It is always beautiful to see similarities in different programming/ query languages. But, it is intuitive, since the fundamentals remain the same. In this article, we will introduce the concept of shallow copy/ view and deep copy in NumPy arrays. However, firstly let’s introduce NumPy arrays and it’s advantages with the help of the matrix multiplication example. Later on, we will explore the nuances of copies in a numpy array.
Numpy is a core library for scientific computing in python. We know that in scientific computing, vectors, matrices and tensors form the building blocks. Having said that, in python, there are two ways of dealing with these entities i.e. either with basic data structures like lists or with numpy arrays. To appreciate the importance of numpy arrays, let us perform a simple matrix multiplication without them. Later on, we will use numpy and see the contrast for ourselves.
Matrix Multiplication without numpy
Without a numpy array, we create a matrix using a list of lists.
A = [[1,2,3], [4,5,6], [7,8,9]] B = [[1,0,0], [0,1,0], [0,0,1]]
Now, in order to multiply, we can write a function as shown below.
def matmul(A, B): rows_A,cols_A = len(A),len(A) rows_B,cols_B = len(B),len(B) if cols_A == rows_B: product = [[0 for cols in range(cols_B)] for row in range(rows_A)] for i in range(rows_A): for j in range(cols_B): for k in range(rows_B): product[i][j] += A[i][k]*B[k][j] else : product = 'Not Possible' return(product) matmul(A,B)
We can see the result as follows:
Furthermore, let us analyse the performance of the user-defined function matmul with the magic function timeit. Please remember the below numbers i.e. an average of 23 microseconds.
We know that matrix multiplication is one of the simplest operations in linear algebra. Intuitively, we can deduce that for more complex tensor operations, the code complexity will increase as well. Hence, python came up with the numpy library to ‘simplify’ and ‘optimize’ the operations in linear algebra.
Matrix Multiplication with numpy array
Firstly, let us focus on the simplification aspect of numpy arrays. The below code snippet creates two arrays.
import numpy as np A = np.array([[1,2,3], [4,5,6], [7,8,9]]) B = np.array([[1,0,0], [0,1,0], [0,0,1]])
Matrix multiplication, with a numpy array, is a one-line code.
Product = np.matmul(A,B)
You can see the result of matrix multiplication as follows.
However, the more pertinent contrast with the traditional list of lists approach is with regards to performance. Let us analyze the performance in this approach.
Whoa! we can see that with numpy, the performance is better by a whopping factor of 11.
Having said that, let us focus on a few other aspects of the numpy array, like views and copies. Let us understand the concept of a view first.
If one is familiar with SQL, a view is a result of a stored query. It is not a physical table, but a semantic layer on top of it. Similarly, in case of a numpy array, a view is a result of expression like the slicing of an array. Let us dive into an example of the same.
import numpy as np Original = np.arange(10) Original View = Original[::2] View
Here ‘View’ is a semantic layer/shallow copy created by slicing ‘Original’. However, it does not create a new array named ‘View’. To verify this, we can use the numpy function shares_memory.
There is another way to verify this. Let us modify the first element of the view.
View = 10
It is evident that a modification in the view leads to the modification in the original, thus confirming the fact that both ‘Original’ and ‘View’ variables point to the same array.
To elaborate, this is similar to the concept of shallow copy in C++ where two pointers referenced the same object. However, I find it much more similar to the concept of views in SQL.
As opposed to Views, we can create Copies that perform deep copies. When you create a copy, the original array is sliced and stored in a different memory location. This is similar to the concept of deep copy in C++. Let us understand this with an example.
import numpy as np Original = np.arange(10) Original
Now, create a copy using the copy() function.
Copy = Original[::2].copy()
However, the next statement brings out the stark contrast between a view and a copy.
We can see that both ‘Original’ and ‘Copy’ are stored at different locations.
Lastly, let us manipulate the ‘Copy’ and see if it affects the ‘Original’ array.
Copy = 10
It is evident that while manipulating the ‘Copy’, the ‘Original’ was not manipulated.
The above two examples i.e. Matrix multiplication and views make it intuitive that numpy arrays are made for optimization in terms of both speed and memory. Do try to play around with the above examples. Hope this article helps.
Featured Image Credit: By David Cournapeau