What is NumPy?
NumPy is a Python library designed to make it easy for us to work with arrays. In basic Python, arrays (also known as lists) have limited functionalities. While we can add and subtract lists, it is not easy to perform advanced mathematical operations on them.
Numpy solves this problem for us. With NumPy, we can perform various complex operations on arrays, including shape manipulation, basic linear algebra, random simulation, and more.
The main data structure in NumPy is an array object known as the ndarray. An ndarray is a multidimensional array of elements, where each element in the array is typically an array itself.
Datasets in machine learning are frequently stored as two-dimensional ndarrays. Therefore, familiarity with NumPy is an essential skill for any data scientist.
Importing the NumPy Library
To work with NumPy, we need to import it.
First, create a new notebook in Google Colab and name it NumPy.ipynb. Next, add the following code to the first cell and run it:
import numpy as np
It is customary to use np as the alias when importing NumPy; we’ll follow the same convention in this book. (Note that this import statement does not produce any output when you run it.)
Creating a NumPy Array
After importing NumPy, we can use it to create an ndarray (also known as a NumPy array, or simply an array).
There are two main ways to do it. The first is to convert a Python array-like structure (such as a Python list or tuple) to an ndarray using the array() function. Let’s look at some examples:
list1 = [1, 2, 3, 4]
list2 = [[1, 2, 3, 4]]
list3 = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
arr1 = np.array(list1)
arr2 = np.array(list2)
arr3 = np.array(list3)
print(type(arr1), type(arr2), type(arr3))
Here, we first declare and initialize three Python lists (list1, list2, and list3).
Notice that list1 and list2 are very similar, except that list1 has one set of square brackets while list2 has two.
list1 is a one-dimensional (1D) list with four elements.
list2, on the other hand, is a two-dimensional (2D) list with one element. This element – [1, 2, 3, 4] – is a list itself and consists of four elements.
list3 is also a 2D list, with three nested lists of four elements each.
After declaring the lists, we pass them to the NumPy array() function to convert them to ndarrays. We then print the data types of the resulting arrays. If you run the code above, you’ll get the following output:
<class 'numpy.ndarray'> <class 'numpy.ndarray'> <class 'numpy.ndarray'>
This output indicates we have successfully converted list1, list2, and list3 to NumPy arrays. We can print the shapes of these ndarrays. The shape of an ndarray (stored in the shape attribute) is given as a tuple and tells us the number of elements in it:
print(arr1.shape)
print(arr2.shape)
print(arr3.shape)
This gives us the following output:
(4,)
(1, 4)
(3, 4)
(4,) tells us that arr1 is a 1D array with four elements, while (1, 4) tells us that arr2 is a 2D array with one nested array; this nested array has four elements.
Finally, the last line (3, 4) tells us that arr3 is a 2D array with three nested arrays; each nested array has four elements.
The example above shows how to use the array() function to convert Python built-in structures to ndarrays. Next, let’s learn to create NumPy arrays from scratch. To do that, we can use different predefined functions in the NumPy library. An example is the linspace() function.
linspace(start,stop,num) gives us a NumPy array of num evenly spaced numbers within the interval of start(inclusive) to stop (inclusive). Let’s look at an example:
arr4 = np.linspace(0, 10, 5)
print(arr4)
This example gives us 5 evenly spaced numbers from 0 to 10. If you run the example, you’ll get the following output:
[ 0. 2.5 5. 7.5 10. ]
linspace() generates floating-point numbers by default. In addition, Jupyter Notebook prints NumPy arrays without commas. Hence, we get decimal points after the integers (e.g., 0.) and no commas in the output above.