NumPy Tutorial for Beginners

NumPy Tutorial for Beginners Data Science & Machine Learning

Selecting Data from a NumPy Array

Similar to what we do with Python lists, we can access elements in a NumPy array using their indexes. Recall that indexes start from 0 and negative indexes start from the back.

arr1 = np.array([1, 2, 3, 4, 5])
print(arr1[2])
print(arr1[-1])

The code above gives us

3
5

as the output. The next example shows how we can access elements in a 2D ndarray:

arr2 = np.array([['a', 'b', 'c'], ['d', 'e', 'f']])
print(arr2[0])
print(arr2[0][1])
print(arr2[0, 1])

Here, arr2 is a 2D ndarray with two elements – [‘a’,’b’,’c’] and [‘d’,’e’,’f’].

arr2[0] gives us the first element in arr2. In other words, it gives us the array [‘a’,’b’,’c’].

To access the elements in this array, we can use two sets of square brackets or a comma. For instance, to access the second element in arr2[0], we use arr2[0][1] or arr2[0, 1].

If you run the code above, you’ll get the following output:

['a' 'b' 'c']
b
b

Next, we can slice a NumPy array. To do that, we use the [start:stop:step] notation. This gives us elements from index start to stop (including start but excluding stop), with a step of step.

start and step have default values of 0 and 1, respectively. The default value for stop is the length of the array.

arr3 = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print(arr3[1:6])

Here, we use the slice 1:6 to select elements from index 1 (i.e., the element ‘b’) to 5. This gives us the following output:

['b' 'c' 'd' 'e' 'f']

If we add a step of 2 to the slice, we’ll get every 2nd element from index 1 to 5. In other words, print(arr3[1:6:2]) gives us:

['b' 'd' 'f']

ndarray Methods

An ndarray comes with many useful methods defined in the ndarray class. To use these methods, we write the array name, followed by the dot operator and the name of the method. Let’s look at some examples.

sum() and mean()

The sum() and mean() methods give us the sum and mean of an array, respectively.

Let’s use sum() as an illustration. If we use sum() without specifying the axis, it returns the sum of all the elements in the array. On the other hand, if we specify axis=0, it sums the elements in each column, and if we specify axis=1, it sums the elements in each row. An example is shown below:

arr1 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr1.sum())
print(arr1.sum(axis=0))
print(arr1.sum(axis=1))

This gives us the following output:

36
[ 6  8 10 12]
[10 26]

arr1.sum() gives us 36 as 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 = 36.

arr1.sum(axis=0) gives us [6 8 10 12] as 1 + 5 = 6, 2 + 6 = 8, 3 + 7 = 10, and 4 + 8 = 12.

Finally, arr1.sum(axis=1) gives us [10 26] as 1 + 2 + 3 + 4 = 10 and 5 + 6 + 7 + 8 = 26.

reshape()

reshape() gives a new shape to a NumPy array without changing its data.

Recall that the shape of a 2D array is represented as a tuple with two values (refer to Section 2.3). For instance, the shape of [[1, 2, 3], [4, 5, 6]] is (2, 3), which tells us that this array consists of two rows and three columns.

Suppose we have a 1D array with 8 elements and we want to reshape it to a (4, 2) 2D array, the example below shows how we can do it:

arr2 = np.array([1, 2, 3, 4, 5, 6, 7, 8])
reshaped_array = arr2.reshape((4, 2))
print(arr2)
print(reshaped_array)

If you run this example, you’ll get the following output:

[1 2 3 4 5 6 7 8]
[[1 2]
[3 4]
[5 6]
[7 8]]

The first line shows the original array, which is not changed by the reshape() method. The remaining lines show the reshaped array, which is a (4, 2) array consisting of 4 rows and 2 columns.

It is necessary to know how to reshape 1D arrays to two-dimensional because certain methods in the Scikit-Learn library require 2D arrays as input. We’ll learn more about this library in Chapter 5.

A useful trick when reshaping arrays to two-dimensional is to pass (n, -1) or (-1, n) to the reshape() method and let it infer the shape for us. For instance, for the example above, we can pass (4, -1) to reshape arr2 to a 2D array with four rows and an unspecified number of columns (denoted by -1). When we do that, reshape() infers the number of columns and returns a (4, 2) array to us.

We can also pass (1, -1) to convert a 1D array to a 2D array with one row and an unspecified number of columns. An example is shown below:

arr3 = np.array([1, 2, 3, 4, 5])
reshaped_array_2 = arr3.reshape((1, -1))
print(arr3)
print(reshaped_array_2)

If you run the code above, you’ll

[1 2 3 4 5]
[[1 2 3 4 5]]

[1 2 3 4 5] is a 1D array, while [[1 2 3 4 5]] is a 2D array with one row and five columns.

NumPy Functions

Next, let’s move on to NumPy functions. In the previous section, we learned about methods that are defined inside the ndarray class.

Besides these methods, NumPy comes with its own set of functions (also known as routines) that are defined outside the ndarray class. To use these functions, we pass the NumPy array as an argument to the function.

concatenate()

A commonly used NumPy function is concatenate(), which can be used to join two or more NumPy arrays.

When we use concatenate() to join arrays, we can specify whether we want to join along axis 0 (which is the default) or axis 1. Joining along axis 0 combines the rows and requires the arrays to have the same number of columns, while joining along axis 1 combines the columns and requires the arrays to have the same number of rows.

Let’s look at some examples:

arr1 = np.array([[1, 2], [3, 4], [5, 6]])
arr2 = np.array([[7, 8]])
combined_array = np.concatenate((arr1, arr2))
print(combined_array)

Here, arr1 can be viewed as a table with three rows and two columns, while arr2 is a table with one row and two columns.

As the two arrays have the same number of columns, we can concatenate them along axis 0. We do that on the third line by passing a tuple of the two arrays – (arr1, arr2) – to the function. If you run the code above, you’ll get the following output:

[[1 2]
[3 4]
[5 6]
[7 8]]

The rows in arr1 and arr2 have been combined to give us a new array with four rows and two columns.

If we want to add columns to this array (combined_array), we need to concatenate it with an array that has 4 rows (i.e., 4 nested arrays). An example is shown below:

arr3 = np.array([[9], [10], [11], [12]])
combined_array = np.concatenate((combined_array, arr3), axis=1)
print(combined_array)

Here, we concatenate combined_array with arr3 along axis 1 and assign the result back to combined_array. If you run the code above, you’ll get the following output:

[[ 1  2  9]
 [ 3  4 10]
 [ 5  6 11]
 [ 7  8 12]]

Leave a Reply

Prev
How to Create Shell Script in Linux for Beginners
How to Create Shell Script in Linux for Beginners

How to Create Shell Script in Linux for Beginners

In this post, I will help to introduce you the basics of shell scripting files

Next
Pandas Tutorial for Beginners
Python Pandas Tutorial for Beginners Data Science & Machine Learning

Pandas Tutorial for Beginners

Using read_csv() If you use Anaconda, you need to make sure the CSV file is in

You May Also Like