Looking to get started with NumPy? This guide will teach you the basics of NumPy arrays in Python.
As a first step, you’ll learn how NumPy arrays work differently than Python lists. You’ll then learn several ways to create NumPy arrays and perform basic operations on them.
Basics of NumPy Arrays
NumPy is one of the most popular Python libraries for scientific computing and data analysis. The basic data structures in NumPy are N-dimensional arrays (N-D arrays). They have broadcasting capabilities and allow us to vectorize operations for speed and use built-in mathematical functions for performance improvement.
To start working with NumPy, you should first install the library and import it into your working environment. It is available as a PyPI package that is installable through pip.
To install NumPy, open up your terminal and run the following command:
pip3 install numpy
After installing NumPy, you can import it into your working environment under an alias. The usual alias is
import numpy as np
Note: Importing NumPy under the alias
npis not a requirement but a recommended convention.
Python Lists vs. NumPy Arrays
Consider the following Python list of numbers:
py_list = [1,2,3,4]
You can get a NumPy array from an existing list by calling the
np.array() function with the list as the argument.
np_arr1 = np.array(py_list) print(np_arr1) [1 2 3 4]
To check the type of
np_arr1, you call the built-in
type() function, you’ll see that it’s
ndarray, the fundamental data structure in NumPy.
type(np_arr1) # numpy.ndarray
Though the Python list and the NumPy array may look similar, there are certain differences:
- A Python list can hold objects of different data types, whereas a NumPy array contains elements of the same data type. The default data type is float with a precision of 64 bits (float64).
- The elements of a Python list are not necessarily stored in contiguous locations in memory. However, the elements of a NumPy array are stored in a contiguous block in memory. As a result, it is faster to look up and access elements.
Let’s go over a couple of other differences.
A powerful feature of NumPy arrays is broadcasting. Suppose we’d like to add 2 to all the elements of
Let’s try adding 2 to
py_list and see what happens:
>>> py_list + 2
We see that we get a TypeError stating that we can only concatenate two lists, and adding py_list + 2 like this is not supported.
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-5-c0f9974899df> in <module> ----> 1 py_list + 2 TypeError: can only concatenate list (not "int") to list
Let’s try the same operation on the array,
>>> np_arr1 + 2
In the result, we see that 2 has been added to each element of the array.
array([3, 4, 5, 6])
This is because NumPy implicitly broadcasted the scalar 2 to an array of compatible shape to yield this result.
NumPy arrays support vectorization for faster element-wise operations. Suppose we’d like to find the element-wise sum of the two arrays.
Using a simple
+ operation on the list would return the concatenation of the two lists (which is not what we want!).
>>> py_list + py_list # [1, 2, 3, 4, 1, 2, 3, 4]
But the same operation on the NumPy array,
np_arr1, returns the element-wise sum of
np_arr1 with itself.
>>> np_arr1 + np_arr1 # array([2, 4, 6, 8])
Similarly, nested lists may look similar in structure to an N-dimensional NumPy array. However, the differences discussed so far hold.
nested_list = [[1,2],[3,4],[5,6]] np_arr2 = np.array(nested_list) print(np_arr2)
[[1 2] [3 4] [5 6]]
How to Create NumPy Arrays
You can always create NumPy arrays from existing Python lists using
np.array(list-obj). However, this is not the most efficient way.
Instead, you can use several built-in functions that let you create arrays of a specific shape. The shape of the array is a tuple that denotes the size of the array along each dimension. For example, the shape of a 2×2 array with two rows and two columns is (2,2). In this section, we’ll learn how to use some of these built-in functions.
Creating Arrays of Zeros and Ones
It’s often helpful to create an array of specific dimensions populated with all zeros or all ones. And then use them and modify them in subsequent steps in the program.
We can use the
zeros() function to create an array of zeros. Pass in the shape of the required array as a tuple:
array0 = np.zeros((3,3)) print(array0)
Here’s the output, a 2D array of zeros:
[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
You can access the attributes of the NumPy array, call attributes such as
shape, using the dot notation, as shown below:
print(array0.dtype) # float64 print(array0.shape) # (3, 3)
To get an array of ones, you can use the
array1 = np.ones((3,3)) print(array1)
[[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]]
Creating an Identity Matrix
The identity matrix is widely used in several applications in linear algebra. And you can use the
np.eye() function to create an identity matrix. The
np.eye() function takes in only one argument: the order of the matrix (
arrayi = np.eye(3) print(arrayi)
[[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]
Creating Arrays of Random Numbers
You can also create arrays of a specific shape populated with random numbers drawn from specific distributions. The commonly used probability distributions are the uniform distribution and the standard normal distribution.
randn() function, which is part of NumPy’s
random module, can be used to generate arrays of numbers that are sampled from a standard normal distribution. Standard normal distribution is a Gaussian distribution with zero mean and unit variance.
std_arr = np.random.randn(3,4) print(std_arr)
[[-0.13604072 1.21884359 2.06850932 0.78212093] [ 0.44314719 -0.78084801 -0.70517138 1.17984949] [ 1.13214829 1.02339351 0.15317809 1.83191128]]
np.random.rand() returns an array of numbers sample from a uniform distribution over the interval [0,1).
uniform_arr = np.random.rand(2,3) print(uniform_arr)
[[0.90470384 0.18877441 0.10021817] [0.741 0.10657658 0.71334643]]
You can also create an array of random integers using the
randint() function that is part of NumPy’s random module.
np.random.randint(low, high, size) returns an array of integers. The shape of the array is inferred from the
size argument and the integers take on values in the interval
Here’s an example:
int_arr = np.random.randint(1,100,(2,3)) print(int_arr)
[[53 89 33] [24 85 33]]
Other Useful Built-In Functions
Next, let’s go over a few other helpful functions to create NumPy arrays.
arange() function returns an array of numbers between a
stop value in steps of a
start + step,
start + 2*step up to but not including
start and the
step values are optional. The default step size is 1 and the default start value is 0.
In this example,
array_a is an array of numbers starting at 1 going up to but not including 10 in steps of 0.5.
array_a = np.arange(1,10,0.5) print(array_a)
[1. 1.5 2. 2.5 3. 3.5 4. 4.5 5. 5.5 6. 6.5 7. 7.5 8. 8.5 9. 9.5]
You can also create arrays of evenly spaced numbers using
np.linspace(start, stop, num) to get an array of
num evenly spaced numbers between the
arr_lin is an array of 5 evenly spaced numbers in the interval [1,10].
array_lin = np.linspace(1,10,5) print(array_lin)
[ 1. 3.25 5.5 7.75 10. ]
arr_lin2 is an array of 10 evenly spaced numbers in the interval [1,20].
array_lin2 = np.linspace(1,20,10) print(array_lin2)
[ 1. 3.11111111 5.22222222 7.33333333 9.44444444 11.55555556 13.66666667 15.77777778 17.88888889 20. ]
💡 Unlike the
arange() function, the
linspace() function includes the endpoint by default.
Basic Operations on NumPy Arrays
Next, let’s go over some of the basic operations on NumPy arrays.
Finding the Minimum and Maximum Elements
Whenever we use functions from NumPy’s random module to create arrays, we’ll get a different result each time the code is run. To get reproducible results, we should set a seed:
In the following example, I have set the seed for reproducibility,
int_arr1 is an array of seven random integers in the interval [1,100).
np.random.seed(27) int_arr1 = np.random.randint(1,100,7) print(int_arr1) # [20 57 73 32 57 38 25]
- To find the maximum element in the array, you can call the
max()method on the array object,
- To find the minimum element in the array, you can call the
min()method on the array object,
int_arr1.max() # 73 int_arr1.min() # 20
Finding the Index of the Maximum and Minimum Elements
Sometimes, you may need to find the index of the maximum and the minimum elements. To do this, you can call the
argmax() and the
argmin() methods on the array object.
Here, the maximum element 73 occurs at index 2.
int_arr1.argmax() # 2
And the minimum element 20 occurs at index 0.
int_arr1.argmin() # 0
You can also use
np.argmin(array)to find the indices of the maximum and minimum elements, respectively. Learn more about the NumPy
How to Concatenate NumPy Arrays
Another common operation that you may want to do with NumPy arrays is concatenation.
Vertical Concatenation Using vstack
You can concatenate arrays vertically using the
Here is an example.
arr1 is an array of ones with two rows and three columns and
arr2 is an array of zeros two rows and three columns.
arr1 = np.ones((2,3)) arr2 = np.zeros((2,3))
We can concatenate these two arrays vertically using the
vstack() function as shown:
array([[1., 1., 1.], [1., 1., 1.], [0., 0., 0.], [0., 0., 0.]])
As the stacking happens vertically, the two arrays should have the same number of columns.
arr2 to be of shape (2,2). It now has two rows and two columns.
arr1 = np.ones((2,3)) arr2 = np.zeros((2,2)) np.vstack((arr1,arr2))
Therefore, vertical concatenation is not possible, and we get a ValueError.
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-21-d5d3bf37fc21> in <module> ----> 1 np.vstack((arr1,arr2)) ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2
Horizontal Concatenation Using hstack
You can concatenate NumPy arrays horizontally using the
hstack() function, as shown below.
arr1 = np.ones((3,3)) arr2 = np.zeros((3,2))
Because the stacking happens horizontally, the input arrays should have the same number of rows. Here, both
arr2 have three rows.
array([[1., 1., 1., 0., 0.], [1., 1., 1., 0., 0.], [1., 1., 1., 0., 0.]])
You can also use concatenate NumPy arrays along a specific axis using the
concatenate() function. Set the optional
axis argument to the axis you’d like to concatenate along; the default value of the axis is zero.
Here are a few examples:
arr1 = np.ones((2,3)) arr2 = np.zeros((2,3))
When we don’t specify the axis to concatenate along, the arrays are concatenated along axis 0. In the resultant array, the second array
arr2 is added (as rows) below the first array.
array([[1., 1., 1.], [1., 1., 1.], [0., 0., 0.], [0., 0., 0.]])
When we specify
axis = 1, we get the following result.
arr2 is concatenated (as columns) beside the first array,
array([[1., 1., 1., 0., 0., 0.], [1., 1., 1., 0., 0., 0.]])
As with the
vstack() functions, the dimensions of the arrays along the concatenation axis should match.
In this tutorial, you’ve learned the differences between NumPy arrays and Python lists, with a focus on the advantages of N-dimensional arrays in terms of speed and efficiency.
You’ve also learned several useful functions to create arrays of a particular dimension and perform common operations, such as finding the minimum and the maximum elements, concatenating arrays, and more.
Next, learn how to reshape NumPy arrays.