Python Topics : NumPy - Basics
Introduction
What is Numpy
NumPy is a Python library used for working with arrays
has functions for working in domain of linear algebra, fourier transform, and matrices
NumPy stands for Numerical Python

Why Use NumPy?
Python lists can be slow to process
NumPy aims to provide an array object that is up to 50x faster than traditional Python lists
the array object in NumPy is called ndarray
provides many supporting functions

Why is NumPy Faster Than Lists?
NumPy arrays are stored at one continuous place in memory unlike lists
processes can access and manipulate them very efficiently
behavior is called locality of reference
also is optimized to work with latest CPU architectures
Getting Started
Installation of NumPy
pip install numpy
Import and Alias NumPy
import numpy as np
Checking NumPy Version
import numpy as np

print(np.__version__)
Creating Arrays
Create a NumPy ndarray Object
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
use a tuple to create a NumPy array
import numpy as np

arr = np.array((1, 2, 3, 4, 5))
print(arr)
Dimensions in Arrays
a dimension in arrays is one level of array depth (nested arrays)
nested array have arrays as their elements

0-D Arrays
0-D arrays or scalars are the elements in an array
each value in an array is a 0-D array
create 0-D array with value 42
import numpy as np

arr = np.array(42)
print(arr)
1-D Arrays
an array that has 0-D arrays as its elements is called uni-dimensional or 1-D array
most common and basic arrays
create a 1-D array
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
print(arr)
2-D Arrays
a array that has 1-D arrays as its elements is called a 2-D array
often used to represent matrix or 2nd order tensors
NumPy has a whole sub module dedicated towards matrix operations named numpy.mat
create a 2-D array containing two arrays

                
3-D arrays
an array that has 2-D arrays (matrices) as its elements is called 3-D array
often used to represent a 3rd order tensor
create a 3-D array with two 2-D arrays each containing two arrays
import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Check the Number of Dimensions?
NumPy Arrays provides the ndim attribute
returns an integer indicating how many dimensions the array has
import numpy as np

a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
Higher Dimensional Arrays
an array can have any number of dimensions
can define the number of dimensions hen the array is created using the ndmin argument
create an array with 5 dimensions and verify that it has 5 dimensions
import numpy as np

arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('number of dimensions :', arr.ndim)
Array Indexing
Access Array Elements
array indexing is the same as accessing an array element
You can access an array element by referring to its index number. NumPy arrays are zero-based

Access 2-D Arrays
to access elements from 2-D arrays use comma separated integers representing the dimension and the index of the element
2-D arrays are like a table with rows and columns
the dimension represents the row and the index represents the column
access the element on the 2nd row, 5th column
import numpy as np

arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('5th element on 2nd row: ', arr[1, 4])
Access 3-D Arrays
to access elements from 3-D arrays use comma separated integers representing the dimensions and the index of the element
access the third element of the second array of the first array
import numpy as np

arr = np.array([ [[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]] ])
print(arr[0, 1, 2])
Negative Indexing
use negative indexing to access an array from the end
print the last element from the 2nd dimension
import numpy as np

arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('Last element from 2nd dimension: ', arr[1, -1])
Array Slicing
Slicing arrays
slicing means taking elements from one given index to another given index
pass slice [start:end]
end arg is exclusive
can also define the step [start:end:step]. start default is 0
end default is the length of the array
step default is 1
slice elements from the beginning to index 4 (not included)
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[:4])
Negative Slicing
use the minus operator to refer to an index from the end
slice from the index 3 from the end to index 1 from the end
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[-3:-1])
STEP
use the step value to determine the step of the slicing
return every other element from the entire array
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[::2])
Slicing 2-D Arrays
from the second element slice elements from index 1 to index 4 (not included)
import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[1, 1:4])
from both elements, slice index 1 to index 4 (not included)
will return a 2-D array
import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[0:2, 1:4])
result
[[2 3 4], [7 8 9]]
Data Types
Data Types in Python
strings used to represent text data
the text is given under quote marks
integer used to represent integer numbers
float used to represent real numbers
boolean used to represent True or False
complex used to represent complex numbers
Data Types in NumPy
NumPy has some extra data types
refers to data types with one character

iinteger
bboolean
uunsigned integer
ffloat
ccomplex float
mtimedelta
Mdatetime
Oobject
Sstring
Uunicode string
Vfixed chunk of memory for other type ( void )
Checking the Data Type of an Array
the NumPy array object has a property called dtype which returns the data type of the array
get the data type of an array object
import numpy as np

arr = np.array([1, 2, 3, 4])
print(arr.dtype) # int64
Creating Arrays With a Defined Data Type
use the array() function to create arrays
function can take an optional argument dtype
allows defining the expected data type of the array elements
create an array with data type string
import numpy as np

arr = np.array([1, 2, 3, 4], dtype='S')

print(arr)
print(arr.dtype)
output
[b'1' b'2' b'3' b'4']
|S1
i, u, f, S and U arrays can define size as well
create an array with data type 4 bytes integer
import numpy as np

arr = np.array([1, 2, 3, 4], dtype='i4')

print(arr)
print(arr.dtype)
output
[1 2 3 4]
int32
What if a Value Can Not Be Converted?
if a type is given in which elements can't be casted then NumPy will raise a ValueError
import numpy as np

arr = np.array(['a', '2', '3'], dtype='i')
output
Traceback (most recent call last):
  File "./prog.py", line 3, in 
ValueError: invalid literal for int() with base 10: 'a'
Converting Data Type on Existing Arrays
the best way to change the data type of an existing array is to make a copy of the array with the astype() method
the astype() function creates a copy of the array
can specify the data type as a parameter
the data type can be specified using a string or can use the data type directly
change data type from float to integer by using 'i' as parameter value
import numpy as np

arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype('i')

print(newarr)
print(newarr.dtype)
or
import numpy as np

arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype(int)

print(newarr)
print(newarr.dtype)
either way the output is
[1 2 3]
int32
Copy vs. View
The Difference Between Copy and View
the main difference between a copy and a view of an array is that the copy is a new array while the view is just a view of the original array the copy owns the data and any changes made to the copy will not affect original array
any changes made to the original array will not affect the copy

the view does not own the data
any changes made to the view will affect the original array
any changes made to the original array will affect the view

make a copy, change the original array, and display both arrays

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42

print(arr)
print(x)
output
[42  2  3  4  5]
[1 2 3 4 5]
make a view, change the original array, and display both arrays
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
arr[0] = 42

print(arr)
print(x)
output
[42  2  3  4  5]
[42  2  3  4  5]
make a view, change the view, and display both arrays
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
x[0] = 31

print(arr)
print(x)
output
[31  2  3  4  5]
[31  2  3  4  5]
Check if Array Owns its Data
copies owns the data while views do not own the data
NumPy arrays have the attribute base which returns None if the array owns the data
otherwise, the base attribute refers to the original object
print the value of the base attribute to check if an array owns it's data or not
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

x = arr.copy()
y = arr.view()

print(x.base)
print(y.base)
output
None
[1 2 3 4 5]
Array Shaping
Shape of an Array
the shape of an array is the number of elements in each dimension

Get the Shape of an Array
NumPy arrays have an attribute named shape
returns a tuple with each index having the number of corresponding elements print the shape of a 2-D array
import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
output
(2, 4)
Create an array with 5 dimensions using ndmin using a vector with values 1,2,3,4
verify that last dimension has value 4
import numpy as np

arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('shape of array :', arr.shape)
output
[[[[[1 2 3 4]]]]]
shape of array : (1, 1, 1, 1, 4)
What does the shape tuple represent?
integers at every index tells about the number of elements the corresponding dimension has in the example above at index-4 has value 4
can say that 5th ( 4 + 1 th) dimension has 4 elements
Array Reshaping
Reshaping Arrays
reshaping means changing the shape of an array
the shape of an array is the number of elements in each dimension
by reshaping can add or remove dimensions or change number of elements in each dimension

Reshape From 1-D to 2-D
convert the following 1-D array with 12 elements into a 2-D array
the outermost dimension will have 4 arrays, each with 3 elements:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
print(newarr)
output
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
Reshape From 1-D to 3-D
convert the following 1-D array with 12 elements into a 3-D array
the outermost dimension will have 2 arrays that contains 3 arrays, each with 2 elements
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)
print(newarr)
output
[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]
Can We Reshape Into any Shape?
as long as the elements required for reshaping are equal in both shapes
can reshape an 8 elements 1D array into 4 elements in 2 rows 2D array
cannot reshape it into a 3 elements 3 rows 2D array as that would require 3x3 = 9 elements
resulting error
Traceback (most recent call last):
  File "demo_numpy_array_reshape_error.py", line 5, in <module>
ValueError: cannot reshape array of size 8 into shape (3,3)
Returns Copy or View?
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
print(arr.reshape(2, 4).base)
output is the original array so the reshaped array is a view

Unknown Dimension
are allowed to have one "unknown" dimension
do not have to specify an exact number for one of the dimensions in the reshape method
pass -1 as the value, and NumPy will calculate this number
convert 1D array with 8 elements to 3D array with 2x2 elements
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
newarr = arr.reshape(2, 2, -1)
print(newarr)
output
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
can not pass -1 to more than one dimension

Flattening the Arrays
flattening array means converting a multidimensional array into a 1D array
can use reshape(-1) to do this
convert the array into a 1D array
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
newarr = arr.reshape(-1)
print(newarr)
Array Iterating
Iterating Arrays
iterate on a 1-D array and it will go through each element one by one
import numpy as np

arr = np.array([1, 2, 3])
for x in arr:
  print(x)
Iterating 2-D Arrays
iterate on the elements of a 2-D array
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
for x in arr:
  print(x)
output
[1 2 3]
[4 5 6]
iterate a n-D array abd it will go through n-1th dimension one by one
to return the actual values, the scalars, have to iterate the arrays in each dimension
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
for x in arr:
  for y in x:
    print(y)
Iterating 3-D Arrays
in a 3-D array it will go through all the 2-D arrays
import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
  print("x represents the 2-D array:")
  print(x)
output
x represents the 2-D array:
[[1 2 3]
 [4 5 6]]
x represents the 2-D array:
[[ 7  8  9]
 [10 11 12]]
iterate down to the scalars
import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
  for y in x:
    for z in y:
      print(z)
Iterating Arrays Using nditer()
the function nditer() is a helper function that can be used from very basic to very advanced iterations
solves some basic issues which we face in iteration

in basic for loops iterating through each scalar of an array need to use n for loops
can be difficult to write for arrays with very high dimensionality

import numpy as np

arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

for x in np.nditer(arr):
  print(x)
output
1
2
3
4
5
6
7
8
Iterating Array With Different Data Types
an use op_dtypes argument and pass it the expected datatype to change the datatype of elements while iterating

NumPy does not change the data type of the element in-place so it needs some other space to perform this action
extra space is called a buffer
in order to enable it in nditer() pass flags=['buffered']
iterate through the array as a string

import numpy as np

arr = np.array([1, 2, 3])

for x in np.nditer(arr, flags=['buffered'], op_dtypes=['S']):
  print(x)
output
b'1'
b'2'
b'3'
Iterating With Different Step Size
can use filtering and followed by iteration
iterate through every scalar element of the 2D array skipping 1 element
import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for x in np.nditer(arr[:, ::2]):
  print(x)
output
1
3
5
7
Enumerated Iteration Using ndenumerate()
enumeration means mentioning sequence number of somethings one by one
sometimes require corresponding index of the element while iterating
the ndenumerate() method can be used for those usecases
enumerate on following 1D arrays elements
import numpy as np

arr = np.array([1, 2, 3])

for idx, x in np.ndenumerate(arr):
  print(idx, x)
output
(0,) 1
(1,) 2
(2,) 3
enumerate on following 2D array's elements
import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for idx, x in np.ndenumerate(arr):
  print(idx, x)
output
(0, 0) 1
(0, 1) 2
(0, 2) 3
(0, 3) 4
(1, 0) 5
(1, 1) 6
(1, 2) 7
(1, 3) 8
Array Join
Joining NumPy Arrays
joining means putting contents of two or more arrays in a single array

in SQL join tables based on a key
in NumPy join arrays by axes

pass a sequence of arrays to join to the concatenate() function along with the axis if axis is not explicitly passed it defaults to 0
join 2 arrays

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))

print(arr)
output
[1 2 3 4 5 6]
join two 2-D arrays along rows (axis=1)
import numpy as np

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)

print(arr)
output
[[1 2 5 6]
 [3 4 7 8]]
Joining Arrays Using Stack Functions
stacking is same as concatenation with the difference being stacking is done along a new axis
can concatenate two 1-D arrays along the second axis which would result in putting them one over the other, ie. stacking
pass a sequence of arrays to join to the stack() method along with the axis
if axis is not explicitly passed it defaults to 0
import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.stack((arr1, arr2), axis=1)

print(arr)
output
[[1 4]
 [2 5]
 [3 6]]
Stacking Along Rows
NumPy provides a helper function hstack() to stack along rows
import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.hstack((arr1, arr2))

print(arr)
output
[1 2 3 4 5 6]
Stacking Along Columns
NumPy provides a helper function vstack() to stack along columns
import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.vstack((arr1, arr2))

print(arr)
output
[[1 2 3]
 [4 5 6]]
Stacking Along Height (depth)
NumPy provides a helper function dstack() to stack along height which is the same as depth
import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.dstack((arr1, arr2))

print(arr)
output
[[[1 4]
  [2 5]
  [3 6]]]
Array Splitting
Splitting NumPy Arrays
splitting is reverse operation of joining
joining merges multiple arrays into one while Splitting breaks one array into multiple arrays
use array_split() for splitting arrays, pass it the array to split and the number of splits
split the array in 3 parts
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)

print(newarr)
if the array has less elements than required, it will adjust from the end accordingly
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 4)

print(newarr)
output
[array([1, 2]), array([3, 4]), array([5]), array([6])]
Split Into Arrays
the return value of the array_split() method is an array containing each of the split as an array
split an array into 3 arrays, can access each array from the result just like any array element
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr[0])
print(newarr[1])
print(newarr[2])
when the division won't yield equal length arrays
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 4)

print(newarr[0])
print(newarr[1])
print(newarr[2])
output
[1 2]
[3 4]
[5]
Splitting 2-D Arrays
use the same syntax when splitting 2-D arrays
import numpy as np

arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)

print(newarr)
output
[array([[1, 2],
       [3, 4]]), array([[5, 6],
       [7, 8]]), array([[ 9, 10],
       [11, 12]])]
split the 2-D array into three 2-D arrays
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])
newarr = np.array_split(arr, 3)

print(newarr)
output
[array([[1, 2, 3],
       [4, 5, 6]]), array([[ 7,  8,  9],
       [10, 11, 12]]), array([[13, 14, 15],
       [16, 17, 18]])]
can specify which axis you want to do the split around
split the 2-D array into three 2-D arrays along rows (axis = 1)
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])
newarr = np.array_split(arr, 3, axis=1)

print(newarr)
output
[array([[ 1],
       [ 4],
       [ 7],
       [10],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [ 8],
       [11],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [ 9],
       [12],
       [15],
       [18]])]
an alternate solution is using hsplit()
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])
newarr = np.hsplit(arr, 3)

print(newarr)
Array Search
Searching Arrays
can search an array for a certain value, and return the indexes that get a match
to search an array, use the where() method
find the indexes where the value is 4
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)

print(x)
output
(array([3, 5, 6]),)
find the indexes where the values are even
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)

print(x)
Search Sorted
searchsorted() method performs a binary search in an array
returns the index where the specified value would be inserted to maintain the search order
assumed to be used on sorted arrays
find the indexes where the value 7 should be inserted
import numpy as np

arr = np.array([6, 7, 8, 9])
x = np.searchsorted(arr, 7)

print(x)
the number 7 should be inserted on index 1 to remain the sort order
the method starts the search from the left and returns the first index where the number 7 is no longer larger than the next value

Search From the Right Side
by default the left most index is returned
can give side='right' to return the right most index instead
find the indexes where the value 7 should be inserted starting from the right
import numpy as np

arr = np.array([6, 7, 8, 9])
x = np.searchsorted(arr, 7, side='right')

print(x)
the number 7 should be inserted on index 2 to remain the sort order
t he method starts the search from the right and returns the first index where the number 7 is no longer less than the next value

Multiple Values
to search for more than one value, use an array with the specified values
find the indexes where the values 2, 4, and 6 should be inserted
import numpy as np

arr = np.array([1, 3, 5, 7])
x = np.searchsorted(arr, [2, 4, 6])

print(x)
The return value is an array [1 2 3]
contains the three indexes where 2, 4, 6 would be inserted in the original array to maintain the order (???)
Array Sort
Sorting Arrays
the NumPy ndarray object has a function called sort() which will sort a specified array
returns a copy of the array
import numpy as np

arr = np.array([3, 2, 0, 1])

print(np.sort(arr))
can also sort arrays of strings, or any other data type

Sorting a 2-D Array
if the sort() method is used on a 2-D array, both arrays will be sorted
import numpy as np

arr = np.array([[3, 2, 4], [5, 0, 1]])

print(np.sort(arr))
output
[[2 3 4]
 [0 1 5]]
Array Filter
Filtering Arrays
getting some elements out of an existing array and creating a new array out of them is called filtering
in NumPy filter an array using a boolean index list
a boolean index list is a list of booleans corresponding to indexes in the array
if the value at an index is True that element is contained in the filtered array
if the value at that index is False that element is excluded from the filtered array
create an array from the elements on index 0 and 2
import numpy as np

arr = np.array([41, 42, 43, 44])
x = [True, False, True, False]
newarr = arr[x]

print(newarr)
Creating the Filter Array
example above uses hard-coded True and False values
common use is to create a filter array based on conditions
create a filter array that will return only values higher than 42
import numpy as np

arr = np.array([41, 42, 43, 44])

# Create an empty list
filter_arr = []

# go through each element in arr
for element in arr:
  # if the element is higher than 42, set the value to True, otherwise False:
  if element > 42:
    filter_arr.append(True)
  else:
    filter_arr.append(False)

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)
create a filter array that will return only even elements from the original array
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

# Create an empty list
filter_arr = []

# go through each element in arr
for element in arr:
  # if the element is completely divisble by 2, set the value to True, otherwise False
  if element % 2 == 0:
    filter_arr.append(True)
  else:
    filter_arr.append(False)

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)
Creating Filter Directly From Array
above example is quite a common task in NumPy
NumPy provides a better solution
can directly substitute the array instead of the iterable variable in our condition
create a filter array that will return only values higher than 42
import numpy as np

arr = np.array([41, 42, 43, 44])
filter_arr = arr > 42
newarr = arr[filter_arr]

print(filter_arr)
print(newarr)
index