In Part 1 of our series on writing efficient code with NumPy we cover why loops are slow in Python, and how to replace them with vectorized code. We also dig deep into how broadcasting works, along with a few practical examples. ,Phew! That was one detailed post! Truth be said, vectorization and broadcasting are two cornerstones of writing efficient code in NumPy and that is why I thought the topics warranted such a long discussion. I encourage you to come up with toy examples to get a better grasp of the concepts. ,The good news, however, is that NumPy provides us with a feature called Broadcasting, which defines how arithmetic operations are to be performed on arrays of unequal size. According to the SciPy docs page on broadcasting, ,In the next part, we will use the things we covered in this post to optimize a naive implementation of the K-Means clustering algorithm (implemented using Python lists and loops) using vectorization and broadcasting, achieving speed-ups of 70x!

arr = np.arange(12).reshape(3, 4) col_vector = np.array([5, 6, 7]) num_cols = arr.shape[1] for col in range(num_cols): arr[: , col] += col_vector

Here's a vectorized approach -

m, n, r = volume.shape x, y, z = np.mgrid[0: m, 0: n, 0: r] X = x - roi[0] Y = y - roi[1] Z = z - roi[2] mask = X ** 2 + Y ** 2 + Z ** 2 < radius ** 2

Possible improvement : We can probably speedup the last step with `numexpr`

module -

```
import numexpr as ne
mask = ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')
```

We can also gradually build the three ranges corresponding to the shape parameters and perform the subtraction against the three elements of `roi`

on the fly without actually creating the meshes as done earlier with `np.mgrid`

. This would be benefited by the use of `broadcasting`

for efficiency purposes. The implementation would look like this -

m, n, r = volume.shape vals = ((np.arange(m) - roi[0]) ** 2)[: , None, None] + \ ((np.arange(n) - roi[1]) ** 2)[: , None] + ((np.arange(r) - roi[2]) ** 2) mask = vals < radius ** 2

Function definitions -

```
def vectorized_app1(volume, roi, radius):
m, n, r = volume.shape
x, y, z = np.mgrid[0: m, 0: n, 0: r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
return X ** 2 + Y ** 2 + Z ** 2 < radius ** 2
def vectorized_app1_improved(volume, roi, radius):
m, n, r = volume.shape
x, y, z = np.mgrid[0: m, 0: n, 0: r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')
def vectorized_app2(volume, roi, radius):
m, n, r = volume.shape
vals = ((np.arange(m) - roi[0]) ** 2)[: , None, None] + \
((np.arange(n) - roi[1]) ** 2)[: , None] + ((np.arange(r) - roi[2]) ** 2)
return vals < radius ** 2
def vectorized_app2_simplified(volume, roi, radius):
m, n, r = volume.shape
x, y, z = np.ogrid[0: m, 0: n, 0: r] - roi
return (x ** 2 + y ** 2 + z ** 2) < radius ** 2
```

Timings -

```
In[106]: # Setup input arrays
...: volume = np.random.rand(90, 110, 100) # Half of original input sizes
...: roi = np.random.rand(3)
...: radius = 3.4
...:
In[107]: % timeit _make_mask(volume, roi, radius)
1 loops, best of 3: 41.4 s per loop
In[108]: % timeit vectorized_app1(volume, roi, radius)
10 loops, best of 3: 62.3 ms per loop
In[109]: % timeit vectorized_app1_improved(volume, roi, radius)
10 loops, best of 3: 47 ms per loop
In[110]: % timeit vectorized_app2(volume, roi, radius)
100 loops, best of 3: 4.26 ms per loop
In[139]: % timeit vectorized_app2_simplified(volume, roi, radius)
100 loops, best of 3: 4.36 ms per loop
```

Say you first build an `xyzy`

array:

```
import itertools
xyz = [np.array(p) for p in itertools.product(range(volume.shape[0]), range(volume.shape[1]), range(volume.shape[2]))]
```

Now, using `numpy.linalg.norm`

,

np.linalg.norm(xyz - roi, axis = 1) < radius

Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or a tuple of numpy arrays. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.,The signature argument allows for vectorizing functions that act on non-scalar arrays of fixed length. For example, you can use it for a vectorized calculation of Pearson correlation coefficient and its p-value:,Set of strings or integers representing the positional or keyword arguments for which the function will not be vectorized. These will be passed directly to pyfunc unmodified.,The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

```
>>> def myfunc(a, b):
..."Return a-b if a>b, otherwise return a+b"
...
if a > b:
...
return a - b
...
else:
...
return a + b
```

>>> vfunc = np.vectorize(myfunc) >>> vfunc([1, 2, 3, 4], 2) array([3, 4, 1, 2])

```
>>> vfunc.__doc__ 'Return a-b if a>b, otherwise return a+b' >>>
vfunc = np.vectorize(myfunc, doc = 'Vectorized `myfunc`') >>>
vfunc.__doc__ 'Vectorized `myfunc`'
```

```
>>> out = vfunc([1, 2, 3, 4], 2)
>>> type(out[0])
<class 'numpy.int64'>
>>> vfunc = np.vectorize(myfunc, otypes=[float])
>>> out = vfunc([1, 2, 3, 4], 2)
>>> type(out[0])
<class 'numpy.float64'>
```

```
>>> def mypolyval(p, x):
..._p = list(p)
...res = _p.pop(0)
...
while _p:
...res = res * x + _p.pop(0)
...
return res >>>
vpolyval = np.vectorize(mypolyval, excluded = ['p']) >>>
vpolyval(p = [1, 2, 3], x = [0, 1])
array([3, 6])
```

>>> vpolyval.excluded.add(0) >>> vpolyval([1, 2, 3], x = [0, 1]) array([3, 6])

Use broadcasting to implicitly loop over data,Vectorize calculations to avoid explicit loops,What if what we really want is pairwise addition of a, b? Without broadcasting, we could accomplish this by looping:,NumPy Broadcasting Article

```
import numpy as np
a = np.array([10, 20, 30, 40])
a + 5
```

`array([15, 25, 35, 45])`

b = np.array([5]) a + b

b = np.array([5, 6, 7]) a + b

b = np.array([5, 5, 10, 10]) a + b

`array([15, 25, 40, 50])`

Last Updated : 04 Oct, 2019

**Output:**

dot_product = 833323333350000.0 Computation time = 35.59449199999999 ms n_dot_product = 833323333350000 Computation time = 0.1559900000000225 ms

Vectorization is a technique of implementing array operations without using for loops. Instead, we use functions defined by various modules which are highly optimized that reduces the running and execution time of code. Vectorized array operations will be faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.,Vectorization is used widely in complex systems and mathematical models because of faster execution and less code size. Now you know how to use vectorization in python, you can apply this to make your project execute faster. So Congratulations!,The element-wise product of two matrices is the algebraic operation in which each element of the first matrix is multiplied by its corresponding element in the second matrix. The dimension of the matrices should be the same.,Here we can see numpy operations are way faster than built-in methods which are faster than for loops.

import numpy as np from timeit import Timer # Creating a large array of size 10 ** 6 array = np.random.randint(1000, size = 10 ** 6) # method that adds elements using for loop def add_forloop(): new_array = [element + 1 for element in array ] # method that adds elements using vectorization def add_vectorized(): new_array = array + 1 # Finding execution time using timeit computation_time_forloop = Timer(add_forloop).timeit(1) computation_time_vectorized = Timer(add_vectorized).timeit(1) print("Computation time is %0.9f using for-loop" % execution_time_forloop) print("Computation time is %0.9f using vectorization" % execution_time_vectorized)

```
Computation time is 0.001202600 using
for -loop
Computation time is 0.000236700 using vectorization
```

import numpy as np from timeit import Timer # Creating a large array of size 10 ** 5 array = np.random.randint(1000, size = 10 ** 5) def sum_using_forloop(): sum_array = 0 for element in array: sum_array += element def sum_using_builtin_method(): sum_array = sum(array) def sum_using_numpy(): sum_array = np.sum(array) time_forloop = Timer(sum_using_forloop).timeit(1) time_builtin = Timer(sum_using_builtin_method).timeit(1) time_numpy = Timer(sum_using_numpy).timeit(1) print("Summing elements takes %0.9f units using for loop" % time_forloop) print("Summing elements takes %0.9f units using builtin method" % time_builtin) print("Summing elements takes %0.9f units using numpy" % time_numpy) print() def max_using_forloop(): maximum = array[0] for element in array: if element > maximum: maximum = element def max_using_builtin_method(): maximum = max(array) def max_using_numpy(): maximum = np.max(array) time_forloop = Timer(max_using_forloop).timeit(1) time_builtin = Timer(max_using_built - in_method).timeit(1) time_numpy = Timer(max_using_numpy).timeit(1) print("Finding maximum element takes %0.9f units using for loop" % time_forloop) print("Finding maximum element takes %0.9f units using built-in method" % time_builtin) print("Finding maximum element takes %0.9f units using numpy" % time_numpy)

Summing elements takes 0.069638600 units using for loop Summing elements takes 0.044852800 units using builtin method Summing elements takes 0.000202500 units using numpy Finding maximum element takes 0.034151200 units using for loop Finding maximum element takes 0.029331300 units using builtin method Finding maximum element takes 0.000242700 units using numpy

import numpy as np from timeit import Timer # Create 2 vectors of same length length = 100000 vector1 = np.random.randint(1000, size = length) vector2 = np.random.randint(1000, size = length) # Finds dot product of vectors using for loop def dotproduct_forloop(): dot = 0.0 for i in range(length): dot += vector1[i] * vector2[i] # Finds dot product of vectors using numpy vectorization def dotproduct_vectorize(): dot = np.dot(vector1, vector2) # Finding execution time using timeit time_forloop = Timer(dotproduct_forloop).timeit(1) time_vectorize = Timer(dotproduct_vectorize).timeit(1) print("Finding dot product takes %0.9f units using for loop" % time_forloop) print("Finding dot product takes %0.9f units using vectorization" % time_vectorize)

Finding dot product takes 0.155011500 units using for loop Finding dot product takes 0.000219400 units using vectorization