# how do i vectorize this loop in numpy?

• Last Update :
• Techknowledgy :

In Part 1 of our series on writing efficient code with NumPy we cover why loops are slow in Python, and how to replace them with vectorized code. We also dig deep into how broadcasting works, along with a few practical examples. ,Phew! That was one detailed post! Truth be said, vectorization and broadcasting are two cornerstones of writing efficient code in NumPy and that is why I thought the topics warranted such a long discussion. I encourage you to come up with toy examples to get a better grasp of the concepts. ,The good news, however, is that NumPy provides us with a feature called Broadcasting, which defines how arithmetic operations are to be performed on arrays of unequal size. According to the SciPy docs page on broadcasting, ,In the next part, we will use the things we covered in this post to optimize a naive implementation of the K-Means clustering algorithm (implemented using Python lists and loops) using vectorization and broadcasting, achieving speed-ups of 70x!

```arr = np.arange(12).reshape(3, 4)

col_vector = np.array([5, 6, 7])

num_cols = arr.shape[1]

for col in range(num_cols):
arr[: , col] += col_vector```

Suggestion : 2

Here's a vectorized approach -

```m, n, r = volume.shape
x, y, z = np.mgrid[0: m, 0: n, 0: r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
mask = X ** 2 + Y ** 2 + Z ** 2 < radius ** 2```

Possible improvement : We can probably speedup the last step with `numexpr` module -

```import numexpr as ne

We can also gradually build the three ranges corresponding to the shape parameters and perform the subtraction against the three elements of `roi` on the fly without actually creating the meshes as done earlier with `np.mgrid`. This would be benefited by the use of `broadcasting` for efficiency purposes. The implementation would look like this -

```m, n, r = volume.shape
vals = ((np.arange(m) - roi[0]) ** 2)[: , None, None] + \
((np.arange(n) - roi[1]) ** 2)[: , None] + ((np.arange(r) - roi[2]) ** 2)

Function definitions -

```def vectorized_app1(volume, roi, radius):
m, n, r = volume.shape
x, y, z = np.mgrid[0: m, 0: n, 0: r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
return X ** 2 + Y ** 2 + Z ** 2 < radius ** 2

m, n, r = volume.shape
x, y, z = np.mgrid[0: m, 0: n, 0: r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')

m, n, r = volume.shape
vals = ((np.arange(m) - roi[0]) ** 2)[: , None, None] + \
((np.arange(n) - roi[1]) ** 2)[: , None] + ((np.arange(r) - roi[2]) ** 2)
return vals < radius ** 2

m, n, r = volume.shape
x, y, z = np.ogrid[0: m, 0: n, 0: r] - roi
return (x ** 2 + y ** 2 + z ** 2) < radius ** 2```

Timings -

```In[106]: # Setup input arrays
...: volume = np.random.rand(90, 110, 100) # Half of original input sizes
...: roi = np.random.rand(3)
...:

1 loops, best of 3: 41.4 s per loop

In[108]: % timeit vectorized_app1(volume, roi, radius)
10 loops, best of 3: 62.3 ms per loop

In[109]: % timeit vectorized_app1_improved(volume, roi, radius)
10 loops, best of 3: 47 ms per loop

In[110]: % timeit vectorized_app2(volume, roi, radius)
100 loops, best of 3: 4.26 ms per loop

In[139]: % timeit vectorized_app2_simplified(volume, roi, radius)
100 loops, best of 3: 4.36 ms per loop```

Say you first build an `xyzy` array:

```import itertools

xyz = [np.array(p) for p in itertools.product(range(volume.shape[0]), range(volume.shape[1]), range(volume.shape[2]))]```
`np.linalg.norm(xyz - roi, axis = 1) < radius`

Suggestion : 3

Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or a tuple of numpy arrays. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.,The signature argument allows for vectorizing functions that act on non-scalar arrays of fixed length. For example, you can use it for a vectorized calculation of Pearson correlation coefficient and its p-value:,Set of strings or integers representing the positional or keyword arguments for which the function will not be vectorized. These will be passed directly to pyfunc unmodified.,The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

```>>> def myfunc(a, b):
..."Return a-b if a>b, otherwise return a+b"
...
if a > b:
...
return a - b
...
else:
...
return a + b```
```>>> vfunc = np.vectorize(myfunc) >>>
vfunc([1, 2, 3, 4], 2)
array([3, 4, 1, 2])```
```>>> vfunc.__doc__ 'Return a-b if a>b, otherwise return a+b' >>>
vfunc = np.vectorize(myfunc, doc = 'Vectorized `myfunc`') >>>
vfunc.__doc__ 'Vectorized `myfunc`'```
```>>> out = vfunc([1, 2, 3, 4], 2)
>>> type(out[0])
<class 'numpy.int64'>
>>> vfunc = np.vectorize(myfunc, otypes=[float])
>>> out = vfunc([1, 2, 3, 4], 2)
>>> type(out[0])
<class 'numpy.float64'>```
```>>> def mypolyval(p, x):
..._p = list(p)
...res = _p.pop(0)
...
while _p:
...res = res * x + _p.pop(0)
...
return res >>>
vpolyval = np.vectorize(mypolyval, excluded = ['p']) >>>
vpolyval(p = [1, 2, 3], x = [0, 1])
array([3, 6])```
```>>> vpolyval.excluded.add(0) >>>
vpolyval([1, 2, 3], x = [0, 1])
array([3, 6])```

Suggestion : 4

Use broadcasting to implicitly loop over data,Vectorize calculations to avoid explicit loops,What if what we really want is pairwise addition of a, b? Without broadcasting, we could accomplish this by looping:,NumPy Broadcasting Article

```import numpy as np

a = np.array([10, 20, 30, 40])
a + 5```
`array([15, 25, 35, 45])`
```b = np.array([5])
a + b```
```b = np.array([5, 6, 7])
a + b```
```b = np.array([5, 5, 10, 10])
a + b```
`array([15, 25, 40, 50])`

Suggestion : 5

Last Updated : 04 Oct, 2019

Output:

```dot_product = 833323333350000.0
Computation time = 35.59449199999999 ms

n_dot_product = 833323333350000
Computation time = 0.1559900000000225 ms```

Suggestion : 6

Vectorization is a technique of implementing array operations without using for loops. Instead, we use functions defined by various modules which are highly optimized that reduces the running and execution time of code. Vectorized array operations will be faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.,Vectorization is used widely in complex systems and mathematical models because of faster execution and less code size. Now you know how to use vectorization in python, you can apply this to make your project execute faster. So Congratulations!,The element-wise product of two matrices is the algebraic operation in which each element of the first matrix is multiplied by its corresponding element in the second matrix. The dimension of the matrices should be the same.,Here we can see numpy operations are way faster than built-in methods which are faster than for loops.

```import numpy as np
from timeit
import Timer

# Creating a large array of size 10 ** 6
array = np.random.randint(1000, size = 10 ** 6)

# method that adds elements using
for loop
new_array = [element + 1
for element in array
]

# method that adds elements using vectorization
new_array = array + 1

# Finding execution time using timeit

print("Computation time is %0.9f using for-loop" % execution_time_forloop)
print("Computation time is %0.9f using vectorization" % execution_time_vectorized)```
```Computation time is 0.001202600 using
for -loop
Computation time is 0.000236700 using vectorization```
```import numpy as np
from timeit
import Timer

# Creating a large array of size 10 ** 5
array = np.random.randint(1000, size = 10 ** 5)

def sum_using_forloop():
sum_array = 0
for element in array:
sum_array += element

def sum_using_builtin_method():
sum_array = sum(array)

def sum_using_numpy():
sum_array = np.sum(array)

time_forloop = Timer(sum_using_forloop).timeit(1)
time_builtin = Timer(sum_using_builtin_method).timeit(1)
time_numpy = Timer(sum_using_numpy).timeit(1)

print("Summing elements takes %0.9f units using for loop" % time_forloop)
print("Summing elements takes %0.9f units using builtin method" % time_builtin)
print("Summing elements takes %0.9f units using numpy" % time_numpy)

print()

def max_using_forloop():
maximum = array[0]
for element in array:
if element > maximum:
maximum = element

def max_using_builtin_method():
maximum = max(array)

def max_using_numpy():
maximum = np.max(array)

time_forloop = Timer(max_using_forloop).timeit(1)
time_builtin = Timer(max_using_built - in_method).timeit(1)
time_numpy = Timer(max_using_numpy).timeit(1)

print("Finding maximum element takes %0.9f units using for loop" % time_forloop)
print("Finding maximum element takes %0.9f units using built-in method" % time_builtin)
print("Finding maximum element takes %0.9f units using numpy" % time_numpy)```
```Summing elements takes 0.069638600 units using
for loop
Summing elements takes 0.044852800 units using builtin method
Summing elements takes 0.000202500 units using numpy

Finding maximum element takes 0.034151200 units using
for loop
Finding maximum element takes 0.029331300 units using builtin method
Finding maximum element takes 0.000242700 units using numpy```
```import numpy as np
from timeit
import Timer

# Create 2 vectors of same length
length = 100000
vector1 = np.random.randint(1000, size = length)
vector2 = np.random.randint(1000, size = length)

# Finds dot product of vectors using
for loop
def dotproduct_forloop():
dot = 0.0
for i in range(length):
dot += vector1[i] * vector2[i]

# Finds dot product of vectors using numpy vectorization
def dotproduct_vectorize():
dot = np.dot(vector1, vector2)

# Finding execution time using timeit
time_forloop = Timer(dotproduct_forloop).timeit(1)
time_vectorize = Timer(dotproduct_vectorize).timeit(1)

print("Finding dot product takes %0.9f units using for loop" % time_forloop)
print("Finding dot product takes %0.9f units using vectorization" % time_vectorize)```
```Finding dot product takes 0.155011500 units using
for loop
Finding dot product takes 0.000219400 units using vectorization```