# how to vectorize a simple for loop in python/numpy

• Last Update :
• Techknowledgy :

In Part 1 of our series on writing efficient code with NumPy we cover why loops are slow in Python, and how to replace them with vectorized code. We also dig deep into how broadcasting works, along with a few practical examples. ,Phew! That was one detailed post! Truth be said, vectorization and broadcasting are two cornerstones of writing efficient code in NumPy and that is why I thought the topics warranted such a long discussion. I encourage you to come up with toy examples to get a better grasp of the concepts. ,The good news, however, is that NumPy provides us with a feature called Broadcasting, which defines how arithmetic operations are to be performed on arrays of unequal size. According to the SciPy docs page on broadcasting, ,In the next part, we will use the things we covered in this post to optimize a naive implementation of the K-Means clustering algorithm (implemented using Python lists and loops) using vectorization and broadcasting, achieving speed-ups of 70x!

```arr = np.arange(12).reshape(3, 4)

col_vector = np.array([5, 6, 7])

num_cols = arr.shape[1]

for col in range(num_cols):
arr[: , col] += col_vector```

Suggestion : 2

I'd probably use a `Counter` and a list comprehension to solve this:

```In[1]: import numpy as np
...:
...: unique_words = np.array(['a', 'b', 'c', 'd'])
...: array_to_compare = np.array(['a', 'b', 'a', 'd'])

In[2]: from collections
import Counter

In[3]: counter = Counter(array_to_compare)

In[4]: counter
Out[4]: Counter({
'a': 2,
'b': 1,
'd': 1
})

In[5]: vector_array = np.array([counter[key]
for key in unique_words
])

In[6]: vector_array
Out[6]: array([2, 1, 0, 1])```

A `numpy` comparison of array values using `broadcasting`:

```In[76]: unique_words[: , None] == array_to_compare
Out[76]:
array([
[True, False, True, False],
[False, True, False, False],
[False, False, False, False],
[False, False, False, True]
])
In[77]: (unique_words[: , None] == array_to_compare).sum(1)
Out[77]: array([2, 1, 0, 1])

In[78]: timeit(unique_words[: , None] == array_to_compare).sum(1)
9.5 µs± 2.79 ns per loop(mean± std.dev.of 7 runs, 100000 loops each)```

But `Counter` is also a good choice:

```In[72]: % % timeit
...: c = Counter(array_to_compare)
...: [c[key]
for key in unique_words
]
12.7 µs± 30.6 ns per loop(mean± std.dev.of 7 runs, 100000 loops each)```

Your use of `count_nonzero` can be improved with

```In[73]: % % timeit
...: words = unique_words.tolist()
...: vector_array = np.zeros(len(words))
...: for i, word in enumerate(words):
...: counter = np.count_nonzero(array_to_compare == word)
...: vector_array[i] = counter
...:
23.4 µs± 505 ns per loop(mean± std.dev.of 7 runs, 10000 loops each)```

Similar to @DanielLenz's answer, but using `np.unique` to create a `dict`:

```import numpy as np
unique_words = np.array(['a', 'b', 'c', 'd'])
array_to_compare = np.array(['a', 'b', 'a', 'd'])
counts = dict(zip( * np.unique(array_to_compare, return_counts = True)))
result = np.array([counts[word]
if word in counts
else 0
for word in unique_words
])[2 1 0 1]```

Suggestion : 3

Vectorization is a technique of implementing array operations without using for loops. Instead, we use functions defined by various modules which are highly optimized that reduces the running and execution time of code. Vectorized array operations will be faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.,Vectorization is used widely in complex systems and mathematical models because of faster execution and less code size. Now you know how to use vectorization in python, you can apply this to make your project execute faster. So Congratulations!,The element-wise product of two matrices is the algebraic operation in which each element of the first matrix is multiplied by its corresponding element in the second matrix. The dimension of the matrices should be the same.,Here we can see numpy operations are way faster than built-in methods which are faster than for loops.

```import numpy as np
from timeit
import Timer

# Creating a large array of size 10 ** 6
array = np.random.randint(1000, size = 10 ** 6)

# method that adds elements using
for loop
new_array = [element + 1
for element in array
]

# method that adds elements using vectorization
new_array = array + 1

# Finding execution time using timeit

print("Computation time is %0.9f using for-loop" % execution_time_forloop)
print("Computation time is %0.9f using vectorization" % execution_time_vectorized)```
```Computation time is 0.001202600 using
for -loop
Computation time is 0.000236700 using vectorization```
```import numpy as np
from timeit
import Timer

# Creating a large array of size 10 ** 5
array = np.random.randint(1000, size = 10 ** 5)

def sum_using_forloop():
sum_array = 0
for element in array:
sum_array += element

def sum_using_builtin_method():
sum_array = sum(array)

def sum_using_numpy():
sum_array = np.sum(array)

time_forloop = Timer(sum_using_forloop).timeit(1)
time_builtin = Timer(sum_using_builtin_method).timeit(1)
time_numpy = Timer(sum_using_numpy).timeit(1)

print("Summing elements takes %0.9f units using for loop" % time_forloop)
print("Summing elements takes %0.9f units using builtin method" % time_builtin)
print("Summing elements takes %0.9f units using numpy" % time_numpy)

print()

def max_using_forloop():
maximum = array[0]
for element in array:
if element > maximum:
maximum = element

def max_using_builtin_method():
maximum = max(array)

def max_using_numpy():
maximum = np.max(array)

time_forloop = Timer(max_using_forloop).timeit(1)
time_builtin = Timer(max_using_built - in_method).timeit(1)
time_numpy = Timer(max_using_numpy).timeit(1)

print("Finding maximum element takes %0.9f units using for loop" % time_forloop)
print("Finding maximum element takes %0.9f units using built-in method" % time_builtin)
print("Finding maximum element takes %0.9f units using numpy" % time_numpy)```
```Summing elements takes 0.069638600 units using
for loop
Summing elements takes 0.044852800 units using builtin method
Summing elements takes 0.000202500 units using numpy

Finding maximum element takes 0.034151200 units using
for loop
Finding maximum element takes 0.029331300 units using builtin method
Finding maximum element takes 0.000242700 units using numpy```
```import numpy as np
from timeit
import Timer

# Create 2 vectors of same length
length = 100000
vector1 = np.random.randint(1000, size = length)
vector2 = np.random.randint(1000, size = length)

# Finds dot product of vectors using
for loop
def dotproduct_forloop():
dot = 0.0
for i in range(length):
dot += vector1[i] * vector2[i]

# Finds dot product of vectors using numpy vectorization
def dotproduct_vectorize():
dot = np.dot(vector1, vector2)

# Finding execution time using timeit
time_forloop = Timer(dotproduct_forloop).timeit(1)
time_vectorize = Timer(dotproduct_vectorize).timeit(1)

print("Finding dot product takes %0.9f units using for loop" % time_forloop)
print("Finding dot product takes %0.9f units using vectorization" % time_vectorize)```
```Finding dot product takes 0.155011500 units using
for loop
Finding dot product takes 0.000219400 units using vectorization```

Suggestion : 4

Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or a tuple of numpy arrays. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.,The signature argument allows for vectorizing functions that act on non-scalar arrays of fixed length. For example, you can use it for a vectorized calculation of Pearson correlation coefficient and its p-value:,Set of strings or integers representing the positional or keyword arguments for which the function will not be vectorized. These will be passed directly to pyfunc unmodified.,The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

```>>> def myfunc(a, b):
..."Return a-b if a>b, otherwise return a+b"
...
if a > b:
...
return a - b
...
else:
...
return a + b```
```>>> vfunc = np.vectorize(myfunc) >>>
vfunc([1, 2, 3, 4], 2)
array([3, 4, 1, 2])```
```>>> vfunc.__doc__ 'Return a-b if a>b, otherwise return a+b' >>>
vfunc = np.vectorize(myfunc, doc = 'Vectorized `myfunc`') >>>
vfunc.__doc__ 'Vectorized `myfunc`'```
```>>> out = vfunc([1, 2, 3, 4], 2)
>>> type(out[0])
<class 'numpy.int64'>
>>> vfunc = np.vectorize(myfunc, otypes=[float])
>>> out = vfunc([1, 2, 3, 4], 2)
>>> type(out[0])
<class 'numpy.float64'>```
```>>> def mypolyval(p, x):
..._p = list(p)
...res = _p.pop(0)
...
while _p:
...res = res * x + _p.pop(0)
...
return res >>>
vpolyval = np.vectorize(mypolyval, excluded = ['p']) >>>
vpolyval(p = [1, 2, 3], x = [0, 1])
array([3, 6])```
```>>> vpolyval.excluded.add(0) >>>
vpolyval([1, 2, 3], x = [0, 1])
array([3, 6])```

Suggestion : 5

Last Updated : 04 Oct, 2019

Output:

```dot_product = 833323333350000.0
Computation time = 35.59449199999999 ms

n_dot_product = 833323333350000
Computation time = 0.1559900000000225 ms```