In Part 1 of our series on writing efficient code with NumPy we cover why loops are slow in Python, and how to replace them with vectorized code. We also dig deep into how broadcasting works, along with a few practical examples. ,Phew! That was one detailed post! Truth be said, vectorization and broadcasting are two cornerstones of writing efficient code in NumPy and that is why I thought the topics warranted such a long discussion. I encourage you to come up with toy examples to get a better grasp of the concepts. ,The good news, however, is that NumPy provides us with a feature called Broadcasting, which defines how arithmetic operations are to be performed on arrays of unequal size. According to the SciPy docs page on broadcasting, ,In the next part, we will use the things we covered in this post to optimize a naive implementation of the K-Means clustering algorithm (implemented using Python lists and loops) using vectorization and broadcasting, achieving speed-ups of 70x!

arr = np.arange(12).reshape(3, 4) col_vector = np.array([5, 6, 7]) num_cols = arr.shape[1] for col in range(num_cols): arr[: , col] += col_vector

I'd probably use a `Counter`

and a list comprehension to solve this:

```
In[1]: import numpy as np
...:
...: unique_words = np.array(['a', 'b', 'c', 'd'])
...: array_to_compare = np.array(['a', 'b', 'a', 'd'])
In[2]: from collections
import Counter
In[3]: counter = Counter(array_to_compare)
In[4]: counter
Out[4]: Counter({
'a': 2,
'b': 1,
'd': 1
})
In[5]: vector_array = np.array([counter[key]
for key in unique_words
])
In[6]: vector_array
Out[6]: array([2, 1, 0, 1])
```

A `numpy`

comparison of array values using `broadcasting`

:

```
In[76]: unique_words[: , None] == array_to_compare
Out[76]:
array([
[True, False, True, False],
[False, True, False, False],
[False, False, False, False],
[False, False, False, True]
])
In[77]: (unique_words[: , None] == array_to_compare).sum(1)
Out[77]: array([2, 1, 0, 1])
In[78]: timeit(unique_words[: , None] == array_to_compare).sum(1)
9.5 µs± 2.79 ns per loop(mean± std.dev.of 7 runs, 100000 loops each)
```

But `Counter`

is also a good choice:

```
In[72]: % % timeit
...: c = Counter(array_to_compare)
...: [c[key]
for key in unique_words
]
12.7 µs± 30.6 ns per loop(mean± std.dev.of 7 runs, 100000 loops each)
```

Your use of `count_nonzero`

can be improved with

```
In[73]: % % timeit
...: words = unique_words.tolist()
...: vector_array = np.zeros(len(words))
...: for i, word in enumerate(words):
...: counter = np.count_nonzero(array_to_compare == word)
...: vector_array[i] = counter
...:
23.4 µs± 505 ns per loop(mean± std.dev.of 7 runs, 10000 loops each)
```

Similar to @DanielLenz's answer, but using `np.unique`

to create a `dict`

:

```
import numpy as np
unique_words = np.array(['a', 'b', 'c', 'd'])
array_to_compare = np.array(['a', 'b', 'a', 'd'])
counts = dict(zip( * np.unique(array_to_compare, return_counts = True)))
result = np.array([counts[word]
if word in counts
else 0
for word in unique_words
])[2 1 0 1]
```

Vectorization is a technique of implementing array operations without using for loops. Instead, we use functions defined by various modules which are highly optimized that reduces the running and execution time of code. Vectorized array operations will be faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.,Vectorization is used widely in complex systems and mathematical models because of faster execution and less code size. Now you know how to use vectorization in python, you can apply this to make your project execute faster. So Congratulations!,The element-wise product of two matrices is the algebraic operation in which each element of the first matrix is multiplied by its corresponding element in the second matrix. The dimension of the matrices should be the same.,Here we can see numpy operations are way faster than built-in methods which are faster than for loops.

import numpy as np from timeit import Timer # Creating a large array of size 10 ** 6 array = np.random.randint(1000, size = 10 ** 6) # method that adds elements using for loop def add_forloop(): new_array = [element + 1 for element in array ] # method that adds elements using vectorization def add_vectorized(): new_array = array + 1 # Finding execution time using timeit computation_time_forloop = Timer(add_forloop).timeit(1) computation_time_vectorized = Timer(add_vectorized).timeit(1) print("Computation time is %0.9f using for-loop" % execution_time_forloop) print("Computation time is %0.9f using vectorization" % execution_time_vectorized)

```
Computation time is 0.001202600 using
for -loop
Computation time is 0.000236700 using vectorization
```

import numpy as np from timeit import Timer # Creating a large array of size 10 ** 5 array = np.random.randint(1000, size = 10 ** 5) def sum_using_forloop(): sum_array = 0 for element in array: sum_array += element def sum_using_builtin_method(): sum_array = sum(array) def sum_using_numpy(): sum_array = np.sum(array) time_forloop = Timer(sum_using_forloop).timeit(1) time_builtin = Timer(sum_using_builtin_method).timeit(1) time_numpy = Timer(sum_using_numpy).timeit(1) print("Summing elements takes %0.9f units using for loop" % time_forloop) print("Summing elements takes %0.9f units using builtin method" % time_builtin) print("Summing elements takes %0.9f units using numpy" % time_numpy) print() def max_using_forloop(): maximum = array[0] for element in array: if element > maximum: maximum = element def max_using_builtin_method(): maximum = max(array) def max_using_numpy(): maximum = np.max(array) time_forloop = Timer(max_using_forloop).timeit(1) time_builtin = Timer(max_using_built - in_method).timeit(1) time_numpy = Timer(max_using_numpy).timeit(1) print("Finding maximum element takes %0.9f units using for loop" % time_forloop) print("Finding maximum element takes %0.9f units using built-in method" % time_builtin) print("Finding maximum element takes %0.9f units using numpy" % time_numpy)

Summing elements takes 0.069638600 units using for loop Summing elements takes 0.044852800 units using builtin method Summing elements takes 0.000202500 units using numpy Finding maximum element takes 0.034151200 units using for loop Finding maximum element takes 0.029331300 units using builtin method Finding maximum element takes 0.000242700 units using numpy

import numpy as np from timeit import Timer # Create 2 vectors of same length length = 100000 vector1 = np.random.randint(1000, size = length) vector2 = np.random.randint(1000, size = length) # Finds dot product of vectors using for loop def dotproduct_forloop(): dot = 0.0 for i in range(length): dot += vector1[i] * vector2[i] # Finds dot product of vectors using numpy vectorization def dotproduct_vectorize(): dot = np.dot(vector1, vector2) # Finding execution time using timeit time_forloop = Timer(dotproduct_forloop).timeit(1) time_vectorize = Timer(dotproduct_vectorize).timeit(1) print("Finding dot product takes %0.9f units using for loop" % time_forloop) print("Finding dot product takes %0.9f units using vectorization" % time_vectorize)

Finding dot product takes 0.155011500 units using for loop Finding dot product takes 0.000219400 units using vectorization

Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or a tuple of numpy arrays. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.,The signature argument allows for vectorizing functions that act on non-scalar arrays of fixed length. For example, you can use it for a vectorized calculation of Pearson correlation coefficient and its p-value:,Set of strings or integers representing the positional or keyword arguments for which the function will not be vectorized. These will be passed directly to pyfunc unmodified.,The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

```
>>> def myfunc(a, b):
..."Return a-b if a>b, otherwise return a+b"
...
if a > b:
...
return a - b
...
else:
...
return a + b
```

>>> vfunc = np.vectorize(myfunc) >>> vfunc([1, 2, 3, 4], 2) array([3, 4, 1, 2])

```
>>> vfunc.__doc__ 'Return a-b if a>b, otherwise return a+b' >>>
vfunc = np.vectorize(myfunc, doc = 'Vectorized `myfunc`') >>>
vfunc.__doc__ 'Vectorized `myfunc`'
```

```
>>> out = vfunc([1, 2, 3, 4], 2)
>>> type(out[0])
<class 'numpy.int64'>
>>> vfunc = np.vectorize(myfunc, otypes=[float])
>>> out = vfunc([1, 2, 3, 4], 2)
>>> type(out[0])
<class 'numpy.float64'>
```

```
>>> def mypolyval(p, x):
..._p = list(p)
...res = _p.pop(0)
...
while _p:
...res = res * x + _p.pop(0)
...
return res >>>
vpolyval = np.vectorize(mypolyval, excluded = ['p']) >>>
vpolyval(p = [1, 2, 3], x = [0, 1])
array([3, 6])
```

>>> vpolyval.excluded.add(0) >>> vpolyval([1, 2, 3], x = [0, 1]) array([3, 6])

Last Updated : 04 Oct, 2019

**Output:**

dot_product = 833323333350000.0 Computation time = 35.59449199999999 ms n_dot_product = 833323333350000 Computation time = 0.1559900000000225 ms