fastest way to filter a numpy array by a set of values

  • Last Update :
  • Techknowledgy :

You can do it in one line, but you have to use list(b), so it might not actually be any faster:

>>> a[np.in1d(a[: , 2], list(b))]
array([
   [430, 382, 121486, 2],
   [451, 412, 153521, 2],
   [607, 567, 121473, 2]
])

It works because np.in1d tells you which of the first item are in the second:

>>> np.in1d(a[: , 2], list(b))
array([False, True, True, False, False, False, False, True, False], dtype = bool)

For large a and b, this is probably faster than your solution, as it still uses b as a set, but builds only boolean array instead of rebuilding the entire array one line at a time. For large a and small b, I think np.in1d might be faster.

ainb = np.array([x in b
   for x in a[: , 2]
])
a[ainb]

For relatively small inputs like those in your question, the fastest method is by far and large the naïve one:

np.array([x
   for x in a
   if x[2] in b
])

For larger inputs, @askewchan solution with NumPy may be faster:

a[np.in1d(a[: , 2], list(b))]

However, when using CPython, a Numba-based implementation would be even faster (at all scale):

import numpy as np
import numba as nb

@nb.jit
def custom_filter(arr, values):
   values = set(values)
n, m = arr.shape
result = np.empty((n, m), dtype = arr.dtype)
k = 0
for i in range(n):
   if arr[i, 2] in values:
   result[k,: ] = arr[i,: ]
k += 1
return result[: k,: ].copy()

@nb.jit
def custom_filter2(arr, values):
   values = set(values)
n, m = arr.shape
k = 0
for i in range(n):
   if arr[i, 2] in values:
   k += 1
result = np.empty((k, m), dtype = arr.dtype)
k = 0
for i in range(n):
   if arr[i, 2] in values:
   result[k,: ] = arr[i,: ]
k += 1
return result

Suggestion : 2

You can filter a numpy array by creating a list or an array of boolean values indicative of whether or not to keep the element in the corresponding array. This method is called boolean mask slicing. For example, if you filter the array [1, 2, 3] with the boolean list [True, False, True], the filtered array would be [1, 3].,Alternatively, you can also use np.where() to get the indexes of the elements to keep and filter the numpy array based on those indexes. The following is the syntax –,The indexes of elements to keep is printed followed by the filtered array. The np.where() function gives us the indexes satisfying the condition which are then used to filter the array. A shorter version of the above code is –,The returned array only contains elements from the original array that are greater than 5 and less than 9, satisfying both the conditions.

The following is the syntax to filter a numpy array using this method –

# arr is a numpy array
# boolean array of which elements to keep, here elements less than 4
mask = arr < 4
# filter the array
arr_filtered = arr[mask]
# above filtering in a single line
arr_filtered = arr[arr < 4]

Alternatively, you can also use np.where() to get the indexes of the elements to keep and filter the numpy array based on those indexes. The following is the syntax –

# arr is a numpy array
# indexes to keep based on the condition, here elements less than 4
indexes_to_keep = np.where(arr < 4)
# filter the array
arr_filtered = arr[indexes_to_keep]
# above filtering in a single line
arr_filtered = arr[np.where(arr < 4)]

First, we will create a numpy array that we will be using throughout this tutorial –

import numpy as np

# create a numpy array
arr = np.array([1, 4, 2, 7, 9, 3, 5, 8])
# print the array
print(arr)

Let’s filter the above array arr on a single condition, say elements greater than 5 using the boolean masking method.

# boolean mask of elements to keep
mask = arr > 5
print(mask)

# filter the array
arr_filtered = arr[mask]
# show the filtered array
print(arr_filtered)

You can see that we printed the boolean mask and the filtered array. Masking and filtering can be done in a single line –

# filter array
filtered_arr = arr[arr > 5]
print(filtered_arr)

Suggestion : 3

If arr is a subclass of ndarray, a base class ndarray is returned. Here, we first create a numpy array and a filter with its values to be filtered. To filter we used this fltr in numpy.in1d () method and stored as its values in the original array that return True if condition fulfills. , Using Numpy functions. Numpy has built-in functions for creating arrays. We will cover some of them in this guide. First, let’s create a one-dimensional array or an array with a rank 1. arange is a widely used function to quickly create an array. Passing a value 20 to the arange function creates an array with values ranging from 0 to 19. , 1 week ago Jun 05, 2019  · Let’s try another one with an array. We’ll build a Numpy array of size 1000x1000 with a value of 1 at each and again try to multiple each element by a float 1.0000001. The code is shown below. On the same machine, multiplying those array values by 1.0000001 in a regular floating point loop took 1.28507 seconds. ,For large a and b, this is probably faster than your solution, as it still uses b as a set, but builds only boolean array instead of rebuilding the entire array one line at a time. For large a and small b, I think np.in1d might be faster.


>> a = np.array([
   [368, 322, 175238, 2],
   [430, 382, 121486, 2],
   [451, 412, 153521, 2],
   [480, 442, 121468, 2],
   [517, 475, 109543, 2],
   [543, 503, 121471, 2],
   [576, 537, 100566, 2],
   [607, 567, 121473, 2],
   [640, 597, 153561, 2]
]) >> b = {
   121486,
   153521,
   121473
} >> np.array([x
   for x in a
   if x[2] in b
]) >> array([
   [430, 382, 121486, 2],
   [451, 412, 153521, 2],
   [607, 567, 121473, 2]
])

>>> a[np.in1d(a[: , 2], list(b))] array([
   [430, 382, 121486, 2],
   [451, 412, 153521, 2],
   [607, 567, 121473, 2]
])

Suggestion : 4

Use the Python filter() function to filter a list (or a tuple).,The following shows the syntax of the filter() function:,In fact, you can pass any iterable to the second argument of the filter() function, not just a list.,Summary: in this tutorial, you’ll learn how to filter list elements by using the built-in Python filter() function.

Suppose that you have the following list of scores:

.wp - block - code {
      border: 0;
      padding: 0;
   }

   .wp - block - code > div {
      overflow: auto;
   }

   .shcb - language {
      border: 0;
      clip: rect(1 px, 1 px, 1 px, 1 px); -
      webkit - clip - path: inset(50 % );
      clip - path: inset(50 % );
      height: 1 px;
      margin: -1 px;
      overflow: hidden;
      padding: 0;
      position: absolute;
      width: 1 px;
      word - wrap: normal;
      word - break: normal;
   }

   .hljs {
      box - sizing: border - box;
   }

   .hljs.shcb - code - table {
      display: table;
      width: 100 % ;
   }

   .hljs.shcb - code - table > .shcb - loc {
      color: inherit;
      display: table - row;
      width: 100 % ;
   }

   .hljs.shcb - code - table.shcb - loc > span {
      display: table - cell;
   }

   .wp - block - code code.hljs: not(.shcb - wrap - lines) {
      white - space: pre;
   }

   .wp - block - code code.hljs.shcb - wrap - lines {
      white - space: pre - wrap;
   }

   .hljs.shcb - line - numbers {
      border - spacing: 0;
      counter - reset: line;
   }

   .hljs.shcb - line - numbers > .shcb - loc {
      counter - increment: line;
   }

   .hljs.shcb - line - numbers.shcb - loc > span {
      padding - left: 0.75 em;
   }

   .hljs.shcb - line - numbers.shcb - loc::before {
      border - right: 1 px solid #ddd;
      content: counter(line);
      display: table - cell;
      padding: 0 0.75 em;
      text - align: right; -
      webkit - user - select: none; -
      moz - user - select: none; -
      ms - user - select: none;
      user - select: none;
      white - space: nowrap;
      width: 1 % ;
   }
scores = [70, 60, 80, 90, 50] Code language: Python(python)

To get all elements from the scores list where each element is greater than or equal to 70, you use the following code:

scores = [70, 60, 80, 90, 50]

filtered = []

for score in scores:
   if score >= 70:
   filtered.append(score)

print(filtered) Code language: Python(python)

The following shows the syntax of the filter() function:

filter(fn, list) Code language: Python(python)

Output:

[70, 80, 90] Code language: Python(python)

Suppose you have the following list of tuples:

countries = [
   ['China', 1394015977],
   ['United States', 329877505],
   ['India', 1326093247],
   ['Indonesia', 267026366],
   ['Bangladesh', 162650853],
   ['Pakistan', 233500636],
   ['Nigeria', 214028302],
   ['Brazil', 21171597],
   ['Russia', 141722205],
   ['Mexico', 128649565]
]
Code language: Python(python)