get groups of consecutive elements of a numpy array based on condition

  • Last Update :
  • Techknowledgy :
1._
def consecutive(data, stepsize = 1):
   return np.split(data, np.where(np.diff(data) != stepsize)[0] + 1)

a = np.array([0, 47, 48, 49, 50, 97, 98, 99])
consecutive(a)

yields

[array([0]), array([47, 48, 49, 50]), array([97, 98, 99])]

Here's a lil func that might help:

def group_consecutives(vals, step = 1):
   ""
"Return list of consecutive lists of numbers from vals (number list)."
""
run = []
result = [run]
expect = None
for v in vals:
   if (v == expect) or(expect is None):
      run.append(v)
else:
   run = [v]
result.append(run)
expect = v + step
return result

   >>>
   group_consecutives(a)[[0], [47, 48, 49, 50], [97, 98, 99]] >>>
   group_consecutives(a, step = 47)[[0, 47], [48], [49], [50, 97], [98], [99]]

this is what I came up so far: not sure is 100% correct

import numpy as np
a = np.array([0, 47, 48, 49, 50, 97, 98, 99])
print np.split(a, np.cumsum(np.where(a[1: ] - a[: -1] > 1)) + 1)

returns:

>>> [array([0]), array([47, 48, 49, 50]), array([97, 98, 99])]

Get where diff isn't one

diffs = numpy.diff(array) != 1

Get the indexes of diffs, grab the first dimension and add one to all because diff compares with the previous index

indexes = numpy.nonzero(diffs)[0] + 1

Split with the given indexes

groups = numpy.split(array, indexes)

It turns out that instead of np.split, list comprehension is more performative. So the below function (almost like @unutbu's consecutive function except it uses a list comprehension to split the array) is much faster:

def consecutive_w_list_comprehension(arr, stepsize = 1):
   idx = np.r_[0, np.where(np.diff(arr) != stepsize)[0] + 1, len(arr)]
return [arr[i: j]
   for i, j in zip(idx, idx[1: ])
]

For example, for an array of length 100_000, consecutive_w_list_comprehension is over 4x faster:

arr = np.sort(np.random.choice(range(150000), size = 100000, replace = False))

   %
   timeit - n 100 consecutive(arr)
96.1 ms± 1.22 ms per loop(mean± std.dev.of 7 runs, 100 loops each)

   %
   timeit - n 100 consecutive_w_list_comprehension(arr)
23.2 ms± 858 µs per loop(mean± std.dev.of 7 runs, 100 loops each)

Code used to produce the plot above:

import perfplot
import numpy as np

def consecutive(data, stepsize = 1):
   return np.split(data, np.where(np.diff(data) != stepsize)[0] + 1)

def consecutive_w_list_comprehension(arr, stepsize = 1):
   idx = np.r_[0, np.where(np.diff(arr) != stepsize)[0] + 1, len(arr)]
return [arr[i: j]
   for i, j in zip(idx, idx[1: ])
]

def group_consecutives(vals, step = 1):
   run = []
result = [run]
expect = None
for v in vals:
   if (v == expect) or(expect is None):
      run.append(v)
else:
   run = [v]
result.append(run)
expect = v + step
return result

def JozeWs(array):
   diffs = np.diff(array) != 1
indexes = np.nonzero(diffs)[0] + 1
groups = np.split(array, indexes)
return groups

perfplot.show(
   setup = lambda n: np.sort(np.random.choice(range(2 * n), size = n, replace = False)),
   kernels = [consecutive, consecutive_w_list_comprehension, group_consecutives, JozeWs],
   labels = ['consecutive', 'consecutive_w_list_comprehension', 'group_consecutives', 'JozeWs'],
   n_range = [2 ** k
      for k in range(5, 22)
   ],
   equality_check = lambda * lst: all((x == y).all() for x, y in zip( * lst)),
   xlabel = '~len(arr)'
)

You can iterate over a list using

for i in range(len(a)):
   print a[i]

You could test the next element in the list meets some criteria like follows

if a[i] == a[i] + 1:
   print "it must be a consecutive run"

And you can store results seperately in

results = []

Suggestion : 2

I have a NumPy array as follows:,Based on a previous question I can count the anycodings_python number c which is defined by the number of anycodings_python times the elements in a are less than b 2 or anycodings_python more times consecutively.,Now I would like to output an array each anycodings_python time the condition is met instead of anycodings_python counting the number of times the condition anycodings_python is met.,But none of them achieved what I am anycodings_python searching for. Can someone point me to the anycodings_python right Python tools in order to output the anycodings_python different arrays satisfying my condition?

I have a NumPy array as follows:

import numpy as np
a = np.array([1, 4, 2, 6, 4, 4, 6, 2, 7, 6, 2, 8, 9, 3, 6, 3, 4, 4, 5, 8])

Based on a previous question I can count the anycodings_python number c which is defined by the number of anycodings_python times the elements in a are less than b 2 or anycodings_python more times consecutively.

from itertools
import groupby
b = 6
sum(len(list(g)) >= 2
   for i, g in groupby(a < b) if i)

So with this example the right output would anycodings_python be:

array1 = [1, 4, 2]
array2 = [4, 4]
array3 = [3, 4, 4, 5]

So far I have tried different options:

np.isin((len(list(g)) >= 2
   for i, g in groupby(a < b) if i), a)

and

np.extract((len(list(g)) >= 2
   for i, g in groupby(a < b) if i), a)

Based on this answer I came up with the anycodings_python following solution using np.split which anycodings_python is more efficent than both previously anycodings_python added answers here:

array = np.append(a, -np.inf) # padding so we don 't lose last element
mask = array >= 6 # values to be removed
split_indices = np.where(mask)[0]
for subarray in np.split(array, split_indices + 1):
   if len(subarray) > 2:
   print(subarray[: -1])

gives:

[1. 4. 2.]
[4. 4.]
[3. 4. 4. 5.]

Use groupby and grab the groups:

from itertools
import groupby

lst = []
b = 6
for i, g in groupby(a, key = lambda x: x < b):
   grp = list(g)
if i and len(grp) >= 2:
   lst.append(grp)

print(lst)

#[[1, 4, 2], [4, 4], [3, 4, 4, 5]]

This task is very similar to image anycodings_python labeling, but, in your case, it is anycodings_python one-dimensional. SciPy library provides anycodings_python some useful functionality for image anycodings_python processing that we could employ here:

import numpy as np
from scipy.ndimage import(binary_dilation,
   binary_erosion,
   label)

a = np.array([1, 4, 2, 6, 4, 4, 6, 2, 7, 6, 2, 8, 9, 3, 6, 3, 4, 4, 5, 8])
b = 6 # your threshold
min_consequent_count = 2

mask = a < b
structure = [False] + [True] * min_consequent_count # used
for erosion and dilation
eroded = binary_erosion(mask, structure)
dilated = binary_dilation(eroded, structure)
labeled_array, labels_count = label(dilated) # labels_count == c

for label_number in range(1, labels_count + 1): # labeling starts from 1
subarray = a[labeled_array == label_number]
print(subarray)

gives:

[1 4 2]
[4 4]
[3 4 4 5]

mask = a < b returns a boolean array anycodings_python with True values where elements are less anycodings_python than the threshold b:

array([True, True, True, False, True, True, False, True, False,
   False, True, False, False, True, False, True, True, True,
   True, False
])

We managed to remove single True values anycodings_python but we need to get the initial anycodings_python configuration for other groups. In order anycodings_python to do so, we use binary dilation with anycodings_python the same structure:

>>> dilated = binary_dilation(eroded, structure) >>>
   dilated
array([True, True, True, False, True, True, False, False, False,
   False, False, False, False, False, False, True, True, True,
   True, False
])

And as a final step, we label each group anycodings_python with scipy.ndimage.label:

>>> labeled_array, labels_count = label(dilated) >>>
   labeled_array
array([1, 1, 1, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 0]) >>>
   labels_count
3

mask = a < b returns a boolean array anycodings_python with True values where elements are less anycodings_python than the threshold b:

array([True, True, True, False, True, True, False, True, False,
   False, True, False, False, True, False, True, True, True,
   True, False
])

As you can see the result contains some anycodings_python True elements that don't have any other anycodings_python True neighbors around them. To eliminate anycodings_python them we could use binary erosion. I use anycodings_python scipy.ndimage.binary_erosion for that anycodings_python purpose. Its default structure parameter anycodings_python is not suitable for our needs as it will anycodings_python also delete two consequent True values, anycodings_python so I construct my own:

>>> structure = [False] + [True] * min_consequent_count >>>
   structure[False, True, True] >>>
   eroded = binary_erosion(mask, structure) >>>
   eroded
array([True, True, False, False, True, False, False, False, False,
   False, False, False, False, False, False, True, True, True,
   False, False
])

Suggestion : 3

I have to cluster the consecutive elements from a NumPy array. Considering the following example, 1 day ago Python I have to cluster the consecutive elements from a NumPy array. Considering the following example The output should be a list of tuples as … Press J to jump to the feed. ,  › How to set a property in powershell on an instance of a class that implements idictionary and icollection , 1 day ago How to find the groups of consecutive elements in a NumPy array - PYTHON [ Ext for Developers : https://www.hows.tech/p/recommended.html ] How to find the g...


a = [0, 47, 48, 49, 50, 97, 98, 99]
def consecutive(data, stepsize = 1): return np.split(data, np.where(np.diff(data) != stepsize)[0] + 1) a = np.array([0, 47, 48, 49, 50, 97, 98, 99]) consecutive(a)
[array([0]), array([47, 48, 49, 50]), array([97, 98, 99])]
def group_consecutives(vals, step = 1): ""
"Return list of consecutive lists of numbers from vals (number list)."
""
run = [] result = [run] expect = Nonefor v in vals: if (v == expect) or(expect is None): run.append(v)
else: run = [v] result.append(run) expect = v + stepreturn result >>> group_consecutives(a)[[0], [47, 48, 49, 50], [97, 98, 99]] >>> group_consecutives(a, step = 47)[[0, 47], [48], [49], [50, 97], [98], [99]]

Suggestion : 4

The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. For example, a list is a good candidate for conversion:,Whenever you see “array”, “NumPy array”, or “ndarray” in the text, with few exceptions they all refer to the same thing: the ndarray object.,NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:,As a simple example, suppose we wished to evaluate the function sqrt(x^2 + y^2) across a regular grid of values. The np.meshgrid function takes two 1D arrays and produces two 2D matrices corresponding to all pairs of (x, y) in the two arrays:

In[13]: data1 = [6, 7.5, 8, 0, 1]

In[14]: arr1 = np.array(data1)

In[15]: arr1
Out[15]: array([6., 7.5, 8., 0., 1.])
In[27]: arr1 = np.array([1, 2, 3], dtype = np.float64)

In[28]: arr2 = np.array([1, 2, 3], dtype = np.int32)

In[29]: arr1.dtype In[30]: arr2.dtype
Out[29]: dtype('float64') Out[30]: dtype('int32')
In[45]: arr = np.array([
   [1., 2., 3.],
   [4., 5., 6.]
])

In[46]: arr
Out[46]:
   array([
      [1., 2., 3.],
      [4., 5., 6.]
   ])

In[47]: arr * arr In[48]: arr - arr
Out[47]: Out[48]:
   array([
      [1., 4., 9.], array([
         [0., 0., 0.],
         [16., 25., 36.]
      ])[0., 0., 0.]
   ])
In[51]: arr = np.arange(10)

In[52]: arr
Out[52]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In[53]: arr[5]
Out[53]: 5

In[54]: arr[5: 8]
Out[54]: array([5, 6, 7])

In[55]: arr[5: 8] = 12

In[56]: arr
Out[56]: array([0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
In[75]: arr[1: 6]
Out[75]: array([1, 2, 3, 4, 64])
In[83]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In[84]: data = np.random.randn(7, 4)

In[85]: names
Out[85]:
   array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],
      dtype = '|S4')

In[86]: data
Out[86]:
   array([
      [-0.048, 0.5433, -0.2349, 1.2792],
      [-0.268, 0.5465, 0.0939, -2.0445],
      [-0.047, -2.026, 0.7719, 0.3103],
      [2.1452, 0.8799, -0.0523, 0.0672],
      [-1.0023, -0.1698, 1.1503, 1.7289],
      [0.1913, 0.4544, 0.4519, 0.5535],
      [0.5994, 0.8174, -0.9297, -1.2564]
   ])

Suggestion : 5

And use idxmax() to get the index of the first 1 occurence,To visualize the way you described, just reshape the output,Extract the first and last indices of all sequences of 1s in a numpy array and append them to a list?,extract the first occurrence in numpy array following the nan

IIUC, you can first make your np array 2D and build a data frame, which makes everything easier. Take a look

row, cols = m.shape[0], m.shape[1] * m.shape[2]
df = pd.DataFrame(m.reshape(row, cols))

0 1 2 3
0 1.0 0.0 0.0 1.0
1 0.0 1.0 0.0 1.0
2 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 0.0
4 1.0 0.0 0.0 1.0
5 1.0 1.0 1.0 1.0

Now you can use a reverse rolling window of 3 on axis=0 and check if all elements are 1

ndf = df[::-1].rolling(3, axis = 0).apply(all, raw = True)[::-1]

0 1 2 3
0 NaN NaN NaN 1.0
1 NaN 1.0 NaN NaN
2 1.0 NaN NaN NaN
3 1.0 NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN

And use idxmax() to get the index of the first 1 occurence

ndf[ndf >= 1].idxmax()

0 2.0
1 1.0
2 NaN
3 0.0
dtype: float