 # get groups of consecutive elements of a numpy array based on condition

• Last Update :
• Techknowledgy :
1._
```def consecutive(data, stepsize = 1):
return np.split(data, np.where(np.diff(data) != stepsize) + 1)

a = np.array([0, 47, 48, 49, 50, 97, 98, 99])
consecutive(a)```

yields

`[array(), array([47, 48, 49, 50]), array([97, 98, 99])]`

Here's a lil func that might help:

```def group_consecutives(vals, step = 1):
""
"Return list of consecutive lists of numbers from vals (number list)."
""
run = []
result = [run]
expect = None
for v in vals:
if (v == expect) or(expect is None):
run.append(v)
else:
run = [v]
result.append(run)
expect = v + step
return result

>>>
group_consecutives(a)[, [47, 48, 49, 50], [97, 98, 99]] >>>
group_consecutives(a, step = 47)[[0, 47], , , [50, 97], , ]```

this is what I came up so far: not sure is 100% correct

```import numpy as np
a = np.array([0, 47, 48, 49, 50, 97, 98, 99])
print np.split(a, np.cumsum(np.where(a[1: ] - a[: -1] > 1)) + 1)```

returns:

`>>> [array(), array([47, 48, 49, 50]), array([97, 98, 99])]`

Get where diff isn't one

`diffs = numpy.diff(array) != 1`

Get the indexes of diffs, grab the first dimension and add one to all because diff compares with the previous index

`indexes = numpy.nonzero(diffs) + 1`

Split with the given indexes

`groups = numpy.split(array, indexes)`

It turns out that instead of `np.split`, list comprehension is more performative. So the below function (almost like @unutbu's `consecutive` function except it uses a list comprehension to split the array) is much faster:

```def consecutive_w_list_comprehension(arr, stepsize = 1):
idx = np.r_[0, np.where(np.diff(arr) != stepsize) + 1, len(arr)]
return [arr[i: j]
for i, j in zip(idx, idx[1: ])
]```

For example, for an array of length 100_000, `consecutive_w_list_comprehension` is over 4x faster:

```arr = np.sort(np.random.choice(range(150000), size = 100000, replace = False))

%
timeit - n 100 consecutive(arr)
96.1 ms± 1.22 ms per loop(mean± std.dev.of 7 runs, 100 loops each)

%
timeit - n 100 consecutive_w_list_comprehension(arr)
23.2 ms± 858 µs per loop(mean± std.dev.of 7 runs, 100 loops each)```

Code used to produce the plot above:

```import perfplot
import numpy as np

def consecutive(data, stepsize = 1):
return np.split(data, np.where(np.diff(data) != stepsize) + 1)

def consecutive_w_list_comprehension(arr, stepsize = 1):
idx = np.r_[0, np.where(np.diff(arr) != stepsize) + 1, len(arr)]
return [arr[i: j]
for i, j in zip(idx, idx[1: ])
]

def group_consecutives(vals, step = 1):
run = []
result = [run]
expect = None
for v in vals:
if (v == expect) or(expect is None):
run.append(v)
else:
run = [v]
result.append(run)
expect = v + step
return result

def JozeWs(array):
diffs = np.diff(array) != 1
indexes = np.nonzero(diffs) + 1
groups = np.split(array, indexes)
return groups

perfplot.show(
setup = lambda n: np.sort(np.random.choice(range(2 * n), size = n, replace = False)),
kernels = [consecutive, consecutive_w_list_comprehension, group_consecutives, JozeWs],
labels = ['consecutive', 'consecutive_w_list_comprehension', 'group_consecutives', 'JozeWs'],
n_range = [2 ** k
for k in range(5, 22)
],
equality_check = lambda * lst: all((x == y).all() for x, y in zip( * lst)),
xlabel = '~len(arr)'
)```

You can iterate over a list using

```for i in range(len(a)):
print a[i]```

You could test the next element in the list meets some criteria like follows

```if a[i] == a[i] + 1:
print "it must be a consecutive run"```

And you can store results seperately in

`results = []`

Suggestion : 2

I have a NumPy array as follows:,Based on a previous question I can count the anycodings_python number c which is defined by the number of anycodings_python times the elements in a are less than b 2 or anycodings_python more times consecutively.,Now I would like to output an array each anycodings_python time the condition is met instead of anycodings_python counting the number of times the condition anycodings_python is met.,But none of them achieved what I am anycodings_python searching for. Can someone point me to the anycodings_python right Python tools in order to output the anycodings_python different arrays satisfying my condition?

I have a NumPy array as follows:

```import numpy as np
a = np.array([1, 4, 2, 6, 4, 4, 6, 2, 7, 6, 2, 8, 9, 3, 6, 3, 4, 4, 5, 8])```

Based on a previous question I can count the anycodings_python number c which is defined by the number of anycodings_python times the elements in a are less than b 2 or anycodings_python more times consecutively.

```from itertools
import groupby
b = 6
sum(len(list(g)) >= 2
for i, g in groupby(a < b) if i)```

So with this example the right output would anycodings_python be:

```array1 = [1, 4, 2]
array2 = [4, 4]
array3 = [3, 4, 4, 5]```

So far I have tried different options:

```np.isin((len(list(g)) >= 2
for i, g in groupby(a < b) if i), a)```

and

```np.extract((len(list(g)) >= 2
for i, g in groupby(a < b) if i), a)```

Based on this answer I came up with the anycodings_python following solution using np.split which anycodings_python is more efficent than both previously anycodings_python added answers here:

```array = np.append(a, -np.inf) # padding so we don 't lose last element
mask = array >= 6 # values to be removed
for subarray in np.split(array, split_indices + 1):
if len(subarray) > 2:
print(subarray[: -1])```

gives:

```[1. 4. 2.]
[4. 4.]
[3. 4. 4. 5.]```

Use groupby and grab the groups:

```from itertools
import groupby

lst = []
b = 6
for i, g in groupby(a, key = lambda x: x < b):
grp = list(g)
if i and len(grp) >= 2:
lst.append(grp)

print(lst)

#[[1, 4, 2], [4, 4], [3, 4, 4, 5]]```

This task is very similar to image anycodings_python labeling, but, in your case, it is anycodings_python one-dimensional. SciPy library provides anycodings_python some useful functionality for image anycodings_python processing that we could employ here:

```import numpy as np
from scipy.ndimage import(binary_dilation,
binary_erosion,
label)

a = np.array([1, 4, 2, 6, 4, 4, 6, 2, 7, 6, 2, 8, 9, 3, 6, 3, 4, 4, 5, 8])
b = 6 # your threshold
min_consequent_count = 2

structure = [False] + [True] * min_consequent_count # used
for erosion and dilation
dilated = binary_dilation(eroded, structure)
labeled_array, labels_count = label(dilated) # labels_count == c

for label_number in range(1, labels_count + 1): # labeling starts from 1
subarray = a[labeled_array == label_number]
print(subarray)```

gives:

```[1 4 2]
[4 4]
[3 4 4 5]```

mask = a < b returns a boolean array anycodings_python with True values where elements are less anycodings_python than the threshold b:

```array([True, True, True, False, True, True, False, True, False,
False, True, False, False, True, False, True, True, True,
True, False
])```

We managed to remove single True values anycodings_python but we need to get the initial anycodings_python configuration for other groups. In order anycodings_python to do so, we use binary dilation with anycodings_python the same structure:

```>>> dilated = binary_dilation(eroded, structure) >>>
dilated
array([True, True, True, False, True, True, False, False, False,
False, False, False, False, False, False, True, True, True,
True, False
])```

And as a final step, we label each group anycodings_python with scipy.ndimage.label:

```>>> labeled_array, labels_count = label(dilated) >>>
labeled_array
array([1, 1, 1, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 0]) >>>
labels_count
3```

mask = a < b returns a boolean array anycodings_python with True values where elements are less anycodings_python than the threshold b:

```array([True, True, True, False, True, True, False, True, False,
False, True, False, False, True, False, True, True, True,
True, False
])```

As you can see the result contains some anycodings_python True elements that don't have any other anycodings_python True neighbors around them. To eliminate anycodings_python them we could use binary erosion. I use anycodings_python scipy.ndimage.binary_erosion for that anycodings_python purpose. Its default structure parameter anycodings_python is not suitable for our needs as it will anycodings_python also delete two consequent True values, anycodings_python so I construct my own:

```>>> structure = [False] + [True] * min_consequent_count >>>
structure[False, True, True] >>>
eroded
array([True, True, False, False, True, False, False, False, False,
False, False, False, False, False, False, True, True, True,
False, False
])```

Suggestion : 3

I have to cluster the consecutive elements from a NumPy array. Considering the following example, 1 day ago Python I have to cluster the consecutive elements from a NumPy array. Considering the following example The output should be a list of tuples as … Press J to jump to the feed. ,  › How to set a property in powershell on an instance of a class that implements idictionary and icollection , 1 day ago How to find the groups of consecutive elements in a NumPy array - PYTHON [ Ext for Developers : https://www.hows.tech/p/recommended.html ] How to find the g...

`a = [0, 47, 48, 49, 50, 97, 98, 99]`
`def consecutive(data, stepsize = 1): return np.split(data, np.where(np.diff(data) != stepsize) + 1) a = np.array([0, 47, 48, 49, 50, 97, 98, 99]) consecutive(a)`
`[array(), array([47, 48, 49, 50]), array([97, 98, 99])]`
```def group_consecutives(vals, step = 1): ""
"Return list of consecutive lists of numbers from vals (number list)."
""
run = [] result = [run] expect = Nonefor v in vals: if (v == expect) or(expect is None): run.append(v)
else: run = [v] result.append(run) expect = v + stepreturn result >>> group_consecutives(a)[, [47, 48, 49, 50], [97, 98, 99]] >>> group_consecutives(a, step = 47)[[0, 47], , , [50, 97], , ]```

Suggestion : 4

The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. For example, a list is a good candidate for conversion:,Whenever you see “array”, “NumPy array”, or “ndarray” in the text, with few exceptions they all refer to the same thing: the ndarray object.,NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:,As a simple example, suppose we wished to evaluate the function sqrt(x^2 + y^2) across a regular grid of values. The np.meshgrid function takes two 1D arrays and produces two 2D matrices corresponding to all pairs of (x, y) in the two arrays:

```In: data1 = [6, 7.5, 8, 0, 1]

In: arr1 = np.array(data1)

In: arr1
Out: array([6., 7.5, 8., 0., 1.])```
```In: arr1 = np.array([1, 2, 3], dtype = np.float64)

In: arr2 = np.array([1, 2, 3], dtype = np.int32)

In: arr1.dtype In: arr2.dtype
Out: dtype('float64') Out: dtype('int32')```
```In: arr = np.array([
[1., 2., 3.],
[4., 5., 6.]
])

In: arr
Out:
array([
[1., 2., 3.],
[4., 5., 6.]
])

In: arr * arr In: arr - arr
Out: Out:
array([
[1., 4., 9.], array([
[0., 0., 0.],
[16., 25., 36.]
])[0., 0., 0.]
])```
```In: arr = np.arange(10)

In: arr
Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In: arr
Out: 5

In: arr[5: 8]
Out: array([5, 6, 7])

In: arr[5: 8] = 12

In: arr
Out: array([0, 1, 2, 3, 4, 12, 12, 12, 8, 9])```
```In: arr[1: 6]
Out: array([1, 2, 3, 4, 64])```
```In: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In: data = np.random.randn(7, 4)

In: names
Out:
array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],
dtype = '|S4')

In: data
Out:
array([
[-0.048, 0.5433, -0.2349, 1.2792],
[-0.268, 0.5465, 0.0939, -2.0445],
[-0.047, -2.026, 0.7719, 0.3103],
[2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[0.1913, 0.4544, 0.4519, 0.5535],
[0.5994, 0.8174, -0.9297, -1.2564]
])```

Suggestion : 5

And use idxmax() to get the index of the first 1 occurence,To visualize the way you described, just reshape the output,Extract the first and last indices of all sequences of 1s in a numpy array and append them to a list?,extract the first occurrence in numpy array following the nan

IIUC, you can first make your np array 2D and build a data frame, which makes everything easier. Take a look

```row, cols = m.shape, m.shape * m.shape
df = pd.DataFrame(m.reshape(row, cols))

0 1 2 3
0 1.0 0.0 0.0 1.0
1 0.0 1.0 0.0 1.0
2 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 0.0
4 1.0 0.0 0.0 1.0
5 1.0 1.0 1.0 1.0```

Now you can use a reverse `rolling` window of `3` on `axis=0` and check if `all` elements are `1`

```ndf = df[::-1].rolling(3, axis = 0).apply(all, raw = True)[::-1]

0 1 2 3
0 NaN NaN NaN 1.0
1 NaN 1.0 NaN NaN
2 1.0 NaN NaN NaN
3 1.0 NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN```

And use `idxmax()` to get the index of the first `1` occurence

```ndf[ndf >= 1].idxmax()

0 2.0
1 1.0
2 NaN
3 0.0
dtype: float```