numpy slicing function: dynamically create slice indices np.r_[a:b, c:d, ...] from array shaped (x, 2) for selection in array

  • Last Update :
  • Techknowledgy :

Think boolean-indexing could be one efficient way. Hence, we can create a mask and then index cols and get our output -

# Generate mask
for cols
mask = np.zeros(arr.shape[1], dtype = bool)
for (i, j) in selector:
   mask[i: j] = True

# Boolean index into cols
for final o / p
out = arr[: , mask]

If there are many entries in selector, there's a broadcasting-based vectorized way to create the mask for cols, like so -

r = np.arange(arr.shape[1])
mask = ((selector[: , 0, None] <= r) & (selector[: , 1, None] > r)).any(0)

You can just create an indexing array from individual aranges

slices = [
   [0, 2],
   [6, 9]
]
np.concatenate([np.arange( * i) for i in slices])
# array([0, 1, 6, 7, 8])

and use it to extract the data

arr[: , np.concatenate([np.arange( * i) for i in slices])]
# array([
   [0, 1, 6, 7, 8],
   #[12, 13, 18, 19, 20]
])

Suggestion : 2

The top level method np.sort returns a sorted copy of an array instead of modifying the array in place. A quick-and-dirty way to compute the quantiles of an array is to sort it and select the value at a particular rank:,Transposing is a special form of reshaping which similarly returns a view on the underlying data without copying anything. Arrays have the transpose method and also the special T attribute:,Calling astype always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype.,Selecting data from an array by boolean indexing always creates a copy of the data, even if the returned array is unchanged.

In[13]: data1 = [6, 7.5, 8, 0, 1]

In[14]: arr1 = np.array(data1)

In[15]: arr1
Out[15]: array([6., 7.5, 8., 0., 1.])
In[27]: arr1 = np.array([1, 2, 3], dtype = np.float64)

In[28]: arr2 = np.array([1, 2, 3], dtype = np.int32)

In[29]: arr1.dtype In[30]: arr2.dtype
Out[29]: dtype('float64') Out[30]: dtype('int32')
In[45]: arr = np.array([
   [1., 2., 3.],
   [4., 5., 6.]
])

In[46]: arr
Out[46]:
   array([
      [1., 2., 3.],
      [4., 5., 6.]
   ])

In[47]: arr * arr In[48]: arr - arr
Out[47]: Out[48]:
   array([
      [1., 4., 9.], array([
         [0., 0., 0.],
         [16., 25., 36.]
      ])[0., 0., 0.]
   ])
In[51]: arr = np.arange(10)

In[52]: arr
Out[52]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In[53]: arr[5]
Out[53]: 5

In[54]: arr[5: 8]
Out[54]: array([5, 6, 7])

In[55]: arr[5: 8] = 12

In[56]: arr
Out[56]: array([0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
In[75]: arr[1: 6]
Out[75]: array([1, 2, 3, 4, 64])
In[83]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In[84]: data = np.random.randn(7, 4)

In[85]: names
Out[85]:
   array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],
      dtype = '|S4')

In[86]: data
Out[86]:
   array([
      [-0.048, 0.5433, -0.2349, 1.2792],
      [-0.268, 0.5465, 0.0939, -2.0445],
      [-0.047, -2.026, 0.7719, 0.3103],
      [2.1452, 0.8799, -0.0523, 0.0672],
      [-1.0023, -0.1698, 1.1503, 1.7289],
      [0.1913, 0.4544, 0.4519, 0.5535],
      [0.5994, 0.8174, -0.9297, -1.2564]
   ])

Suggestion : 3

Last Updated : 05 Aug, 2021,GATE CS 2021 Syllabus

Output : 

TypeError: can 't multiply sequence by non-int of type '
list '

Output :

Array is: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]

a[-8: 17: 1] = [12 13 14 15 16]

a[10: ] = [10 11 12 13 14 15 16 17 18 19]

Suggestion : 4

Datasets are very similar to NumPy arrays. They are homogeneous collections of data elements, with an immutable datatype and (hyper)rectangular shape. Unlike NumPy arrays, they support a variety of transparent storage features such as compression, error-detection, and chunked I/O.,HDF5 datasets re-use the NumPy slicing syntax to read and write to the file. Slice specifications are translated directly to HDF5 “hyperslab” selections, and are a fast and efficient way to access data in the file. The following slicing arguments are recognized:,Read from an HDF5 dataset directly into a NumPy array, which can avoid making an intermediate copy as happens with slicing. The destination array must be C-contiguous and writable, and must have a datatype to which the source data may be cast. Data type conversion will be carried out on the fly by HDF5.,Return a wrapper allowing you to read data as a particular type. Conversion is handled by HDF5 directly, on the fly:

>>> dset = f.create_dataset("default", (100, )) >>>
   dset = f.create_dataset("ints", (100, ), dtype = 'i8')
>>> arr = np.arange(100) >>>
   dset = f.create_dataset("init", data = arr)
>>> dset = f.create_dataset("MyDataset", (10, 10, 10), 'f') >>>
   dset[0, 0, 0] >>>
   dset[0, 2: 10, 1: 9: 3] >>>
   dset[: , ::2, 5] >>>
   dset[0] >>>
   dset[1, 5] >>>
   dset[0, ...] >>>
   dset[..., 6] >>>
   dset[()]
>>> dset.fields("FieldA")[: 10] # Read a single field >>>
   dset[: 10]["FieldA"] # Read all fields, select in NumPy
>>> dset[0,: ,: ] = np.arange(10) # Broadcasts to(10, 10)
>>> f = h5py.File('my_hdf5_file.h5', 'w') >>>
   dset = f.create_dataset("test", (2, 2)) >>>
   dset[0][1] = 3.0 # No effect!
   >>>
   print(dset[0][1])
0.0

Suggestion : 5

A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index),A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index),A list or array of labels ['a', 'b', 'c'],A slice object with labels 'a':'f', (note that contrary to usual python slices, both the start and the stop are included!)

In [1]: dates = date_range('1/1/2000', periods=8)

In [2]: df = DataFrame(randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])

In [3]: df
Out[3]: 
                   A         B         C         D
2000-01-01  0.469112 -0.282863 -1.509059 -1.135632
2000-01-02  1.212112 -0.173215  0.119209 -1.044236
2000-01-03 -0.861849 -2.104569 -0.494929  1.071804
2000-01-04  0.721555 -0.706771 -1.039575  0.271860
2000-01-05 -0.424972  0.567020  0.276232 -1.087401
2000-01-06 -0.673690  0.113648 -1.478427  0.524988
2000-01-07  0.404705  0.577046 -1.715002 -1.039268
2000-01-08 -0.370647 -1.157892 -1.344312  0.844885

In [4]: panel = Panel({'one' : df, 'two' : df - df.mean()})

In [5]: panel
Out[5]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 8 (major_axis) x 4 (minor_axis)
Items axis: one to two
Major_axis axis: 2000-01-01 00:00:00 to 2000-01-08 00:00:00
Minor_axis axis: A to D
In[6]: s = df['A']

In[7]: s[dates[5]]
Out[7]: -0.67368970808837025

In[8]: panel['two']
Out[8]:
   A B C D
2000 - 01 - 01 0.409571 0.113086 - 0.610826 - 0.936507
2000 - 01 - 02 1.152571 0.222735 1.017442 - 0.845111
2000 - 01 - 03 - 0.921390 - 1.708620 0.403304 1.270929
2000 - 01 - 04 0.662014 - 0.310822 - 0.141342 0.470985
2000 - 01 - 05 - 0.484513 0.962970 1.174465 - 0.888276
2000 - 01 - 06 - 0.733231 0.509598 - 0.580194 0.724113
2000 - 01 - 07 0.345164 0.972995 - 0.816769 - 0.840143
2000 - 01 - 08 - 0.430188 - 0.761943 - 0.446079 1.044010
In[9]: df
Out[9]:
   A B C D
2000 - 01 - 01 0.469112 - 0.282863 - 1.509059 - 1.135632
2000 - 01 - 02 1.212112 - 0.173215 0.119209 - 1.044236
2000 - 01 - 03 - 0.861849 - 2.104569 - 0.494929 1.071804
2000 - 01 - 04 0.721555 - 0.706771 - 1.039575 0.271860
2000 - 01 - 05 - 0.424972 0.567020 0.276232 - 1.087401
2000 - 01 - 06 - 0.673690 0.113648 - 1.478427 0.524988
2000 - 01 - 07 0.404705 0.577046 - 1.715002 - 1.039268
2000 - 01 - 08 - 0.370647 - 1.157892 - 1.344312 0.844885

In[10]: df[['B', 'A']] = df[['A', 'B']]

In[11]: df
Out[11]:
   A B C D
2000 - 01 - 01 - 0.282863 0.469112 - 1.509059 - 1.135632
2000 - 01 - 02 - 0.173215 1.212112 0.119209 - 1.044236
2000 - 01 - 03 - 2.104569 - 0.861849 - 0.494929 1.071804
2000 - 01 - 04 - 0.706771 0.721555 - 1.039575 0.271860
2000 - 01 - 05 0.567020 - 0.424972 0.276232 - 1.087401
2000 - 01 - 06 0.113648 - 0.673690 - 1.478427 0.524988
2000 - 01 - 07 0.577046 0.404705 - 1.715002 - 1.039268
2000 - 01 - 08 - 1.157892 - 0.370647 - 1.344312 0.844885
In[12]: sa = Series([1, 2, 3], index = list('abc'))

In[13]: dfa = df.copy()
In[14]: sa.b
Out[14]: 2

In[15]: dfa.A
Out[15]:
   2000 - 01 - 01 - 0.282863
2000 - 01 - 02 - 0.173215
2000 - 01 - 03 - 2.104569
2000 - 01 - 04 - 0.706771
2000 - 01 - 05 0.567020
2000 - 01 - 06 0.113648
2000 - 01 - 07 0.577046
2000 - 01 - 08 - 1.157892
Freq: D, Name: A, dtype: float64

In[16]: panel.one
Out[16]:
   A B C D
2000 - 01 - 01 0.469112 - 0.282863 - 1.509059 - 1.135632
2000 - 01 - 02 1.212112 - 0.173215 0.119209 - 1.044236
2000 - 01 - 03 - 0.861849 - 2.104569 - 0.494929 1.071804
2000 - 01 - 04 0.721555 - 0.706771 - 1.039575 0.271860
2000 - 01 - 05 - 0.424972 0.567020 0.276232 - 1.087401
2000 - 01 - 06 - 0.673690 0.113648 - 1.478427 0.524988
2000 - 01 - 07 0.404705 0.577046 - 1.715002 - 1.039268
2000 - 01 - 08 - 0.370647 - 1.157892 - 1.344312 0.844885
In[17]: sa.a = 5

In[18]: sa
Out[18]:
   a 5
b 2
c 3
dtype: int64

In[19]: dfa.A = list(range(len(dfa.index))) # ok
if A already exists

In[20]: dfa
Out[20]:
   A B C D
2000 - 01 - 01 0 0.469112 - 1.509059 - 1.135632
2000 - 01 - 02 1 1.212112 0.119209 - 1.044236
2000 - 01 - 03 2 - 0.861849 - 0.494929 1.071804
2000 - 01 - 04 3 0.721555 - 1.039575 0.271860
2000 - 01 - 05 4 - 0.424972 0.276232 - 1.087401
2000 - 01 - 06 5 - 0.673690 - 1.478427 0.524988
2000 - 01 - 07 6 0.404705 - 1.715002 - 1.039268
2000 - 01 - 08 7 - 0.370647 - 1.344312 0.844885

In[21]: dfa['A'] = list(range(len(dfa.index))) # use this form to create a new column

In[22]: dfa
Out[22]:
   A B C D
2000 - 01 - 01 0 0.469112 - 1.509059 - 1.135632
2000 - 01 - 02 1 1.212112 0.119209 - 1.044236
2000 - 01 - 03 2 - 0.861849 - 0.494929 1.071804
2000 - 01 - 04 3 0.721555 - 1.039575 0.271860
2000 - 01 - 05 4 - 0.424972 0.276232 - 1.087401
2000 - 01 - 06 5 - 0.673690 - 1.478427 0.524988
2000 - 01 - 07 6 0.404705 - 1.715002 - 1.039268
2000 - 01 - 08 7 - 0.370647 - 1.344312 0.844885