efficient way to find several rows above and below a subset of data

  • Last Update :
  • Techknowledgy :

This will be a very efficient method

In[39]: df = DataFrame(np.random.randn(10, 2))

In[41]: start = 3

In[42]: stop = 4

In[43]: df.iloc[(max(df.index.get_loc(start) - 2, 0)): min(df.index.get_loc(stop) + 2, len(df))]
Out[43]:
   0 1
1 0.348326 1.413770
2 1.898784 0.053780
3 0.825941 - 1.986920
4 0.075956 - 0.324657
5 - 2.736800 - 0.075813

[5 rows x 2 columns]

If you want essentially a function of arbitrary indexers, just create a list of the ones you want and pass to .iloc

In[18]: index_wanted = [71, 102, 103, 179, 505, 506, 607]

In[19]: from itertools
import chain

In[20]: df = DataFrame(np.random.randn(1000, 2))

You prob want unique ones

f = lambda i: [i - 2, i - 1, i, i + 1, i + 2]

In[21]: indexers = Index(list(chain( * [f(i) for i in [71, 102, 103, 179, 505, 506, 607]]))).unique()

In[22]: df.iloc[indexers]
Out[22]:
   0 1
69 0.792996 0.264597
70 1.084315 - 0.620006
71 - 0.030432 1.219576
72 - 0.767855 0.765041
73 - 0.637771 - 0.103378
100 - 1.087505 1.698133
101 1.007143 2.594046
102 - 0.307440 0.308360
103 0.944429 - 0.411742
104 1.332445 - 0.149350
105 0.165213 1.125668
177 0.409580 - 0.375709
178 - 1.757021 - 0.266762
179 0.736809 - 1.286848
180 1.856241 0.176931
181 - 0.492590 0.083519
503 - 0.651788 0.717922
504 - 1.612517 - 1.729867
505 - 1.786807 - 0.066421
506 1.423571 0.768161
507 0.186871 1.162447
508 1.233441 - 0.028261
605 - 0.060117 - 1.459827
606 - 0.541765 - 0.350981
607 - 1.166172 - 0.026404
608 - 0.045338 1.641864
609 - 0.337748 0.955940

[27 rows x 2 columns]

you can use shift and | operator; for example for +/- 2 days you can do

idx = (data2['buy'] == True).fillna(False)
idx |= idx.shift(-1) | idx.shift(-2) # one & two days after
idx |= idx.shift(1) | idx.shift(2) # one & two days before
data2[idx] # this is what you need

Suggestion : 2

Manipulate and extract data using column headings and index locations.,Manipulate and extract data using column headings and index locations. ,We can select specific ranges of our data in both the row and column directions using either label or integer-based indexing.,Employ label and integer-based indexing to select ranges of data in a dataframe.

# Make sure pandas is loaded
import pandas as pd

# Read in the survey CSV
surveys_df = pd.read_csv("data/surveys.csv")
# TIP: use the.head() method we saw earlier to make output shorter
# Method 1: select a 'subset' of the data using the column name
surveys_df['species_id']

# Method 2: use the column name as an 'attribute';
gives the same output
surveys_df.species_id
# Creates an object, surveys_species, that only contains the `species_id`
column
surveys_species = surveys_df['species_id']
# Select the species and plot columns from the DataFrame
surveys_df[['species_id', 'plot_id']]

# What happens when you flip the order ?
   surveys_df[['plot_id', 'species_id']]

# What happens
if you ask
for a column that doesn 't exist?
surveys_df['speciess']
# Create a list of numbers:
   a = [1, 2, 3, 4, 5]
a[0]

Suggestion : 3

slice() lets you index rows by their (integer) locations. It allows you to select, remove, and duplicate rows. It is accompanied by a number of helpers for common use cases:,Slice does not work with relational databases because they have no intrinsic notion of row order. If you want to perform the equivalent operation, use filter() and row_number().,Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.,These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

slice(.data, ..., .preserve = FALSE)

slice_head(.data, ..., n, prop)

slice_tail(.data, ..., n, prop)

slice_min(.data, order_by, ..., n, prop, with_ties = TRUE)

slice_max(.data, order_by, ..., n, prop, with_ties = TRUE)

slice_sample(.data, ..., n, prop, weight_by = NULL, replace = FALSE)