This will be a very efficient method
In[39]: df = DataFrame(np.random.randn(10, 2))
In[41]: start = 3
In[42]: stop = 4
In[43]: df.iloc[(max(df.index.get_loc(start) - 2, 0)): min(df.index.get_loc(stop) + 2, len(df))]
Out[43]:
0 1
1 0.348326 1.413770
2 1.898784 0.053780
3 0.825941 - 1.986920
4 0.075956 - 0.324657
5 - 2.736800 - 0.075813
[5 rows x 2 columns]
If you want essentially a function of arbitrary indexers, just create a list
of the ones you want and pass to .iloc
In[18]: index_wanted = [71, 102, 103, 179, 505, 506, 607]
In[19]: from itertools
import chain
In[20]: df = DataFrame(np.random.randn(1000, 2))
You prob want unique ones
f = lambda i: [i - 2, i - 1, i, i + 1, i + 2] In[21]: indexers = Index(list(chain( * [f(i) for i in [71, 102, 103, 179, 505, 506, 607]]))).unique() In[22]: df.iloc[indexers] Out[22]: 0 1 69 0.792996 0.264597 70 1.084315 - 0.620006 71 - 0.030432 1.219576 72 - 0.767855 0.765041 73 - 0.637771 - 0.103378 100 - 1.087505 1.698133 101 1.007143 2.594046 102 - 0.307440 0.308360 103 0.944429 - 0.411742 104 1.332445 - 0.149350 105 0.165213 1.125668 177 0.409580 - 0.375709 178 - 1.757021 - 0.266762 179 0.736809 - 1.286848 180 1.856241 0.176931 181 - 0.492590 0.083519 503 - 0.651788 0.717922 504 - 1.612517 - 1.729867 505 - 1.786807 - 0.066421 506 1.423571 0.768161 507 0.186871 1.162447 508 1.233441 - 0.028261 605 - 0.060117 - 1.459827 606 - 0.541765 - 0.350981 607 - 1.166172 - 0.026404 608 - 0.045338 1.641864 609 - 0.337748 0.955940 [27 rows x 2 columns]
you can use shift
and |
operator; for example for +/- 2 days you can do
idx = (data2['buy'] == True).fillna(False) idx |= idx.shift(-1) | idx.shift(-2) # one & two days after idx |= idx.shift(1) | idx.shift(2) # one & two days before data2[idx] # this is what you need
Manipulate and extract data using column headings and index locations.,Manipulate and extract data using column headings and index locations. ,We can select specific ranges of our data in both the row and column directions using either label or integer-based indexing.,Employ label and integer-based indexing to select ranges of data in a dataframe.
# Make sure pandas is loaded import pandas as pd # Read in the survey CSV surveys_df = pd.read_csv("data/surveys.csv")
# TIP: use the.head() method we saw earlier to make output shorter
# Method 1: select a 'subset' of the data using the column name
surveys_df['species_id']
# Method 2: use the column name as an 'attribute';
gives the same output
surveys_df.species_id
# Creates an object, surveys_species, that only contains the `species_id` column surveys_species = surveys_df['species_id']
# Select the species and plot columns from the DataFrame surveys_df[['species_id', 'plot_id']] # What happens when you flip the order ? surveys_df[['plot_id', 'species_id']] # What happens if you ask for a column that doesn 't exist? surveys_df['speciess']
# Create a list of numbers:
a = [1, 2, 3, 4, 5]
a[0]
slice() lets you index rows by their (integer) locations. It allows you to select, remove, and duplicate rows. It is accompanied by a number of helpers for common use cases:,Slice does not work with relational databases because they have no intrinsic notion of row order. If you want to perform the equivalent operation, use filter() and row_number().,Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.,These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
slice(.data, ..., .preserve = FALSE) slice_head(.data, ..., n, prop) slice_tail(.data, ..., n, prop) slice_min(.data, order_by, ..., n, prop, with_ties = TRUE) slice_max(.data, order_by, ..., n, prop, with_ties = TRUE) slice_sample(.data, ..., n, prop, weight_by = NULL, replace = FALSE)