So my approach would be:
import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low = 1, high = 100, size = 100), index = [pd.date_range(start = '2000-01-01', freq = 'W', periods = 100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if len(s) > 2:
if s[-1] > s[-2] > s[-3]:
return 'High'
elif s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
else:
return ''
list = [func(window) for window in list(number_series.rolling(5))]
new_series = pd.Series(list, index = number_series.index)
- Get the
WindowIndexer
or therolling()
method. - Apply
func
returning a string and storing the results as a list - Convert back your results to a series.
import numpy as np import pandas as pd np.random.seed(1) number_series = pd.Series(np.random.randint(low = 1, high = 100, size = 100), index = [pd.date_range(start = '2000-01-01', freq = 'W', periods = 100)]) number_series = number_series.apply(lambda x: float(x)) def func(s): if (len(s) >= 3) and(s[-1] > s[-2] > s[-3]): return 'High' elif(len(s) >= 2) and s[-1] > s[-2]: return 'Medium' else: return 'Low' # Step 1: Get the window indexer window_indexer = number_series.rolling(5)._get_window_indexer() start, end = window_indexer.get_window_bounds(num_values = len(number_series)) # Step 2: Apply func results = [func(number_series.iloc[slice(s, e)]) for s, e in zip(start, end)] # Step 3: Get results back to a pandas Series new_series = pd.Series(results, index = number_series.index) new_series >>> 2000 - 01 - 02 Low 2000 - 01 - 09 Low 2000 - 01 - 16 Medium 2000 - 01 - 23 Low 2000 - 01 - 30 Medium ... 2001 - 10 - 28 Low 2001 - 11 - 04 Medium 2001 - 11 - 11 High 2001 - 11 - 18 High 2001 - 11 - 25 Low Length: 100, dtype: object
Here's another way using boolean 'or' trick with a list and pd.Series constructor:
import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low = 1, high = 100, size = 100), index = [pd.date_range(start = '2000-01-01', freq = 'W', periods = 100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if s[-1] > s[-2] > s[-3]:
return 'High'
elif s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
l = []
new_series = number_series.rolling(5).apply(lambda x: l.append(func(x)) or 0)
pd.Series(l, index = number_series.index[: len(l)])
However, you can convert your rolling anycodings_apply windows to a list and apply the function anycodings_apply to that list (thanks to this anycodings_apply discussion).,The workaround that I have in place at the anycodings_apply moment is to amend the func to output anycodings_apply integers to a series and then to apply anycodings_apply another function to this series to generate anycodings_apply the new series. As per the example below:,I have a datetime series of dtype: float64. anycodings_apply I am trying to apply a custom function to a anycodings_apply rolling window on the series. I want this anycodings_apply function to return strings. However, this anycodings_apply generates a TypeError. Why does this anycodings_apply generate the error and is there a way to anycodings_apply make this work directly with the application anycodings_apply of one function?,Also note that func needs to handle the anycodings_apply first items differently because indices anycodings_apply would otherwise be out of bounds.
Here is an example:
import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low = 1, high = 100, size = 100), index = [pd.date_range(start = '2000-01-01', freq = 'W', periods = 100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if s[-1] > s[-2] > s[-3]:
return 'High'
elif s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
new_series = number_series.rolling(5).apply(func)
The result is the following error:
TypeError: must be real number, not str
The workaround that I have in place at the anycodings_apply moment is to amend the func to output anycodings_apply integers to a series and then to apply anycodings_apply another function to this series to generate anycodings_apply the new series. As per the example below:
def func_float(s):
if s[-1] > s[-2] > s[-3]:
return 1
elif s[-1] > s[-2]:
return 2
else:
return 3
float_series = number_series.rolling(5).apply(func_float)
def func_text(s):
if s == 1:
return 'High'
elif s == 2:
return 'Medium'
else:
return 'Low'
new_series = float_series.apply(func_text)
So my approach would be:
import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low = 1, high = 100, size = 100), index = [pd.date_range(start = '2000-01-01', freq = 'W', periods = 100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if len(s) > 2:
if s[-1] > s[-2] > s[-3]:
return 'High'
elif s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
else:
return ''
list = [func(window) for window in list(number_series.rolling(5))]
new_series = pd.Series(list, index = number_series.index)
- Get the
WindowIndexer
or therolling()
method. - Apply
func
returning a string and storing the results as a list - Convert back your results to a series.
import numpy as np import pandas as pd np.random.seed(1) number_series = pd.Series(np.random.randint(low = 1, high = 100, size = 100), index = [pd.date_range(start = '2000-01-01', freq = 'W', periods = 100)]) number_series = number_series.apply(lambda x: float(x)) def func(s): if (len(s) >= 3) and(s[-1] > s[-2] > s[-3]): return 'High' elif(len(s) >= 2) and s[-1] > s[-2]: return 'Medium' else: return 'Low' # Step 1: Get the window indexer window_indexer = number_series.rolling(5)._get_window_indexer() start, end = window_indexer.get_window_bounds(num_values = len(number_series)) # Step 2: Apply func results = [func(number_series.iloc[slice(s, e)]) for s, e in zip(start, end)] # Step 3: Get results back to a pandas Series new_series = pd.Series(results, index = number_series.index) new_series >>> 2000 - 01 - 02 Low 2000 - 01 - 09 Low 2000 - 01 - 16 Medium 2000 - 01 - 23 Low 2000 - 01 - 30 Medium ... 2001 - 10 - 28 Low 2001 - 11 - 04 Medium 2001 - 11 - 11 High 2001 - 11 - 18 High 2001 - 11 - 25 Low Length: 100, dtype: object
Here's another way using boolean 'or' anycodings_apply trick with a list and pd.Series anycodings_apply constructor:
import numpy as np
import pandas as pd
np.random.seed(1)
number_series = pd.Series(np.random.randint(low = 1, high = 100, size = 100), index = [pd.date_range(start = '2000-01-01', freq = 'W', periods = 100)])
number_series = number_series.apply(lambda x: float(x))
def func(s):
if s[-1] > s[-2] > s[-3]:
return 'High'
elif s[-1] > s[-2]:
return 'Medium'
else:
return 'Low'
l = []
new_series = number_series.rolling(5).apply(lambda x: l.append(func(x)) or 0)
pd.Series(l, index = number_series.index[: len(l)])
This argument is only implemented when specifying engine='numba' in the method call.,If a BaseIndexer subclass, the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, and closed will be passed to get_window_bounds.,If an offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. To learn more about the offsets & frequency strings, please see this link.,Rolling sum with a window length of 2 observations, but only needs a minimum of 1 observation to calculate a value.
>>> df = pd.DataFrame({
'B': [0, 1, 2, np.nan, 4]
}) >>>
df
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
>>> df.rolling(2).sum()
B
0 NaN
1 1.0
2 3.0
3 NaN
4 NaN
>>> df_time = pd.DataFrame({
'B': [0, 1, 2, np.nan, 4]
},
...index = [pd.Timestamp('20130101 09:00:00'),
...pd.Timestamp('20130101 09:00:02'),
...pd.Timestamp('20130101 09:00:03'),
...pd.Timestamp('20130101 09:00:05'),
...pd.Timestamp('20130101 09:00:06')
])
>>> df_time
B
2013 - 01 - 01 09: 00: 00 0.0
2013 - 01 - 01 09: 00: 02 1.0
2013 - 01 - 01 09: 00: 03 2.0
2013 - 01 - 01 09: 00: 05 NaN
2013 - 01 - 01 09: 00: 06 4.0
>>> df_time.rolling('2s').sum()
B
2013 - 01 - 01 09: 00: 00 0.0
2013 - 01 - 01 09: 00: 02 1.0
2013 - 01 - 01 09: 00: 03 3.0
2013 - 01 - 01 09: 00: 05 NaN
2013 - 01 - 01 09: 00: 06 4.0
>>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size = 2) >>>
df.rolling(window = indexer, min_periods = 1).sum()
B
0 1.0
1 3.0
2 2.0
3 4.0
4 4.0
How to apply a function not returning a numeric value to a pandas rolling Window?,Return multiple values from a pandas rolling apply function,How to create new column in pandas based on a function relating to all previous values of another column,Pandas rolling apply function to entire window dataframe
By using loc
on col
the actual DataFrame is being modified in each iteration. The introduction of NaN
in the column eventually means the window becomes all NaN
. The easiest fix (without understanding more about how the skewness is to be applied) would be to create a copy of col
to work on:
def _get_skewness(col, q = (0.05, 0.95)): copy_col = col.copy() # Make a copy so as to not overwrite future values. if q[0] > 0: quantiles = copy_col.quantile(q) copy_col.loc[ (copy_col < quantiles[q[0]]) | (copy_col > quantiles[q[1]]) ] = np.nan skew = copy_col.skew(axis = 0, skipna = True) return skew
df = pd.DataFrame(np.arange(40).reshape(-1, 2)) df_skew = df.rolling(20, 10).apply(_get_skewness)
df_skew
:
0 1 0 NaN NaN 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 NaN NaN 5 NaN NaN 6 NaN NaN 7 NaN NaN 8 NaN NaN 9 0.0 0.0 10 0.0 0.0 11 0.0 0.0 12 0.0 0.0 13 0.0 0.0 14 0.0 0.0 15 0.0 0.0 16 0.0 0.0 17 0.0 0.0 18 0.0 0.0 19 0.0 0.0
rolling() function returns a subclass of Rolling with the values used to calculate.,Now, let’s do the rolling sum with window=2. By default, the result is set to the right edge of the window. You can change this to the center of the window by setting center=True.,Following is the syntax of DataFrame.rolling() function. Returns a window of rolling subclass.,You can also calculate the mean or average with pandas.DataFrame.rolling() function, rolling mean is also known as the moving average, It is used to get the rolling window calculation. This use win_type=None, meaning all points are evenly weighted.
Following is the syntax of DataFrame.rolling() function. Returns a window of rolling subclass.
# Syntax of DataFrame.rolling() DataFrame.rolling(window, min_periods = None, center = False, win_type = None, on = None, axis = 0, closed = None, method = 'single')
First, let’s create a pandas DataFrame to explain rolling() with examples
# Create DataFrame import pandas as pd import numpy as np df = pd.DataFrame({ 'A': [0, 1, 2, 4, 6, 10, 4], 'B': [0, 1, 3, 6, 9, np.nan, 4] }) print(df) # Outputs # A B #0 0 0.0 # 1 1 1.0 #2 2 3.0 # 3 4 6.0 #4 6 9.0 # 5 10 NaN #6 4 4.0
# Returns Rolling subclass. rolling = df.rolling(window = 2) print(rolling) # Outputs Rolling[window = 2, center = False, axis = 0, method = single]
You can also calculate the mean or average with pandas.DataFrame.rolling() function, rolling mean is also known as the moving average, It is used to get the rolling window calculation. This use win_type=None
, meaning all points are evenly weighted.
# Rolling() of mean with window length 3 df2 = df.rolling(window = 3).mean() print(df2) # Outputs # A B #0 NaN NaN # 1 NaN NaN #2 1.000000 1.333333 # 3 2.333333 3.333333 #4 4.000000 6.000000 # 5 6.666667 NaN #6 6.666667 NaN
Following example does the rolling mean with a window length of 3, using the ‘triang’ window type.
# Rolling() of sum with win_type triang df2 = df.rolling(window = 3, win_type = 'triang').mean() print(df2) # Outputs # A B #0 NaN NaN # 1 NaN NaN #2 1.00 1.25 # 3 2.25 3.25 #4 4.00 6.00 # 5 6.50 NaN #6 7.50 NaN