multiindex duplicated when rolling() applied on a groupby pandas object

  • Last Update :
  • Techknowledgy :

Use the group_keys argument of groupby:

df.groupby('machin', group_keys = False).rolling(window = 5, min_periods = 1).mean()

Alternatively, you can drop the 0th level, which rolling inserts, with reset_index:

df.groupby('machin').rolling(window = 5, min_periods = 1).mean().reset_index(level = 0, drop = True)

Output for either:

               a column
               machin truc
               machin1 truc1 1.0
               truc2 1.5
               truc3 2.0
               truc4 2.5
               machin2 truc1 100.0
               truc2 99.5
               truc3 99.0

Suggestion : 2

x.field.rolling(window=5,min_periods=1).mean() where anycodings_rolling-computation x is a anycodings_rolling-computation pandas.core.groupby.groupby.DataFrameGroupBy anycodings_rolling-computation object.,x.field.apply(lambda x: anycodings_rolling-computation x.rolling(window=5,min_periods=1).mean()),Alternatively, you can drop the 0th anycodings_pandas-groupby level, which rolling inserts, with anycodings_pandas-groupby reset_index:,as you can see, the column index 'machin' is anycodings_rolling-computation duplicated while before using the rolling anycodings_rolling-computation method it appears correctly.

Contrary to the webpage introduced above, I anycodings_rolling-computation still get the same bug.

+ -- -- -- -- - + -- -- -- -- - + -- -- -- - + -- -- -- -- -- -- -- -- -- -- +
|
machin | machin | truc | a column of series |
   + -- -- -- -- - + -- -- -- -- - + -- -- -- - + -- -- -- -- -- -- -- -- -- -- +
   |
   machin1 | machin1 | truc1 | 1 |
   |
   | | truc2 | 2 |
   |
   | | truc3 | 3 |
   |
   | | truc4 | 4 |
   |
   machin2 | machin2 | truc1 | 100 |
   |
   | | truc2 | 99 |
   |
   | | truc3 | 98 |
   + -- -- -- -- - + -- -- -- -- - + -- -- -- - + -- -- -- -- -- -- -- -- -- -- +

For instance let's write anycodings_rolling-computation x.field.apply(lambda x: x+1). It returns:

+ -- -- -- -- - + -- -- -- - + -- -- -- -- -- -- -- -- -- -- +
|
machin | truc | a column of series |
   + -- -- -- -- - + -- -- -- - + -- -- -- -- -- -- -- -- -- -- +
   |
   machin1 | truc1 | 2 |
   |
   | truc2 | 3 |
   |
   | truc3 | 4 |
   |
   | truc4 | 5 |
   |
   machin2 | truc1 | 101 |
   |
   | truc2 | 100 |
   |
   | truc3 | 99 |
   + -- -- -- -- - + -- -- -- - + -- -- -- -- -- -- -- -- -- -- +

Here some code to help you to reproduce my anycodings_rolling-computation computation

import pandas as pd

#creation of records
rec = [{
      'machin': 'machin1',
      'truc': ['truc1', 'truc2', 'truc3', 'truc4'],
      'a column': [1, 2, 3, 4]
   },
   {
      'machin': 'machin2',
      'truc': ['truc1', 'truc2', 'truc3'],
      'a column': [100, 99, 98]
   }
]

#creation of pandas dataframe
df = pd.concat([pd.DataFrame(rec[0]), pd.DataFrame(rec[1])])

#creation of multi - index
df.set_index(['machin', 'truc'], inplace = True)

#creation of a groupby object
x = df.groupby(by = 'machin')

#rolling computation.Note that to do x.field or x['field'] is the same, and gives same bug as I checked.
   x['a column'].rolling(window = 5, min_periods = 1).mean()

#rolling with apply and lambda, gives same bug
x['a column'].apply(lambda x: x.rolling(window = 5, min_periods = 1).mean())

#making apply and lambda alone gives no bug
a = x['a column'].apply(lambda x: x + 1)

while you can see 'machin' in a names' value anycodings_rolling-computation in the multiindex:

a.index
MultiIndex(levels = [
      ['machin1', 'machin2'],
      ['machin1', 'machin2'],
      ['truc1', 'truc2', 'truc3', 'truc4']
   ],
   labels = [
      [0, 0, 0, 0, 1, 1, 1],
      [0, 0, 0, 0, 1, 1, 1],
      [0, 1, 2, 3, 0, 1, 2]
   ],
   names = ['machin', 'machin', 'truc'])

I tried with drop too, doc here:

a.drop(index = 'machin')
a.drop(index = 0)

Use the group_keys argument of groupby:

df.groupby('machin', group_keys = False).rolling(window = 5, min_periods = 1).mean()

Alternatively, you can drop the 0th anycodings_pandas-groupby level, which rolling inserts, with anycodings_pandas-groupby reset_index:

df.groupby('machin').rolling(window = 5, min_periods = 1).mean().reset_index(level = 0, drop = True)

Output for either:

               a column
               machin truc
               machin1 truc1 1.0
               truc2 1.5
               truc3 2.0
               truc4 2.5
               machin2 truc1 100.0
               truc2 99.5
               truc3 99.0

Suggestion : 3

pandas Index objects support duplicate values. If a non-unique index is used as the group key in a groupby operation, all values for the same index value will be considered to be in one group and thus the output of aggregation functions will only contain unique index values:,A DataFrame may be grouped by a combination of columns and index levels by specifying the column names as strings and the index levels as pd.Grouper objects.,The function signature must start with values, index exactly as the data belonging to each group will be passed into values, and the group index will be passed into index.,(Optionally) operates on the entire group chunk. If this is supported, a fast path is used starting from the second chunk.

SELECT Column1, Column2, mean(Column3), sum(Column4)
FROM SomeTable
GROUP BY Column1, Column2
In[1]: df = pd.DataFrame(
      ...: [
         ...: ("bird", "Falconiformes", 389.0),
         ...: ("bird", "Psittaciformes", 24.0),
         ...: ("mammal", "Carnivora", 80.2),
         ...: ("mammal", "Primates", np.nan),
         ...: ("mammal", "Carnivora", 58),
         ...:
      ],
      ...: index = ["falcon", "parrot", "lion", "monkey", "leopard"],
      ...: columns = ("class", "order", "max_speed"),
      ...: )
   ...:

   In[2]: df
Out[2]:
   class order max_speed
falcon bird Falconiformes 389.0
parrot bird Psittaciformes 24.0
lion mammal Carnivora 80.2
monkey mammal Primates NaN
leopard mammal Carnivora 58.0

#
default is axis = 0
In[3]: grouped = df.groupby("class")

In[4]: grouped = df.groupby("order", axis = "columns")

In[5]: grouped = df.groupby(["class", "order"])
In[6]: df = pd.DataFrame(
      ...: {
         ...: "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
         ...: "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
         ...: "C": np.random.randn(8),
         ...: "D": np.random.randn(8),
         ...:
      }
      ...: )
   ...:

   In[7]: df
Out[7]:
   A B C D
0 foo one 0.469112 - 0.861849
1 bar one - 0.282863 - 2.104569
2 foo two - 1.509059 - 0.494929
3 bar three - 1.135632 1.071804
4 foo two 1.212112 0.721555
5 bar two - 0.173215 - 0.706771
6 foo one 0.119209 - 1.039575
7 foo three - 1.044236 0.271860
In[8]: grouped = df.groupby("A")

In[9]: grouped = df.groupby(["A", "B"])
In[10]: df2 = df.set_index(["A", "B"])

In[11]: grouped = df2.groupby(level = df2.index.names.difference(["B"]))

In[12]: grouped.sum()
Out[12]:
   C D
A
bar - 1.591710 - 1.739537
foo - 0.752861 - 1.402938
In[13]: def get_letter_type(letter):
   ....: if letter.lower() in 'aeiou':
   ....: return 'vowel'
      ....:
      else:
         ....: return 'consonant'
            ....:

            In[14]: grouped = df.groupby(get_letter_type, axis = 1)

Suggestion : 4

Last Updated : 09 Jun, 2022

Syntax: 

DataFrame.groupby(by = None, axis = 0, level = None, as_index = True, sort = True,
   group_keys = True, squeeze = False, ** kwargs)

Suggestion : 5

I can verify Allen's answer works when using pandas_datareader, modifying the index level for the groupby operation for the datareader multiindexing.,Pandas DataFrame MultiIndex groupby rolling operation with missing dates,A Better solution to check if dataframe value is in another dataframe and within specific date boundaries or ther specifications,How to generate a rolling mean for a specific date range and location with pandas

Can you try the following to see if it works?

df['30_day_volume'] = df.groupby(level = 0)['PX_VOLUME'].rolling(window = 30).mean().values

df['volume_change_%'] = (df['PX_VOLUME'] - df['30_day_volume']) / df['30_day_volume']

I can verify Allen's answer works when using pandas_datareader, modifying the index level for the groupby operation for the datareader multiindexing.

import pandas_datareader.data as web
import datetime

start = datetime.datetime(2016, 12, 1)
end = datetime.datetime(2017, 2, 28)
data = web.DataReader(['AAPL', 'IBM', 'MSFT'], 'yahoo', start, end).to_frame()

data['30_day_volume'] = data.groupby(level = 1).rolling(window = 30)['Volume'].mean().values

data['volume_change_%'] = (data['Volume'] - data['30_day_volume']) / data['30_day_volume']

# double - check that it computed starting at 30 trading days.
data.loc['2017-1-17': '2017-1-30']

The original poster might try editing this line:

df['30_day_volume'] = df.groupby(level = 0, group_keys = True)['PX_VOLUME'].rolling(window = 30).mean()

to the following, using mean().values:

df['30_day_volume'] = df.groupby(level = 0, group_keys = True)['PX_VOLUME'].rolling(window = 30).mean().values