how to get a list of index values after groupby().mean() in pandas?

  • Last Update :
  • Techknowledgy :

Just look at the index if you have an Index. If you have a MultiIndex, see @jezrael's answer with get_level_values.

means.index.tolist()

You can use:

print(list(means.index))['John', 'Mary', 'Suzan', 'Eric']

Another better solution is use Series.unique and omit groupby:

print(data.name.unique())['John'
   'Mary'
   'Suzan'
   'Eric']

Suggestion : 2

This means that we are not indexing according to actual values in the index attribute of the object. We are indexing according to the actual position of the element in the object.,Return the elements in the given positional indices along an axis.,We may take elements using negative integers for positive indices, starting from the end of the object, just like with Python lists.,The axis on which to select elements. 0 means that we are selecting rows, 1 means that we are selecting columns.

>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
         ...('parrot', 'bird', 24.0),
         ...('lion', 'mammal', 80.5),
         ...('monkey', 'mammal', np.nan)
      ],
      ...columns = ['name', 'class', 'max_speed'],
      ...index = [0, 2, 3, 1]) >>>
   df
name class max_speed
0 falcon bird 389.0
2 parrot bird 24.0
3 lion mammal 80.5
1 monkey mammal NaN
>>> df.take([0, 3])
name class max_speed
0 falcon bird 389.0
1 monkey mammal NaN
>>> df.take([1, 2], axis = 1)
class max_speed
0 bird 389.0
2 bird 24.0
3 mammal 80.5
1 mammal NaN
>>> df.take([-1, -2])
name class max_speed
1 monkey mammal NaN
3 lion mammal 80.5

Suggestion : 3

DataFrame.gorupby() accepts string or list of column or index names to perform group in pandas DataFrame. The index name is required to perform, If you don’t have it, set the name to index by using DataFrame.index.name = 'index-name'.,How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame.groupby() function takes string or list as a param to specify the group columns or index. When using the list you can also use the combination of index and columns.,Sometimes you may also be required to do group by on column and index at the same time, the good thing about groupby() function is it accepts both at the same time.,In this article, you have learned how to perform group by on a single index, multiple indexes, and a combination of column and index using pandas groupby() function.

1._
# Below are quick examples

# Create DataFrame
df = pd.DataFrame(technologies)
# Set Index to DataFrame
df.set_index(['Courses', 'Fee'], inplace = True)
print(df)

# Groupby Index
result = df.groupby('Courses').sum()
print(result)

# Groupby Multiple Index
result = df.groupby(['Courses', 'Fee']).sum()
print(result)

# Groupby Column & Index
result = df.groupby(['Courses', 'Duration']).sum()
print(result)

Let’s create a pandas DataFrame from the Dict object and explore the above examples.

import pandas as pd
technologies = {
   'Courses': ["Spark", "PySpark", "Hadoop", "Python", "PySpark", "Spark", "Spark"],
   'Fee': [20000, 25000, 26000, 22000, 25000, 20000, 35000],
   'Duration': ['30day', '40days', '35days', '40days', '60days', '60days', '70days'],
   'Discount': [1000, 2300, 1200, 2500, 2000, 2000, 3000]
}

df = pd.DataFrame(technologies)
df.set_index(['Courses', 'Fee'], inplace = True)
print(df)

Yields below output. As you notice in the above example, I have used DataFrame.set_index() to set the multiple columns as Index. I will use these two indexes to group rows.

Duration Discount
Courses Fee
Spark 20000 30 day 1000
PySpark 25000 40 days 2300
Hadoop 26000 35 days 1200
Python 22000 40 days 2500
PySpark 25000 60 days 2000
Spark 20000 60 days 2000
35000 70 days 3000

Yields below output.

Discount
Courses
Hadoop 1200
PySpark 4300
Python 2500
Spark 6000

Now let’s see how to group by multiple index fields at the same time, to do so pass all index names as a list. The below example groups rows by Courses and Fee index.

# Groupby Multiple Index
result = df.groupby(['Courses', 'Fee']).sum()
print(result)

Suggestion : 4

Syntax: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True,group_keys=True, observed=False, dropna=True),Live Data Science ProgramMenu Toggle,axis:0 or 1 (default: 0). We can use this to specify the orientation along which the DataFrame is to be split.

Create a simple dataframe as shown below with details of employees of different departments

# Create DataFrame
import pandas as pd

# Create the data of the DataFrame as a dictionary
data_df = {
   'Name': ['Asha', 'Harsh', 'Sourav', 'Riya', 'Hritik',
      'Shivansh', 'Rohan', 'Akash', 'Soumya', 'Kartik'
   ],

   'Department': ['Administration', 'Marketing', 'Technical', 'Technical', 'Marketing',
      'Administration', 'Technical', 'Marketing', 'Technical', 'Administration'
   ],

   'Employment Type': ['Full-time Employee', 'Intern', 'Intern', 'Part-time Employee', 'Part-time Employee',
      'Full-time Employee', 'Full-time Employee', 'Intern', 'Intern', 'Full-time Employee'
   ],

   'Salary': [120000, 50000, 70000, 70000, 55000,
      120000, 125000, 60000, 50000, 120000
   ],

   'Years of Experience': [5, 1, 2, 3, 4,
      7, 6, 2, 1, 6
   ]
}

# Create the DataFrame
df = pd.DataFrame(data_df)
df

Now, use groupby function to group the data as per the ‘Department’ type as shown below.

# Use pandas groupby to group rows by department and get only employees of technical department
df_grouped = df.groupby('Department')

df_grouped.get_group('Technical')

Let us say you want to find the average salary of different departments, then take the ‘Salary’ column from the grouped df and take the mean.

# Group by department and find average salary of each group
df.groupby('Department')['Salary'].mean()

The output will be a dictionary where the keys of the dictionary are the group keys and the values of each key will be row index labels that have the same group key value.

# View the indices of the rows which are in the same group
print(groups.groups)
# View the indices of the rows which are in the same group
print(groups.groups)
# > {
   'Administration': [0, 5, 9],
   'Marketing': [1, 4, 7],
   'Technical': [2, 3, 6, 8]
}