pandas groupby and rank within groups that start with 1 for each group

  • Last Update :
  • Techknowledgy :

Use DataFrameGroupBy.rank by first level of MultiIndex (session):

s = (df.groupby(['session', 'issue'])
   .size()
   .groupby(level = 0)
   .rank(ascending = False, method = 'dense'))
print(s)
session issue
1 a 1.0
b 2.0
2 a 1.0
b 1.0
3 a 2.0
b 1.0
dtype: float64

Suggestion : 2

dense: like ‘min’, but rank always increases by 1 between groups.,first: ranks assigned in order they appear in the array.,max: highest rank in group., pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing

>>> df = pd.DataFrame(
      ...{
         ..."group": ["a", "a", "a", "a", "a", "b", "b", "b", "b", "b"],
         ..."value": [2, 4, 2, 3, 5, 1, 2, 4, 1, 5],
         ...
      }
      ...) >>>
   df
group value
0 a 2
1 a 4
2 a 2
3 a 3
4 a 5
5 b 1
6 b 2
7 b 4
8 b 1
9 b 5
   >>>
   for method in ['average', 'min', 'max', 'dense', 'first']:
   ...df[f '{method}_rank'] = df.groupby('group')['value'].rank(method) >>>
   df
group value average_rank min_rank max_rank dense_rank first_rank
0 a 2 1.5 1.0 2.0 1.0 1.0
1 a 4 4.0 4.0 4.0 3.0 4.0
2 a 2 1.5 1.0 2.0 1.0 2.0
3 a 3 3.0 3.0 3.0 2.0 3.0
4 a 5 5.0 5.0 5.0 4.0 5.0
5 b 1 1.5 1.0 2.0 1.0 1.0
6 b 2 3.0 3.0 3.0 2.0 3.0
7 b 4 4.0 4.0 4.0 3.0 4.0
8 b 1 1.5 1.0 2.0 1.0 2.0
9 b 5 5.0 5.0 5.0 4.0 5.0

Suggestion : 3

Anyway, why for each group ranks don't start anycodings_group-by from 1, 2, 3...?,for group session=1, there are three a issues and one b issue, so for group 1, ranks are a = 1 and b = 2,for group session=3, there are to b issues and one a issue, so ranks should be b=1 and a=2,for group session=2, both ranks are equal so their rank should be the same = 1

import pandas as pd

df = pd.DataFrame([
   [1, 'a'],
   [1, 'a'],
   [1, 'b'],
   [1, 'a'],
   [2, 'a'],
   [2, 'b'],
   [2, 'a'],
   [2, 'b'],
   [3, 'b'],
   [3, 'a'],
   [3, 'b'],

], columns = ['session', 'issue'])
df

I would like to rank issues within anycodings_group-by sessions. I tried with:

df.groupby(['session', 'issue']).size().rank(ascending = False, method = 'dense')

session issue
1 a 1.0
b 3.0
2 a 2.0
b 2.0
3 a 3.0
b 2.0
dtype: float64

Use DataFrameGroupBy.rank by first level anycodings_size of MultiIndex (session):

s = (df.groupby(['session', 'issue'])
   .size()
   .groupby(level = 0)
   .rank(ascending = False, method = 'dense'))
print(s)
session issue
1 a 1.0
b 2.0
2 a 1.0
b 1.0
3 a 2.0
b 1.0
dtype: float64

Suggestion : 4

Pandas Rank will compute the rank of your data point within a larger dataset. It is extremely useful for filtering the ‘first’ or 2nd of of a sub dataset. We will look at two methods today:,Did you know that .rank() can be used as an aggregate function too? This means you can use it within your group by function. Simply call .rank() on top of your group by function and you’ll get the ranks specific to each subgroup in your DataFrame.,Finally, let's check out ranking within subgroups. You can use .rank() on your group by function as well.,Pandas ranks is a simple but helpful function that will rank your data points in relation with each other. Not only will it apply to an entire Series, but you can also use it in a group by as an aggregate function.

  1. Rank data within your entire DataFrame
  2. Rank data within subgroups (group by)
1. pd.DataFrame.diff(periods = 1)
2. pd.DataFrame.groupby().rank()
import pandas as pd
import numpy as np
np.random.seed(seed = 42)

df = pd.DataFrame(data = np.random.normal(loc = 100, scale = 50, size = (8, 2)),
   columns = ('Parks', 'Schools'),
   index = ['San Francisco', 'San Diego', 'Los Angeles', \
      'New York', 'Chicago', 'Denver', 'Seattle', 'Portland'
   ]
)
df = df.astype(int)
df
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank()
df_copy
df_copy = df.copy()
df_copy.rank()
df_copy = df.copy()
df_copy['park_rank'] = df_copy['Parks'].rank(ascending = False)
df_copy

Suggestion : 5

29 May 2019

1._
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
data = pd.DataFrame([
   ['a', 1, 'A'],
   ['a', 2, 'B'],
   ['a', 3, 'C'],
   ['b', 5, 'D'],
   ['b', 6, 'E'],
   ['b', 7, 'F'],
   ['b', 8, 'G'],
   ['c', 10, 'H'],
   ['c', 11, 'I'],
   ['c', 12, 'J'],
   ['c', 13, 'K']
], columns = ['group_name', 'used_for_sorting', 'the_value'])
2._
1
2
3
4
5
6
7
8
9
10
11
12
13
14
3._
import pandas as pd
data = pd.DataFrame([
   ['a', 1, 'A'],
   ['a', 2, 'B'],
   ['a', 3, 'C'],
   ['b', 5, 'D'],
   ['b', 6, 'E'],
   ['b', 7, 'F'],
   ['b', 8, 'G'],
   ['c', 10, 'H'],
   ['c', 11, 'I'],
   ['c', 12, 'J'],
   ['c', 13, 'K']
], columns = ['group_name', 'used_for_sorting', 'the_value'])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
data = pd.DataFrame([
   ['a', 1, 'A'],
   ['a', 2, 'B'],
   ['a', 3, 'C'],
   ['b', 5, 'D'],
   ['b', 6, 'E'],
   ['b', 7, 'F'],
   ['b', 8, 'G'],
   ['c', 10, 'H'],
   ['c', 11, 'I'],
   ['c', 12, 'J'],
   ['c', 13, 'K']
], columns = ['group_name', 'used_for_sorting', 'the_value'])
1._
1
sorted_data_frame = data.sort_values(['used_for_sorting'], ascending = False)
2._
1
3._
sorted_data_frame = data.sort_values(['used_for_sorting'], ascending = False)
1
sorted_data_frame = data.sort_values(['used_for_sorting'], ascending = False)

Suggestion : 6

pandas groupby and rank within groups that start with 1 for each group,Python : Group rows in dataframe and select abs max value in groups using pandas groupby,Rank within groups using python-pandas,Python - Find a substring within a string using an IF statement when iterating through a pandas DataFrame with a FOR loop

You require method=dense in SeriesGroupBy.rank() where the ranks increase by 1 between groups:

df['z_rank'] = df.groupby(['instance', 'D'])['z'].rank(method = 'dense').astype(int)

I tried it with the following code. I get 1 for all on the FrSeg column.

Merge_Data['FrSeg'] = Merge_Data.groupby(['CustomerKey'])['Frequency'].rank(method = 'dense').astype(int)