pandas and category replacement

  • Last Update :
  • Techknowledgy :

My Work Around
pd.DataFrame.apply with pd.Series.replace
This has the advantage that you don't need to mess with changing any types.

df = pd.DataFrame({
   's1': [1, 2, 3],
   's2': [1, 3, 4]
}, dtype = 'category')
df.apply(pd.Series.replace, to_replace = 1, value = 2)

s1 s2
0 2 2
1 2 3
2 3 4

Or

df = pd.DataFrame({
   's1': ['a', 'b', 'c'],
   's2': ['a', 'c', 'd']
}, dtype = 'category')
df.apply(pd.Series.replace, to_replace = 'a', value = 1)

s1 s2
0 1 1
1 b c
2 c d

@cᴏʟᴅsᴘᴇᴇᴅ's Work Around

df = pd.DataFrame({
   's1': ['a', 'b', 'c'],
   's2': ['a', 'c', 'd']
}, dtype = 'category')
df.applymap(str).replace('a', 1)

s1 s2
0 1 1
1 b c
2 c d

The reason for such behavior is different set of categorical values for each column:

In[224]: df.s1.cat.categories
Out[224]: Index(['a', 'b', 'c'], dtype = 'object')

In[225]: df.s2.cat.categories
Out[225]: Index(['a', 'c', 'd'], dtype = 'object')

so if you will replace to a value that is in both categories it'll work:

In[226]: df.replace('d', 'a')
Out[226]:
   s1 s2
0 a a
1 b c
2 c a

As a solution you might want to make your columns categorical manually, using:

pd.Categorical(..., categories = [...])

Suggestion : 2

list-like: all items must be unique and the number of items in the new categories must match the existing number of categories.,dict-like: specifies a mapping from old categories to new. Categories not contained in the mapping are passed through and extra categories in the mapping are ignored.,If new categories are list-like and do not have the same number of items than the current categories or do not validate as categories,New categories which will replace old categories.

>>> c = pd.Categorical(['a', 'a', 'b']) >>>
   c.rename_categories([0, 1])[0, 0, 1]
Categories(2, int64): [0, 1]
>>> c.rename_categories({
   'a': 'A',
   'c': 'C'
})['A', 'A', 'b']
Categories(2, object): ['A', 'b']
>>> c.rename_categories(lambda x: x.upper())['A', 'A', 'B']
Categories(2, object): ['A', 'B']

Suggestion : 3

Last Updated : 08 Aug, 2022

Output:

   Array_1 Array_2
   0 60.0 65.1
   1 70.0 60.0