how can i map values in a pandas dataframe?

  • Last Update :
  • Techknowledgy :

First we have made a dictionary with the values mapped with another values such that first values is of feature first_name and the next is of new feature subjects. Subjects = {"Sheldon" : "Science", "Raj" : "Chemistry", "Leonard" : "Maths", "Howard" : "Astronaut", "Amy" : "Science"} print(Subjects) Now we have created a function to map the values of different columns. df["Subjects"] = df["first_name"].map(Subjects) print(df) So the output comes as,We sometimes need to map values in python i.e values of a feature with values of another feature., ProjectPro Platform has helped me in a great way to start my tech career. The project provides me Code review, Code Walk Through, Video of Code writing, and connect with the Project head for each... Read More , This recipe helps you map values in a Pandas DataFrame Last Updated: 23 Apr 2022

First we have made a dictionary with the values mapped with another values such that first values is of feature first_name and the next is of new feature subjects. Now we have created a function to map the values of different columns. df["Subjects"] = df["first_name"].map(Subjects) print(df) So the output comes as

  first_name last_name age Comedy_Score Rating_Score
  0 Sheldon Copper 42 9 25
  1 Raj Koothrappali 38 7 25
  2 Leonard Hofstadter 36 8 49
  3 Howard Wolowitz 41 8 62
  4 Amy Fowler 35 5 70

  {
     "Sheldon": "Science",
     "Raj": "Chemistry",
     "Leonard": "Maths",
     "Howard": "Astronaut",
     "Amy": "Science"
  }

  first_name last_name age Comedy_Score Rating_Score Subjects
  0 Sheldon Copper 42 9 25 Science
  1 Raj Koothrappali 38 7 25 Chemistry
  2 Leonard Hofstadter 36 8 49 Maths
  3 Howard Wolowitz 41 8 62 Astronaut
  4 Amy Fowler 35 5 70 Science

Suggestion : 2

Last Updated : 08 Jan, 2019

First_name Last_name Age City Qualification
0 Ram Kumar 42 Mumbai B.Com
1 Mohan Sharma 52 Noida IAS
2 Tina Ali 36 Pune LLB
3 Jeetu Gandhi 21 Delhi B.Tech
4 Meera Kumari 23 Bihar MBBS

First_name Last_name Age City
0 Ram Kumar 42 Mumbai
1 Mohan Sharma 52 Noida
2 Tina Ali 36 Pune
3 Jeetu Gandhi 21 Delhi
4 Meera Kumari 23 Bihar

First_name Last_name Age City
0 Shyam Kumar 42 Mumbai
1 Mohan Sharma 52 Noida
2 Riya Ali 36 Pune
3 Jitender Gandhi 21 Delhi
4 Meera Kumari 23 Bihar

First_name Last_name Age City
0 Shyam Kumar 42 Mumbai
1 Mohan Sharma 52 Noida
2 Riya Ali 36 Pune
3 Jitender Gandhi 21 Delhi
4 Meera Kumari 23 Bihar

Suggestion : 3

Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.,Map values of Series according to an input mapping or function.,map accepts a dict or a Series. Values that are not found in the dict are converted to NaN, unless the dict has a default value (e.g. defaultdict):,When arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to NaN. However, if the dictionary is a dict subclass that defines __missing__ (i.e. provides a method for default values), then this default is used rather than NaN.

>>> s = pd.Series(['cat', 'dog', np.nan, 'rabbit']) >>>
   s
0 cat
1 dog
2 NaN
3 rabbit
dtype: object
>>> s.map({
   'cat': 'kitten',
   'dog': 'puppy'
})
0 kitten
1 puppy
2 NaN
3 NaN
dtype: object
>>> s.map('I am a {}'.format)
0 I am a cat
1 I am a dog
2 I am a nan
3 I am a rabbit
dtype: object
>>> s.map('I am a {}'.format, na_action = 'ignore')
0 I am a cat
1 I am a dog
2 NaN
3 I am a rabbit
dtype: object

Suggestion : 4

You can use .replace. For example:

>>> df = pd.DataFrame({
      'col2': {
         0: 'a',
         1: 2,
         2: np.nan
      },
      'col1': {
         0: 'w',
         1: 1,
         2: 2
      }
   }) >>>
   di = {
      1: "A",
      2: "B"
   } >>>
   df
col1 col2
0 w a
1 1 2
2 2 NaN
   >>>
   df.replace({
      "col1": di
   })
col1 col2
0 w a
1 A 2
2 B NaN

In this case, the form is very simple:

df['col1'].map(di) # note: if the dictionary does not exhaustively map all
# entries then non - matched entries are changed to NaNs

If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna:

df['col1'].map(di).fillna(df['col1'])

Using the following data with pandas version 0.23.1:

di = {
   1: "A",
   2: "B",
   3: "C",
   4: "D",
   5: "E",
   6: "F",
   7: "G",
   8: "H"
}
df = pd.DataFrame({
   'col1': np.random.choice(range(1, 9), 100000)
})

Case 1: If the keys of di are meant to refer to index values, then you could use the update method:

df['col1'].update(pd.Series(di))

For example,

import pandas as pd
import numpy as np

df = pd.DataFrame({
      'col1': ['w', 10, 20],
      'col2': ['a', 30, np.nan]
   },
   index = [1, 2, 0])
# col1 col2
# 1 w a
# 2 10 30
# 0 20 NaN

di = {
   0: "A",
   2: "B"
}

# The value at the 0 - index is mapped to 'A', the value at the 2 - index is mapped to 'B'
df['col1'].update(pd.Series(di))
print(df)

yields

  col1 col2
  1 w a
  2 B 30
  0 A NaN

Case 3: If the keys in di refer to index locations, then you could use

df['col1'].put(di.keys(), di.values())

since

df = pd.DataFrame({
      'col1': ['w', 10, 20],
      'col2': ['a', 30, np.nan]
   },
   index = [1, 2, 0])
di = {
   0: "A",
   2: "B"
}

# The values at the 0 and 2 index locations are replaced by 'A'
and 'B'
df['col1'].put(di.keys(), di.values())
print(df)

DSM has the accepted answer, but the coding doesn't seem to work for everyone. Here is one that works with the current version of pandas (0.23.4 as of 8/2018):

import pandas as pd

df = pd.DataFrame({
   'col1': [1, 2, 2, 3, 1],
   'col2': ['negative', 'positive', 'neutral', 'neutral', 'positive']
})

conversion_dict = {
   'negative': -1,
   'neutral': 0,
   'positive': 1
}
df['converted_column'] = df['col2'].replace(conversion_dict)

print(df.head())

You'll see it looks like:

   col1 col2 converted_column
   0 1 negative - 1
   1 2 positive 1
   2 2 neutral 0
   3 3 neutral 0
   4 1 positive 1

Given map is faster than replace (@JohnE's solution) you need to be careful with Non-Exhaustive mappings where you intend to map specific values to NaN. The proper method in this case requires that you mask the Series when you .fillna, else you undo the mapping to NaN.

import pandas as pd
import numpy as np

d = {
   'm': 'Male',
   'f': 'Female',
   'missing': np.NaN
}
df = pd.DataFrame({
   'gender': ['m', 'f', 'missing', 'Male', 'U']
})

keep_nan = [k
   for k, v in d.items() if pd.isnull(v)
]
s = df['gender']

df['mapped'] = s.map(d).fillna(s.mask(s.isin(keep_nan)))

Adding to this question if you ever have more than one columns to remap in a data dataframe:

def remap(data, dict_labels):
   ""
"
This
function take in a dictionnary of labels: dict_labels
and replace the values(previously labelencode) into the string.

ex: dict_labels = {
      {
         'col1': {
            1: 'A',
            2: 'B'
         }
      }

      ""
      "
      for field,
      values in dict_labels.items(): print("I am remapping %s" % field)
      data.replace({
         field: values
      }, inplace = True)
      print("DONE")

      return data

Suggestion : 5

In this tutorial, you learned how to analyze and transform your Pandas DataFrame using vectorized functions, and the .map() and .apply() methods. The section below provides a recap of everything you’ve learned:,In the following sections, you’ll dive deeper into each of these scenarios to see how the .map() method can be used to transform and map a Pandas column. ,The Pandas .apply() method can pass a function to either a single column or an entire DataFrame,To follow along with this tutorial, copy the code provided below to load a sample Pandas DataFrame. The dataset provides a number of helpful columns, allowing us to manipulate and transform our data in different ways.

To follow along with this tutorial, copy the code provided below to load a sample Pandas DataFrame. The dataset provides a number of helpful columns, allowing us to manipulate and transform our data in different ways.

# Loading a Sample Pandas DataFrame
import pandas as pd
df = pd.DataFrame({
   'name': ['James', 'Jane', 'Melissa', 'Ed', 'Neil'],
   'age': [30, 40, 32, 67, 43],
   'score': ['90%', '95%', '100%', '82%', '87%'],
   'age_missing_data': [30, 40, 32, 67, None],
   'income': [100000, 80000, 55000, 62000, 120000]
})
print(df)

# Returns:
   # name age score age_missing_data income
# 0 James 30 90 % 30.0 100000
# 1 Jane 40 95 % 40.0 80000
# 2 Melissa 32 100 % 32.0 55000
# 3 Ed 67 82 % 67.0 62000
# 4 Neil 43 87 % NaN 120000

In fact, you’ve likely been using vectorized expressions, perhaps, without even knowing it! When you apply, say, .mean() to a Pandas column, you’re applying a vectorized method. Let’s visualize how we could do this both with a for loop and with a vectorized function.

# Visualizing the Difference Between Vectorization and Scalar Operations
# Scalar Operations(Simplified using a
   for loop)
length = 0
age_sum = 0
for item in df['ages']:
   length += 1
age_sum += item

average_age_for_loop = age_sum / length

# Vectorized Implementation
average_age_vectorized = df['age'].mean()

For example, we could map in the gender of each person in our DataFrame by using the .map() method. Let’s define a dictionary where the keys are the people and their corresponding gender are the keys’ values.

# Creating a dictionary of genders
genders = {
   'James': 'Male',
   'Jane': 'Female',
   'Melissa': 'Female',
   'Ed': 'Male',
   'Neil': 'Male'
}

Let’s design a function that evaluates whether each person’s income is higher or lower than the average income. We’ll then apply that function using the .map() method:

# Mapping in a custom
function
mean_income = df['income'].mean()

def higher_income(x):
   return x > mean_income

df['higher_than_avg_income'] = df['income'].map(higher_income)
print(df)

# Returns:
   # name age score age_missing_data income higher_than_avg_income
# 0 James 30 90 % 30.0 100000 True
# 1 Jane 40 95 % 40.0 80000 False
# 2 Melissa 32 100 % 32.0 55000 False
# 3 Ed 67 82 % 67.0 62000 False
# 4 Neil 43 87 % NaN 120000 True

Python allows us to define anonymous functions, lambda functions, which are functions that are defined without a name. This can be helpful when we need to use a function only a single time and want to simplify the use of the function. Let’s see how we can replicate the example above with the use of a lambda function:

# Mapping in an Anonymous Function
mean_income = df['income'].mean()
df['higher_than_avg_income'] = df['income'].map(lambda x: x > mean_income)
print(df)

# Returns:
   # name age score age_missing_data income higher_than_avg_income
# 0 James 30 90 % 30.0 100000 True
# 1 Jane 40 95 % 40.0 80000 False
# 2 Melissa 32 100 % 32.0 55000 False
# 3 Ed 67 82 % 67.0 62000 False
# 4 Neil 43 87 % NaN 120000 True
df['percent'] = df['score'].map(lambda x: int(x.replace('%', '')))
print(df)

# Returns:
   # name age score age_missing_data income percent
# 0 James 30 90 % 30.0 100000 90
# 1 Jane 40 95 % 40.0 80000 95
# 2 Melissa 32 100 % 32.0 55000 100
# 3 Ed 67 82 % 67.0 62000 82
# 4 Neil 43 87 % NaN 120000 87
total_income = df['income'].sum()
df['perc_of_total'] = df['income'] / total_income

print(df)
# name age score age_missing_data income perc_of_total
# 0 James 30 90 % 30.0 100000 0.239808
# 1 Jane 40 95 % 40.0 80000 0.191847
# 2 Melissa 32 100 % 32.0 55000 0.131894
# 3 Ed 67 82 % 67.0 62000 0.148681
# 4 Neil 43 87 % NaN 120000 0.287770

Suggestion : 6

pandas map() function from Series is used to substitute each value in a Series with another value, that may be derived from a function, a dict or a Series. Since DataFrame columns are series, you can use map() to update the column and assign it back to the DataFrame.,In this article, I have explained map() function is from the Series which is used to substitute each value in a Series with another value and returns a Series object, since DataFrame is a collection of Series, you can use the map() function to update the DataFrame.,map() when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.,The following is the syntax of the pandas map() function. This accepts arg and na_action as parameters and returns a Series.

The following is the syntax of the pandas map() function. This accepts arg and na_action as parameters and returns a Series.

# Syntax of Series.map()
Series.map(arg, na_action = None)
2._
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
technologies = {
   'Fee': [22000, 25000, 23000, np.NaN, 26000],
   'Duration': ['30days', '50days', '30days', '35days', '40days']
}
df = pd.DataFrame(technologies)
print(df)

Yields below output.

Fee Duration
0 22000.0 30 days
1 25000.0 50 days
2 23000.0 30 days
3 NaN 35 days
4 26000.0 40 days
5._
Fee Duration
0 19800.0 30 days
1 22500.0 50 days
2 20700.0 30 days
3 NaN 35 days
4 23400.0 40 days

You can also apply a function with the lambda as below. This yields the same output as above.

# Using custom
function
def fun1(x):
   return x / 100
df['Fee'] = df['Fee'].map(lambda x: fun1(x))

Suggestion : 7

In this tutorial, we'll learn how to map column with dictionary in Pandas DataFrame. We are going to use Pandas method pandas.Series.map which is described as:,In this tutorial, we saw several options to map, replace, update and add new columns based on a dictionary in Pandas.,An alternative solution to map column to dict is by using the function pandas.Series.replace.,To map dictionary from existing column to new column we need to change column name:

In the post, we'll use the following DataFrame, which consists of several rows and columns:

import pandas as pd
import numpy as np

data = {
   'Member': {
      0: 'John',
      1: 'Bill',
      2: 'Jim',
      3: 'Steve'
   },
   'Disqualified': {
      0: 0,
      1: 1,
      2: 0,
      3: 1
   },
   'Paid': {
      0: 1,
      1: 0,
      2: 0,
      3: np.nan
   }
}

df = pd.DataFrame(data)

We are going to map column Disqualified to boolean values - 1 will be mapped as True and 0 will be mapped as False:

dict_map = {
   1: 'True',
   0: 'False'
}
df['Disqualified'].map(dict_map)

The result is a new Pandas Series with the mapped values:

0 False
1 True
2 False
3 True
Name: Disqualified, dtype: object

To map dictionary from existing column to new column we need to change column name:

df['Disqualified Boolean'] = df['Disqualified'].map(dict_map)

What will happen if a value is not present in the mapping dictionary? In this case we will end with NA value:

df['Paid'].map(dict_map)