First we have made a dictionary with the values mapped with another values such that first values is of feature first_name and the next is of new feature subjects. Subjects = {"Sheldon" : "Science", "Raj" : "Chemistry", "Leonard" : "Maths", "Howard" : "Astronaut", "Amy" : "Science"} print(Subjects) Now we have created a function to map the values of different columns. df["Subjects"] = df["first_name"].map(Subjects) print(df) So the output comes as,We sometimes need to map values in python i.e values of a feature with values of another feature., ProjectPro Platform has helped me in a great way to start my tech career. The project provides me Code review, Code Walk Through, Video of Code writing, and connect with the Project head for each... Read More , This recipe helps you map values in a Pandas DataFrame Last Updated: 23 Apr 2022
First we have made a dictionary with the values mapped with another values such that first values is of feature first_name and the next is of new feature subjects. Now we have created a function to map the values of different columns.
df["Subjects"] = df["first_name"].map(Subjects)
print(df)
So the output comes as
first_name last_name age Comedy_Score Rating_Score
0 Sheldon Copper 42 9 25
1 Raj Koothrappali 38 7 25
2 Leonard Hofstadter 36 8 49
3 Howard Wolowitz 41 8 62
4 Amy Fowler 35 5 70
{
"Sheldon": "Science",
"Raj": "Chemistry",
"Leonard": "Maths",
"Howard": "Astronaut",
"Amy": "Science"
}
first_name last_name age Comedy_Score Rating_Score Subjects
0 Sheldon Copper 42 9 25 Science
1 Raj Koothrappali 38 7 25 Chemistry
2 Leonard Hofstadter 36 8 49 Maths
3 Howard Wolowitz 41 8 62 Astronaut
4 Amy Fowler 35 5 70 Science
Last Updated : 08 Jan, 2019
First_name Last_name Age City Qualification
0 Ram Kumar 42 Mumbai B.Com
1 Mohan Sharma 52 Noida IAS
2 Tina Ali 36 Pune LLB
3 Jeetu Gandhi 21 Delhi B.Tech
4 Meera Kumari 23 Bihar MBBS
First_name Last_name Age City 0 Ram Kumar 42 Mumbai 1 Mohan Sharma 52 Noida 2 Tina Ali 36 Pune 3 Jeetu Gandhi 21 Delhi 4 Meera Kumari 23 Bihar First_name Last_name Age City 0 Shyam Kumar 42 Mumbai 1 Mohan Sharma 52 Noida 2 Riya Ali 36 Pune 3 Jitender Gandhi 21 Delhi 4 Meera Kumari 23 Bihar
First_name Last_name Age City 0 Shyam Kumar 42 Mumbai 1 Mohan Sharma 52 Noida 2 Riya Ali 36 Pune 3 Jitender Gandhi 21 Delhi 4 Meera Kumari 23 Bihar
Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.,Map values of Series according to an input mapping or function.,map accepts a dict or a Series. Values that are not found in the dict are converted to NaN, unless the dict has a default value (e.g. defaultdict):,When arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to NaN. However, if the dictionary is a dict subclass that defines __missing__ (i.e. provides a method for default values), then this default is used rather than NaN.
>>> s = pd.Series(['cat', 'dog', np.nan, 'rabbit']) >>>
s
0 cat
1 dog
2 NaN
3 rabbit
dtype: object
>>> s.map({
'cat': 'kitten',
'dog': 'puppy'
})
0 kitten
1 puppy
2 NaN
3 NaN
dtype: object
>>> s.map('I am a {}'.format)
0 I am a cat
1 I am a dog
2 I am a nan
3 I am a rabbit
dtype: object
>>> s.map('I am a {}'.format, na_action = 'ignore')
0 I am a cat
1 I am a dog
2 NaN
3 I am a rabbit
dtype: object
You can use .replace
. For example:
>>> df = pd.DataFrame({
'col2': {
0: 'a',
1: 2,
2: np.nan
},
'col1': {
0: 'w',
1: 1,
2: 2
}
}) >>>
di = {
1: "A",
2: "B"
} >>>
df
col1 col2
0 w a
1 1 2
2 2 NaN
>>>
df.replace({
"col1": di
})
col1 col2
0 w a
1 A 2
2 B NaN
In this case, the form is very simple:
df['col1'].map(di) # note: if the dictionary does not exhaustively map all
# entries then non - matched entries are changed to NaNs
If you have a non-exhaustive mapping and wish to retain the existing variables for non-matches, you can add fillna
:
df['col1'].map(di).fillna(df['col1'])
Using the following data with pandas version 0.23.1:
di = {
1: "A",
2: "B",
3: "C",
4: "D",
5: "E",
6: "F",
7: "G",
8: "H"
}
df = pd.DataFrame({
'col1': np.random.choice(range(1, 9), 100000)
})
Case 1:
If the keys of di
are meant to refer to index values, then you could use the update
method:
df['col1'].update(pd.Series(di))
For example,
import pandas as pd import numpy as np df = pd.DataFrame({ 'col1': ['w', 10, 20], 'col2': ['a', 30, np.nan] }, index = [1, 2, 0]) # col1 col2 # 1 w a # 2 10 30 # 0 20 NaN di = { 0: "A", 2: "B" } # The value at the 0 - index is mapped to 'A', the value at the 2 - index is mapped to 'B' df['col1'].update(pd.Series(di)) print(df)
yields
col1 col2
1 w a
2 B 30
0 A NaN
Case 3:
If the keys in di
refer to index locations, then you could use
df['col1'].put(di.keys(), di.values())
since
df = pd.DataFrame({ 'col1': ['w', 10, 20], 'col2': ['a', 30, np.nan] }, index = [1, 2, 0]) di = { 0: "A", 2: "B" } # The values at the 0 and 2 index locations are replaced by 'A' and 'B' df['col1'].put(di.keys(), di.values()) print(df)
DSM has the accepted answer, but the coding doesn't seem to work for everyone. Here is one that works with the current version of pandas (0.23.4 as of 8/2018):
import pandas as pd
df = pd.DataFrame({
'col1': [1, 2, 2, 3, 1],
'col2': ['negative', 'positive', 'neutral', 'neutral', 'positive']
})
conversion_dict = {
'negative': -1,
'neutral': 0,
'positive': 1
}
df['converted_column'] = df['col2'].replace(conversion_dict)
print(df.head())
You'll see it looks like:
col1 col2 converted_column 0 1 negative - 1 1 2 positive 1 2 2 neutral 0 3 3 neutral 0 4 1 positive 1
Given map
is faster than replace (@JohnE's solution) you need to be careful with Non-Exhaustive mappings where you intend to map specific values to NaN
. The proper method in this case requires that you mask
the Series when you .fillna
, else you undo the mapping to NaN
.
import pandas as pd
import numpy as np
d = {
'm': 'Male',
'f': 'Female',
'missing': np.NaN
}
df = pd.DataFrame({
'gender': ['m', 'f', 'missing', 'Male', 'U']
})
keep_nan = [k
for k, v in d.items() if pd.isnull(v)
]
s = df['gender']
df['mapped'] = s.map(d).fillna(s.mask(s.isin(keep_nan)))
Adding to this question if you ever have more than one columns to remap in a data dataframe:
def remap(data, dict_labels): "" " This function take in a dictionnary of labels: dict_labels and replace the values(previously labelencode) into the string. ex: dict_labels = { { 'col1': { 1: 'A', 2: 'B' } } "" " for field, values in dict_labels.items(): print("I am remapping %s" % field) data.replace({ field: values }, inplace = True) print("DONE") return data
In this tutorial, you learned how to analyze and transform your Pandas DataFrame using vectorized functions, and the .map() and .apply() methods. The section below provides a recap of everything you’ve learned:,In the following sections, you’ll dive deeper into each of these scenarios to see how the .map() method can be used to transform and map a Pandas column. ,The Pandas .apply() method can pass a function to either a single column or an entire DataFrame,To follow along with this tutorial, copy the code provided below to load a sample Pandas DataFrame. The dataset provides a number of helpful columns, allowing us to manipulate and transform our data in different ways.
To follow along with this tutorial, copy the code provided below to load a sample Pandas DataFrame. The dataset provides a number of helpful columns, allowing us to manipulate and transform our data in different ways.
# Loading a Sample Pandas DataFrame import pandas as pd df = pd.DataFrame({ 'name': ['James', 'Jane', 'Melissa', 'Ed', 'Neil'], 'age': [30, 40, 32, 67, 43], 'score': ['90%', '95%', '100%', '82%', '87%'], 'age_missing_data': [30, 40, 32, 67, None], 'income': [100000, 80000, 55000, 62000, 120000] }) print(df) # Returns: # name age score age_missing_data income # 0 James 30 90 % 30.0 100000 # 1 Jane 40 95 % 40.0 80000 # 2 Melissa 32 100 % 32.0 55000 # 3 Ed 67 82 % 67.0 62000 # 4 Neil 43 87 % NaN 120000
In fact, you’ve likely been using vectorized expressions, perhaps, without even knowing it! When you apply, say, .mean()
to a Pandas column, you’re applying a vectorized method. Let’s visualize how we could do this both with a for loop and with a vectorized function.
# Visualizing the Difference Between Vectorization and Scalar Operations # Scalar Operations(Simplified using a for loop) length = 0 age_sum = 0 for item in df['ages']: length += 1 age_sum += item average_age_for_loop = age_sum / length # Vectorized Implementation average_age_vectorized = df['age'].mean()
For example, we could map in the gender of each person in our DataFrame by using the .map()
method. Let’s define a dictionary where the keys are the people and their corresponding gender are the keys’ values.
# Creating a dictionary of genders genders = { 'James': 'Male', 'Jane': 'Female', 'Melissa': 'Female', 'Ed': 'Male', 'Neil': 'Male' }
Let’s design a function that evaluates whether each person’s income is higher or lower than the average income. We’ll then apply that function using the .map()
method:
# Mapping in a custom function mean_income = df['income'].mean() def higher_income(x): return x > mean_income df['higher_than_avg_income'] = df['income'].map(higher_income) print(df) # Returns: # name age score age_missing_data income higher_than_avg_income # 0 James 30 90 % 30.0 100000 True # 1 Jane 40 95 % 40.0 80000 False # 2 Melissa 32 100 % 32.0 55000 False # 3 Ed 67 82 % 67.0 62000 False # 4 Neil 43 87 % NaN 120000 True
Python allows us to define anonymous functions, lambda functions, which are functions that are defined without a name. This can be helpful when we need to use a function only a single time and want to simplify the use of the function. Let’s see how we can replicate the example above with the use of a lambda function:
# Mapping in an Anonymous Function mean_income = df['income'].mean() df['higher_than_avg_income'] = df['income'].map(lambda x: x > mean_income) print(df) # Returns: # name age score age_missing_data income higher_than_avg_income # 0 James 30 90 % 30.0 100000 True # 1 Jane 40 95 % 40.0 80000 False # 2 Melissa 32 100 % 32.0 55000 False # 3 Ed 67 82 % 67.0 62000 False # 4 Neil 43 87 % NaN 120000 True
df['percent'] = df['score'].map(lambda x: int(x.replace('%', ''))) print(df) # Returns: # name age score age_missing_data income percent # 0 James 30 90 % 30.0 100000 90 # 1 Jane 40 95 % 40.0 80000 95 # 2 Melissa 32 100 % 32.0 55000 100 # 3 Ed 67 82 % 67.0 62000 82 # 4 Neil 43 87 % NaN 120000 87
total_income = df['income'].sum() df['perc_of_total'] = df['income'] / total_income print(df) # name age score age_missing_data income perc_of_total # 0 James 30 90 % 30.0 100000 0.239808 # 1 Jane 40 95 % 40.0 80000 0.191847 # 2 Melissa 32 100 % 32.0 55000 0.131894 # 3 Ed 67 82 % 67.0 62000 0.148681 # 4 Neil 43 87 % NaN 120000 0.287770
pandas map() function from Series is used to substitute each value in a Series with another value, that may be derived from a function, a dict or a Series. Since DataFrame columns are series, you can use map() to update the column and assign it back to the DataFrame.,In this article, I have explained map() function is from the Series which is used to substitute each value in a Series with another value and returns a Series object, since DataFrame is a collection of Series, you can use the map() function to update the DataFrame.,map() when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.,The following is the syntax of the pandas map() function. This accepts arg and na_action as parameters and returns a Series.
The following is the syntax of the pandas map() function. This accepts arg and na_action as parameters and returns a Series.
# Syntax of Series.map() Series.map(arg, na_action = None)
# Create a pandas DataFrame. import pandas as pd import numpy as np technologies = { 'Fee': [22000, 25000, 23000, np.NaN, 26000], 'Duration': ['30days', '50days', '30days', '35days', '40days'] } df = pd.DataFrame(technologies) print(df)
Yields below output.
Fee Duration 0 22000.0 30 days 1 25000.0 50 days 2 23000.0 30 days 3 NaN 35 days 4 26000.0 40 days
Fee Duration 0 19800.0 30 days 1 22500.0 50 days 2 20700.0 30 days 3 NaN 35 days 4 23400.0 40 days
You can also apply a function with the lambda as below. This yields the same output as above.
# Using custom
function
def fun1(x):
return x / 100
df['Fee'] = df['Fee'].map(lambda x: fun1(x))
In this tutorial, we'll learn how to map column with dictionary in Pandas DataFrame. We are going to use Pandas method pandas.Series.map which is described as:,In this tutorial, we saw several options to map, replace, update and add new columns based on a dictionary in Pandas.,An alternative solution to map column to dict is by using the function pandas.Series.replace.,To map dictionary from existing column to new column we need to change column name:
In the post, we'll use the following DataFrame, which consists of several rows and columns:
import pandas as pd
import numpy as np
data = {
'Member': {
0: 'John',
1: 'Bill',
2: 'Jim',
3: 'Steve'
},
'Disqualified': {
0: 0,
1: 1,
2: 0,
3: 1
},
'Paid': {
0: 1,
1: 0,
2: 0,
3: np.nan
}
}
df = pd.DataFrame(data)
We are going to map column Disqualified to boolean values - 1 will be mapped as True
and 0 will be mapped as False
:
dict_map = {
1: 'True',
0: 'False'
}
df['Disqualified'].map(dict_map)
The result is a new Pandas Series with the mapped values:
0 False 1 True 2 False 3 True Name: Disqualified, dtype: object
To map dictionary from existing column to new column we need to change column name:
df['Disqualified Boolean'] = df['Disqualified'].map(dict_map)
What will happen if a value is not present in the mapping dictionary? In this case we will end with NA
value:
df['Paid'].map(dict_map)