create dummy variable of multiple columns with python

  • Last Update :
  • Techknowledgy :

In this section, we are going to use pandas get_dummies() to generate dummy variables in Python. First, we are going to work with the categorical variable “sex”. That is, we will start with dummy coding a categorical variable with two levels. ,In this section, we are going to create a dummy variable in Python using Pandas get_dummies method. Specifically, we will generate dummy variables for a categorical variable with two levels (i.e., male and female). ,To create dummy variables in Python, with Pandas, we can use this code template:,Use of dummy variables in regression equations (Peer-reviewed article)

To create dummy variables in Python, with Pandas, we can use this code template:

.wp - block - code {
      border: 0;
      padding: 0;
   }

   .wp - block - code > div {
      overflow: auto;
   }

   .shcb - language {
      border: 0;
      clip: rect(1 px, 1 px, 1 px, 1 px); -
      webkit - clip - path: inset(50 % );
      clip - path: inset(50 % );
      height: 1 px;
      margin: -1 px;
      overflow: hidden;
      padding: 0;
      position: absolute;
      width: 1 px;
      word - wrap: normal;
      word - break: normal;
   }

   .hljs {
      box - sizing: border - box;
   }

   .hljs.shcb - code - table {
      display: table;
      width: 100 % ;
   }

   .hljs.shcb - code - table > .shcb - loc {
      color: inherit;
      display: table - row;
      width: 100 % ;
   }

   .hljs.shcb - code - table.shcb - loc > span {
      display: table - cell;
   }

   .wp - block - code code.hljs: not(.shcb - wrap - lines) {
      white - space: pre;
   }

   .wp - block - code code.hljs.shcb - wrap - lines {
      white - space: pre - wrap;
   }

   .hljs.shcb - line - numbers {
      border - spacing: 0;
      counter - reset: line;
   }

   .hljs.shcb - line - numbers > .shcb - loc {
      counter - increment: line;
   }

   .hljs.shcb - line - numbers.shcb - loc > span {
      padding - left: 0.75 em;
   }

   .hljs.shcb - line - numbers.shcb - loc::before {
      border - right: 1 px solid #ddd;
      content: counter(line);
      display: table - cell;
      padding: 0 0.75 em;
      text - align: right; -
      webkit - user - select: none; -
      moz - user - select: none; -
      ms - user - select: none;
      user - select: none;
      white - space: nowrap;
      width: 1 % ;
   }
# Creating dummy variables:
   df_dc = pd.get_dummies(df, columns = ['ColumnToDummyCode']) Code language: Python(python)

Now, before we start using Pandas get_dummies() method, we need to load pandas and import the data.

import pandas as pd

data_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/Salaries.csv'
df = pd.read_csv(data_url, index_col = 0)

df.head() Code language: Python(python)

In the first Python dummy coding example below, we are using Pandas get_dummies to make dummy variables. Note, we are using a series as data and, thus, get two new columns named Female and Male.

# Pandas get_dummies on one column:
   pd.get_dummies(df['sex']).head() Code language: Python(python)

In the output (using Pandas head()), we can see that Pandas get_dummies automatically added “sex” as prefix and underscore as prefix separator. If we, however, want to change the prefix as well as the prefix separator we can add these arguments to Pandas get_dummies():

# Changing the prefix
for the dummy variables:
   df_dummies = pd.get_dummies(df, prefix = 'Gender', prefix_sep = '.',
      columns = ['sex'])
df_dummies.head() Code language: Python(python)

In the next Pandas dummies example code, we are going to make dummy variables with Python but we will set the prefix and the prefix_sep arguments so that we the column name will be the factor levels (categories):

# Remove the prefix and separator when dummy coding:
   df_dummies = pd.get_dummies(df, prefix = , prefix_sep = '',
      columns = ['sex'])
df_dummies.head() Code language: Python(python)

Thank you for your kind comment. I am glad you liked the article and the notebook with the get_dummy() code examples. Now, if I understand your question correctly, you can add your unique prefixes to the prefix parameter. For example, in the last example (in the Notebook) you can do like this:

df_dummies = pd.get_dummies(df, prefix = ['p1', 'p2', '3'], prefix_sep = '',
   columns = ['rank', 'sex', 'discipline'])

Suggestion : 2

If need indicators in output use max, if need count values use sum after get_dummies with another parameters and casting values to strings:

df = pd.get_dummies(df.astype(str), prefix = '', prefix_sep = '').max(level = 0, axis = 1)
#count alternative
#df = pd.get_dummies(df.astype(str), prefix = '', prefix_sep = '').sum(level = 0, axis = 1)
print(df)
1 2 3 4
0 1 1 0 0
1 0 1 1 0
2 0 0 1 1

Different ways of skinning a cat; here's how I'd do it—use an additional groupby:

# pd.get_dummies(df.astype(str)).groupby(lambda x: x.split('_')[1], axis = 1).sum()
pd.get_dummies(df.astype(str)).groupby(lambda x: x.split('_')[1], axis = 1).max()

1 2 3 4
0 1 1 0 0
1 0 1 1 0
2 0 0 1 1

Another option is stacking, if you like conciseness:

# pd.get_dummies(df.stack()).sum(level = 0)
pd.get_dummies(df.stack()).max(level = 0)

1 2 3 4
0 1 1 0 0
1 0 1 1 0
2 0 0 1 1

Suggestion : 3

This tutorial demonstrates how to convert a pandas DataFrame column to a 1/0 dummy matrix in Python programming.,Convert True/False Boolean to 1/0 Dummy Integer in pandas DataFrame Column in Python,Create pandas DataFrame with Multiindex in Python,Convert 1/0 Integer Dummy to True/False Boolean in Columns of pandas DataFrame in Python

import pandas as pd # Import pandas library in Python
data = pd.DataFrame({
   'x1': range(10, 15),
   # Create example DataFrame 'x2': ['x', 'y', 'y', 'x', 'y']
})
print(data) # Print example DataFrame
data_dummies = pd.get_dummies(data['x2']) # Convert column to dummies
print(data_dummies) # Print dummies
data_all = pd.concat([data, data_dummies], # Combine original data & dummies axis = 1)
print(data_all) # Print all data

Suggestion : 4

Create dummy variable of multiple columns with python,Create new column with ratio values based on multiple other columns in python pandas,COUNTIF in pandas python over multiple columns with multiple conditions,Create multiple pandas DataFrame columns from applying a function with multiple returns

If need indicators in output use max, if need count values use sum after get_dummies with another parameters and casting values to strings:

df = pd.get_dummies(df.astype(str), prefix = '', prefix_sep = '').max(level = 0, axis = 1)
#count alternative
#df = pd.get_dummies(df.astype(str), prefix = '', prefix_sep = '').sum(level = 0, axis = 1)
print(df)
1 2 3 4
0 1 1 0 0
1 0 1 1 0
2 0 0 1 1

Different ways of skinning a cat; here's how I'd do it—use an additional groupby:

# pd.get_dummies(df.astype(str)).groupby(lambda x: x.split('_')[1], axis = 1).sum()
pd.get_dummies(df.astype(str)).groupby(lambda x: x.split('_')[1], axis = 1).max()

1 2 3 4
0 1 1 0 0
1 0 1 1 0
2 0 0 1 1

Another option is stacking, if you like conciseness:

# pd.get_dummies(df.stack()).sum(level = 0)
pd.get_dummies(df.stack()).max(level = 0)

1 2 3 4
0 1 1 0 0
1 0 1 1 0
2 0 0 1 1