merge columns and create new column with pandas

  • Last Update :
  • Techknowledgy :

You just need to handle NaNs

df['Colors'] = df[['Black', 'Red', 'Blue', 'Green']].apply(lambda x: ', '.join(x[x.notnull()]), axis = 1)

ID Black Red Blue Green Colors
0 120 NaN red NaN green red, green
1 121 black NaN blue NaN black, blue

Using dot

s = df.iloc[: , 1: ]
s.notnull()
Black Red Blue Green
0 False True False True
1 True True True False
s.notnull().dot(s.columns + ',').str[: -1]
0 Red, Green
1 Black, Red, Blue
dtype: object

df['color'] = s.notnull().dot(s.columns + ',').str[: -1]

Suggestion : 2

By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.,In this article, you have learned how to combine two or multiple string columns in pandas DataFrame using + operator, DataFrame.map(), DataFrame.agg(), and Series.str.cat(), DataFrame.apply() method.,To join multiple string columns, you can also use DataFrame.agg() method. Like above pass all the columns you wanted to merge as a list.,You can also use the .apply() function compressing two or multiple columns of the DataFrame to a single column. join() function is used to join strings. DataFrame.apply() function is used to apply another function on a specific axis.

If you are in a hurry, below are some quick examples of how to combine two columns of text in pandas DataFrame.

# Below are quick example
# Using + operator to combine two columns
df["Period"] = df['Courses'].astype(str) + "-" + df["Duration"]

# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis = 1)

# Using DataFrame.agg() to combine two columns of text
df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis = 1)

# Using Series.str.cat()
function
df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-")

# Using DataFrame.apply() and lambda
function
df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis = 1)

# Using map()
function to combine two columns of text
df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
2._
import pandas as pd
technologies = ({
   'Courses': ["Spark", "PySpark", "Hadoop", "Python", "pandas"],
   'Fee': [20000, 25000, 26000, 22000, 24000],
   'Duration': ['30days', '40days', '35days', '40days', '60days'],
   'Discount': [1000, 1500, 2500, 2100, 2000]
})
df = pd.DataFrame(technologies)
print(df)

Yields below output.

Courses Fee Duration Discount
0 Spark 20000 30 days 1000
1 PySpark 25000 40 days 1500
2 Hadoop 26000 35 days 2500
3 Python 22000 40 days 2100
4 pandas 24000 60 days 2000
5._
Courses Fee Duration Discount Period
0 Spark 20000 30 days 1000 Spark - 30 days
1 PySpark 25000 40 days 1500 PySpark - 40 days
2 Hadoop 26000 35 days 2500 Hadoop - 35 days
3 Python 22000 40 days 2100 Python - 40 days
4 pandas 24000 60 days 2000 pandas - 60 days

You can also use the .apply() function compressing two or multiple columns of the DataFrame to a single column. join() function is used to join strings. DataFrame.apply() function is used to apply another function on a specific axis.

# Using apply() method to combine two columns of text
df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis = 1)
print(df)

Suggestion : 3

Appending rows to a DataFrame,This is also a valid argument to DataFrame.append:,left: A DataFrame object,Joining multiple DataFrame or Panel objects

In[1]: df = DataFrame(np.random.randn(10, 4))

In[2]: df
Out[2]:
   0 1 2 3
0 0.469112 - 0.282863 - 1.509059 - 1.135632
1 1.212112 - 0.173215 0.119209 - 1.044236
2 - 0.861849 - 2.104569 - 0.494929 1.071804
3 0.721555 - 0.706771 - 1.039575 0.271860
4 - 0.424972 0.567020 0.276232 - 1.087401
5 - 0.673690 0.113648 - 1.478427 0.524988
6 0.404705 0.577046 - 1.715002 - 1.039268
7 - 0.370647 - 1.157892 - 1.344312 0.844885
8 1.075770 - 0.109050 1.643563 - 1.469388
9 0.357021 - 0.674600 - 1.776904 - 0.968914

[10 rows x 4 columns]

#
break it into pieces
In[3]: pieces = [df[: 3], df[3: 7], df[7: ]]

In[4]: concatenated = concat(pieces)

In[5]: concatenated
Out[5]:
   0 1 2 3
0 0.469112 - 0.282863 - 1.509059 - 1.135632
1 1.212112 - 0.173215 0.119209 - 1.044236
2 - 0.861849 - 2.104569 - 0.494929 1.071804
3 0.721555 - 0.706771 - 1.039575 0.271860
4 - 0.424972 0.567020 0.276232 - 1.087401
5 - 0.673690 0.113648 - 1.478427 0.524988
6 0.404705 0.577046 - 1.715002 - 1.039268
7 - 0.370647 - 1.157892 - 1.344312 0.844885
8 1.075770 - 0.109050 1.643563 - 1.469388
9 0.357021 - 0.674600 - 1.776904 - 0.968914

[10 rows x 4 columns]
concat(objs, axis = 0, join = 'outer', join_axes = None, ignore_index = False,
   keys = None, levels = None, names = None, verify_integrity = False)
In[6]: concatenated = concat(pieces, keys = ['first', 'second', 'third'])

In[7]: concatenated
Out[7]:
   0 1 2 3
first 0 0.469112 - 0.282863 - 1.509059 - 1.135632
1 1.212112 - 0.173215 0.119209 - 1.044236
2 - 0.861849 - 2.104569 - 0.494929 1.071804
second 3 0.721555 - 0.706771 - 1.039575 0.271860
4 - 0.424972 0.567020 0.276232 - 1.087401
5 - 0.673690 0.113648 - 1.478427 0.524988
6 0.404705 0.577046 - 1.715002 - 1.039268
third 7 - 0.370647 - 1.157892 - 1.344312 0.844885
8 1.075770 - 0.109050 1.643563 - 1.469388
9 0.357021 - 0.674600 - 1.776904 - 0.968914

[10 rows x 4 columns]
In[8]: concatenated.ix['second']
Out[8]:
   0 1 2 3
3 0.721555 - 0.706771 - 1.039575 0.271860
4 - 0.424972 0.567020 0.276232 - 1.087401
5 - 0.673690 0.113648 - 1.478427 0.524988
6 0.404705 0.577046 - 1.715002 - 1.039268

[4 rows x 4 columns]
In[9]: from pandas.util.testing
import rands

In[10]: df = DataFrame(np.random.randn(10, 4), columns = ['a', 'b', 'c', 'd'],
      ....: index = [rands(5) for _ in range(10)])
   ....:

   In[11]: df
Out[11]:
   a b c d
6 I74i - 1.294524 0.413738 0.276662 - 0.472035
RP8O8 - 0.013960 - 0.362543 - 0.006154 - 0.923061
lTKuy 0.895717 0.805244 - 1.206412 2.565646
BmVOx 1.431256 1.340309 - 1.170299 - 0.226169
qp7p7 0.410835 0.813850 0.132003 - 0.827317
k3K2f - 0.076467 - 1.187678 1.130127 - 1.436737
HGqMS - 1.413681 1.607920 1.024180 0.569605
Xby44 0.875906 - 2.211372 0.974466 - 2.006747
PL69Z - 0.410001 - 0.078638 0.545952 - 1.219217
AZAf4 - 1.226825 0.769804 - 1.281247 - 0.727707

[10 rows x 4 columns]

In[12]: concat([df.ix[: 7, ['a', 'b']], df.ix[2: -2, ['c']],
      ....: df.ix[-7: , ['d']]
   ], axis = 1)
   ....:
   Out[12]:
   a b c d
6 I74i - 1.294524 0.413738 NaN NaN
AZAf4 NaN NaN NaN - 0.727707
BmVOx 1.431256 1.340309 - 1.170299 - 0.226169
HGqMS - 1.413681 1.607920 1.024180 0.569605
PL69Z NaN NaN NaN - 1.219217
RP8O8 - 0.013960 - 0.362543 NaN NaN
Xby44 NaN NaN 0.974466 - 2.006747
k3K2f - 0.076467 - 1.187678 1.130127 - 1.436737
lTKuy 0.895717 0.805244 - 1.206412 NaN
qp7p7 0.410835 0.813850 0.132003 - 0.827317

[10 rows x 4 columns]
In[13]: concat([df.ix[: 7, ['a', 'b']], df.ix[2: -2, ['c']],
      ....: df.ix[-7: , ['d']]
   ], axis = 1, join = 'inner')
   ....:
   Out[13]:
   a b c d
BmVOx 1.431256 1.340309 - 1.170299 - 0.226169
qp7p7 0.410835 0.813850 0.132003 - 0.827317
k3K2f - 0.076467 - 1.187678 1.130127 - 1.436737
HGqMS - 1.413681 1.607920 1.024180 0.569605

[4 rows x 4 columns]

Suggestion : 4

November 27, 2018 by cmdline

Let us first create a simple Pandas data frame using Pandas’ DataFrame function.

#
import Pandas as pd
import pandas as pd
# create a new data frame
df = pd.DataFrame({
   'Last': ['Smith', 'Nadal', 'Federer'],
   'First': ['Steve', 'Joe', 'Roger'],
   'Age': [32, 34, 36]
})
df

Here, we made a toy data frame with three columns and last name and first names are in two separate columns.

Age First Last
0 32 Steve Smith
1 34 Joe Nadal
2 36 Roger Federer

Let us use Python str function on first name and chain it with cat method and provide the last name as argument to cat function.

df['Name'] = df['First'].str.cat(df['Last'], sep = " ")
df

Another way to join two columns in Pandas is to simply use the + symbol. For example, to concatenate First Name column and Last Name column, we can do

df["Name"] = df["First"] + df["Last"]

We will get our results like this.

      Last First Age Name
      0 Smith Steve 32 SteveSmith
      1 Nadal Joe 34 JoeNadal
      2 Federer Roger 36 RogerFederer