You just need to handle NaNs
df['Colors'] = df[['Black', 'Red', 'Blue', 'Green']].apply(lambda x: ', '.join(x[x.notnull()]), axis = 1)
ID Black Red Blue Green Colors
0 120 NaN red NaN green red, green
1 121 black NaN blue NaN black, blue
Using dot
s = df.iloc[: , 1: ]
s.notnull()
Black Red Blue Green
0 False True False True
1 True True True False
s.notnull().dot(s.columns + ',').str[: -1]
0 Red, Green
1 Black, Red, Blue
dtype: object
df['color'] = s.notnull().dot(s.columns + ',').str[: -1]
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.,In this article, you have learned how to combine two or multiple string columns in pandas DataFrame using + operator, DataFrame.map(), DataFrame.agg(), and Series.str.cat(), DataFrame.apply() method.,To join multiple string columns, you can also use DataFrame.agg() method. Like above pass all the columns you wanted to merge as a list.,You can also use the .apply() function compressing two or multiple columns of the DataFrame to a single column. join() function is used to join strings. DataFrame.apply() function is used to apply another function on a specific axis.
If you are in a hurry, below are some quick examples of how to combine two columns of text in pandas DataFrame.
# Below are quick example # Using + operator to combine two columns df["Period"] = df['Courses'].astype(str) + "-" + df["Duration"] # Using apply() method to combine two columns of text df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis = 1) # Using DataFrame.agg() to combine two columns of text df["period"] = df[['Courses', 'Duration']].agg('-'.join, axis = 1) # Using Series.str.cat() function df["Period"] = df["Courses"].str.cat(df["Duration"], sep = "-") # Using DataFrame.apply() and lambda function df["Period"] = df[["Courses", "Duration"]].apply(lambda x: "-".join(x), axis = 1) # Using map() function to combine two columns of text df["Period"] = df["Courses"].map(str) + "-" + df["Duration"]
import pandas as pd
technologies = ({
'Courses': ["Spark", "PySpark", "Hadoop", "Python", "pandas"],
'Fee': [20000, 25000, 26000, 22000, 24000],
'Duration': ['30days', '40days', '35days', '40days', '60days'],
'Discount': [1000, 1500, 2500, 2100, 2000]
})
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Courses Fee Duration Discount 0 Spark 20000 30 days 1000 1 PySpark 25000 40 days 1500 2 Hadoop 26000 35 days 2500 3 Python 22000 40 days 2100 4 pandas 24000 60 days 2000
Courses Fee Duration Discount Period 0 Spark 20000 30 days 1000 Spark - 30 days 1 PySpark 25000 40 days 1500 PySpark - 40 days 2 Hadoop 26000 35 days 2500 Hadoop - 35 days 3 Python 22000 40 days 2100 Python - 40 days 4 pandas 24000 60 days 2000 pandas - 60 days
You can also use the .apply()
function compressing two or multiple columns of the DataFrame to a single column. join()
function is used to join strings. DataFrame.apply()
function is used to apply another function on a specific axis.
# Using apply() method to combine two columns of text df["Period"] = df[["Courses", "Duration"]].apply("-".join, axis = 1) print(df)
Appending rows to a DataFrame,This is also a valid argument to DataFrame.append:,left: A DataFrame object,Joining multiple DataFrame or Panel objects
In[1]: df = DataFrame(np.random.randn(10, 4))
In[2]: df
Out[2]:
0 1 2 3
0 0.469112 - 0.282863 - 1.509059 - 1.135632
1 1.212112 - 0.173215 0.119209 - 1.044236
2 - 0.861849 - 2.104569 - 0.494929 1.071804
3 0.721555 - 0.706771 - 1.039575 0.271860
4 - 0.424972 0.567020 0.276232 - 1.087401
5 - 0.673690 0.113648 - 1.478427 0.524988
6 0.404705 0.577046 - 1.715002 - 1.039268
7 - 0.370647 - 1.157892 - 1.344312 0.844885
8 1.075770 - 0.109050 1.643563 - 1.469388
9 0.357021 - 0.674600 - 1.776904 - 0.968914
[10 rows x 4 columns]
#
break it into pieces
In[3]: pieces = [df[: 3], df[3: 7], df[7: ]]
In[4]: concatenated = concat(pieces)
In[5]: concatenated
Out[5]:
0 1 2 3
0 0.469112 - 0.282863 - 1.509059 - 1.135632
1 1.212112 - 0.173215 0.119209 - 1.044236
2 - 0.861849 - 2.104569 - 0.494929 1.071804
3 0.721555 - 0.706771 - 1.039575 0.271860
4 - 0.424972 0.567020 0.276232 - 1.087401
5 - 0.673690 0.113648 - 1.478427 0.524988
6 0.404705 0.577046 - 1.715002 - 1.039268
7 - 0.370647 - 1.157892 - 1.344312 0.844885
8 1.075770 - 0.109050 1.643563 - 1.469388
9 0.357021 - 0.674600 - 1.776904 - 0.968914
[10 rows x 4 columns]
concat(objs, axis = 0, join = 'outer', join_axes = None, ignore_index = False,
keys = None, levels = None, names = None, verify_integrity = False)
In[6]: concatenated = concat(pieces, keys = ['first', 'second', 'third'])
In[7]: concatenated
Out[7]:
0 1 2 3
first 0 0.469112 - 0.282863 - 1.509059 - 1.135632
1 1.212112 - 0.173215 0.119209 - 1.044236
2 - 0.861849 - 2.104569 - 0.494929 1.071804
second 3 0.721555 - 0.706771 - 1.039575 0.271860
4 - 0.424972 0.567020 0.276232 - 1.087401
5 - 0.673690 0.113648 - 1.478427 0.524988
6 0.404705 0.577046 - 1.715002 - 1.039268
third 7 - 0.370647 - 1.157892 - 1.344312 0.844885
8 1.075770 - 0.109050 1.643563 - 1.469388
9 0.357021 - 0.674600 - 1.776904 - 0.968914
[10 rows x 4 columns]
In[8]: concatenated.ix['second']
Out[8]:
0 1 2 3
3 0.721555 - 0.706771 - 1.039575 0.271860
4 - 0.424972 0.567020 0.276232 - 1.087401
5 - 0.673690 0.113648 - 1.478427 0.524988
6 0.404705 0.577046 - 1.715002 - 1.039268
[4 rows x 4 columns]
In[9]: from pandas.util.testing
import rands
In[10]: df = DataFrame(np.random.randn(10, 4), columns = ['a', 'b', 'c', 'd'],
....: index = [rands(5) for _ in range(10)])
....:
In[11]: df
Out[11]:
a b c d
6 I74i - 1.294524 0.413738 0.276662 - 0.472035
RP8O8 - 0.013960 - 0.362543 - 0.006154 - 0.923061
lTKuy 0.895717 0.805244 - 1.206412 2.565646
BmVOx 1.431256 1.340309 - 1.170299 - 0.226169
qp7p7 0.410835 0.813850 0.132003 - 0.827317
k3K2f - 0.076467 - 1.187678 1.130127 - 1.436737
HGqMS - 1.413681 1.607920 1.024180 0.569605
Xby44 0.875906 - 2.211372 0.974466 - 2.006747
PL69Z - 0.410001 - 0.078638 0.545952 - 1.219217
AZAf4 - 1.226825 0.769804 - 1.281247 - 0.727707
[10 rows x 4 columns]
In[12]: concat([df.ix[: 7, ['a', 'b']], df.ix[2: -2, ['c']],
....: df.ix[-7: , ['d']]
], axis = 1)
....:
Out[12]:
a b c d
6 I74i - 1.294524 0.413738 NaN NaN
AZAf4 NaN NaN NaN - 0.727707
BmVOx 1.431256 1.340309 - 1.170299 - 0.226169
HGqMS - 1.413681 1.607920 1.024180 0.569605
PL69Z NaN NaN NaN - 1.219217
RP8O8 - 0.013960 - 0.362543 NaN NaN
Xby44 NaN NaN 0.974466 - 2.006747
k3K2f - 0.076467 - 1.187678 1.130127 - 1.436737
lTKuy 0.895717 0.805244 - 1.206412 NaN
qp7p7 0.410835 0.813850 0.132003 - 0.827317
[10 rows x 4 columns]
In[13]: concat([df.ix[: 7, ['a', 'b']], df.ix[2: -2, ['c']],
....: df.ix[-7: , ['d']]
], axis = 1, join = 'inner')
....:
Out[13]:
a b c d
BmVOx 1.431256 1.340309 - 1.170299 - 0.226169
qp7p7 0.410835 0.813850 0.132003 - 0.827317
k3K2f - 0.076467 - 1.187678 1.130127 - 1.436737
HGqMS - 1.413681 1.607920 1.024180 0.569605
[4 rows x 4 columns]
November 27, 2018 by cmdline
Let us first create a simple Pandas data frame using Pandas’ DataFrame function.
# import Pandas as pd import pandas as pd # create a new data frame df = pd.DataFrame({ 'Last': ['Smith', 'Nadal', 'Federer'], 'First': ['Steve', 'Joe', 'Roger'], 'Age': [32, 34, 36] }) df
Here, we made a toy data frame with three columns and last name and first names are in two separate columns.
Age First Last 0 32 Steve Smith 1 34 Joe Nadal 2 36 Roger Federer
Let us use Python str function on first name and chain it with cat method and provide the last name as argument to cat function.
df['Name'] = df['First'].str.cat(df['Last'], sep = " ")
df
Another way to join two columns in Pandas is to simply use the + symbol. For example, to concatenate First Name column and Last Name column, we can do
df["Name"] = df["First"] + df["Last"]
We will get our results like this.
Last First Age Name 0 Smith Steve 32 SteveSmith 1 Nadal Joe 34 JoeNadal 2 Federer Roger 36 RogerFederer