You can use the reindex
method. Pass in the list of column names and specify 'columns'
. The fill value for missing entries is NaN
by default:
>>> df1.reindex(column_master_list, axis = 'columns')
b c e d a
0 32 32 NaN NaN 1
You can pass a list of columns to [] to select columns in that order: If a column is not contained in the DataFrame, an exception will be raised. Multiple columns can also be set in this manner:,As a convenience, you can pass a list of arrays directly into Series or DataFrame to construct a MultiIndex automatically:,As a convenience, there is a new function on DataFrame called reset_index which transfers the index values into the DataFrame’s columns and sets a simple integer index. This is the inverse operation to set_index,With DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.
In [542]: dates = np.asarray(date_range('1/1/2000', periods=8))
In [543]: df = DataFrame(randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
In [544]: df
Out[544]:
A B C D
2000-01-01 0.469112 -0.282863 -1.509059 -1.135632
2000-01-02 1.212112 -0.173215 0.119209 -1.044236
2000-01-03 -0.861849 -2.104569 -0.494929 1.071804
2000-01-04 0.721555 -0.706771 -1.039575 0.271860
2000-01-05 -0.424972 0.567020 0.276232 -1.087401
2000-01-06 -0.673690 0.113648 -1.478427 0.524988
2000-01-07 0.404705 0.577046 -1.715002 -1.039268
2000-01-08 -0.370647 -1.157892 -1.344312 0.844885
In [545]: panel = Panel({'one' : df, 'two' : df - df.mean()})
In [546]: panel
Out[546]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 8 (major) x 4 (minor)
Items: one to two
Major axis: 2000-01-01 00:00:00 to 2000-01-08 00:00:00
Minor axis: A to D
In[547]: s = df['A']
In[548]: s[dates[5]]
Out[548]: -0.67368970808837059
In[549]: panel['two']
Out[549]:
A B C D
2000 - 01 - 01 0.409571 0.113086 - 0.610826 - 0.936507
2000 - 01 - 02 1.152571 0.222735 1.017442 - 0.845111
2000 - 01 - 03 - 0.921390 - 1.708620 0.403304 1.270929
2000 - 01 - 04 0.662014 - 0.310822 - 0.141342 0.470985
2000 - 01 - 05 - 0.484513 0.962970 1.174465 - 0.888276
2000 - 01 - 06 - 0.733231 0.509598 - 0.580194 0.724113
2000 - 01 - 07 0.345164 0.972995 - 0.816769 - 0.840143
2000 - 01 - 08 - 0.430188 - 0.761943 - 0.446079 1.044010
In[550]: s.get_value(dates[5])
Out[550]: -0.67368970808837059
In[551]: df.get_value(dates[5], 'A')
Out[551]: -0.67368970808837059
In[552]: df.set_value(dates[5], 'E', 7)
Out[552]:
A B C D E
2000 - 01 - 01 0.469112 - 0.282863 - 1.509059 - 1.135632 NaN
2000 - 01 - 02 1.212112 - 0.173215 0.119209 - 1.044236 NaN
2000 - 01 - 03 - 0.861849 - 2.104569 - 0.494929 1.071804 NaN
2000 - 01 - 04 0.721555 - 0.706771 - 1.039575 0.271860 NaN
2000 - 01 - 05 - 0.424972 0.567020 0.276232 - 1.087401 NaN
2000 - 01 - 06 - 0.673690 0.113648 - 1.478427 0.524988 7
2000 - 01 - 07 0.404705 0.577046 - 1.715002 - 1.039268 NaN
2000 - 01 - 08 - 0.370647 - 1.157892 - 1.344312 0.844885 NaN
In[553]: df.A
Out[553]:
2000 - 01 - 01 0.469112
2000 - 01 - 02 1.212112
2000 - 01 - 03 - 0.861849
2000 - 01 - 04 0.721555
2000 - 01 - 05 - 0.424972
2000 - 01 - 06 - 0.673690
2000 - 01 - 07 0.404705
2000 - 01 - 08 - 0.370647
Name: A
In[554]: df
Out[554]:
A B C D
2000 - 01 - 01 0.469112 - 0.282863 - 1.509059 - 1.135632
2000 - 01 - 02 1.212112 - 0.173215 0.119209 - 1.044236
2000 - 01 - 03 - 0.861849 - 2.104569 - 0.494929 1.071804
2000 - 01 - 04 0.721555 - 0.706771 - 1.039575 0.271860
2000 - 01 - 05 - 0.424972 0.567020 0.276232 - 1.087401
2000 - 01 - 06 - 0.673690 0.113648 - 1.478427 0.524988
2000 - 01 - 07 0.404705 0.577046 - 1.715002 - 1.039268
2000 - 01 - 08 - 0.370647 - 1.157892 - 1.344312 0.844885
In[555]: df[['B', 'A']] = df[['A', 'B']]
In[556]: df
Out[556]:
A B C D
2000 - 01 - 01 - 0.282863 0.469112 - 1.509059 - 1.135632
2000 - 01 - 02 - 0.173215 1.212112 0.119209 - 1.044236
2000 - 01 - 03 - 2.104569 - 0.861849 - 0.494929 1.071804
2000 - 01 - 04 - 0.706771 0.721555 - 1.039575 0.271860
2000 - 01 - 05 0.567020 - 0.424972 0.276232 - 1.087401
2000 - 01 - 06 0.113648 - 0.673690 - 1.478427 0.524988
2000 - 01 - 07 0.577046 0.404705 - 1.715002 - 1.039268
2000 - 01 - 08 - 1.157892 - 0.370647 - 1.344312 0.844885
DataFrame.assign() is also used to add a constant column to the pandas DataFrame, this method returns a new DataFrame after adding a "Discount_Percentage" column to the existing DataFrame.,In pandas you can add a new constant column with a literal value to DataFrame using assign() method, this method returns a new Dataframe after adding a column. insert() is also used to update the existing DataFrame with a new constant column. In this article, I will explain several ways of how to add a new column with a constant value to pandas DataFrame with examples.,You can also use DataFrame.assign() method to add multiple constant columns to the pandas DataFrame. If you need to assign multiple columns with different values, you should use assign with a dictionary.,DataFrame.insert() method is used to add a new column to DataFrame at any position of the existing DataFrame. Using this you can specify the index where you would like to add a column. The below example adds a constant column at the second position (Index 1). Note that in pandas, the Index starts from zero.
If you are in a hurry, below are some quick examples of how to add a constant column value to pandas DataFrame.
# Below are quick example # Adding new column with a constant value df["Discount_Percentage"] = 10 # Using DataFrame.insert() to add column constant value df.insert(1, 'Discount_Percentage', '10') # Add a constant number to each column elements df['Discount'] = df['Discount'] + 150 # Using DataFrame.apply() and lambda function df['Discount_Percentage'] = df.apply(lambda x: 10, axis = 1) # Using DataFrame.assign() to add constant column df2 = df.assign(Discount_Percentage = 10) # Add multiple constant columns data = { 'Discount_Percentage': 10, 'Advance': 1000 } df2 = df.assign( ** data) # Using pandas series df['Discount_Percentage'] = pd.Series([10 for x in range(len(df.index)) ])
import pandas as pd
technologies = {
'Courses': ["Spark", "PySpark", "Python", "pandas"],
'Fee': [20000, 25000, 22000, 30000],
'Duration': ['30days', '40days', '35days', '50days'],
'Discount': [1000, 2300, 1200, 2000]
}
index_labels = ['r1', 'r2', 'r3', 'r4']
df = pd.DataFrame(technologies, index = index_labels)
print(df)
Yields below output.
Courses Fee Duration Discount r1 Spark 20000 30 days 1000 r2 PySpark 25000 40 days 2300 r3 Python 22000 35 days 1200 r4 pandas 30000 50 days 2000
Courses Fee Duration Discount Discount_Percentage 0 Spark 20000 30 days 1000 10 1 PySpark 25000 40 days 2300 10 2 Python 22000 35 days 1200 10 3 pandas 30000 50 days 2000 10
insert()
method updates the existing DataFrame object with the new column.
# Using DataFrame.insert() to add column constant value df = pd.DataFrame(technologies, index = index_labels) df.insert(1, 'Discount_Percentage', '10') print(df)