python pandas dataframe fill nan values

  • Last Update :
  • Techknowledgy :

pandas.DataFrame.fillna ,pandas.DataFrame.ffill, pandas.DataFrame.ffill , pandas.DataFrame.bfill

>>> df = pd.DataFrame([
         [np.nan, 2, np.nan, 0],
         ...[3, 4, np.nan, 1],
         ...[np.nan, np.nan, np.nan, np.nan],
         ...[np.nan, 3, np.nan, 4]
      ],
      ...columns = list("ABCD")) >>>
   df
A B C D
0 NaN 2.0 NaN 0.0
1 3.0 4.0 NaN 1.0
2 NaN NaN NaN NaN
3 NaN 3.0 NaN 4.0
>>> df.fillna(0)
A B C D
0 0.0 2.0 0.0 0.0
1 3.0 4.0 0.0 1.0
2 0.0 0.0 0.0 0.0
3 0.0 3.0 0.0 4.0
>>> df.fillna(method = "ffill")
A B C D
0 NaN 2.0 NaN 0.0
1 3.0 4.0 NaN 1.0
2 3.0 4.0 NaN 1.0
3 3.0 3.0 NaN 4.0
>>> values = {
      "A": 0,
      "B": 1,
      "C": 2,
      "D": 3
   } >>>
   df.fillna(value = values)
A B C D
0 0.0 2.0 2.0 0.0
1 3.0 4.0 2.0 1.0
2 0.0 1.0 2.0 3.0
3 0.0 3.0 2.0 4.0
>>> df.fillna(value = values, limit = 1)
A B C D
0 0.0 2.0 2.0 0.0
1 3.0 4.0 NaN 1.0
2 NaN 1.0 NaN 3.0
3 NaN 3.0 NaN 4.0
>>> df2 = pd.DataFrame(np.zeros((4, 4)), columns = list("ABCE")) >>>
   df.fillna(df2)
A B C D
0 0.0 2.0 0.0 0.0
1 3.0 4.0 0.0 1.0
2 0.0 0.0 0.0 NaN
3 0.0 3.0 0.0 4.0

Suggestion : 2

Last Updated : 03 Jul, 2020

df['DataFrame Column'] = df['DataFrame Column'].fillna(0)
df['DataFrame Column'] = df['DataFrame Column'].replace(np.nan, 0)
df.fillna(0)
df.replace(np.nan, 0)

Suggestion : 3

Example:

In[7]: df
Out[7]:
   0 1
0 NaN NaN
1 - 0.494375 0.570994
2 NaN NaN
3 1.876360 - 0.229738
4 NaN NaN

In[8]: df.fillna(0)
Out[8]:
   0 1
0 0.000000 0.000000
1 - 0.494375 0.570994
2 0.000000 0.000000
3 1.876360 - 0.229738
4 0.000000 0.000000

To fill the NaNs in only one column, select just that column. in this case I'm using inplace=True to actually change the contents of df.

In[12]: df[1].fillna(0, inplace = True)
Out[12]:
   0 0.000000
1 0.570994
2 0.000000
3 - 0.229738
4 0.000000
Name: 1

In[13]: df
Out[13]:
   0 1
0 NaN 0.000000
1 - 0.494375 0.570994
2 NaN 0.000000
3 1.876360 - 0.229738
4 NaN 0.000000

To avoid a SettingWithCopyWarning, use the built in column-specific functionality:

df.fillna({
   1: 0
}, inplace = True)

It is not guaranteed that the slicing returns a view or a copy. You can do

df['column'] = df['column'].fillna(value)

You could use replace to change NaN to 0:

import pandas as pd
import numpy as np

#
for column
df['column'] = df['column'].replace(np.nan, 0)

#
for whole dataframe
df = df.replace(np.nan, 0)

# inplace
df.replace(np.nan, 0, inplace = True)

The below code worked for me.

import pandas

df = pandas.read_csv('somefile.txt')

df = df.fillna(0)

I just wanted to provide a bit of an update/special case since it looks like people still come here. If you're using a multi-index or otherwise using an index-slicer the inplace=True option may not be enough to update the slice you've chosen. For example in a 2x2 level multi-index this will not change any values (as of pandas 0.15):

idx = pd.IndexSlice
df.loc[idx[: , mask_1], idx[mask_2,: ]].fillna(value = 0, inplace = True)

The solution is DataFrame.update:

df.update(df.loc[idx[: , mask_1], idx[[mask_2],: ]].fillna(value = 0))

You can also use dictionaries to fill NaN values of the specific columns in the DataFrame rather to fill all the DF with some oneValue.

import pandas as pd

df = pd.read_excel('example.xlsx')
df.fillna({
   'column1': 'Write your values here',
   'column2': 'Write your values here',
   'column3': 'Write your values here',
   'column4': 'Write your values here',
   .
   .
   .
   'column-n': 'Write your values here'
}, inplace = True)

Suggestion : 4

Use pandas fillna() method to fill a specified value on multiple DataFrame columns, the below example update columns Discount and Fee with 0 for NaN values.,fillna() method is used to fill NaN/NA values on a specified column or on an entire DataaFrame with any given value. You can specify modify using inplace, or limit how many filling to perform or choose an axis whether to fill on rows/column etc. The Below example fills all NaN values with None value.,In this article, you have learned DataFrame fillna() method to fill one column, multiple columns containing NaN with a specified value. Also learned to replace different values for each column.,The above example filled all NaN values on the entire DataFrame. some times you would need to replace just on one column, you can do so by selecting the DataFrame column for fillna() method.

1._
# fillna() on all columns
df2 = df.fillna('None')

# fillna() on once column
df2['Discount'] = df['Discount'].fillna(0)

# fillna() on multiple columns
df2[['Discount', 'Fee']] = df[['Discount', 'Fee']].fillna(0)

# fillna() on multiple columns with different values
df2 = df.fillna(value = {
   'Discount': 0,
   'Fee': 10000
})

# fill with limit
df2 = df.fillna(value = {
   'Discount': 0,
   'Fee': 0
}, limit = 1)

Below is the syntax of pandas.DataFrame.fillna() method. This takes parameters value, method, axis, inplace, limit, and downcast and returns a new DataFrame. When inplace=True is used, it returns None as the replace happens on the existing DataFrame object.

# Syntax of pandas.DataFrame.fillna()
DataFrame.fillna(value = None, method = None, axis = None, inplace = False, limit = None, downcast = None)

Let’s create a DataFrame

# Create DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame(({
   'Courses': ["Spark", 'Java', "Scala", 'Python'],
   'Fee': [20000, np.nan, 26000, 24000],
   'Duration': ['30days', '40days', 'NA', '40days'],
   'Discount': [1000, np.nan, 2500, None]
}))
print(df)

The above example filled all NaN values on the entire DataFrame. some times you would need to replace just on one column, you can do so by selecting the DataFrame column for fillna() method.

# fillna on one column
df2['Discount'] = df['Discount'].fillna('0')
print(df2)

# Outputs
# Courses Fee Duration Discount
#0   Spark  20000.0   30days   1000.0
# 1 Java None 40 days 0
#2   Scala  26000.0     None   2500.0
# 3 Python 24000.0 40 days 0

Use pandas fillna() method to fill a specified value on multiple DataFrame columns, the below example update columns Discount and Fee with 0 for NaN values.

# fillna() on multiple columns
df2[['Discount', 'Fee']] = df[['Discount', 'Fee']].fillna('0')
print(df2)

#Outputs
Courses Fee Duration Discount
0 Spark 20000.0 30 days 1000.0
1 Java 0 40 days 0
2 Scala 26000.0 None 2500.0
3 Python 24000.0 40 days 0

Suggestion : 5

fillna() method can be used to fill NaN values in the whole DataFrame, or specific columns, or modify inplace, or limit on the number of fillings, or choose an axis along which filling has to take place etc.,DataFrame.fillna() method fills(replaces) NA or NaN values in the DataFrame with the specified values.,In the following program, we shall create a DataFrame with values containing NaN. And we will use fillna() method to replace these NaN values with different values in different columns. We will pass the dictionary specifying these columns and values.,limit takes integer or None. This is the maximum number of consecutive NaN values to forward/backward fill. This argument is used only if method is specified.

The syntax of DataFrame.fillna() method is

DataFrame.fillna(self, value = None, method = None, axis = None, inplace = False, limit = None, downcast = None)→ Union[ForwardRef(‘DataFrame’), NoneType][source]

Python Program

import pandas as pd
import numpy as np

df = pd.DataFrame(
   [
      [np.nan, 72, 67],
      [23, 78, 62],
      [32, 74, np.nan],
      [np.nan, 54, 76]
   ],
   columns = ['a', 'b', 'c'])

df_result = df.fillna(0)

print('Original DataFrame\n', df)
print('\nResulting DataFrame\n', df_result)

Output

Original DataFrame
a b c
0 NaN 72 67.0
1 23.0 78 62.0
2 32.0 74 NaN
3 NaN 54 76.0

Resulting DataFrame
a b c
0 0.0 72 67.0
1 23.0 78 62.0
2 32.0 74 0.0
3 0.0 54 76.0