python(pandas) fills blanks cells

  • Last Update :
  • Techknowledgy :

You can also replace blank values with NAN with DataFrame.mask() methods. The mask() method replaces the values of the rows where the condition evaluates to True.,You can replace blank/empty values with DataFrame.replace() methods. The replace() method replaces the specified value with another specified value on a specified column or on all columns of a DataFrame; replaces every case of the specified value.,You can replace black values or empty string with NAN in pandas DataFrame by using DataFrame.replace(), DataFrame.apply(), and DataFrame.mask() methods. In this article, I will explain how to replace blank values with NAN on the entire DataFrame and selected columns with some examples,Another method to replace blank values with NAN is by using DataFrame.apply() method and lambda functions. The apply() method allows you to apply a function along with one of the axis of the DataFrame, default 0, which is the index (row) axis.

If you are in hurry, below are some quick examples of how to replace blank values or empty string with NAN on pandas DataFrame.

# Below are some quick examples.
# Replace Blank values with DataFrame.replace() methods.
df2 = df.replace(r '^\s*$', np.nan, regex = True)

# Using DataFrame.mask() method.
df2 = df.mask(df == '')

# Replace on single column
df2 = df.Courses.replace('', np.nan, regex = True)

# Replace on all selected columns
df2 = df[['Courses', 'Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)
2._
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies = {
   'Courses': ["Spark", "", "Spark", "", "PySpark"],
   'Fee': [22000, 25000, 23000, 24000, 26000],
   'Duration': ['30days', '', '30days', '', '35days']
}
df = pd.DataFrame(technologies)
print(df)

Yields below output.

Courses Fee Duration
0 Spark 22000 30 days
1 25000
2 Spark 23000 30 days
3 24000
4 PySpark 26000 35 days
5._
Courses Fee Duration
0 Spark 22000 30 days
1 NaN 25000 NaN
2 Spark 23000 30 days
3 NaN 24000 NaN
4 PySpark 26000 35 days

You can also replace blank values with NAN with DataFrame.mask() methods. The mask() method replaces the values of the rows where the condition evaluates to True.

# Using DataFrame.mask() method.
df2 = df.mask(df == '')
print(df2)

Suggestion : 2

I think df.replace() does the job, since pandas 0.13:

df = pd.DataFrame([
   [-0.532681, 'foo', 0],
   [1.490752, 'bar', 1],
   [-1.387326, 'foo', 2],
   [0.814772, 'baz', ' '],
   [-0.222552, '   ', 4],
   [-1.176781, 'qux', '  '],
], columns = 'A B C'.split(), index = pd.date_range('2000-01-01', '2000-01-06'))

# replace field that 's entirely space (or empty) with NaN
print(df.replace(r '^\s*$', np.nan, regex = True))

Produces:

                   A B C
                   2000 - 01 - 01 - 0.532681 foo 0
                   2000 - 01 - 02 1.490752 bar 1
                   2000 - 01 - 03 - 1.387326 foo 2
                   2000 - 01 - 04 0.814772 baz NaN
                   2000 - 01 - 05 - 0.222552 NaN 4
                   2000 - 01 - 06 - 1.176781 qux NaN

If you want to replace an empty string and records with only spaces, the correct answer is!:

df = df.replace(r '^\s*$', np.nan, regex = True)

The accepted answer

df.replace(r '\s+', np.nan, regex = True)

Does not replace an empty string!, you can try yourself with the given example slightly updated:

df = pd.DataFrame([
   [-0.532681, 'foo', 0],
   [1.490752, 'bar', 1],
   [-1.387326, 'fo o', 2],
   [0.814772, 'baz', ' '],
   [-0.222552, '   ', 4],
   [-1.176781, 'qux', ''],
], columns = 'A B C'.split(), index = pd.date_range('2000-01-01', '2000-01-06'))

How about:

d = d.applymap(lambda x: np.nan
   if isinstance(x, basestring) and x.isspace()
   else x)

I did this:

df = df.apply(lambda x: x.str.strip()).replace('', np.nan)

or

df = df.apply(lambda x: x.str.strip() if isinstance(x, str)
   else x).replace('', np.nan)

If you are exporting the data from the CSV file it can be as simple as this :

df = pd.read_csv(file_csv, na_values = ' ')

Simplest of all solutions:

df = df.replace(r '^\s+$', np.nan, regex = True)

Suggestion : 3

In this example, I’ll show how to convert blank cells in a pandas DataFrame to NaN values.,In this Python tutorial you have learned how to replace and set empty character strings in a pandas DataFrame by NaNs. Tell me about it in the comments section, if you have any further questions.,The article consists of one example for the replacement of empty cells in a pandas DataFrame by NaN values. To be more specific, the tutorial contains this content:,The previous output of the Python console shows the structure of the example data – A pandas DataFrame where some of the cells are empty. Note that some of these empty cells contain multiple white spaces.

import pandas as pd # Import pandas library
data = pd.DataFrame({
   'x1': [1, '', '   ', 2, 3],
   # Create example DataFrame 'x2': ['', '', 'a', 'b', 'c'],
   'x3': ['    ', 'a', 'b', 'c', 'd']
})
print(data) # Print example DataFrame
# x1 x2 x3
# 0 1
# 1 a
# 2 a b
# 3 2 b c
# 4 3 c d
data_new = data.copy() # Create duplicate of example data
data_new = data_new.replace(r '^s*$', float('NaN'), regex = True) # Replace blanks by NaN
print(data_new) # Print updated data
# x1 x2 x3
# 0 1.0 NaN NaN
# 1 NaN NaN a
# 2 NaN a b
# 3 2.0 b c
# 4 3.0 c d

Suggestion : 4

You can use the following syntax to replace empty strings with NaN values in pandas:,Related: How to Replace NaN Values with String in Pandas,We can use the following syntax to replace these empty strings with NaN values:,Notice that each of the empty strings have been replaced with NaN.

You can use the following syntax to replace empty strings with NaN values in pandas:

df = df.replace(r '^\s*$', np.nan, regex = True)

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({
   'team': ['A', 'B', ' ', 'D', 'E', ' ', 'G', 'H'],
   'position': [' ', 'G', 'G', 'F', 'F', ' ', 'C', 'C'],
   'points': [5, 7, 7, 9, 12, 9, 9, 4],
   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]
})

#view DataFrame
df

team position points rebounds
0 A 5 11
1 B G 7 8
2 G 7 10
3 D F 9 6
4 E F 12 6
5 9 5
6 G C 9 9
7 H C 4 12

We can use the following syntax to replace these empty strings with NaN values:

import numpy as np

#replace empty values with NaN
df = df.replace(r '^\s*$', np.nan, regex = True)

#view updated DataFrame
df

team position points rebounds
0 A NaN 5 11
1 B G 7 8
2 NaN G 7 10
3 D F 9 6
4 E F 12 6
5 NaN NaN 9 5
6 G C 9 9
7 H C 4 127