fill empty pandas column based on condition on others columns

  • Last Update :
  • Techknowledgy :

Dataframe needs to be indexed and accessed differently:

df['foo'] = ''
df.loc[(df['Name'] == 'tom') & (df['Age'] == 10), 'foo'] = 'x1'
df['foo'] = np.where((df['Name'] == 'tom') & (df['Age'] == 10), 'x1', '')

According to error ,here you are comparing whole series df['Name'] with 'tom' and same with other condition. you have to write this condition for every value pandas series. For that you can use apply function.

def new_column(df1):
   if (df1['Name'] == 'tom'
      and df1['Age'] == 10):
      df1['foo'] = 'x1'

df.apply(new_column)

Suggestion : 2

According to error ,here you are comparing whole series df['Name'] with 'tom' and same with other condition. you have to write this condition for every value pandas series. For that you can use apply function.,how do I sum each column based on condition of another column without iterating over the columns in pandas datframe,Creating a new column based on condition with values from another column in python,Change the value of a pandas dataframe column based on a condition ,also depending on other columns of the dataframe

According to error ,here you are comparing whole series df['Name'] with 'tom' and same with other condition. you have to write this condition for every value pandas series. For that you can use apply function.

def new_column(df1):
   if (df1['Name'] == 'tom'
      and df1['Age'] == 10):
      df1['foo'] = 'x1'

df.apply(new_column)
df['foo'] = np.where((df['Name'] == 'tom') & (df['Age'] == 10), 'x1', '')

Dataframe needs to be indexed and accessed differently:

df['foo'] = ''
df.loc[(df['Name'] == 'tom') & (df['Age'] == 10), 'foo'] = 'x1'

Suggestion : 3

Suppose I have the following toy dataframe:,I want to fill the empty column based on anycodings_python conditions on the other two columns. For anycodings_python example:,and I create an empty column which I want to anycodings_python fill later:,In alert it is showing the data but not in home screen

Suppose I have the following toy dataframe:

# Import pandas library
import pandas as pd
# initialize list of lists
data = [
   ['tom', 10],
   ['nick', 15],
   ['juli', 14]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# print dataframe.
df

and I create an empty column which I want to anycodings_python fill later:

df['foo'] = df.apply(lambda _: '', axis = 1)

I want to fill the empty column based on anycodings_python conditions on the other two columns. For anycodings_python example:

 if (df['Name'] == 'tom'
    and df['Age'] == 10):
    df['foo'] = 'x1'

Dataframe needs to be indexed and anycodings_python accessed differently:

df['foo'] = ''
df.loc[(df['Name'] == 'tom') & (df['Age'] == 10), 'foo'] = 'x1'
df['foo'] = np.where((df['Name'] == 'tom') & (df['Age'] == 10), 'x1', '')

According to error ,here you are anycodings_python comparing whole series df['Name'] with anycodings_python 'tom' and same with other condition. you anycodings_python have to write this condition for every anycodings_python value pandas series. For that you can anycodings_python use apply function.

def new_column(df1):
   if (df1['Name'] == 'tom'
      and df1['Age'] == 10):
      df1['foo'] = 'x1'

df.apply(new_column)

Suggestion : 4

1 week ago Web Aug 15, 2019  · Fill empty pandas column based on condition on others columns. # Import pandas library import pandas as pd # initialize list of lists data = [ ['tom', 10], ['nick', 15], ['juli', 14]] # Create the pandas DataFrame df = pd.DataFrame (data, columns = ['Name', 'Age']) # print dataframe. df. , 6 days ago Web Aug 09, 2021  · Pandas’ loc creates a boolean mask, based on a condition. Sometimes, that condition can just be selecting rows and columns, but it can also be used to filter dataframes. These filtered dataframes can then have values applied to them. df.loc [df [‘column’] condition, ‘new column name’] = ‘value if condition is met’. , 1 week ago Web Jul 01, 2020  · While this is a very superficial analysis, we’ve accomplished our true goal here: adding columns to pandas DataFrames based on conditional statements about values in our existing columns. Of course, this is a task that can be accomplished in a wide variety of ways. np.where() and np.select() are just two of many potential approaches. , 1 week ago Web Oct 21, 2021  · Using apply() method. If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas.DataFrame.apply() method should do the trick.. For example, you can define your own method and then pass it to the apply() method. Let’s …


# Import pandas library
import pandas as pd # initialize list of lists data = [
   ['tom', 10],
   ['nick', 15],
   ['juli', 14]
] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Age']) # print dataframe.df
# Import pandas library
import pandas as pd # initialize list of lists data = [
   ['tom', 10],
   ['nick', 15],
   ['juli', 14]
] # Create the pandas DataFrame df = pd.DataFrame(data, columns = ['Name', 'Age']) # print dataframe.df
df['foo'] = df.apply(lambda _: '', axis = 1)
 if (df['Name'] == 'tom'
    and df['Age'] == 10): df['foo'] = 'x1'
df['foo'] = ''
df.loc[(df['Name'] == 'tom') & (df['Age'] == 10), 'foo'] = 'x1'
df['foo'] = np.where((df['Name'] == 'tom') & (df['Age'] == 10), 'x1', '')

Suggestion : 5

Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.,Dicts can be used to specify different replacement values for different existing values. For example, {'a': 'b', 'y': 'z'} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way the value parameter should be None.,numeric: numeric values equal to to_replace will be replaced with value,regex: regexs matching to_replace will be replaced with value

>>> s = pd.Series([1, 2, 3, 4, 5]) >>>
   s.replace(1, 5)
0 5
1 2
2 3
3 4
4 5
dtype: int64
>>> df = pd.DataFrame({
      'A': [0, 1, 2, 3, 4],
      ...'B': [5, 6, 7, 8, 9],
      ...'C': ['a', 'b', 'c', 'd', 'e']
   }) >>>
   df.replace(0, 5)
A B C
0 5 5 a
1 1 6 b
2 2 7 c
3 3 8 d
4 4 9 e
>>> df.replace([0, 1, 2, 3], 4)
A B C
0 4 5 a
1 4 6 b
2 4 7 c
3 4 8 d
4 4 9 e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
A B C
0 4 5 a
1 3 6 b
2 2 7 c
3 1 8 d
4 4 9 e
>>> s.replace([1, 2], method = 'bfill')
0 3
1 3
2 3
3 4
4 5
dtype: int64
>>> df.replace({
   0: 10,
   1: 100
})
A B C
0 10 5 a
1 100 6 b
2 2 7 c
3 3 8 d
4 4 9 e

Suggestion : 6

Use pandas fillna() method to fill a specified value on multiple DataFrame columns, the below example update columns Discount and Fee with 0 for NaN values.,pandas.DataFrame.fillna() method is used to fill column (one or multiple columns) contains NA/NaN/None with 0, empty, blank or any specified values e.t.c. NaN is considered a missing value. When you dealing with machine learning handling missing values is very important, not handling these will result in a side effect with an incorrect result.,In this article, you have learned DataFrame fillna() method to fill one column, multiple columns containing NaN with a specified value. Also learned to replace different values for each column.,The above example filled all NaN values on the entire DataFrame. some times you would need to replace just on one column, you can do so by selecting the DataFrame column for fillna() method.

1._
# fillna() on all columns
df2 = df.fillna('None')

# fillna() on once column
df2['Discount'] = df['Discount'].fillna(0)

# fillna() on multiple columns
df2[['Discount', 'Fee']] = df[['Discount', 'Fee']].fillna(0)

# fillna() on multiple columns with different values
df2 = df.fillna(value = {
   'Discount': 0,
   'Fee': 10000
})

# fill with limit
df2 = df.fillna(value = {
   'Discount': 0,
   'Fee': 0
}, limit = 1)

Below is the syntax of pandas.DataFrame.fillna() method. This takes parameters value, method, axis, inplace, limit, and downcast and returns a new DataFrame. When inplace=True is used, it returns None as the replace happens on the existing DataFrame object.

# Syntax of pandas.DataFrame.fillna()
DataFrame.fillna(value = None, method = None, axis = None, inplace = False, limit = None, downcast = None)

Let’s create a DataFrame

# Create DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame(({
   'Courses': ["Spark", 'Java', "Scala", 'Python'],
   'Fee': [20000, np.nan, 26000, 24000],
   'Duration': ['30days', '40days', 'NA', '40days'],
   'Discount': [1000, np.nan, 2500, None]
}))
print(df)

The above example filled all NaN values on the entire DataFrame. some times you would need to replace just on one column, you can do so by selecting the DataFrame column for fillna() method.

# fillna on one column
df2['Discount'] = df['Discount'].fillna('0')
print(df2)

# Outputs
# Courses Fee Duration Discount
#0   Spark  20000.0   30days   1000.0
# 1 Java None 40 days 0
#2   Scala  26000.0     None   2500.0
# 3 Python 24000.0 40 days 0

Use pandas fillna() method to fill a specified value on multiple DataFrame columns, the below example update columns Discount and Fee with 0 for NaN values.

# fillna() on multiple columns
df2[['Discount', 'Fee']] = df[['Discount', 'Fee']].fillna('0')
print(df2)

#Outputs
Courses Fee Duration Discount
0 Spark 20000.0 30 days 1000.0
1 Java 0 40 days 0
2 Scala 26000.0 None 2500.0
3 Python 24000.0 40 days 0

Suggestion : 7

To replace values in column based on condition in a Pandas DataFrame, you can use DataFrame.loc property, or numpy.where(), or DataFrame.where().,To replace a values in a column based on a condition, using numpy.where, use the following syntax.,To replace a values in a column based on a condition, using DataFrame.loc, use the following syntax.,In the following program, we will use DataFrame.where() method and replace those values in the column ‘a’ that satisfy the condition that the value is less than zero.

To replace a values in a column based on a condition, using DataFrame.loc, use the following syntax.

DataFrame.loc[condition, column_name] = new_value

Python Program

import pandas as pd

df = pd.DataFrame([
      [-10, -9, 8],
      [6, 2, -4],
      [-8, 5, 1]
   ],
   columns = ['a', 'b', 'c'])

df.loc[(df.a < 0), 'a'] = 0
print(df)

Output

   a b c
   0 0 - 9 8
   1 6 2 - 4
   2 0 5 1

To replace a values in a column based on a condition, using numpy.where, use the following syntax.

DataFrame['column_name'] = numpy.where(condition, new_value, DataFrame.column_name)