If need change value of column in DataFrame
is necessary DataFrame.loc
with condition and column name:
df.loc[df['B'] % 2 == 0, 'C'] = 5
print(df)
A B C
0 1 2 5
1 4 5 6
2 7 8 5
3 10 11 12
You could just change the order to:
df['C'][df['B'] % 2 == 0] = 5
Using numpy where
df['C'] = np.where(df['B'] % 2 == 0, 5, df['C'])
Output
A B C
0 1 2 5
1 4 5 6
2 7 8 5
3 10 11 12
pandas.DataFrame.apply() method is used to apply the expression row-by-row and return the rows that matched the values.,pandas support several ways to filter by column value, DataFrame.query() method is the most used to filter the rows based on the expression and returns a new DataFrame after applying the column filter. In case you wanted to update the existing or referring DataFrame use inplace=True argument. Alternatively, you can also use DataFrame[] with loc[] and DataFrame.apply().,DataFrame.query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param., In case you wanted to filter and ignore rows that have None or nan on column values, use DataFrame.dropna() method.
# Filter Rows using DataFrame.query() df2 = df.query("Courses == 'Spark'") #Using variable value = 'Spark' df2 = df.query("Courses == @value") #inpace df.query("Courses == 'Spark'", inplace = True) #Not equals, in & multiple conditions df.query("Courses != 'Spark'") df.query("Courses in ('Spark','PySpark')") df.query("`Courses Fee` >= 23000") df.query("`Courses Fee` >= 23000 and `Courses Fee` <= 24000") # Other ways to Filter Rows df.loc[df['Courses'] == value] df.loc[df['Courses'] != 'Spark'] df.loc[df['Courses'].isin(values)] df.loc[~df['Courses'].isin(values)] df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)] df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000)] df[df["Courses"] == 'Spark'] df[df['Courses'].str.contains("Spark")] df[df['Courses'].str.lower().str.contains("spark")] df[df['Courses'].str.startswith("P")] df.apply(lambda row: row[df['Courses'].isin(['Spark', 'PySpark'])]) df.dropna()
If you are a learner, Let’s see with sample data and run through these examples and explore the output to understand better. First, let’s create a pandas DataFrame from Dictionary.
import pandas as pd
import numpy as np
technologies = {
'Courses': ["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
'Fee': [22000, 25000, 23000, 24000, 26000],
'Duration': ['30days', '50days', '30days', None, np.nan],
'Discount': [1000, 2300, 1000, 1200, 2500]
}
df = pd.DataFrame(technologies)
print(df)
# Filter all rows with Courses rquals 'Spark' df2 = df.query("Courses == 'Spark'") print(df2)
In case you wanted to use a variable in the expression, use @ character
.
# Filter Rows by using Python variable value = 'Spark' df2 = df.query("Courses == @value") print(df2)
# Replace current esisting DataFrame df.query("Courses == 'Spark'", inplace = True) print(df)
Mar 18, 2022 , Mar 14, 2022 , Mar 17, 2022 , May 31, 2022
num_df.loc[num_df['a'] == 2]
February 22, 2018 by cmdline
Let us first load gapminder data as a dataframe into pandas.
# load pandas import pandas as pd data_url = 'http://bit.ly/2cLzoxH' # read data from url as pandas dataframe gapminder = pd.read_csv(data_url)
This data frame has over 6000 rows and 6 columns. One of the columns is year. Let us look at the first three rows of the data frame.
print(gapminder.head(3)) country year pop continent lifeExp gdpPercap 0 Afghanistan 1952 8425333.0 Asia 28.801 779.445314 1 Afghanistan 1957 9240934.0 Asia 30.332 820.853030 2 Afghanistan 1962 10267083.0 Asia 31.997 853.100710
For example, let us filter the dataframe or subset the dataframe based on year’s value 2002. This conditional results in a boolean variable that has True when the value of year equals 2002, False otherwise.
# does year equals to 2002 ? # is_2002 is a boolean variable with True or False in it > is_2002 = gapminder['year'] == 2002
We can then use this boolean variable to filter the dataframe. After subsetting we can see that new dataframe is much smaller in size.
# filter rows for year 2002 using the boolean variable > gapminder_2002 = gapminder[is_2002]
Checking the shape or dimension of the filtered dataframe
> print(gapminder_2002.shape) (142, 6)