how to insert row values that meet a condition in pandas.dataframe

  • Last Update :
  • Techknowledgy :

You could use groupby to split by groups and append the rows in a list comprehension before merging again with contact:

df2 = pd.concat([d.append(pd.Series([None, 4], index = ['Id', 'type']), ignore_index = True)
   for _, d in df.groupby('Id')
], ignore_index = True).iloc[: -1]

If the index is sorted, another option is to find the index of the last item per group and use it to generate the new rows:

# get index of last item per group(except last)
idx = df.index.to_series().groupby(df['Id']).last().values[: -1]

# craft a DataFrame with the new rows
d = pd.DataFrame([
   [None, 4]
] * len(idx), columns = df.columns, index = idx)

# concatenate and reorder
pd.concat([df, d]).sort_index().reset_index(drop = True)

output:

    Id type
    0 1.0 car
    1 1.0 track
    2 NaN 4.0
    3 2.0 train
    4 2.0 plane
    5 NaN 4.0
    6 3.0 car

You can do this:

df = pd.read_csv('input.csv', sep = ";")
Id type
0 1 car
1 1 track
2 2 train
3 2 plane
4 3 car
df = pd.read_csv('input.csv', sep=";")
 Id   type
0   1    car
1   1  track
2   2  train
3   2  plane
4   3    car

mask = df['Id'].ne(df['Id'].shift(-1))
df1 = pd.DataFrame('4', index = mask.index[mask] + .5, columns = df.columns)
df1['Id'] = df['Id'].replace({
   '4': ' '
})
df = pd.concat([df, df1]).sort_index().reset_index(drop = True).iloc[: -1]

which gives:

 Id type
 0 1.0 car
 1 1.0 track
 2 NaN 4
 3 2.0 train
 4 2.0 plane
 5 NaN 4
 6 3.0 car​

You can do:

In[244]: grp = df.groupby('Id')
In[256]: res = pd.DataFrame()

In[257]: for x, y in grp:
   ...: if y['type'].count() > 1:
   ...: tmp = y.append(pd.DataFrame({
      'Id': [''],
      'type': [4]
   }))
   ...: res = res.append(tmp)
   ...:
   else:
      ...: res = res.append(y)
      ...:

      In[258]: res
Out[258]:
   Id type
0 1 car
1 1 track
0 4
2 2 train
3 2 plane
0 4
4 3 car

Please find the solution below using index :

   # # # # # # Create a shift variable to compare index
   df['idshift'] = df['Id'].shift(1)
   # When shift id does not match id, mean change index
   change_index = df.index[df['idshift'] != df['Id']].tolist()
   change_index
   # Loop through all the change index and insert at index
   for i in change_index[1: ]:
      line = pd.DataFrame({
         "Id": ' ',
         "rate": 4
      }, index = [(i - 1) + .5])
   df = df.append(line, ignore_index = False)
   # finallt sort the index
   df = df.sort_index().reset_index(drop = True)

Input Dataframe :

df = pd.DataFrame({
   'Id': [1, 1, 2, 2, 3, 3, 3, 4],
   'rate': [1, 2, 3, 10, 12, 16, 10, 12]
})

Suggestion : 2

Select Rows based on value in column,Select Rows based on any of the multiple values in column,Select Rows based on any of the multiple conditions on column,Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’,

First let’s create a DataFrame,

# List of Tuples
students = [('jack', 'Apples', 34),
   ('Riti', 'Mangos', 31),
   ('Aadi', 'Grapes', 30),
   ('Sonia', 'Apples', 32),
   ('Lucy', 'Mangos', 33),
   ('Mike', 'Apples', 35)
]

#Create a DataFrame object
dfObj = pd.DataFrame(students, columns = ['Name', 'Product', 'Sale'])

    Name Product Sale
    0 jack Apples 34
    1 Riti Mangos 31
    2 Aadi Grapes 30
    3 Sonia Apples 32
    4 Lucy Mangos 33
    5 Mike Apples 35

Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’,

subsetDataFrame = dfObj[dfObj['Product'] == 'Apples']

If we pass this series object to [] operator of DataFrame, then it will return a new DataFrame with only those rows that has True in the passed Series object i.e.

dfObj[dfObj['Product'] == 'Apples']

Select rows in above DataFrame for which ‘Product‘ column contains either ‘GrapesorMangos‘ i.e

subsetDataFrame = dfObj[dfObj['Product'].isin(['Mangos', 'Grapes'])]

Suggestion : 3

In our last example for today, we’ll select one or multiple rows only if they match a specific list of values.,As we typically do, we’ll start by creating a DataFrame that you can use to follow along this example in your own computer. Enter the code below into your Python Data Analysis development environment:,First case will be to filter our DataFrame according to rows containing specific values. Initially we’ll use a simple condition as an example:,Could be that you might want to select row values that matches a specific string pattern. In the example below, we’ll filter only entries in the language column which starts with the letter K.

As we typically do, we’ll start by creating a DataFrame that you can use to follow along this example in your own computer. Enter the code below into your Python Data Analysis development environment:

import pandas as pd

month = ['June', 'August', 'February', 'July']
language = ['Java', 'Kotlin', 'PHP', 'Python']
first_interview = (78, 93, 76, 89)
hr_dict = dict(month = month, language = language, interview_1 = first_interview)
hr_df = pd.DataFrame(data = hr_dict)

print(hr_df.head())

First case will be to filter our DataFrame according to rows containing specific values. Initially we’ll use a simple condition as an example:

# select rows by simple condition
condition = (hr_df['language'] == 'Python')
hr_df[condition]

Using the OR operator

# multiple conditions(OR)

condition = (hr_df['language'] == 'Python') | (hr_df['month'] == 'June')
hr_df[condition]

Could be that you might want to select row values that matches a specific string pattern. In the example below, we’ll filter only entries in the language column which starts with the letter K.

#select rows column value starts with
condition = (hr_df['language'].str.startswith('K'))
hr_df[condition]

In our last example for today, we’ll select one or multiple rows only if they match a specific list of values.

# find rows by condition in list

#define the list of values
lang_lst = ['PHP', 'Python']
#subset the dataframe
hr_df[hr_df['language'].isin(lang_lst)]

Suggestion : 4

DataFrame.apply() method is used to apply the lambda function row-by-row and return the rows that matched with the values., In case if you wanted to drop rows that have None or nan values, use DataFrame.dropna() method.,pandas.DataFrame.apply() – To custom select using lambda function.,You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.

1._
# filter Rows Based on condition
df[df["Courses"] == 'Spark']
df.loc[df['Courses'] == value]
df.query("Courses == 'Spark'")
df.loc[df['Courses'] != 'Spark']
df.loc[df['Courses'].isin(values)]
df.loc[~df['Courses'].isin(values)]

# filter Multiple Conditions using Multiple Columns
df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)]
df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000)]

# Using lambda
function
df.apply(lambda row: row[df['Courses'].isin(['Spark', 'PySpark'])])

# filter columns that have no None & nana values
df.dropna()

# Other examples
df[df['Courses'].str.contains("Spark")]
df[df['Courses'].str.lower().str.contains("spark")]
df[df['Courses'].str.startswith("P")]

If you are a learner, let’s see with sample data and run through these examples and explore the output to understand better. First, let’s create a pandas DataFrame from Dictionary.

import pandas as pd
import numpy as np

technologies = {
   'Courses': ["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
   'Fee': [22000, 25000, 23000, 24000, 26000],
   'Duration': ['30days', '50days', '30days', None, np.nan],
   'Discount': [1000, 2300, 1000, 1200, 2500]
}
df = pd.DataFrame(technologies)
print(df)
3._
df2 = df[df["Courses"] == 'Spark']
print(df2)

You can also write the above statement with a variable.

value = "Spark"
df2 = df[df["Courses"] == value]
6._
df[df["Courses"] != 'Spark']