You could use groupby
to split by groups and append
the rows in a list comprehension before merging again with contact
:
df2 = pd.concat([d.append(pd.Series([None, 4], index = ['Id', 'type']), ignore_index = True)
for _, d in df.groupby('Id')
], ignore_index = True).iloc[: -1]
If the index is sorted, another option is to find the index of the last item per group and use it to generate the new rows:
# get index of last item per group(except last) idx = df.index.to_series().groupby(df['Id']).last().values[: -1] # craft a DataFrame with the new rows d = pd.DataFrame([ [None, 4] ] * len(idx), columns = df.columns, index = idx) # concatenate and reorder pd.concat([df, d]).sort_index().reset_index(drop = True)
output:
Id type 0 1.0 car 1 1.0 track 2 NaN 4.0 3 2.0 train 4 2.0 plane 5 NaN 4.0 6 3.0 car
You can do this:
df = pd.read_csv('input.csv', sep = ";")
Id type
0 1 car
1 1 track
2 2 train
3 2 plane
4 3 car
df = pd.read_csv('input.csv', sep=";")
Id type
0 1 car
1 1 track
2 2 train
3 2 plane
4 3 car
mask = df['Id'].ne(df['Id'].shift(-1))
df1 = pd.DataFrame('4', index = mask.index[mask] + .5, columns = df.columns)
df1['Id'] = df['Id'].replace({
'4': ' '
})
df = pd.concat([df, df1]).sort_index().reset_index(drop = True).iloc[: -1]
which gives:
Id type 0 1.0 car 1 1.0 track 2 NaN 4 3 2.0 train 4 2.0 plane 5 NaN 4 6 3.0 car
You can do:
In[244]: grp = df.groupby('Id')
In[256]: res = pd.DataFrame()
In[257]: for x, y in grp:
...: if y['type'].count() > 1:
...: tmp = y.append(pd.DataFrame({
'Id': [''],
'type': [4]
}))
...: res = res.append(tmp)
...:
else:
...: res = res.append(y)
...:
In[258]: res
Out[258]:
Id type
0 1 car
1 1 track
0 4
2 2 train
3 2 plane
0 4
4 3 car
Please find the solution below using index :
# # # # # # Create a shift variable to compare index df['idshift'] = df['Id'].shift(1) # When shift id does not match id, mean change index change_index = df.index[df['idshift'] != df['Id']].tolist() change_index # Loop through all the change index and insert at index for i in change_index[1: ]: line = pd.DataFrame({ "Id": ' ', "rate": 4 }, index = [(i - 1) + .5]) df = df.append(line, ignore_index = False) # finallt sort the index df = df.sort_index().reset_index(drop = True)
Input Dataframe :
df = pd.DataFrame({
'Id': [1, 1, 2, 2, 3, 3, 3, 4],
'rate': [1, 2, 3, 10, 12, 16, 10, 12]
})
Select Rows based on value in column,Select Rows based on any of the multiple values in column,Select Rows based on any of the multiple conditions on column,Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’,
First let’s create a DataFrame,
# List of Tuples students = [('jack', 'Apples', 34), ('Riti', 'Mangos', 31), ('Aadi', 'Grapes', 30), ('Sonia', 'Apples', 32), ('Lucy', 'Mangos', 33), ('Mike', 'Apples', 35) ] #Create a DataFrame object dfObj = pd.DataFrame(students, columns = ['Name', 'Product', 'Sale'])
Name Product Sale 0 jack Apples 34 1 Riti Mangos 31 2 Aadi Grapes 30 3 Sonia Apples 32 4 Lucy Mangos 33 5 Mike Apples 35
Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’,
subsetDataFrame = dfObj[dfObj['Product'] == 'Apples']
If we pass this series object to [] operator of DataFrame, then it will return a new DataFrame with only those rows that has True in the passed Series object i.e.
dfObj[dfObj['Product'] == 'Apples']
Select rows in above DataFrame for which ‘Product‘ column contains either ‘Grapes‘ or ‘Mangos‘ i.e
subsetDataFrame = dfObj[dfObj['Product'].isin(['Mangos', 'Grapes'])]
In our last example for today, we’ll select one or multiple rows only if they match a specific list of values.,As we typically do, we’ll start by creating a DataFrame that you can use to follow along this example in your own computer. Enter the code below into your Python Data Analysis development environment:,First case will be to filter our DataFrame according to rows containing specific values. Initially we’ll use a simple condition as an example:,Could be that you might want to select row values that matches a specific string pattern. In the example below, we’ll filter only entries in the language column which starts with the letter K.
As we typically do, we’ll start by creating a DataFrame that you can use to follow along this example in your own computer. Enter the code below into your Python Data Analysis development environment:
import pandas as pd
month = ['June', 'August', 'February', 'July']
language = ['Java', 'Kotlin', 'PHP', 'Python']
first_interview = (78, 93, 76, 89)
hr_dict = dict(month = month, language = language, interview_1 = first_interview)
hr_df = pd.DataFrame(data = hr_dict)
print(hr_df.head())
First case will be to filter our DataFrame according to rows containing specific values. Initially we’ll use a simple condition as an example:
# select rows by simple condition condition = (hr_df['language'] == 'Python') hr_df[condition]
Using the OR operator
# multiple conditions(OR) condition = (hr_df['language'] == 'Python') | (hr_df['month'] == 'June') hr_df[condition]
Could be that you might want to select row values that matches a specific string pattern. In the example below, we’ll filter only entries in the language column which starts with the letter K.
#select rows column value starts with
condition = (hr_df['language'].str.startswith('K'))
hr_df[condition]
In our last example for today, we’ll select one or multiple rows only if they match a specific list of values.
# find rows by condition in list #define the list of values lang_lst = ['PHP', 'Python'] #subset the dataframe hr_df[hr_df['language'].isin(lang_lst)]
DataFrame.apply() method is used to apply the lambda function row-by-row and return the rows that matched with the values., In case if you wanted to drop rows that have None or nan values, use DataFrame.dropna() method.,pandas.DataFrame.apply() – To custom select using lambda function.,You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.
# filter Rows Based on condition df[df["Courses"] == 'Spark'] df.loc[df['Courses'] == value] df.query("Courses == 'Spark'") df.loc[df['Courses'] != 'Spark'] df.loc[df['Courses'].isin(values)] df.loc[~df['Courses'].isin(values)] # filter Multiple Conditions using Multiple Columns df.loc[(df['Discount'] >= 1000) & (df['Discount'] <= 2000)] df.loc[(df['Discount'] >= 1200) & (df['Fee'] >= 23000)] # Using lambda function df.apply(lambda row: row[df['Courses'].isin(['Spark', 'PySpark'])]) # filter columns that have no None & nana values df.dropna() # Other examples df[df['Courses'].str.contains("Spark")] df[df['Courses'].str.lower().str.contains("spark")] df[df['Courses'].str.startswith("P")]
If you are a learner, let’s see with sample data and run through these examples and explore the output to understand better. First, let’s create a pandas DataFrame from Dictionary.
import pandas as pd
import numpy as np
technologies = {
'Courses': ["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
'Fee': [22000, 25000, 23000, 24000, 26000],
'Duration': ['30days', '50days', '30days', None, np.nan],
'Discount': [1000, 2300, 1000, 1200, 2500]
}
df = pd.DataFrame(technologies)
print(df)
df2 = df[df["Courses"] == 'Spark']
print(df2)
You can also write the above statement with a variable.
value = "Spark"
df2 = df[df["Courses"] == value]
df[df["Courses"] != 'Spark']