how do i extend a pandas dataframe by repeating the last row?

  • Last Update :
  • Techknowledgy :

Here's an alternate (fancy indexing) way to do it:

df.append(df.iloc[[-1] * 3])

Out[757]:
   A B C D
2014 - 01 - 01 1 0 0 0
2014 - 01 - 02 0 1 0 0
2014 - 01 - 03 0 0 1 0
2014 - 01 - 04 0 0 0 1
2014 - 01 - 04 0 0 0 1
2014 - 01 - 04 0 0 0 1
2014 - 01 - 04 0 0 0 1

You could use nested concat operations, the inner one will concatenate your last row 3 times and we then concatenate this with your orig df:

In[181]:

   dates = pd.date_range('1/1/2014', periods = 4)
df = pd.DataFrame(np.eye(4, 4), index = dates, columns = ['A', 'B', 'C', 'D'])
pd.concat([df, pd.concat([df[-1: ]] * 3)])
Out[181]:
   A B C D
2014 - 01 - 01 1 0 0 0
2014 - 01 - 02 0 1 0 0
2014 - 01 - 03 0 0 1 0
2014 - 01 - 04 0 0 0 1
2014 - 01 - 04 0 0 0 1
2014 - 01 - 04 0 0 0 1
2014 - 01 - 04 0 0 0 1

This could be put into a function like so:

In[182]:

   def repeatRows(d, n = 3):
   return pd.concat([d] * n)

pd.concat([df, repeatRows(df[-1: ], 3)])
Out[182]:
   A B C D
2014 - 01 - 01 1 0 0 0
2014 - 01 - 02 0 1 0 0
2014 - 01 - 03 0 0 1 0
2014 - 01 - 04 0 0 0 1
2014 - 01 - 04 0 0 0 1
2014 - 01 - 04 0 0 0 1
2014 - 01 - 04 0 0 0 1

Suggestion : 2

Repeat or replicate the rows of dataframe in pandas python (create duplicate rows) can be done in a roundabout way by using concat() function. Let’s see how to, Repeat or replicate the dataframe in pandas python.,Concat function repeats the dataframe in pandas with index. So index will also be repeated,Repeat or replicate the dataframe in pandas along with index.

First let’s create a dataframe

import pandas as pd
import numpy as np

#Create a DataFrame
df1 = {
   'State': ['Arizona AZ', 'Georgia GG', 'Newyork NY', 'Indiana IN', 'Florida FL'],
   'Score': [62, 47, 55, 74, 31]
}

df1 = pd.DataFrame(df1, columns = ['State', 'Score'])
print(df1)

Repeat the dataframe 3 times with concat function.  Ignore_index=True does not repeat the index. So new index will be created for the repeated columns

''
' Repeat without index '
''
df_repeated = pd.concat([df1] * 3, ignore_index = True)
print(df_repeated)

Concat function repeats the dataframe in pandas with index. So index will also be repeated

''
' Repeat with index'
''
df_repeated_with_index = pd.concat([df1] * 2)
print(df_repeated_with_index)

Suggestion : 3

Only consider certain columns for identifying duplicates, by default use all of the columns.,Return boolean Series denoting duplicate rows.,Considering certain columns is optional.,By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True.

>>> df = pd.DataFrame({
      ...'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
      ...'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
      ...'rating': [4, 4, 3.5, 15, 5]
         ...
   }) >>>
   df
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
>>> df.duplicated()
0 False
1 True
2 False
3 False
4 False
dtype: bool
>>> df.duplicated(keep = 'last')
0 True
1 False
2 False
3 False
4 False
dtype: bool
>>> df.duplicated(keep = False)
0 True
1 True
2 False
3 False
4 False
dtype: bool
>>> df.duplicated(subset = ['brand'])
0 False
1 True
2 False
3 True
4 True
dtype: bool

Suggestion : 4

In this article, you have learned how to drop/remove/delete duplicate rows using pandas.DataFrame.drop_duplicates(), DataFrame.apply() and lambda function with examples.,Below is the syntax of the DataFrame.drop_duplicates() function that removes duplicate rows from the pandas DataFrame.,By using pandas.DataFrame.drop_duplicates() method you can drop/remove/delete duplicate rows from DataFrame. Using this method you can drop duplicate rows on selected multiple columns or all columns. In this article, we’ll explain several ways of how to drop duplicate rows from Pandas DataFrame with examples by using functions like DataFrame.drop_duplicates(), DataFrame.apply() and lambda function with examples.,You can use DataFrame.drop_duplicates() without any arguments to drop rows with the same values on all columns. It takes defaults values subset=None and keep=‘first’. The below example returns four rows after removing duplicate rows in our DataFrame.

1._
# Below are quick example
# keep first duplicate row
df2 = df.drop_duplicates()

# Using DataFrame.drop_duplicates() to keep first duplicate row
df2 = df.drop_duplicates(keep = 'first')

# keep last duplicate row
df2 = df.drop_duplicates(keep = 'last')

# Remove all duplicate rows
df2 = df.drop_duplicates(keep = False)

# Delete duplicate rows based on specific columns
df2 = df.drop_duplicates(subset = ["Courses", "Fee"], keep = False)

# Drop duplicate rows in place
df.drop_duplicates(inplace = True)

# Using DataFrame.apply() and lambda
function
df2 = df.apply(lambda x: x.astype(str).str.lower()).drop_duplicates(subset = ['Courses', 'Fee'], keep = 'first')

Below is the syntax of the DataFrame.drop_duplicates() function that removes duplicate rows from the pandas DataFrame.

# Syntax of drop_duplicates
DataFrame.drop_duplicates(subset = None, keep = 'first', inplace = False, ignore_index = False)

Now, let’s create a DataFrame with a few duplicate rows on columns. Our DataFrame contains column names CoursesFeeDuration, and Discount.

import pandas as pd
import numpy as np
technologies = {
   'Courses': ["Spark", "PySpark", "Python", "pandas", "Python", "Spark", "pandas"],
   'Fee': [20000, 25000, 22000, 30000, 22000, 20000, 30000],
   'Duration': ['30days', '40days', '35days', '50days', '35days', '30days', '50days'],
   'Discount': [1000, 2300, 1200, 2000, 1200, 1000, 2000]
}
df = pd.DataFrame(technologies)
print(df)

You can use DataFrame.drop_duplicates() without any arguments to drop rows with the same values on all columns. It takes defaults values subset=None and keep=‘first’. The below example returns four rows after removing duplicate rows in our DataFrame.

# keep first duplicate row
df2 = df.drop_duplicates()
print(df2)

# Using DataFrame.drop_duplicates() to keep first duplicate row
df2 = df.drop_duplicates(keep = 'first')
print(df2)

Yields below output.

Courses Fee Duration Discount
0 Spark 20000 30 days 1000
1 PySpark 25000 40 days 2300
2 Python 22000 35 days 1200
3 pandas 30000 50 days 2000

Suggestion : 5

Pandas Dataframe provides a function dataframe.append() to add rows to a dataframe i.e.,We can pass a list of series too in the dataframe.append() for appending multiple rows in dataframe. For example, we can create a list of series with same column names as dataframe i.e.,We can also pass a series object to the append() function to append a new row to the dataframe i.e.,We can select a row from dataframe by its name using loc[] attribute and the pass the selected row as an argument to the append() function. It will add the that row to the another dataframe. Let’s see an example where we will select a row with index label ‘b’ and append it to another dataframe using append(). For example,

Pandas Dataframe provides a function dataframe.append() to add rows to a dataframe i.e.

DataFrame.append(other, ignore_index = False, verify_integrity = False, sort = None)
2._
    Name Age City Country
    a jack 34 Sydeny Australia
    b Riti 30 Delhi India
    c Vikas 31 Mumbai India
    d Neelu 32 Bangalore India
    e John 16 New York US
    f Mike 17 las vegas US

Let’s add a new row in above dataframe by passing dictionary i.e.

# Pass the row elements as key value pairs to append()
function
mod_df = df.append({
      'Name': 'Sahil',
      'Age': 22
   },
   ignore_index = True)

print('Modified Dataframe')
print(mod_df)

Complete example to add a dictionary as row to the dataframe is as follows,

import pandas as pd

# List of Tuples
students = [('jack', 34, 'Sydeny', 'Australia'),
   ('Riti', 30, 'Delhi', 'India'),
   ('Vikas', 31, 'Mumbai', 'India'),
   ('Neelu', 32, 'Bangalore', 'India'),
   ('John', 16, 'New York', 'US'),
   ('Mike', 17, 'las vegas', 'US')
]

#Create a DataFrame object
df = pd.DataFrame(students,
   columns = ['Name', 'Age', 'City', 'Country'],
   index = ['a', 'b', 'c', 'd', 'e', 'f'])

print('Original Dataframe')
print(df)

# Pass the row elements as key value pairs to append()
function
mod_df = df.append({
      'Name': 'Sahil',
      'Age': 22
   },
   ignore_index = True)

print('Modified Dataframe')
print(mod_df)

Output:

Original Dataframe
Name Age City Country
a jack 34 Sydeny Australia
b Riti 30 Delhi India
c Vikas 31 Mumbai India
d Neelu 32 Bangalore India
e John 16 New York US
f Mike 17 las vegas US
Modified Dataframe
Name Age City Country
0 jack 34 Sydeny Australia
1 Riti 30 Delhi India
2 Vikas 31 Mumbai India
3 Neelu 32 Bangalore India
4 John 16 New York US
5 Mike 17 las vegas US
6 Sahil 22 NaN NaN