cumsum()
method calculates the cumulative sum of a Pandas column. You are looking for that applied to the grouped words
. Therefore:
In[303]: df_2['cumsum'] = df_2.groupby(['words'])['sum'].cumsum()
In[304]: df_2
Out[304]:
index lodgement_year words sum cum_sum cumsum
0 0 2000 the 14 14 14
1 1 2000 australia 10 10 10
2 2 2000 word 12 12 12
3 3 2000 brand 8 8 8
4 4 2000 fresh 5 5 5
5 5 2001 the 8 22 22
6 6 2001 australia 3 13 13
7 7 2001 banana 1 1 1
8 8 2001 brand 7 15 15
9 9 2001 fresh 1 6 6
If we only need to consider the column 'words', we might need to loop through unique values of the words
for unique_words in df_2.words.unique():
if 'cum_sum' not in df_2:
df_2['cum_sum'] = df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()
else:
df_2.update(pd.DataFrame({
'cum_sum': df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()
}))
above will result to:
>>> print(df_2) lodgement_year sum words cum_sum 0 2000 14 the 14.0 1 2000 10 australia 10.0 2 2000 12 word 12.0 3 2000 8 brand 8.0 4 2000 5 fresh 5.0 5 2001 8 the 22.0 6 2001 3 australia 13.0 7 2001 1 banana 1.0 8 2001 7 brand 15.0 9 2001 1 fresh 6.0
Pandas makes it easy to calculate a cumulative sum on a column by using the .cumsum() method. ,Let’s say we wanted to calculate the cumulative sum on the Sales column. We can accomplish this by writing:,Calculating a Pandas Cumulative Sum on a Single Column,There may be times when you want to calculate cumulative sums on groups in a Pandas Dataframe.
To begin, let’s load a sample Pandas Dataframe. If you want to follow along, copy the code from below and paste it into your favourite editor:
import pandas as pd
df = pd.DataFrame.from_dict({
'Type': ['A', 'B', 'A', 'A', 'A', 'B', 'A', 'B', 'B'],
'Date': ['01-Jan-21', '01-Jan-21', '02-Jan-21', '03-Jan-21', '05-Jan-21', '07-Jan-21', '09-Jan-21', '10-Jan-21', '11-Jan-21'],
'Sales': [10, 15, 7, 23, 18, 7, 3, 10, 25],
'Profits': [3, 5, 2, 7, 6, 2, 1, 3, 8]
})
print(df)
This returns the following dataframe:
Type Date Sales Profits
0 A 01 - Jan - 21 10 3
1 B 01 - Jan - 21 15 5
2 A 02 - Jan - 21 7 2
3 A 03 - Jan - 21 23 7
4 A 05 - Jan - 21 18 6
5 B 07 - Jan - 21 7 2
6 A 09 - Jan - 21 3 1
7 B 10 - Jan - 21 10 3
8 B 11 - Jan - 21 25 8
Let’s say we wanted to calculate the cumulative sum on the Sales
column. We can accomplish this by writing:
df['Sales'] = df['Sales'].cumsum()
print(df)
df['Cumulative Sales'] = df['Sales'].cumsum()
print(df)
The Pandas .cumsum()
also allows you to work with missing data. To test this out, let’s first insert a missing value into our dataframe.
import numpy as np
df.loc[5, 'Sales'] = np.NaN
print(df)
By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.,To iterate over columns and find the sum in each row, use axis=1,The index or the name of the axis. 0 is equivalent to None or ‘index’.,Exclude NA/null values. If an entire row/column is NA, the result will be NA.
>>> s = pd.Series([2, np.nan, 5, -1, 0]) >>> s 0 2.0 1 NaN 2 5.0 3 - 1.0 4 0.0 dtype: float64
>>> s.cumsum() 0 2.0 1 NaN 2 7.0 3 6.0 4 6.0 dtype: float64
>>> s.cumsum(skipna = False) 0 2.0 1 NaN 2 NaN 3 NaN 4 NaN dtype: float64
>>> df = pd.DataFrame([
[2.0, 1.0],
...[3.0, np.nan],
...[1.0, 0.0]
],
...columns = list('AB')) >>>
df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
>>> df.cumsum()
A B
0 2.0 1.0
1 5.0 NaN
2 6.0 1.0
>>> df.cumsum(axis = 1) A B 0 2.0 3.0 1 3.0 NaN 2 1.0 1.0
cumsum() over several columns,Joining string of a columns over several index while keeping other colums,Pandas - Updating columns based on several conditions - group by method,pandas multiply using dictionary values across several columns
Try:
df.C = (df.B.replace(0, np.nan).ffill().shift() * (df.A == -1) * -1).fillna(0)
confirmed jezrael's suggestion:
df.C = (df.B.replace(0, np.nan).ffill() * (df.A == -1) * -1).fillna(0)
confirmed ColonelBeauvel's suggestion:
df.C = np.where(df.A == -1, -df.B.replace(0, method = 'ffill').shift(), 0)
It's easy to do in numpy
, but I have yet to find a way to find a way to do it directly in pandas
, because apparently pandas
somehow ignores the fancy indexing:
def generate_C(df, inplace = False):
import numpy
if not inplace:
df = df.copy()
A, B = df.values.T
C = numpy.zeros_like(A)
C[A == -1] = -B[A == 1]
df['C'] = C
return df
I found a way to to it with pure pandas
:
def generate_C(df, inplace = False):
if not inplace:
df = df.copy()
df['C'] = (-df.B[df.A == 1]).reindex(df.A[df.A == -1].index, method = 'pad')
df['C'].fillna(0, inplace = True)
return df
You can use:
df.loc[df.A == -1, 'C'] = (-df.loc[df.A == 1, 'B']).values
df.C.fillna(0, inplace = True)
print(df)
A B C
Index
a 0 0 0.0
b 1 10 0.0
c - 1 0 - 10.0
d 1 20 0.0
e 0 0 0.0
f - 1 0 - 20.0
Last Updated : 26 Jul, 2020
Output:
A B C
0 2 1 5
1 5 3 8
2 13 7 17
3 27 10 19
In this tutorial, we will learn the Python pandas DataFrame.cumsum() method. It gives a cumulative sum over a DataFrame or Series axis. It returns a DataFrame or Series of the same size containing the cumulative sum.,In this tutorial, we learned the Python pandas DataFrame.cumsum() method. We learned the syntax, parameters and by solving examples we understood the DataFrame.cumsum() method.,The below example shows how to find the cumulative sum of the DataFrame with null values over the index axis using the DataFrame.cumsum() method.,The below example shows how to find the cumulative sum of the DataFrame over the index axis using the DataFrame.cumsum() method.
Syntax
DataFrame.cumsum(axis = None, skipna = True, * args, ** kwargs)
The below example shows how to find the cumulative sum of the DataFrame over the index axis using the DataFrame.cumsum()
method.
import pandas as pd # Creating the dataframe df = pd.DataFrame({ "A": [1, 2, 3, 4], "B": [5, 6, 7, 8] }) print(df) print("-----------Finding cumulative sum-------") print(df.cumsum(axis = 0))
The below example shows how to find the cumulative sum of the DataFrame over the column axis using the DataFrame.cumsum()
method.
import pandas as pd # Creating the dataframe df = pd.DataFrame({ "A": [1, 2, 3, 4], "B": [5, 6, 7, 8] }) print(df) print("-----------Finding cumulative sum-------") print(df.cumsum(axis = 1))
When applied on a pandas series, the cumsum() function returns a pandas series of the cumulative sum of the original series values. You can also apply it to an entire dataframe, in which case it returns a dataframe with cumulative sum of all the numerical columns.,You can use the pandas series cumsum() function to calculate the cumulative sum of pandas column. The following is the syntax:,You can also apply the cumsum() function on an entire dataframe. For example, let’s start fresh and create the original dataframe df containing the pageviews and the daily ad revenue.,Let’s apply the pandas cumsum() function on a single column. For example, to get the cumulative ad-revenue generated by the website, we’ll apply the cumsum() function on the “Ad Revenue” column.
You can use the pandas series cumsum()
function to calculate the cumulative sum of pandas column. The following is the syntax:
# cumulative sum of column 'Col1' df['Col1'].cumsum()
Let’s look at some examples of using the cumsum()
function to get the cumulative sum. First, we’ll create a sample dataframe that we’ll be using throughout this tutorial.
import pandas as pd # create dataframe df = pd.DataFrame({ 'PageViews': [100, 120, 180, 200, 240, 160, 130], 'Ad Revenue': [10, 15, 12, 20, 30, 22, 14] }, index = ['2020-03-01', '2020-03-02', '2020-03-03', '2020-03-04', \ '2020-03-05', '2020-03-06', '2020-03-07' ]) # print the dataframe print(df)
Output:
PageViews Ad Revenue 2020 - 03 - 01 100 10 2020 - 03 - 02 120 15 2020 - 03 - 03 180 12 2020 - 03 - 04 200 20 2020 - 03 - 05 240 30 2020 - 03 - 06 160 22 2020 - 03 - 07 130 14
What do you think would happen if try to get the cumulative sum of a column with Nan value(s) using the cumsum()
function? Let’s find out. First, we will drop the “Cumulative Ad Revenue” column created above and then set one value in the “Ad Revenue” to Nan.
import numpy as np # drop Cumulative Ad Revenue df = df.drop('Cumulative Ad Revenue', axis = 1) # set an Ad Revenue to NaN df.loc['2020-03-02', 'Ad Revenue'] = np.nan # display the dataframe print(df)
You can see that the “Ad Revenue” corresponding to “2020-03-02” is NaN. Let’s go ahead and get the cumulative sum of this column.
# cumulative Ad Reveunue df['Cumulative Ad Revenue'] = df['Ad Revenue'].cumsum() # display the dataframe print(df)
The cummsum() function of a DataFrame object is used to obtain the cumulative sum over its axis.,Line 15: We use the cumsum() function to obtain the cumulative maximum values running downwards across the rows (axis 0). We print the result to the console.,This function returns a Series or DataFrame object showing the cumulative maximum in the axis.,Line 18: We use the cumsum() function to obtain the cumulative maximum values running horizontally across columns (axis 1). We print the result to the console.
DataFrame.cumsum(axis = None, skipna = True, * args, ** kwargs)
# A code to illustrate the cumsum() function in Pandas # importing the pandas library import pandas as pd # creating a dataframe df = pd.DataFrame([ [5, 10, 4, 15, 3], [1, 7, 5, 9, 0.5], [3, 11, 13, 14, 12] ], columns = list('ABCDE')) # printing the dataframe print(df) # obtaining the cummulative sum vertically across rows print(df.cumsum(axis = "index")) # obtaining the cummulative sum horizontally over columns print(df.cumsum(axis = "columns"))