cumsum() over several columns

  • Last Update :
  • Techknowledgy :

cumsum() method calculates the cumulative sum of a Pandas column. You are looking for that applied to the grouped words. Therefore:

In[303]: df_2['cumsum'] = df_2.groupby(['words'])['sum'].cumsum()

In[304]: df_2
Out[304]:
   index lodgement_year words sum cum_sum cumsum
0 0 2000 the 14 14 14
1 1 2000 australia 10 10 10
2 2 2000 word 12 12 12
3 3 2000 brand 8 8 8
4 4 2000 fresh 5 5 5
5 5 2001 the 8 22 22
6 6 2001 australia 3 13 13
7 7 2001 banana 1 1 1
8 8 2001 brand 7 15 15
9 9 2001 fresh 1 6 6

If we only need to consider the column 'words', we might need to loop through unique values of the words

for unique_words in df_2.words.unique():
   if 'cum_sum' not in df_2:
   df_2['cum_sum'] = df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()
else:
   df_2.update(pd.DataFrame({
      'cum_sum': df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()
   }))

above will result to:

>>> print(df_2)
lodgement_year sum words cum_sum
0 2000 14 the 14.0
1 2000 10 australia 10.0
2 2000 12 word 12.0
3 2000 8 brand 8.0
4 2000 5 fresh 5.0
5 2001 8 the 22.0
6 2001 3 australia 13.0
7 2001 1 banana 1.0
8 2001 7 brand 15.0
9 2001 1 fresh 6.0

Suggestion : 2

Pandas makes it easy to calculate a cumulative sum on a column by using the .cumsum() method. ,Let’s say we wanted to calculate the cumulative sum on the Sales column. We can accomplish this by writing:,Calculating a Pandas Cumulative Sum on a Single Column,There may be times when you want to calculate cumulative sums on groups in a Pandas Dataframe.

To begin, let’s load a sample Pandas Dataframe. If you want to follow along, copy the code from below and paste it into your favourite editor:

import pandas as pd

df = pd.DataFrame.from_dict({
   'Type': ['A', 'B', 'A', 'A', 'A', 'B', 'A', 'B', 'B'],
   'Date': ['01-Jan-21', '01-Jan-21', '02-Jan-21', '03-Jan-21', '05-Jan-21', '07-Jan-21', '09-Jan-21', '10-Jan-21', '11-Jan-21'],
   'Sales': [10, 15, 7, 23, 18, 7, 3, 10, 25],
   'Profits': [3, 5, 2, 7, 6, 2, 1, 3, 8]
})

print(df)

This returns the following dataframe:

  Type Date Sales Profits
  0 A 01 - Jan - 21 10 3
  1 B 01 - Jan - 21 15 5
  2 A 02 - Jan - 21 7 2
  3 A 03 - Jan - 21 23 7
  4 A 05 - Jan - 21 18 6
  5 B 07 - Jan - 21 7 2
  6 A 09 - Jan - 21 3 1
  7 B 10 - Jan - 21 10 3
  8 B 11 - Jan - 21 25 8

Let’s say we wanted to calculate the cumulative sum on the Sales column. We can accomplish this by writing:

df['Sales'] = df['Sales'].cumsum()

print(df)
df['Cumulative Sales'] = df['Sales'].cumsum()

print(df)

The Pandas .cumsum() also allows you to work with missing data. To test this out, let’s first insert a missing value into our dataframe.

import numpy as np

df.loc[5, 'Sales'] = np.NaN

print(df)

Suggestion : 3

By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.,To iterate over columns and find the sum in each row, use axis=1,The index or the name of the axis. 0 is equivalent to None or ‘index’.,Exclude NA/null values. If an entire row/column is NA, the result will be NA.

>>> s = pd.Series([2, np.nan, 5, -1, 0]) >>>
   s
0 2.0
1 NaN
2 5.0
3 - 1.0
4 0.0
dtype: float64
>>> s.cumsum()
0 2.0
1 NaN
2 7.0
3 6.0
4 6.0
dtype: float64
>>> s.cumsum(skipna = False)
0 2.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
>>> df = pd.DataFrame([
         [2.0, 1.0],
         ...[3.0, np.nan],
         ...[1.0, 0.0]
      ],
      ...columns = list('AB')) >>>
   df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
>>> df.cumsum()
A B
0 2.0 1.0
1 5.0 NaN
2 6.0 1.0
>>> df.cumsum(axis = 1)
A B
0 2.0 3.0
1 3.0 NaN
2 1.0 1.0

Suggestion : 4

cumsum() over several columns,Joining string of a columns over several index while keeping other colums,Pandas - Updating columns based on several conditions - group by method,pandas multiply using dictionary values across several columns

Try:

df.C = (df.B.replace(0, np.nan).ffill().shift() * (df.A == -1) * -1).fillna(0)

confirmed jezrael's suggestion:

df.C = (df.B.replace(0, np.nan).ffill() * (df.A == -1) * -1).fillna(0)

confirmed ColonelBeauvel's suggestion:

df.C = np.where(df.A == -1, -df.B.replace(0, method = 'ffill').shift(), 0)

It's easy to do in numpy, but I have yet to find a way to find a way to do it directly in pandas, because apparently pandas somehow ignores the fancy indexing:

def generate_C(df, inplace = False):
   import numpy

if not inplace:
   df = df.copy()

A, B = df.values.T
C = numpy.zeros_like(A)
C[A == -1] = -B[A == 1]
df['C'] = C

return df

I found a way to to it with pure pandas:

def generate_C(df, inplace = False):
   if not inplace:
   df = df.copy()

df['C'] = (-df.B[df.A == 1]).reindex(df.A[df.A == -1].index, method = 'pad')
df['C'].fillna(0, inplace = True)

return df

You can use:

df.loc[df.A == -1, 'C'] = (-df.loc[df.A == 1, 'B']).values
df.C.fillna(0, inplace = True)
print(df)
A B C
Index
a 0 0 0.0
b 1 10 0.0
c - 1 0 - 10.0
d 1 20 0.0
e 0 0 0.0
f - 1 0 - 20.0

Suggestion : 5

Last Updated : 26 Jul, 2020

Output: 
 

    A B C
    0 2 1 5
    1 5 3 8
    2 13 7 17
    3 27 10 19

Suggestion : 6

In this tutorial, we will learn the Python pandas DataFrame.cumsum() method. It gives a cumulative sum over a DataFrame or Series axis. It returns a DataFrame or Series of the same size containing the cumulative sum.,In this tutorial, we learned the Python pandas DataFrame.cumsum() method. We learned the syntax, parameters and by solving examples we understood the DataFrame.cumsum() method.,The below example shows how to find the cumulative sum of the DataFrame with null values over the index axis using the DataFrame.cumsum() method.,The below example shows how to find the cumulative sum of the DataFrame over the index axis using the DataFrame.cumsum() method.

Syntax

DataFrame.cumsum(axis = None, skipna = True, * args, ** kwargs)

The below example shows how to find the cumulative sum of the DataFrame over the index axis using the DataFrame.cumsum() method.

import pandas as pd
# Creating the dataframe
df = pd.DataFrame({
   "A": [1, 2, 3, 4],
   "B": [5, 6, 7, 8]
})
print(df)
print("-----------Finding cumulative sum-------")
print(df.cumsum(axis = 0))

The below example shows how to find the cumulative sum of the DataFrame over the column axis using the DataFrame.cumsum() method.

import pandas as pd
# Creating the dataframe
df = pd.DataFrame({
   "A": [1, 2, 3, 4],
   "B": [5, 6, 7, 8]
})
print(df)
print("-----------Finding cumulative sum-------")
print(df.cumsum(axis = 1))

Suggestion : 7

When applied on a pandas series, the cumsum() function returns a pandas series of the cumulative sum of the original series values. You can also apply it to an entire dataframe, in which case it returns a dataframe with cumulative sum of all the numerical columns.,You can use the pandas series cumsum() function to calculate the cumulative sum of pandas column. The following is the syntax:,You can also apply the cumsum() function on an entire dataframe. For example, let’s start fresh and create the original dataframe df containing the pageviews and the daily ad revenue.,Let’s apply the pandas cumsum() function on a single column. For example, to get the cumulative ad-revenue generated by the website, we’ll apply the cumsum() function on the “Ad Revenue” column.

You can use the pandas series cumsum() function to calculate the cumulative sum of pandas column. The following is the syntax:

# cumulative sum of column 'Col1'
df['Col1'].cumsum()

Let’s look at some examples of using the cumsum() function to get the cumulative sum. First, we’ll create a sample dataframe that we’ll be using throughout this tutorial.

import pandas as pd

# create dataframe
df = pd.DataFrame({
      'PageViews': [100, 120, 180, 200, 240, 160, 130],
      'Ad Revenue': [10, 15, 12, 20, 30, 22, 14]
   },
   index = ['2020-03-01', '2020-03-02', '2020-03-03', '2020-03-04', \
      '2020-03-05', '2020-03-06', '2020-03-07'
   ])
# print the dataframe
print(df)

Output:

            PageViews Ad Revenue
            2020 - 03 - 01 100 10
            2020 - 03 - 02 120 15
            2020 - 03 - 03 180 12
            2020 - 03 - 04 200 20
            2020 - 03 - 05 240 30
            2020 - 03 - 06 160 22
            2020 - 03 - 07 130 14

What do you think would happen if try to get the cumulative sum of a column with Nan value(s) using the cumsum() function? Let’s find out. First, we will drop the “Cumulative Ad Revenue” column created above and then set one value in the “Ad Revenue” to Nan.

import numpy as np
# drop Cumulative Ad Revenue
df = df.drop('Cumulative Ad Revenue', axis = 1)
# set an Ad Revenue to NaN
df.loc['2020-03-02', 'Ad Revenue'] = np.nan
# display the dataframe
print(df)

You can see that the “Ad Revenue” corresponding to “2020-03-02” is NaN. Let’s go ahead and get the cumulative sum of this column.

# cumulative Ad Reveunue
df['Cumulative Ad Revenue'] = df['Ad Revenue'].cumsum()
# display the dataframe
print(df)

Suggestion : 8

The cummsum() function of a DataFrame object is used to obtain the cumulative sum over its axis.,Line 15: We use the cumsum() function to obtain the cumulative maximum values running downwards across the rows (axis 0). We print the result to the console.,This function returns a Series or DataFrame object showing the cumulative maximum in the axis.,Line 18: We use the cumsum() function to obtain the cumulative maximum values running horizontally across columns (axis 1). We print the result to the console.

DataFrame.cumsum(axis = None, skipna = True, * args, ** kwargs)
# A code to illustrate the cumsum()
function in Pandas

# importing the pandas library
import pandas as pd

# creating a dataframe
df = pd.DataFrame([
      [5, 10, 4, 15, 3],
      [1, 7, 5, 9, 0.5],
      [3, 11, 13, 14, 12]
   ],
   columns = list('ABCDE'))
# printing the dataframe
print(df)

# obtaining the cummulative sum vertically across rows
print(df.cumsum(axis = "index"))

# obtaining the cummulative sum horizontally over columns
print(df.cumsum(axis = "columns"))