From reading the discussion of this issue on Github here, you can solve this issue by specifying numeric_only=False for mean calculation as follows
pd.concat([A, B], axis = 1).groupby("status_reason")["closing_time"]\
.mean(numeric_only = False)
The problem might be In Progress
only have NaT
time, which might not allowed in groupby().mean()
. Here's the test:
df = pd.DataFrame({
'closing_time': ['11:35:00', '07:13:00', np.nan, np.nan, np.nan],
'status_reason': ['Won', 'Canceled', 'In Progress', 'In Progress', 'In Progress']
})
df.closing_time = pd.to_timedelta(df.closing_time)
df.groupby('status_reason').closing_time.mean()
gives the exact error. To overcome this, do:
def custom_mean(x):
try:
return x.mean()
except:
return pd.to_timedelta([np.nan])
df.groupby('status_reason').closing_time.apply(custom_mean)
which gives:
status_reason Canceled 07: 13: 00 In Progress NaT Won 11: 35: 00 Name: closing_time, dtype: timedelta64[ns]
I cannot say why groupby's mean() method does not work, but the following slight modification of your code should work: First, convert timedelta column to seconds with total_seconds() method, then groupby and mean, then convert seconds to timedelta again:
pd.to_timedelta(pd.concat([A.dt.total_seconds(), B], axis = 1).groupby("status_reason")["closing_time"].mean(), unit = "s")
For example dataframe below, the code -
df = pd.DataFrame({ 'closing_time': ['2 days 11:35:00', '07:13:00', np.nan, np.nan, np.nan], 'status_reason': ['Won', 'Canceled', 'In Progress', 'In Progress', 'In Progress'] }) df.loc[: , "closing_time"] = \ pd.to_timedelta(df.closing_time).dt.days * 24 * 3600\ + pd.to_timedelta(df.closing_time).dt.seconds # or alternatively use total_seconds() to get total seconds in timedelta as follows # df.loc[: , "closing_time"] = pd.to_timedelta(df.closing_time).dt.total_seconds() pd.to_timedelta(df.groupby("status_reason")["closing_time"].mean(), unit = "s")
produces
status_reason Canceled 0 days 07: 13: 00 In Progress NaT Won 2 days 11: 35: 00 Name: closing_time, dtype: timedelta64[ns]
To overcome this situation, the first thing you have to do is deciding how to handle NaN values. The best approach depends on what we want to achieve. In my case, it's fine to have even a simple categorical result, so I can do something like this:
import datetime
def define_time(row):
if pd.isnull(row["closing_time"]):
return "Null"
elif row["closing_time"] < datetime.timedelta(days = 100):
return "<100"
elif row["closing_time"] > datetime.timedelta(days = 100):
return ">100"
time_results = pd.concat([A, B], axis = 1).apply(lambda row: define_time(row), axis = 1)
In the end the result is like this:
In:
time_results.value_counts()
Out:
>
100 1452 <
100 1091
Null 1000
dtype: int64
Pandas Timedelta mean returns error "No numeric types to aggregate". Why?,How can i pivot a dataframe in pandas where values are date? I get DataError: No numeric types to aggregate error,pandas.core.base.DataError: No numeric types to aggregate when trying to get the mean with groupby,`No numeric types to aggregate` error with rolling sum and timedelta type
To overcome this situation, the first thing you have to do is deciding how to handle NaN values. The best approach depends on what we want to achieve. In my case, it's fine to have even a simple categorical result, so I can do something like this:
import datetime
def define_time(row):
if pd.isnull(row["closing_time"]):
return "Null"
elif row["closing_time"] < datetime.timedelta(days = 100):
return "<100"
elif row["closing_time"] > datetime.timedelta(days = 100):
return ">100"
time_results = pd.concat([A, B], axis = 1).apply(lambda row: define_time(row), axis = 1)
In the end the result is like this:
In:
time_results.value_counts()
Out:
>
100 1452 <
100 1091
Null 1000
dtype: int64
The problem might be In Progress
only have NaT
time, which might not allowed in groupby().mean()
. Here's the test:
df = pd.DataFrame({
'closing_time': ['11:35:00', '07:13:00', np.nan, np.nan, np.nan],
'status_reason': ['Won', 'Canceled', 'In Progress', 'In Progress', 'In Progress']
})
df.closing_time = pd.to_timedelta(df.closing_time)
df.groupby('status_reason').closing_time.mean()
gives the exact error. To overcome this, do:
def custom_mean(x):
try:
return x.mean()
except:
return pd.to_timedelta([np.nan])
df.groupby('status_reason').closing_time.apply(custom_mean)
which gives:
status_reason Canceled 07: 13: 00 In Progress NaT Won 11: 35: 00 Name: closing_time, dtype: timedelta64[ns]
I cannot say why groupby's mean() method does not work, but the following slight modification of your code should work: First, convert timedelta column to seconds with total_seconds() method, then groupby and mean, then convert seconds to timedelta again:
pd.to_timedelta(pd.concat([A.dt.total_seconds(), B], axis = 1).groupby("status_reason")["closing_time"].mean(), unit = "s")
For example dataframe below, the code -
df = pd.DataFrame({ 'closing_time': ['2 days 11:35:00', '07:13:00', np.nan, np.nan, np.nan], 'status_reason': ['Won', 'Canceled', 'In Progress', 'In Progress', 'In Progress'] }) df.loc[: , "closing_time"] = \ pd.to_timedelta(df.closing_time).dt.days * 24 * 3600\ + pd.to_timedelta(df.closing_time).dt.seconds # or alternatively use total_seconds() to get total seconds in timedelta as follows # df.loc[: , "closing_time"] = pd.to_timedelta(df.closing_time).dt.total_seconds() pd.to_timedelta(df.groupby("status_reason")["closing_time"].mean(), unit = "s")
produces
status_reason Canceled 0 days 07: 13: 00 In Progress NaT Won 2 days 11: 35: 00 Name: closing_time, dtype: timedelta64[ns]
From reading the discussion of this issue on Github here, you can solve this issue by specifying numeric_only=False for mean calculation as follows
pd.concat([A, B], axis = 1).groupby("status_reason")["closing_time"]\
.mean(numeric_only = False)
Jun 4, 2017 183K views,Jun 7, 2020 27K views,Jun 21, 2020 3.1K views
import pandas as pd df1 = pd.DataFrame({
'index': range(8),
'variable1': ["A", "A", "B", "B", "A", "B", "B", "A"],
'variable2': ["a", "b", "a", "b", "a", "b", "a", "b"],
'variable3': ["x", "x", "x", "y", "y", "y", "x", "y"],
'result': ["on", "off", "off", "on", "on", "off", "off", "on"]
}) df1.pivot_table(values = 'result', rows = 'index', cols = ['variable1', 'variable2', 'variable3'])
import pandas as pd df1 = pd.DataFrame({ 'index': range(8), 'variable1': ["A", "A", "B", "B", "A", "B", "B", "A"], 'variable2': ["a", "b", "a", "b", "a", "b", "a", "b"], 'variable3': ["x", "x", "x", "y", "y", "y", "x", "y"], 'result': ["on", "off", "off", "on", "on", "off", "off", "on"] }) # these are the columns to end up in the multi - index columns.unstack_cols = ['variable1', 'variable2', 'variable3']