pandas series get mean value of multiple intervals

  • Last Update :
  • Techknowledgy :

Let's define the test source Series as:

2019 - 02 - 20 13: 00: 49.268 40
2019 - 02 - 20 13: 00: 50.275 30
2019 - 02 - 20 13: 02: 51.397 18
2019 - 02 - 20 13: 02: 52.434 13
2019 - 02 - 20 13: 05: 53.542 21
2019 - 02 - 20 13: 05: 55.059 51
2019 - 02 - 20 13: 06: 56.169 32
2019 - 02 - 20 13: 06: 57.279 38
2019 - 02 - 20 13: 08: 58.408 48
2019 - 02 - 20 13: 08: 59.518 14
Name: Val, dtype: int64

and the list of intervals as:

intv = [(pd.to_datetime('2019-02-20 13:00'), pd.to_datetime('2019-02-20 13:01')),
   (pd.to_datetime('2019-02-20 13:06'), pd.to_datetime('2019-02-20 13:07'))
]

A preparatory step is to create an IntervalIndex:

intvInd = pd.IntervalIndex.from_tuples(intv)

I tried the above solution using Pandas version 0.24.2. As Inspi noticed, at least in version 0.25 the last instruction must be changed to:

s[[any(intvInd.contains(v)) for v in s.index.values]].mean()

Assume your time series is indexed by date:

dates = pd.date_range('2019-07-01', '2019-07-25', freq = 'T')
s = pd.Series(np.random.uniform(1, 100, len(dates)), index = dates)

Some sample data:

2019 - 07 - 01 00: 00: 00 54.851538
2019 - 07 - 01 00: 01: 00 82.493677
2019 - 07 - 01 00: 02: 00 80.589765
2019 - 07 - 01 00: 03: 00 54.973948
2019 - 07 - 01 00: 04: 00 18.216064

And your intervals are defined in a data frame:

intervals = pd.DataFrame([
   ['2019-07-01', '2019-07-02'],
   ['2019-07-02', '2019-07-10']
], columns = ['StartDate', 'EndDate'], dtype = 'datetime64[ns]')

Suggestion : 2

A preparatory step is to create an anycodings_python-3.x IntervalIndex:,What would be a short way to get the mean anycodings_pandas value of the Series in the combined anycodings_pandas intervals? (any interpolation function can anycodings_pandas be used here),Flutter streambuilder does not update a statefulwidget or statelesswidget,What is the use of the field generator in @GeneratedValue in JPA / Hibernate?

With an unevenly spaced datetime Series like anycodings_pandas so:

date
2019 - 02 - 20 13: 00: 49.268 41.177929
2019 - 02 - 20 13: 00: 50.275 12.431984
2019 - 02 - 20 13: 00: 51.397 18.042411
2019 - 02 - 20 13: 00: 52.434 13.144179
2019 - 02 - 20 13: 00: 53.542 21.349083
   ...
   2019 - 02 - 20 13: 05: 55.059 51.763360
2019 - 02 - 20 13: 05: 56.169 58.140644
2019 - 02 - 20 13: 05: 57.279 0.411533
2019 - 02 - 20 13: 05: 58.408 48.404780
2019 - 02 - 20 13: 05: 59.518 14.626680
Name: Values, Length: 285, dtype: float64

Let's define the test source Series as:

2019 - 02 - 20 13: 00: 49.268 40
2019 - 02 - 20 13: 00: 50.275 30
2019 - 02 - 20 13: 02: 51.397 18
2019 - 02 - 20 13: 02: 52.434 13
2019 - 02 - 20 13: 05: 53.542 21
2019 - 02 - 20 13: 05: 55.059 51
2019 - 02 - 20 13: 06: 56.169 32
2019 - 02 - 20 13: 06: 57.279 38
2019 - 02 - 20 13: 08: 58.408 48
2019 - 02 - 20 13: 08: 59.518 14
Name: Val, dtype: int64

and the list of intervals as:

intv = [(pd.to_datetime('2019-02-20 13:00'), pd.to_datetime('2019-02-20 13:01')),
   (pd.to_datetime('2019-02-20 13:06'), pd.to_datetime('2019-02-20 13:07'))
]

A preparatory step is to create an anycodings_python-3.x IntervalIndex:

intvInd = pd.IntervalIndex.from_tuples(intv)

I tried the above solution using Pandas anycodings_python-3.x version 0.24.2. As Inspi noticed, at anycodings_python-3.x least in version 0.25 the last anycodings_python-3.x instruction must be changed to:

s[[any(intvInd.contains(v)) for v in s.index.values]].mean()

Assume your time series is indexed by anycodings_python-3.x date:

dates = pd.date_range('2019-07-01', '2019-07-25', freq = 'T')
s = pd.Series(np.random.uniform(1, 100, len(dates)), index = dates)

Some sample data:

2019 - 07 - 01 00: 00: 00 54.851538
2019 - 07 - 01 00: 01: 00 82.493677
2019 - 07 - 01 00: 02: 00 80.589765
2019 - 07 - 01 00: 03: 00 54.973948
2019 - 07 - 01 00: 04: 00 18.216064

And your intervals are defined in a data anycodings_python-3.x frame:

intervals = pd.DataFrame([
   ['2019-07-01', '2019-07-02'],
   ['2019-07-02', '2019-07-10']
], columns = ['StartDate', 'EndDate'], dtype = 'datetime64[ns]')

Suggestion : 3

Now, we will aggregate the 1-second values into interval averages. To see how the averaging interval affects results, we’ll loop over a few common data intervals and accumulate the results.,Now, calculate the “ground truth” irradiance data. We’ll simulate clear-sky irradiance components at 1-second intervals and calculate the corresponding POA irradiance. At such a short timescale, the difference between instantaneous and interval-averaged irradiance is negligible.,It is important to account for this difference when using interval-averaged weather data for modeling. This example focuses on calculating solar position appropriately for irradiance transposition, but this concept is relevant for other steps in the modeling process as well.,We can also plot the underlying time series results of the last iteration (hourly in this case). The modeled irradiance using no shift is effectively time-lagged compared with ground truth. In contrast, the half-shift model is nearly identical to the ground truth irradiance.

import pvlib
import pandas as pd
import matplotlib.pyplot as plt
def transpose(irradiance, timeshift):
   ""
"
Transpose irradiance components to plane - of - array, incorporating
a timeshift in the solar position calculation.

Parameters
-- -- -- -- --
irradiance: DataFrame
Has columns dni, ghi, dhi
timeshift: float
Number of minutes to shift
for solar position calculation
Outputs:
   Series of POA irradiance ""
"
idx = irradiance.index
# calculate solar position
for shifted timestamps:
   idx = idx + pd.Timedelta(timeshift, unit = 'min')
solpos = location.get_solarposition(idx)
# but still report the values with the original timestamps:
   solpos.index = irradiance.index

poa_components = pvlib.irradiance.get_total_irradiance(
   surface_tilt = 20,
   surface_azimuth = 180,
   solar_zenith = solpos['apparent_zenith'],
   solar_azimuth = solpos['azimuth'],
   dni = irradiance['dni'],
   ghi = irradiance['ghi'],
   dhi = irradiance['dhi'],
   model = 'isotropic',
)
return poa_components['poa_global']
# baseline: all calculations done at 1 - second scale
location = pvlib.location.Location(40, -80, tz = 'Etc/GMT+5')
times = pd.date_range('2019-06-01 05:00', '2019-06-01 19:00',
   freq = '1s', tz = 'Etc/GMT+5')
solpos = location.get_solarposition(times)
clearsky = location.get_clearsky(times, solar_position = solpos)
poa_1s = transpose(clearsky, timeshift = 0) # no shift needed
for 1 s data
fig, ax = plt.subplots(figsize = (5, 3))

results = []

for timescale_minutes in [1, 5, 10, 15, 30, 60]:

   timescale_str = f '{timescale_minutes}min'
# get the "true"
interval average of poa as the baseline
for comparison
poa_avg = poa_1s.resample(timescale_str).mean()
# get interval averages of irradiance components to use
for transposition
clearsky_avg = clearsky.resample(timescale_str).mean()

# low - res interval averages of 1 - second data, with NO shift
poa_avg_noshift = transpose(clearsky_avg, timeshift = 0)

# low - res interval averages of 1 - second data, with half - interval shift
poa_avg_halfshift = transpose(clearsky_avg, timeshift = timescale_minutes / 2)

df = pd.DataFrame({
   'ground truth': poa_avg,
   'modeled, half shift': poa_avg_halfshift,
   'modeled, no shift': poa_avg_noshift,
})
error = df.subtract(df['ground truth'], axis = 0)
# add another trace to the error plot
error['modeled, no shift'].plot(ax = ax, label = timescale_str)
# calculate error statistics and save
for later
stats = error.abs().mean() # average absolute error across daylight hours
stats['timescale_minutes'] = timescale_minutes
results.append(stats)

ax.legend(ncol = 2)
ax.set_ylabel('Transposition Error [W/m$^2$]')
fig.tight_layout()

df_results = pd.DataFrame(results).set_index('timescale_minutes')
print(df_results)
                   ground truth modeled, half shift modeled, no shift
                   timescale_minutes
                   1.0 0.0 0.012018 0.702429
                   5.0 0.0 0.021197 3.542882
                   10.0 0.0 0.062650 7.051620
                   15.0 0.0 0.142984 10.531453
                   30.0 0.0 0.581701 20.619625
                   60.0 0.0 1.955845 39.418585
fig, ax = plt.subplots(figsize = (5, 3))
df_results[['modeled, no shift', 'modeled, half shift']].plot.bar(rot = 0, ax = ax)
ax.set_ylabel('Mean Absolute Error [W/m$^2$]')
ax.set_xlabel('Transposition Timescale [minutes]')
fig.tight_layout()

Suggestion : 4

jQuery Tutorial , Hire developers

Solution for get each 3 rows (if exist) per Name groups - first get counter by GroupBy.cumcount with integer division and pass it to named aggregations:

g = df.groupby('Name').cumcount() // 3
df = df.groupby(['Name', g]).agg(Start = ('Position', 'first'),
   End = ('Position', 'last'),
   Value = ('Value', 'mean')).droplevel(1).reset_index()
print(df)
Name Start End Value
0 A 1 3 10.333333
1 A 4 6 8.666667
2 A 7 9 9.333333
3 A 10 12 9.000000