how to resample irregular time series to daily frequency and have it span to today?

  • Last Update :
  • Techknowledgy :
dset = data.reindex(
   pd.date_range(start = data.index.min(),
      end = pd.datetime.today(),
      freq = 'D'),
   method = 'ffill')

[output]

            Var 1 Var 2 Var 3 Var 4
            2017 - 09 - 20 16.0 1.328125 1.375 0.135976
            2017 - 09 - 21 16.0 1.328125 1.375 0.135976
            2017 - 09 - 22 16.0 1.328125 1.375 0.135976
            2017 - 09 - 23 16.0 1.328125 1.375 0.135976
            2017 - 09 - 24 16.0 1.328125 1.375 0.135976
            2017 - 09 - 25 16.0 1.328125 1.375 0.135976
            2017 - 09 - 26 16.0 1.328125 1.375 0.135976
            2017 - 09 - 27 16.0 1.328125 1.375 0.135976
            2017 - 09 - 28 16.0 1.328125 1.375 0.135976
            2017 - 09 - 29 16.0 1.328125 1.375 0.135976
               ...
               2019 - 04 - 04 17.0 3.507353 2.375 0.179358
            2019 - 04 - 05 17.0 3.507353 2.375 0.179358
            2019 - 04 - 06 17.0 3.507353 2.375 0.179358
            2019 - 04 - 07 17.0 3.507353 2.375 0.179358
            2019 - 04 - 08 17.0 3.507353 2.375 0.179358
            2019 - 04 - 09 17.0 3.507353 2.375 0.179358
            2019 - 04 - 10 17.0 3.507353 2.375 0.179358
            2019 - 04 - 11 17.0 3.507353 2.375 0.179358
            2019 - 04 - 12 17.0 3.507353 2.375 0.179358
            2019 - 04 - 13 17.0 3.507353 2.375 0.179358

Suggestion : 2

Use DataFrame.reindex with anycodings_pandas pandas.date_range method:,For example, here is the irregular anycodings_python dataframe:,I am sure I am doing something stupid/not anycodings_python adding a simple addendum to the command. I anycodings_python would prefer to stay within pandas, if anycodings_python possible. ,Is there a way of mirroring an inheritance relationship using spring-boot-starter-data-elasticsearch?

For example, here is the irregular anycodings_python dataframe:

data
Out[1]:
   Var 1 Var 2 Var 3 Var 4
Dates

2017 - 09 - 20 16.0 1.328125 1.375 0.135976
2017 - 12 - 13 16.0 1.343750 1.375 0.085391
2018 - 03 - 21 15.0 2.191667 2.125 0.274946
2018 - 06 - 13 15.0 2.241667 2.375 0.208452
2018 - 09 - 26 16.0 4.312500 2.375 0.111803
2018 - 12 - 19 17.0 4.279412 2.375 0.083026
2019 - 03 - 20 17.0 3.507353 2.375 0.179358

I used

dset = data.resample('D', convention = 'end').ffill()

which results (the tail end) in

dset.tail()
Out[2]:
   Var 1 Var 2 Var 3 Var 4
Dates
2019 - 03 - 16 17.0 4.279412 2.375 0.083026
2019 - 03 - 17 17.0 4.279412 2.375 0.083026
2019 - 03 - 18 17.0 4.279412 2.375 0.083026
2019 - 03 - 19 17.0 4.279412 2.375 0.083026
2019 - 03 - 20 17.0 3.507353 2.375 0.179358

Use DataFrame.reindex with anycodings_pandas pandas.date_range method:

dset = data.reindex(
   pd.date_range(start = data.index.min(),
      end = pd.datetime.today(),
      freq = 'D'),
   method = 'ffill')

[output]

            Var 1 Var 2 Var 3 Var 4
            2017 - 09 - 20 16.0 1.328125 1.375 0.135976
            2017 - 09 - 21 16.0 1.328125 1.375 0.135976
            2017 - 09 - 22 16.0 1.328125 1.375 0.135976
            2017 - 09 - 23 16.0 1.328125 1.375 0.135976
            2017 - 09 - 24 16.0 1.328125 1.375 0.135976
            2017 - 09 - 25 16.0 1.328125 1.375 0.135976
            2017 - 09 - 26 16.0 1.328125 1.375 0.135976
            2017 - 09 - 27 16.0 1.328125 1.375 0.135976
            2017 - 09 - 28 16.0 1.328125 1.375 0.135976
            2017 - 09 - 29 16.0 1.328125 1.375 0.135976
               ...
               2019 - 04 - 04 17.0 3.507353 2.375 0.179358
            2019 - 04 - 05 17.0 3.507353 2.375 0.179358
            2019 - 04 - 06 17.0 3.507353 2.375 0.179358
            2019 - 04 - 07 17.0 3.507353 2.375 0.179358
            2019 - 04 - 08 17.0 3.507353 2.375 0.179358
            2019 - 04 - 09 17.0 3.507353 2.375 0.179358
            2019 - 04 - 10 17.0 3.507353 2.375 0.179358
            2019 - 04 - 11 17.0 3.507353 2.375 0.179358
            2019 - 04 - 12 17.0 3.507353 2.375 0.179358
            2019 - 04 - 13 17.0 3.507353 2.375 0.179358

Suggestion : 3

which is great, except that the last "upsampling" ended on 3/20/2019, but I would like for it to end on 4/13/2019 (today's date). As you can see, the type of resampling I am after is to simply take the data from the irregular series and repeat it daily until the next (irregular) data point, from which the new observation is repeated until the next (irregular) data point, etc. ,I have an irregularly spaced (with respect to time frequency) pandas data frame. I can successfully up-sample the data frame to a daily frequency using the resample command, however my problem is that the resampling ends at the last (pre-resampled) data observation. I would like the resampling to span all the way to today's date. ,For example, here is the irregular dataframe:,I am sure I am doing something stupid/not adding a simple addendum to the command. I would prefer to stay within pandas, if possible.

For example, here is the irregular dataframe:

data
Out[1]:
   Var 1 Var 2 Var 3 Var 4
Dates

2017 - 09 - 20 16.0 1.328125 1.375 0.135976
2017 - 12 - 13 16.0 1.343750 1.375 0.085391
2018 - 03 - 21 15.0 2.191667 2.125 0.274946
2018 - 06 - 13 15.0 2.241667 2.375 0.208452
2018 - 09 - 26 16.0 4.312500 2.375 0.111803
2018 - 12 - 19 17.0 4.279412 2.375 0.083026
2019 - 03 - 20 17.0 3.507353 2.375 0.179358

I used

dset = data.resample('D', convention = 'end').ffill()

which results (the tail end) in

dset.tail()
Out[2]:
   Var 1 Var 2 Var 3 Var 4
Dates
2019 - 03 - 16 17.0 4.279412 2.375 0.083026
2019 - 03 - 17 17.0 4.279412 2.375 0.083026
2019 - 03 - 18 17.0 4.279412 2.375 0.083026
2019 - 03 - 19 17.0 4.279412 2.375 0.083026
2019 - 03 - 20 17.0 3.507353 2.375 0.179358
dset = data.reindex(
   pd.date_range(start = data.index.min(),
      end = pd.datetime.today(),
      freq = 'D'),
   method = 'ffill')

[output]

            Var 1 Var 2 Var 3 Var 4
            2017 - 09 - 20 16.0 1.328125 1.375 0.135976
            2017 - 09 - 21 16.0 1.328125 1.375 0.135976
            2017 - 09 - 22 16.0 1.328125 1.375 0.135976
            2017 - 09 - 23 16.0 1.328125 1.375 0.135976
            2017 - 09 - 24 16.0 1.328125 1.375 0.135976
            2017 - 09 - 25 16.0 1.328125 1.375 0.135976
            2017 - 09 - 26 16.0 1.328125 1.375 0.135976
            2017 - 09 - 27 16.0 1.328125 1.375 0.135976
            2017 - 09 - 28 16.0 1.328125 1.375 0.135976
            2017 - 09 - 29 16.0 1.328125 1.375 0.135976
               ...
               2019 - 04 - 04 17.0 3.507353 2.375 0.179358
            2019 - 04 - 05 17.0 3.507353 2.375 0.179358
            2019 - 04 - 06 17.0 3.507353 2.375 0.179358
            2019 - 04 - 07 17.0 3.507353 2.375 0.179358
            2019 - 04 - 08 17.0 3.507353 2.375 0.179358
            2019 - 04 - 09 17.0 3.507353 2.375 0.179358
            2019 - 04 - 10 17.0 3.507353 2.375 0.179358
            2019 - 04 - 11 17.0 3.507353 2.375 0.179358
            2019 - 04 - 12 17.0 3.507353 2.375 0.179358
            2019 - 04 - 13 17.0 3.507353 2.375 0.179358

Suggestion : 4

convert irregular time series to hourly data in python pandas,convert irregular time series to hourly data in python and have normal distribution,Pandas resample and interpolate an irregular time series using a list of other irregular times,Python pandas time series interpolation and regularization

The quick answer is that you can use

DataFrame.resample().mean().interpolate()

Here's how I did it:

import pandas as pd
from io
import StringIO
from bokeh.plotting
import figure, output_notebook, show

# copied and pasted from your post: )
data = StringIO(""
      "
      Date Time Entry Exist 2013 - 01 - 07 05: 00: 00 29.0 12.0 2013 - 01 - 07 10: 00: 00 98.0 83.0 2013 - 01 - 07 15: 00: 00 404.0 131.0 2013 - 01 - 07 20: 00: 00 2340.0 229.0 2013 - 01 - 08 05: 00: 00 3443.0 629.0 2013 - 01 - 08 10: 00: 00 6713.0 1629.0 2013 - 01 - 08 15: 00: 00 9547.0 2965.0 2013 - 01 - 08 20: 00: 00 10440.0 4589.0 ""
      ")

      # read in the data, converting the separate date and times to a single date time.# see the link to do this "after the fact"
      if
      your data has separate date and time columns

      df = pd.read_csv(data,
         parse_dates = {
            "date_time": ['Date', 'Time']
         },
         delim_whitespace = True)

Now, make the data a time series, resample it, apply a function (mean in this case) and interpolate both data columns at the same time.

df_rs = df.set_index('date_time').resample('H').mean().interpolate('linear')
df_rs

FULL CODE

import pandas as pd
from io
import StringIO
from bokeh.plotting
import figure, output_notebook, show

output_notebook()

# copied and pasted from your post: )
data = StringIO(""
      "
      Date Time ENTRIES EXITS 2013 - 01 - 07 05: 00: 00 29.0 12.0 2013 - 01 - 07 10: 00: 00 98.0 83.0 2013 - 01 - 07 15: 00: 00 404.0 131.0 2013 - 01 - 07 20: 00: 00 2340.0 229.0 2013 - 01 - 08 05: 00: 00 3443.0 629.0 2013 - 01 - 08 10: 00: 00 6713.0 1629.0 2013 - 01 - 08 15: 00: 00 9547.0 2965.0 2013 - 01 - 08 20: 00: 00 10440.0 4589.0 ""
      ")

      # read in the data, converting the separate date and times to a single date time.# see the link to do this "after the fact"
      if
      your data as separate date and time columns original_data = pd.read_csv(data,
         parse_dates = {
            "DATETIME": ['Date', 'Time']
         },
         delim_whitespace = True)

      # make it a time series, resample to a higher freq, apply mean, interpolate and round inter_data = original_data.set_index(['DATETIME']).resample('H').mean().interpolate('linear').round(1)

      # No need to drop the index to select a slice.You can slice on the index # I see you are starting at 1 / 1(jan 1 st), yet your data starts at 1 / 7(Jan 7 th ? ) inter_data[inter_data.index >= '2013-01-01 00:00:00'].head(20)

Suggestion : 5

The newTimeStep input argument is a character vector or string that specifies a predefined time step. For example, when newTimeStep is 'daily', and method is 'mean', then TT2 contains the daily means of the data from TT1.,Interpolate Irregular Timetable Data at Hourly Times,Aggregate data into time bins (for example, to create a timetable containing quarterly means from monthly data).,Create timetable data that are approximately hourly, but with some irregularity in the times. Interpolate the data so that the output timetable has regular hourly row times.

Time = datetime({
   '2015-12-18 07:02:12';
   '2015-12-18 08:00:47';...
   '2015-12-18 09:01:37';
   '2015-12-18 10:03:10';...
   '2015-12-18 10:59:34'
});
Temp = [37.3;41.9;45.7;42.3;39.8];
Pressure = [30.1;29.9;30.03;29.9;29.8];
TT = timetable(Time, Temp, Pressure)
TT = 5× 2 timetable
Time Temp Pressure
____________________ ____ ________

18 - Dec - 2015 07: 02: 12 37.3 30.1
18 - Dec - 2015 08: 00: 47 41.9 29.9
18 - Dec - 2015 09: 01: 37 45.7 30.03
18 - Dec - 2015 10: 03: 10 42.3 29.9
18 - Dec - 2015 10: 59: 34 39.8 29.8
TT2 = retime(TT, 'hourly', 'spline')
TT2 = 5× 2 timetable
Time Temp Pressure
____________________ ______ ________

18 - Dec - 2015 07: 00: 00 37.228 30.124
18 - Dec - 2015 08: 00: 00 41.824 29.899
18 - Dec - 2015 09: 00: 00 45.694 30.029
18 - Dec - 2015 10: 00: 00 42.552 29.91
18 - Dec - 2015 11: 00: 00 39.808 29.8
Time = [minutes(0): minutes(15): minutes(105)]
';
Temp = [98;97.5;97.9;98.1;97.9;98;98.3;97.8];
Pulse = [80;75;73;68;69;65;72;71];
TT = timetable(Time, Temp, Pulse)
TT = 8× 2 timetable
Time Temp Pulse
_______ ____ _____

0 min 98 80
15 min 97.5 75
30 min 97.9 73
45 min 98.1 68
60 min 97.9 69
75 min 98 65
90 min 98.3 72
105 min 97.8 71

Suggestion : 6

Resample time series data from hourly to daily, monthly, or yearly using pandas.,Resample Time Series Data Using Pandas Dataframes,You can use the same syntax to resample the data one last time, this time from monthly to yearly using:,You can use the same syntax to resample the data again, this time from daily to monthly using:

# Import necessary packages
import os
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import earthpy as et

# Handle date time conversions between pandas and matplotlib
from pandas.plotting
import register_matplotlib_converters
register_matplotlib_converters()

# Use white grid plot background from seaborn
sns.set(font_scale = 1.5, style = "whitegrid")
# Download the data
data = et.data.get_data('colorado-flood')
# Set working directory
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))

# Define relative path to file with hourly precip
file_path = os.path.join("data", "colorado-flood",
   "precipitation",
   "805325-precip-daily-2003-2013.csv")
# Import data using datetime and no data value
precip_2003_2013_hourly = pd.read_csv(file_path,
   parse_dates = ['DATE'],
   index_col = ['DATE'],
   na_values = ['999.99'])

# View first few rows
precip_2003_2013_hourly.head()
# View dataframe info
precip_2003_2013_hourly.info()
<class 'pandas.core.frame.DataFrame'>
   DatetimeIndex: 1840 entries, 2003-01-01 01:00:00 to 2013-12-31 00:00:00
   Data columns (total 8 columns):
   # Column Non-Null Count Dtype
   --- ------ -------------- -----
   0 STATION 1840 non-null object
   1 STATION_NAME 1840 non-null object
   2 ELEVATION 1840 non-null float64
   3 LATITUDE 1840 non-null float64
   4 LONGITUDE 1840 non-null float64
   5 HPCP 1746 non-null float64
   6 Measurement Flag 1840 non-null object
   7 Quality Flag 1840 non-null object
   dtypes: float64(4), object(4)
   memory usage: 129.4+ KB

Suggestion : 7

Fixed frequency datasets are sometimes stored with time span information spread across multiple columns. For example, in this macroeconomic dataset, the year and quarter are in different columns:,pandas provides many built-in time series tools and algorithms. You can efficiently work with large time series, and slice and dice, aggregate, and resample irregular- and fixed-frequency time series. Some of these tools are useful for financial and economics applications, but you could certainly use them to analyze server log data, too.,Periods represent time spans, like days, months, quarters, or years. The pandas.Period class represents this data type, requiring a string or integer and a supported frequency from Table 11.4:,Before digging in, we can load up some time series data and resample it to business day frequency:

In[12]: import numpy as np

In[13]: import pandas as pd
In[14]: from datetime
import datetime

In[15]: now = datetime.now()

In[16]: now
Out[16]: datetime.datetime(2022, 7, 21, 20, 28, 9, 156796)

In[17]: now.year, now.month, now.day
Out[17]: (2022, 7, 21)
In[18]: delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)

In[19]: delta
Out[19]: datetime.timedelta(days = 926, seconds = 56700)

In[20]: delta.days
Out[20]: 926

In[21]: delta.seconds
Out[21]: 56700
In[22]: from datetime
import timedelta

In[23]: start = datetime(2011, 1, 7)

In[24]: start + timedelta(12)
Out[24]: datetime.datetime(2011, 1, 19, 0, 0)

In[25]: start - 2 * timedelta(12)
Out[25]: datetime.datetime(2010, 12, 14, 0, 0)
In[26]: stamp = datetime(2011, 1, 3)

In[27]: str(stamp)
Out[27]: '2011-01-03 00:00:00'

In[28]: stamp.strftime("%Y-%m-%d")
Out[28]: '2011-01-03'
In[29]: value = "2011-01-03"

In[30]: datetime.strptime(value, "%Y-%m-%d")
Out[30]: datetime.datetime(2011, 1, 3, 0, 0)

In[31]: datestrs = ["7/6/2011", "8/6/2011"]

In[32]: [datetime.strptime(x, "%m/%d/%Y") for x in datestrs]
Out[32]: [datetime.datetime(2011, 7, 6, 0, 0),
   datetime.datetime(2011, 8, 6, 0, 0)
]

Suggestion : 8

Lastly, pandas represents null date times, time deltas, and time spans as NaT which is useful for representing missing or null date like values and behaves similar as np.nan does for float data.,Resampling or converting a time series to a particular frequency,Since pandas represents timestamps in nanosecond resolution, the time span that can be represented using a 64-bit integer is limited to approximately 584 years:,kind can be set to ‘timestamp’ or ‘period’ to convert the resulting index to/from timestamp and time span representations. By default resample retains the input representation.

In[1]: import datetime

In[2]: dti = pd.to_datetime(
      ...: ["1/1/2018", np.datetime64("2018-01-01"), datetime.datetime(2018, 1, 1)]
      ...: )
   ...:

   In[3]: dti
Out[3]: DatetimeIndex(['2018-01-01', '2018-01-01', '2018-01-01'], dtype = 'datetime64[ns]', freq = None)
In[4]: dti = pd.date_range("2018-01-01", periods = 3, freq = "H")

In[5]: dti
Out[5]:
   DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00',
         '2018-01-01 02:00:00'
      ],
      dtype = 'datetime64[ns]', freq = 'H')
In[6]: dti = dti.tz_localize("UTC")

In[7]: dti
Out[7]:
   DatetimeIndex(['2018-01-01 00:00:00+00:00', '2018-01-01 01:00:00+00:00',
         '2018-01-01 02:00:00+00:00'
      ],
      dtype = 'datetime64[ns, UTC]', freq = 'H')

In[8]: dti.tz_convert("US/Pacific")
Out[8]:
   DatetimeIndex(['2017-12-31 16:00:00-08:00', '2017-12-31 17:00:00-08:00',
         '2017-12-31 18:00:00-08:00'
      ],
      dtype = 'datetime64[ns, US/Pacific]', freq = 'H')
In[9]: idx = pd.date_range("2018-01-01", periods = 5, freq = "H")

In[10]: ts = pd.Series(range(len(idx)), index = idx)

In[11]: ts
Out[11]:
   2018 - 01 - 01 00: 00: 00 0
2018 - 01 - 01 01: 00: 00 1
2018 - 01 - 01 02: 00: 00 2
2018 - 01 - 01 03: 00: 00 3
2018 - 01 - 01 04: 00: 00 4
Freq: H, dtype: int64

In[12]: ts.resample("2H").mean()
Out[12]:
   2018 - 01 - 01 00: 00: 00 0.5
2018 - 01 - 01 02: 00: 00 2.5
2018 - 01 - 01 04: 00: 00 4.0
Freq: 2 H, dtype: float64
In[13]: friday = pd.Timestamp("2018-01-05")

In[14]: friday.day_name()
Out[14]: 'Friday'

# Add 1 day
In[15]: saturday = friday + pd.Timedelta("1 day")

In[16]: saturday.day_name()
Out[16]: 'Saturday'

# Add 1 business day(Friday-- > Monday)
In[17]: monday = friday + pd.offsets.BDay()

In[18]: monday.day_name()
Out[18]: 'Monday'
In[19]: pd.Series(range(3), index = pd.date_range("2000", freq = "D", periods = 3))
Out[19]:
   2000 - 01 - 01 0
2000 - 01 - 02 1
2000 - 01 - 03 2
Freq: D, dtype: int64