Raw Data in Excel file
MCU Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 12 - Feb - 15 25.17 5.88 5.92 5.98 6.18 6.23 6.33 11 - Feb - 15 25.9 6.05 6.09 6.15 6.28 6.31 6.39 10 - Feb - 15 26.38 5.94 6.05 6.15 6.33 6.39 6.46
Code
xls = pd.ExcelFile('e:/Data.xlsx')
vols = xls.parse(asset.upper() + 'VOL', header = 1)
vols.set_index('Timestamp', inplace = True)
Data before set_index
Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 25 P1 25 P2\ 0 2015 - 02 - 12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08 1 2015 - 02 - 11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17 2 2015 - 02 - 10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16
Output
>>> vols.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2015-02-12, ..., NaT]
Length: 1478, Freq: None, Timezone: None
>>> vols[date(2015,2,12)]
*** KeyError: datetime.date(2015, 2, 12)
Timestamp and Period can serve as an index. Lists of Timestamp and Period are automatically coerced to DatetimeIndex and PeriodIndex respectively.,If you have data that is outside of the Timestamp bounds, see Timestamp limitations, then you can use a PeriodIndex and/or Series of Periods to do computations.,Timestamped data can be converted to PeriodIndex-ed data using to_period and vice-versa using to_timestamp:,There are several time/date properties that one can access from Timestamp or a collection of timestamps like a DatetimeIndex.
In[1]: import datetime
In[2]: dti = pd.to_datetime(
...: ["1/1/2018", np.datetime64("2018-01-01"), datetime.datetime(2018, 1, 1)]
...: )
...:
In[3]: dti
Out[3]: DatetimeIndex(['2018-01-01', '2018-01-01', '2018-01-01'], dtype = 'datetime64[ns]', freq = None)
In[4]: dti = pd.date_range("2018-01-01", periods = 3, freq = "H")
In[5]: dti
Out[5]:
DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00',
'2018-01-01 02:00:00'
],
dtype = 'datetime64[ns]', freq = 'H')
In[6]: dti = dti.tz_localize("UTC")
In[7]: dti
Out[7]:
DatetimeIndex(['2018-01-01 00:00:00+00:00', '2018-01-01 01:00:00+00:00',
'2018-01-01 02:00:00+00:00'
],
dtype = 'datetime64[ns, UTC]', freq = 'H')
In[8]: dti.tz_convert("US/Pacific")
Out[8]:
DatetimeIndex(['2017-12-31 16:00:00-08:00', '2017-12-31 17:00:00-08:00',
'2017-12-31 18:00:00-08:00'
],
dtype = 'datetime64[ns, US/Pacific]', freq = 'H')
In[9]: idx = pd.date_range("2018-01-01", periods = 5, freq = "H")
In[10]: ts = pd.Series(range(len(idx)), index = idx)
In[11]: ts
Out[11]:
2018 - 01 - 01 00: 00: 00 0
2018 - 01 - 01 01: 00: 00 1
2018 - 01 - 01 02: 00: 00 2
2018 - 01 - 01 03: 00: 00 3
2018 - 01 - 01 04: 00: 00 4
Freq: H, dtype: int64
In[12]: ts.resample("2H").mean()
Out[12]:
2018 - 01 - 01 00: 00: 00 0.5
2018 - 01 - 01 02: 00: 00 2.5
2018 - 01 - 01 04: 00: 00 4.0
Freq: 2 H, dtype: float64
In[13]: friday = pd.Timestamp("2018-01-05")
In[14]: friday.day_name()
Out[14]: 'Friday'
# Add 1 day
In[15]: saturday = friday + pd.Timedelta("1 day")
In[16]: saturday.day_name()
Out[16]: 'Saturday'
# Add 1 business day(Friday-- > Monday)
In[17]: monday = friday + pd.offsets.BDay()
In[18]: monday.day_name()
Out[18]: 'Monday'
In[19]: pd.Series(range(3), index = pd.date_range("2000", freq = "D", periods = 3))
Out[19]:
2000 - 01 - 01 0
2000 - 01 - 02 1
2000 - 01 - 03 2
Freq: D, dtype: int64
I took an excel sheet which has dates and some values and want to convert them to pandas dataframe and select only rows which are between certain dates., Timestamp is the pandas equivalent of python’s Datetime and is interchangeable with it in most cases. It’s the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas. Value to be converted to Timestamp. , 1 week ago 6. 17. · This basic introduction to time series data manipulation with pandas should allow you to get started in your time series analysis. Specific objectives are to show you how to: create a date range. work with timestamp data. convert string data to a … , Timestamp is the pandas equivalent of python’s Datetime and is interchangeable with it in most cases. It’s the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas. Value to be converted to Timestamp. Offset which Timestamp will have. Time zone for time which Timestamp will have.
MCU Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 12 - Feb - 15 25.17 5.88 5.92 5.98 6.18 6.23 6.33 11 - Feb - 15 25.9 6.05 6.09 6.15 6.28 6.31 6.39 10 - Feb - 15 26.38 5.94 6.05 6.15 6.33 6.39 6.46
MCU Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 12 - Feb - 15 25.17 5.88 5.92 5.98 6.18 6.23 6.33 11 - Feb - 15 25.9 6.05 6.09 6.15 6.28 6.31 6.39 10 - Feb - 15 26.38 5.94 6.05 6.15 6.33 6.39 6.46
xls = pd.ExcelFile('e:/Data.xlsx') vols = xls.parse(asset.upper() + 'VOL', header = 1) vols.set_index('Timestamp', inplace = True)
Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 25 P1 25 P2\ 0 2015 - 02 - 12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08 1 2015 - 02 - 11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17 2 2015 - 02 - 10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16
50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 25 P1 25 P2 25 P3\ Timestamp2015 - 02 - 12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08 3.21 2015 - 02 - 11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17 3.32 2015 - 02 - 10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16 3.31
>>>vols.index <class 'pandas.tseries.index.DatetimeIndex'>[2015-02-12, ..., NaT] Length: 1478, Freq: None, Timezone: None >>>vols[date(2015,2,12)] *** KeyError: datetime.date(2015, 2, 12)
When using datetime-like objects, you need to have exact matches for single indexing. It’s important to realize that when you make datetime or pd.Timestamp objects, all the fields you don’t specify explicitly will default to 0.,When using datetime-like objects for indexing, we need to match the resolution of the index.,Indexing a DatetimeIndex using a datetime-like object will use exact indexing.,Slicing with datetime-like objects also works. Note that the end item is inclusive, and the defaults for hours, minutes, seconds, and microseconds will set the cutoff for the randomized data on minute boundaries (in our case).
To show how this functionality works, let’s create some sample time series data with different time resolutions.
import pandas as pd import numpy as np import datetime # this is an easy way to create a DatetimeIndex # both dates are inclusive d_range = pd.date_range("2021-01-01", "2021-01-20") # this creates another DatetimeIndex, 10000 minutes long m_range = pd.date_range("2021-01-01", periods = 10000, freq = "T") # daily data in a Series daily = pd.Series(np.random.rand(len(d_range)), index = d_range) # minute data in a DataFrame minute = pd.DataFrame(np.random.rand(len(m_range), 1), columns = ["value"], index = m_range) # time boundaries not on the minute boundary, add some random jitter mr_range = m_range + pd.Series([pd.Timedelta(microseconds = 1_000_000.0 * s) for s in np.random.rand(len(m_range)) ]) # minute data in a DataFrame, but at a higher resolution minute2 = pd.DataFrame(np.random.rand(len(mr_range), 1), columns = ["value"], index = mr_range)
import pandas as pd import numpy as np import datetime # this is an easy way to create a DatetimeIndex # both dates are inclusive d_range = pd.date_range("2021-01-01", "2021-01-20") # this creates another DatetimeIndex, 10000 minutes long m_range = pd.date_range("2021-01-01", periods=10000, freq="T") # daily data in a Series daily = pd.Series(np.random.rand(len(d_range)), index=d_range) # minute data in a DataFrame minute = pd.DataFrame(np.random.rand(len(m_range), 1), columns=["value"], index=m_range) # time boundaries not on the minute boundary, add some random jitter mr_range = m_range + pd.Series([pd.Timedelta(microseconds=1_000_000.0 * s) for s in np.random.rand(len(m_range))]) # minute data in a DataFrame, but at a higher resolution minute2 = pd.DataFrame(np.random.rand(len(mr_range), 1), columns=["value"], index=mr_range)
daily.head()
daily.head()
2021 - 01 - 01 0.293300 2021 - 01 - 02 0.921466 2021 - 01 - 03 0.040813 2021 - 01 - 04 0.107230 2021 - 01 - 05 0.201100 Freq: D, dtype: float64
minute.head()
value 2021 - 01 - 01 00: 00: 00 0.124186 2021 - 01 - 01 00: 01: 00 0.542545 2021 - 01 - 01 00: 02: 00 0.557347 2021 - 01 - 01 00: 03: 00 0.834881 2021 - 01 - 01 00: 04: 00 0.732195
value 2021-01-01 00:00:00 0.124186 2021-01-01 00:01:00 0.542545 2021-01-01 00:02:00 0.557347 2021-01-01 00:03:00 0.834881 2021-01-01 00:04:00 0.732195
minute2.head()