can't index by timestamp in pandas dataframe

  • Last Update :
  • Techknowledgy :

Raw Data in Excel file

MCU
Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12
12 - Feb - 15 25.17 5.88 5.92 5.98 6.18 6.23 6.33
11 - Feb - 15 25.9 6.05 6.09 6.15 6.28 6.31 6.39
10 - Feb - 15 26.38 5.94 6.05 6.15 6.33 6.39 6.46

Code

xls = pd.ExcelFile('e:/Data.xlsx')
vols = xls.parse(asset.upper() + 'VOL', header = 1)
vols.set_index('Timestamp', inplace = True)

Data before set_index

      Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 25 P1 25 P2\
      0 2015 - 02 - 12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08
      1 2015 - 02 - 11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17
      2 2015 - 02 - 10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16

Output

>>> vols.index
<class 'pandas.tseries.index.DatetimeIndex'>
   [2015-02-12, ..., NaT]
   Length: 1478, Freq: None, Timezone: None

   >>> vols[date(2015,2,12)]
   *** KeyError: datetime.date(2015, 2, 12)

Suggestion : 2

Timestamp and Period can serve as an index. Lists of Timestamp and Period are automatically coerced to DatetimeIndex and PeriodIndex respectively.,If you have data that is outside of the Timestamp bounds, see Timestamp limitations, then you can use a PeriodIndex and/or Series of Periods to do computations.,Timestamped data can be converted to PeriodIndex-ed data using to_period and vice-versa using to_timestamp:,There are several time/date properties that one can access from Timestamp or a collection of timestamps like a DatetimeIndex.

In[1]: import datetime

In[2]: dti = pd.to_datetime(
      ...: ["1/1/2018", np.datetime64("2018-01-01"), datetime.datetime(2018, 1, 1)]
      ...: )
   ...:

   In[3]: dti
Out[3]: DatetimeIndex(['2018-01-01', '2018-01-01', '2018-01-01'], dtype = 'datetime64[ns]', freq = None)
In[4]: dti = pd.date_range("2018-01-01", periods = 3, freq = "H")

In[5]: dti
Out[5]:
   DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00',
         '2018-01-01 02:00:00'
      ],
      dtype = 'datetime64[ns]', freq = 'H')
In[6]: dti = dti.tz_localize("UTC")

In[7]: dti
Out[7]:
   DatetimeIndex(['2018-01-01 00:00:00+00:00', '2018-01-01 01:00:00+00:00',
         '2018-01-01 02:00:00+00:00'
      ],
      dtype = 'datetime64[ns, UTC]', freq = 'H')

In[8]: dti.tz_convert("US/Pacific")
Out[8]:
   DatetimeIndex(['2017-12-31 16:00:00-08:00', '2017-12-31 17:00:00-08:00',
         '2017-12-31 18:00:00-08:00'
      ],
      dtype = 'datetime64[ns, US/Pacific]', freq = 'H')
In[9]: idx = pd.date_range("2018-01-01", periods = 5, freq = "H")

In[10]: ts = pd.Series(range(len(idx)), index = idx)

In[11]: ts
Out[11]:
   2018 - 01 - 01 00: 00: 00 0
2018 - 01 - 01 01: 00: 00 1
2018 - 01 - 01 02: 00: 00 2
2018 - 01 - 01 03: 00: 00 3
2018 - 01 - 01 04: 00: 00 4
Freq: H, dtype: int64

In[12]: ts.resample("2H").mean()
Out[12]:
   2018 - 01 - 01 00: 00: 00 0.5
2018 - 01 - 01 02: 00: 00 2.5
2018 - 01 - 01 04: 00: 00 4.0
Freq: 2 H, dtype: float64
In[13]: friday = pd.Timestamp("2018-01-05")

In[14]: friday.day_name()
Out[14]: 'Friday'

# Add 1 day
In[15]: saturday = friday + pd.Timedelta("1 day")

In[16]: saturday.day_name()
Out[16]: 'Saturday'

# Add 1 business day(Friday-- > Monday)
In[17]: monday = friday + pd.offsets.BDay()

In[18]: monday.day_name()
Out[18]: 'Monday'
In[19]: pd.Series(range(3), index = pd.date_range("2000", freq = "D", periods = 3))
Out[19]:
   2000 - 01 - 01 0
2000 - 01 - 02 1
2000 - 01 - 03 2
Freq: D, dtype: int64

Suggestion : 3

I took an excel sheet which has dates and some values and want to convert them to pandas dataframe and select only rows which are between certain dates., Timestamp is the pandas equivalent of python’s Datetime and is interchangeable with it in most cases. It’s the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas. Value to be converted to Timestamp. , 1 week ago 6. 17. · This basic introduction to time series data manipulation with pandas should allow you to get started in your time series analysis. Specific objectives are to show you how to: create a date range. work with timestamp data. convert string data to a … , Timestamp is the pandas equivalent of python’s Datetime and is interchangeable with it in most cases. It’s the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas. Value to be converted to Timestamp. Offset which Timestamp will have. Time zone for time which Timestamp will have.


MCU Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 12 - Feb - 15 25.17 5.88 5.92 5.98 6.18 6.23 6.33 11 - Feb - 15 25.9 6.05 6.09 6.15 6.28 6.31 6.39 10 - Feb - 15 26.38 5.94 6.05 6.15 6.33 6.39 6.46
MCU Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 12 - Feb - 15 25.17 5.88 5.92 5.98 6.18 6.23 6.33 11 - Feb - 15 25.9 6.05 6.09 6.15 6.28 6.31 6.39 10 - Feb - 15 26.38 5.94 6.05 6.15 6.33 6.39 6.46
xls = pd.ExcelFile('e:/Data.xlsx') vols = xls.parse(asset.upper() + 'VOL', header = 1) vols.set_index('Timestamp', inplace = True)
Timestamp 50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 25 P1 25 P2\ 0 2015 - 02 - 12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08 1 2015 - 02 - 11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17 2 2015 - 02 - 10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16
  50 D 10 P1 10 P2 10 P3 10 P6 10 P9 10 P12 25 P1 25 P2 25 P3\ Timestamp2015 - 02 - 12 25.17 5.88 5.92 5.98 6.18 6.23 6.33 2.98 3.08 3.21 2015 - 02 - 11 25.90 6.05 6.09 6.15 6.28 6.31 6.39 3.12 3.17 3.32 2015 - 02 - 10 26.38 5.94 6.05 6.15 6.33 6.39 6.46 3.01 3.16 3.31
>>>vols.index <class 'pandas.tseries.index.DatetimeIndex'>[2015-02-12, ..., NaT] Length: 1478, Freq: None, Timezone: None >>>vols[date(2015,2,12)] *** KeyError: datetime.date(2015, 2, 12)

Suggestion : 4

When using datetime-like objects, you need to have exact matches for single indexing. It’s important to realize that when you make datetime or pd.Timestamp objects, all the fields you don’t specify explicitly will default to 0.,When using datetime-like objects for indexing, we need to match the resolution of the index.,Indexing a DatetimeIndex using a datetime-like object will use exact indexing.,Slicing with datetime-like objects also works. Note that the end item is inclusive, and the defaults for hours, minutes, seconds, and microseconds will set the cutoff for the randomized data on minute boundaries (in our case).

To show how this functionality works, let’s create some sample time series data with different time resolutions.

import pandas as pd
import numpy as np

import datetime

# this is an easy way to create a DatetimeIndex
# both dates are inclusive
d_range = pd.date_range("2021-01-01", "2021-01-20")

# this creates another DatetimeIndex, 10000 minutes long
m_range = pd.date_range("2021-01-01", periods = 10000, freq = "T")

# daily data in a Series
daily = pd.Series(np.random.rand(len(d_range)), index = d_range)
# minute data in a DataFrame
minute = pd.DataFrame(np.random.rand(len(m_range), 1),
   columns = ["value"],
   index = m_range)

# time boundaries not on the minute boundary, add some random jitter
mr_range = m_range + pd.Series([pd.Timedelta(microseconds = 1_000_000.0 * s)
   for s in np.random.rand(len(m_range))
])
# minute data in a DataFrame, but at a higher resolution
minute2 = pd.DataFrame(np.random.rand(len(mr_range), 1),
   columns = ["value"],
   index = mr_range)
import pandas as pd
import numpy as np

import datetime

# this is an easy way to create a DatetimeIndex
# both dates are inclusive
d_range = pd.date_range("2021-01-01", "2021-01-20")

# this creates another DatetimeIndex, 10000 minutes long
m_range = pd.date_range("2021-01-01", periods=10000, freq="T")

# daily data in a Series
daily = pd.Series(np.random.rand(len(d_range)), index=d_range)
# minute data in a DataFrame
minute = pd.DataFrame(np.random.rand(len(m_range), 1),
                      columns=["value"],
                      index=m_range)

# time boundaries not on the minute boundary, add some random jitter
mr_range = m_range + pd.Series([pd.Timedelta(microseconds=1_000_000.0 * s)
                                for s in np.random.rand(len(m_range))]) 
# minute data in a DataFrame, but at a higher resolution
minute2 = pd.DataFrame(np.random.rand(len(mr_range), 1),
                       columns=["value"],
                       index=mr_range)
daily.head()
daily.head()
2021 - 01 - 01 0.293300
2021 - 01 - 02 0.921466
2021 - 01 - 03 0.040813
2021 - 01 - 04 0.107230
2021 - 01 - 05 0.201100
Freq: D, dtype: float64
minute.head()
                        value
                        2021 - 01 - 01 00: 00: 00 0.124186
                        2021 - 01 - 01 00: 01: 00 0.542545
                        2021 - 01 - 01 00: 02: 00 0.557347
                        2021 - 01 - 01 00: 03: 00 0.834881
                        2021 - 01 - 01 00: 04: 00 0.732195
                        value
2021-01-01 00:00:00  0.124186
2021-01-01 00:01:00  0.542545
2021-01-01 00:02:00  0.557347
2021-01-01 00:03:00  0.834881
2021-01-01 00:04:00  0.732195
minute2.head()