performance of pandas custom business day offset

  • Last Update :
  • Techknowledgy :

I think the way you are implementing it via lambda is slowing it down. Consider this method (taken more or less straight from the documentaion )

from pandas.tseries.offsets
import CustomBusinessDay
bday_us = CustomBusinessDay(calendar = USFederalHolidayCalendar())
mydate + bday_us

Out[13]: Timestamp('2014-12-26 00:00:00')

The first part is slow, but you only need to do it once. The second part is very fast though.

% timeit bday_us = CustomBusinessDay(calendar = USFederalHolidayCalendar())
10 loops, best of 3: 66.5 ms per loop

   %
   timeit mydate + bday_us
10000 loops, best of 3: 44 µs per loop

To get apples to apples, here are the other timings on my machine:

% timeit with_holiday = mydate + bday_offset(1)
10 loops, best of 3: 23.1 ms per loop

   %
   timeit without_holiday = mydate + pd.datetools.offsets.BDay(1)
10000 loops, best of 3: 36.6 µs per loop

Suggestion : 2

Example #1: Use pandas.tseries.offsets.BusinessDay () function to create an offset of 5 Business days. Now we will add the Business day offset to the given timestamp object to increment the datetime value. , 3 days ago Apr 22, 2019  · Output : As we can see in the output, we have successfully created an offset of 5 Business days and added it to the given timestamp. Example #2 : Use pandas.tseries.offsets.BusinessDay () function to create an offset of 10 Business days and 10 hours. import pandas as pd. ts = pd.Timestamp ('2019-10-10 07:15:11') , 5 days ago Apr 23, 2019  · For example, Bday(2) can be added to a date to move it two business days forward. If the date does not start on a valid date, first it is moved to a valid date and then offset is created. Pandas tseries.offsets.CustomBusinessDay() function is used to create your own custom business days. DateOffset subclass representing possibly n custom ... , Pandas tseries.offsets.CustomBusinessDay() function is used to create your own custom business days. DateOffset subclass representing possibly n custom business days, excluding holidays. Syntax: pandas.tseries.offsets.CustomBusinessDay() Parameter : n : int normalize : Normalize start/end dates to midnight before generating date range


import pandas as pd from pandas.tseries.holiday
import USFederalHolidayCalendar cal = USFederalHolidayCalendar() bday_offset = lambda n: pd.datetools.offsets.CustomBusinessDay(n, calendar = cal) mydate = pd.to_datetime("12/24/2014") % timeit with_holiday = mydate + bday_offset(1) % timeit without_holiday = mydate + pd.datetools.offsets.BDay(1)

from pandas.tseries.offsets
import CustomBusinessDay bday_us = CustomBusinessDay(calendar = USFederalHolidayCalendar()) mydate + bday_us Out[13]: Timestamp('2014-12-26 00:00:00')
import pandas as pd from pandas.tseries.holiday
import USFederalHolidayCalendar cal = USFederalHolidayCalendar() bday_offset = lambda n: pd.datetools.offsets.CustomBusinessDay(n, calendar = cal) mydate = pd.to_datetime("12/24/2014") % timeit with_holiday = mydate + bday_offset(1) % timeit without_holiday = mydate + pd.datetools.offsets.BDay(1)
from pandas.tseries.offsets
import CustomBusinessDay bday_us = CustomBusinessDay(calendar = USFederalHolidayCalendar()) mydate + bday_us Out[13]: Timestamp('2014-12-26 00:00:00')
% timeit bday_us = CustomBusinessDay(calendar = USFederalHolidayCalendar()) 10 loops, best of 3: 66.5 ms per loop % timeit mydate + bday_us 10000 loops, best of 3: 44 µs per loop
% timeit with_holiday = mydate + bday_offset(1) 10 loops, best of 3: 23.1 ms per loop % timeit without_holiday = mydate + pd.datetools.offsets.BDay(1) 10000 loops, best of 3: 36.6 µs per loop

Suggestion : 3

I think the way you are implementing it via lambda is slowing it down. Consider this method (taken more or less straight from the documentaion ),prevent the up and down keys from moving the caret/cursor in a textbox to the left and right in a c# windows form.,Using ddply across numerous variables when calculating descriptive statistics,Can you use the variable of an out parameter as the next argument to another function in the same expression as the call which sets the out parameter?

from pandas.tseries.offsets
import CustomBusinessDay
bday_us = CustomBusinessDay(calendar = USFederalHolidayCalendar())
mydate + bday_us

Out[13]: Timestamp('2014-12-26 00:00:00')
In[194]: from pandas.tseries.holiday
import USFederalHolidayCalendar

In[195]: bday_us = pd.offsets.CustomBusinessDay(calendar = USFederalHolidayCalendar())

# Friday before MLK Day
In[196]: dt = datetime.datetime(2014, 1, 17)

# Tuesday after MLK Day(Monday is skipped because it 's a holiday)
      In[197]: dt + bday_us Out[197]: Timestamp('2014-01-21 00:00:00')
In[198]: bmth_us = pd.offsets.CustomBusinessMonthBegin(calendar = USFederalHolidayCalendar())

# Skip new years
In[199]: dt = datetime.datetime(2013, 12, 17)

In[200]: dt + bmth_us
Out[200]: Timestamp('2014-01-02 00:00:00')

# Define date index with custom offset
In[201]: pd.date_range(start = "20100101", end = "20120101", freq = bmth_us)
Out[201]:
   DatetimeIndex(['2010-01-04', '2010-02-01', '2010-03-01', '2010-04-01',
         '2010-05-03', '2010-06-01', '2010-07-01', '2010-08-02',
         '2010-09-01', '2010-10-01', '2010-11-01', '2010-12-01',
         '2011-01-03', '2011-02-01', '2011-03-01', '2011-04-01',
         '2011-05-02', '2011-06-01', '2011-07-01', '2011-08-01',
         '2011-09-01', '2011-10-03', '2011-11-01', '2011-12-01'
      ],
      dtype = 'datetime64[ns]', freq = 'CBMS')
In[262]: pd.date_range(
      .....: start = "7/1/2012", end = "7/10/2012", freq = pd.offsets.CDay(calendar = cal)
      .....: ).to_pydatetime()
   .....:
   Out[262]:
   array([datetime.datetime(2012, 7, 2, 0, 0),
      datetime.datetime(2012, 7, 3, 0, 0),
      datetime.datetime(2012, 7, 5, 0, 0),
      datetime.datetime(2012, 7, 6, 0, 0),
      datetime.datetime(2012, 7, 9, 0, 0),
      datetime.datetime(2012, 7, 10, 0, 0)
   ], dtype = object)

In[263]: offset = pd.offsets.CustomBusinessDay(calendar = cal)

In[264]: datetime.datetime(2012, 5, 25) + offset
Out[264]: Timestamp('2012-05-29 00:00:00')

In[265]: datetime.datetime(2012, 7, 3) + offset
Out[265]: Timestamp('2012-07-05 00:00:00')

In[266]: datetime.datetime(2012, 7, 3) + 2 * offset
Out[266]: Timestamp('2012-07-06 00:00:00')

In[267]: datetime.datetime(2012, 7, 6) + offset
Out[267]: Timestamp('2012-07-09 00:00:00')
In[187]: weekmask_egypt = "Sun Mon Tue Wed Thu"

# They also observe International Workers ' Day so let'
s
# add that
for a couple of years
In[188]: holidays = [
      .....: "2012-05-01",
      .....: datetime.datetime(2013, 5, 1),
      .....: np.datetime64("2014-05-01"),
      .....:
   ]
   .....:

   In[189]: bday_egypt = pd.offsets.CustomBusinessDay(
      .....: holidays = holidays,
      .....: weekmask = weekmask_egypt,
      .....: )
   .....:

   In[190]: dt = datetime.datetime(2013, 4, 30)

In[191]: dt + 2 * bday_egypt
Out[191]: Timestamp('2013-05-05 00:00:00')
In[1]: from pandas.tseries.offsets
import CustomBusinessHour

In[2]: from pandas.tseries.holiday
import USFederalHolidayCalendar

In[3]: bhour_us = CustomBusinessHour(calendar = USFederalHolidayCalendar())

Suggestion : 4

The key features of a DateOffset object are:,3.2 From dict of DataFrame objects,1.3 SparseIndex objects,2.19.2 Set operations on Index objects

In[1]: d = datetime(2008, 8, 18, 9, 0)

In[2]: d + relativedelta(months = 4, days = 5)
Out[2]: datetime.datetime(2008, 12, 23, 9, 0)
In[3]: from pandas.tseries.offsets
import *

In[4]: d + DateOffset(months = 4, days = 5)
Out[4]: Timestamp('2008-12-23 09:00:00')
class BDay(DateOffset):
   ""
"DateOffset increments between business days"
""
def apply(self, other):
   ...
In[5]: d - 5 * BDay()
Out[5]: Timestamp('2008-08-11 09:00:00')

In[6]: d + BMonthEnd()
Out[6]: Timestamp('2008-08-29 09:00:00')
In[7]: d
Out[7]: datetime.datetime(2008, 8, 18, 9, 0)

In[8]: offset = BMonthEnd()

In[9]: offset.rollforward(d)
Out[9]: Timestamp('2008-08-29 09:00:00')

In[10]: offset.rollback(d)
Out[10]: Timestamp('2008-07-31 09:00:00')
In[11]: day = Day()

In[12]: day.apply(pd.Timestamp('2014-01-01 09:00'))
Out[12]: Timestamp('2014-01-02 09:00:00')

In[13]: day = Day(normalize = True)

In[14]: day.apply(pd.Timestamp('2014-01-01 09:00'))
Out[14]: Timestamp('2014-01-02 00:00:00')

In[15]: hour = Hour()

In[16]: hour.apply(pd.Timestamp('2014-01-01 22:00'))
Out[16]: Timestamp('2014-01-01 23:00:00')

In[17]: hour = Hour(normalize = True)

In[18]: hour.apply(pd.Timestamp('2014-01-01 22:00'))
Out[18]: Timestamp('2014-01-01 00:00:00')

In[19]: hour.apply(pd.Timestamp('2014-01-01 23:00'))
Out[19]: Timestamp('2014-01-02 00:00:00')

Suggestion : 5

Python Pandas add date offset column with custom business day (CDay; CustomBusinessDay),Add business days to pandas dataframe with dates and skip over holidays python,Get column with cumulative business day for each month from pandas datetimeindex,Python pandas - join date & time columns into datetime column with timezone

It looks like you're only applying cdays to the offset and not the ship_date+offset.

(df['ship_date_et'] + df['offset']).apply(cdays)

@dlstadther, while your answer did give me date offsets, it was not seeing the weekends or holidays as zeros. I think this has to do with my implementation of df['offset'] as type timedelta. This resulted in:

>> (df['ship_date_et'] + df['offset']).apply(cdays)
0 2018 - 10 - 02
1 2018 - 10 - 03
2 2018 - 10 - 04
3 2018 - 10 - 05
4 2018 - 10 - 08
5 2018 - 10 - 08
6 2018 - 10 - 08
7 2018 - 10 - 09
8 2018 - 10 - 11
9 2018 - 10 - 11
dtype: datetime64[ns]

With a little bit of sleep and finagling:

>> df['new'] = df['ship_date_et'] + df['offset'].dt.days * cdays

is what I was looking for.

>> df
offset ship_date_et new
0 0 days 2018 - 10 - 01 2018 - 10 - 01
1 1 days 2018 - 10 - 01 2018 - 10 - 02
2 2 days 2018 - 10 - 01 2018 - 10 - 03
3 3 days 2018 - 10 - 01 2018 - 10 - 04
4 4 days 2018 - 10 - 01 2018 - 10 - 05
5 5 days 2018 - 10 - 01 2018 - 10 - 08
6 6 days 2018 - 10 - 01 2018 - 10 - 09
7 7 days 2018 - 10 - 01 2018 - 10 - 11
8 8 days 2018 - 10 - 01 2018 - 10 - 12
9 9 days 2018 - 10 - 01 2018 - 10 - 15

Suggestion : 6

The function busday_offset allows you to apply offsets specified in business days to datetimes with a unit of ‘D’ (day).,To allow the datetime to be used in contexts where only certain days of the week are valid, NumPy includes a set of “busday” (business day) functions.,When performance is important for manipulating many business dates with one particular choice of weekmask and holidays, there is an object busdaycalendar which stores the data necessary in an optimized form.,To find how many valid days there are in a specified range of datetime64 dates, use busday_count:

>>> np.datetime64('2005-02-25')
numpy.datetime64('2005-02-25')
>>> np.datetime64(1, 'Y')
numpy.datetime64('1971')
>>> np.datetime64('2005-02')
numpy.datetime64('2005-02')
>>> np.datetime64('2005-02', 'D')
numpy.datetime64('2005-02-01')
>>> np.datetime64('2005-02-25T03:30')
numpy.datetime64('2005-02-25T03:30')
>>> np.datetime64('nat')
numpy.datetime64('NaT')