You can use the data.table
package to do a rolling join:
trades[quotes, on = .(ticker, time), roll = -Inf, c("bid", "ask"): = .(bid, ask)]
output:
time ticker price quantity bid ask
1: 2016 - 05 - 25 13: 30: 00 MSFT 51.95 75 51.95 51.96
2: 2016 - 05 - 25 13: 30: 00 MSFT 51.95 155 51.97 51.98
3: 2016 - 05 - 25 13: 30: 00 GOOG 720.77 100 720.50 720.93
4: 2016 - 05 - 25 13: 30: 00 GOOG 720.92 100 720.50 720.93
5: 2016 - 05 - 25 13: 30: 00 AAPL 98.00 100 NA NA
data:
library(data.table) quotes < -fread("time ticker bid ask 2016 - 05 - 25_13: 30: 00.023 GOOG 720.50 720.93 2016 - 05 - 25_13: 30: 00.023 MSFT 51.95 51.96 2016 - 05 - 25_13: 30: 00.030 MSFT 51.97 51.98 2016 - 05 - 25_13: 30: 00.041 MSFT 51.99 52.00 2016 - 05 - 25_13: 30: 00.048 GOOG 720.50 720.93 2016 - 05 - 25_13: 30: 00.049 AAPL 97.99 98.01 2016 - 05 - 25_13: 30: 00.072 GOOG 720.50 720.88 2016 - 05 - 25_13: 30: 00.075 MSFT 52.01 52.03 ") trades < -fread("time ticker price quantity 2016 - 05 - 25_13: 30: 00.023 MSFT 51.95 75 2016 - 05 - 25_13: 30: 00.038 MSFT 51.95 155 2016 - 05 - 25_13: 30: 00.048 GOOG 720.77 100 2016 - 05 - 25_13: 30: 00.048 GOOG 720.92 100 2016 - 05 - 25_13: 30: 00.048 AAPL 98.00 100 ") quotes[, time: = as.POSIXct(time, format = "%Y-%m-%d_%H:%M:%OS")] trades[, time: = as.POSIXct(time, format = "%Y-%m-%d_%H:%M:%OS")]
Assuming you want to join on ticker and time difference less than .002
library(sqldf) sqldf("select t.*, q.bid, q.ask from trades t left join quotes q on t.ticker = q.ticker and abs(q.time - t.time) < .002 ")
giving:
time ticker price quantity bid ask
1 2016 - 05 - 25 13: 30: 00 MSFT 51.95 75 51.95 51.96
2 2016 - 05 - 25 13: 30: 00 MSFT 51.95 155 NA NA
3 2016 - 05 - 25 13: 30: 00 GOOG 720.77 100 720.50 720.93
4 2016 - 05 - 25 13: 30: 00 GOOG 720.92 100 720.50 720.93
5 2016 - 05 - 25 13: 30: 00 AAPL 98.00 100 97.99 98.01
or to join on ticker and minimum time difference:
sqldf("select t.*, q.bid, q.ask, min(abs(q.time - t.time)) from trades t left join quotes q on t.ticker = q.ticker group by t.rowid ")[1:6]
Note
Lines1 < -" time ticker bid ask 0 2016 - 05 - 25 T13: 30: 00.023 GOOG 720.50 720.93 1 2016 - 05 - 25 T13: 30: 00.023 MSFT 51.95 51.96 2 2016 - 05 - 25 T13: 30: 00.030 MSFT 51.97 51.98 3 2016 - 05 - 25 T13: 30: 00.041 MSFT 51.99 52.00 4 2016 - 05 - 25 T13: 30: 00.048 GOOG 720.50 720.93 5 2016 - 05 - 25 T13: 30: 00.049 AAPL 97.99 98.01 6 2016 - 05 - 25 T13: 30: 00.072 GOOG 720.50 720.88 7 2016 - 05 - 25 T13: 30: 00.075 MSFT 52.01 52.03 " quotes < -read.table(text = Lines1, as.is = TRUE) quotes < -transform(quotes, time = as.POSIXct(sub("T", " ", time))) Lines2 < -" time ticker price quantity 0 2016 - 05 - 25 T13: 30: 00.023 MSFT 51.95 75 1 2016 - 05 - 25 T13: 30: 00.038 MSFT 51.95 155 2 2016 - 05 - 25 T13: 30: 00.048 GOOG 720.77 100 3 2016 - 05 - 25 T13: 30: 00.048 GOOG 720.92 100 4 2016 - 05 - 25 T13: 30: 00.048 AAPL 98.00 100 " trades < -read.table(text = Lines2, as.is = TRUE) trades < -transform(trades, time = as.POSIXct(sub("T", " ", time)))
You can also use the data.table package to perform a non-equi join:
quotes[trades, on = .(ticker, time <= time), .(time = i.time, ticker, price, quantity, bid, ask), mult = 'last']
This gives more control and is easier to tweak upon other matching criteria. And the result is the same.
time ticker price quantity bid ask
1: 2016 - 05 - 25 13: 30: 00.023 MSFT 51.95 75 51.95 51.96
2: 2016 - 05 - 25 13: 30: 00.038 MSFT 51.95 155 51.97 51.98
3: 2016 - 05 - 25 13: 30: 00.048 GOOG 720.77 100 720.50 720.93
4: 2016 - 05 - 25 13: 30: 00.048 GOOG 720.92 100 720.50 720.93
5: 2016 - 05 - 25 13: 30: 00.048 AAPL 98.00 100 NA NA
You can use the merge function to join two data frames in R
merge(trades, quotes, by = "ticker", all = TRUE)
You can use the merge function to join two data frames in R,Is there a way in Pandas to merge two dataframes using columns that are modified without affecting the originals?,Dask - Is there a dask dataframe equivalent of pandas df.values.tolist()?,Is there a way to do a merge using pandas where one column is a list and another column might contain an element in that list?
You can use the data.table
package to do a rolling join:
trades[quotes, on = .(ticker, time), roll = -Inf, c("bid", "ask"): = .(bid, ask)]
output:
time ticker price quantity bid ask
1: 2016 - 05 - 25 13: 30: 00 MSFT 51.95 75 51.95 51.96
2: 2016 - 05 - 25 13: 30: 00 MSFT 51.95 155 51.97 51.98
3: 2016 - 05 - 25 13: 30: 00 GOOG 720.77 100 720.50 720.93
4: 2016 - 05 - 25 13: 30: 00 GOOG 720.92 100 720.50 720.93
5: 2016 - 05 - 25 13: 30: 00 AAPL 98.00 100 NA NA
data:
library(data.table) quotes < -fread("time ticker bid ask 2016 - 05 - 25_13: 30: 00.023 GOOG 720.50 720.93 2016 - 05 - 25_13: 30: 00.023 MSFT 51.95 51.96 2016 - 05 - 25_13: 30: 00.030 MSFT 51.97 51.98 2016 - 05 - 25_13: 30: 00.041 MSFT 51.99 52.00 2016 - 05 - 25_13: 30: 00.048 GOOG 720.50 720.93 2016 - 05 - 25_13: 30: 00.049 AAPL 97.99 98.01 2016 - 05 - 25_13: 30: 00.072 GOOG 720.50 720.88 2016 - 05 - 25_13: 30: 00.075 MSFT 52.01 52.03 ") trades < -fread("time ticker price quantity 2016 - 05 - 25_13: 30: 00.023 MSFT 51.95 75 2016 - 05 - 25_13: 30: 00.038 MSFT 51.95 155 2016 - 05 - 25_13: 30: 00.048 GOOG 720.77 100 2016 - 05 - 25_13: 30: 00.048 GOOG 720.92 100 2016 - 05 - 25_13: 30: 00.048 AAPL 98.00 100 ") quotes[, time: = as.POSIXct(time, format = "%Y-%m-%d_%H:%M:%OS")] trades[, time: = as.POSIXct(time, format = "%Y-%m-%d_%H:%M:%OS")]
You can use the merge function to join two data frames in R
merge(trades, quotes, by = "ticker", all = TRUE)
Assuming you want to join on ticker and time difference less than .002
library(sqldf) sqldf("select t.*, q.bid, q.ask from trades t left join quotes q on t.ticker = q.ticker and abs(q.time - t.time) < .002 ")
giving:
time ticker price quantity bid ask
1 2016 - 05 - 25 13: 30: 00 MSFT 51.95 75 51.95 51.96
2 2016 - 05 - 25 13: 30: 00 MSFT 51.95 155 NA NA
3 2016 - 05 - 25 13: 30: 00 GOOG 720.77 100 720.50 720.93
4 2016 - 05 - 25 13: 30: 00 GOOG 720.92 100 720.50 720.93
5 2016 - 05 - 25 13: 30: 00 AAPL 98.00 100 97.99 98.01
or to join on ticker and minimum time difference:
sqldf("select t.*, q.bid, q.ask, min(abs(q.time - t.time)) from trades t left join quotes q on t.ticker = q.ticker group by t.rowid ")[1:6]
Note
Lines1 < -" time ticker bid ask 0 2016 - 05 - 25 T13: 30: 00.023 GOOG 720.50 720.93 1 2016 - 05 - 25 T13: 30: 00.023 MSFT 51.95 51.96 2 2016 - 05 - 25 T13: 30: 00.030 MSFT 51.97 51.98 3 2016 - 05 - 25 T13: 30: 00.041 MSFT 51.99 52.00 4 2016 - 05 - 25 T13: 30: 00.048 GOOG 720.50 720.93 5 2016 - 05 - 25 T13: 30: 00.049 AAPL 97.99 98.01 6 2016 - 05 - 25 T13: 30: 00.072 GOOG 720.50 720.88 7 2016 - 05 - 25 T13: 30: 00.075 MSFT 52.01 52.03 " quotes < -read.table(text = Lines1, as.is = TRUE) quotes < -transform(quotes, time = as.POSIXct(sub("T", " ", time))) Lines2 < -" time ticker price quantity 0 2016 - 05 - 25 T13: 30: 00.023 MSFT 51.95 75 1 2016 - 05 - 25 T13: 30: 00.038 MSFT 51.95 155 2 2016 - 05 - 25 T13: 30: 00.048 GOOG 720.77 100 3 2016 - 05 - 25 T13: 30: 00.048 GOOG 720.92 100 4 2016 - 05 - 25 T13: 30: 00.048 AAPL 98.00 100 " trades < -read.table(text = Lines2, as.is = TRUE) trades < -transform(trades, time = as.POSIXct(sub("T", " ", time)))
You can also use the data.table package to perform a non-equi join:
quotes[trades, on = .(ticker, time <= time), .(time = i.time, ticker, price, quantity, bid, ask), mult = 'last']
This gives more control and is easier to tweak upon other matching criteria. And the result is the same.
time ticker price quantity bid ask
1: 2016 - 05 - 25 13: 30: 00.023 MSFT 51.95 75 51.95 51.96
2: 2016 - 05 - 25 13: 30: 00.038 MSFT 51.95 155 51.97 51.98
3: 2016 - 05 - 25 13: 30: 00.048 GOOG 720.77 100 720.50 720.93
4: 2016 - 05 - 25 13: 30: 00.048 GOOG 720.92 100 720.50 720.93
5: 2016 - 05 - 25 13: 30: 00.048 AAPL 98.00 100 NA NA
pandas.merge_asof merges two dataframes, anycodings_r does a left join except it matches on the anycodings_r nearest key rather than equal keys. ,In the above example, pd.merge_asof matches anycodings_r each row of trades with the row of quotes anycodings_r having the same ticker and the nearest time.,This gives more control and is easier to anycodings_pandas tweak upon other matching criteria. And anycodings_pandas the result is the same.,You can use the merge function to join anycodings_pandas two data frames in R
Example (stolen from the documentation):
>>> quotes
time ticker bid ask
0 2016 - 05 - 25 13: 30: 00.023 GOOG 720.50 720.93
1 2016 - 05 - 25 13: 30: 00.023 MSFT 51.95 51.96
2 2016 - 05 - 25 13: 30: 00.030 MSFT 51.97 51.98
3 2016 - 05 - 25 13: 30: 00.041 MSFT 51.99 52.00
4 2016 - 05 - 25 13: 30: 00.048 GOOG 720.50 720.93
5 2016 - 05 - 25 13: 30: 00.049 AAPL 97.99 98.01
6 2016 - 05 - 25 13: 30: 00.072 GOOG 720.50 720.88
7 2016 - 05 - 25 13: 30: 00.075 MSFT 52.01 52.03
>>>
trades
time ticker price quantity
0 2016 - 05 - 25 13: 30: 00.023 MSFT 51.95 75
1 2016 - 05 - 25 13: 30: 00.038 MSFT 51.95 155
2 2016 - 05 - 25 13: 30: 00.048 GOOG 720.77 100
3 2016 - 05 - 25 13: 30: 00.048 GOOG 720.92 100
4 2016 - 05 - 25 13: 30: 00.048 AAPL 98.00 100
>>>
pd.merge_asof(trades, quotes,
...on = 'time',
...by = 'ticker')
time ticker price quantity bid ask
0 2016 - 05 - 25 13: 30: 00.023 MSFT 51.95 75 51.95 51.96
1 2016 - 05 - 25 13: 30: 00.038 MSFT 51.95 155 51.97 51.98
2 2016 - 05 - 25 13: 30: 00.048 GOOG 720.77 100 720.50 720.93
3 2016 - 05 - 25 13: 30: 00.048 GOOG 720.92 100 720.50 720.93
4 2016 - 05 - 25 13: 30: 00.048 AAPL 98.00 100 NaN NaN
You can use the data.table package to do anycodings_pandas a rolling join:
trades[quotes, on = .(ticker, time), roll = -Inf, c("bid", "ask"): = .(bid, ask)]
output:
time ticker price quantity bid ask
1: 2016 - 05 - 25 13: 30: 00 MSFT 51.95 75 51.95 51.96
2: 2016 - 05 - 25 13: 30: 00 MSFT 51.95 155 51.97 51.98
3: 2016 - 05 - 25 13: 30: 00 GOOG 720.77 100 720.50 720.93
4: 2016 - 05 - 25 13: 30: 00 GOOG 720.92 100 720.50 720.93
5: 2016 - 05 - 25 13: 30: 00 AAPL 98.00 100 NA NA
data:
library(data.table) quotes < -fread("time ticker bid ask 2016 - 05 - 25_13: 30: 00.023 GOOG 720.50 720.93 2016 - 05 - 25_13: 30: 00.023 MSFT 51.95 51.96 2016 - 05 - 25_13: 30: 00.030 MSFT 51.97 51.98 2016 - 05 - 25_13: 30: 00.041 MSFT 51.99 52.00 2016 - 05 - 25_13: 30: 00.048 GOOG 720.50 720.93 2016 - 05 - 25_13: 30: 00.049 AAPL 97.99 98.01 2016 - 05 - 25_13: 30: 00.072 GOOG 720.50 720.88 2016 - 05 - 25_13: 30: 00.075 MSFT 52.01 52.03 ") trades < -fread("time ticker price quantity 2016 - 05 - 25_13: 30: 00.023 MSFT 51.95 75 2016 - 05 - 25_13: 30: 00.038 MSFT 51.95 155 2016 - 05 - 25_13: 30: 00.048 GOOG 720.77 100 2016 - 05 - 25_13: 30: 00.048 GOOG 720.92 100 2016 - 05 - 25_13: 30: 00.048 AAPL 98.00 100 ") quotes[, time: = as.POSIXct(time, format = "%Y-%m-%d_%H:%M:%OS")] trades[, time: = as.POSIXct(time, format = "%Y-%m-%d_%H:%M:%OS")]
Assuming you want to join on ticker and anycodings_pandas time difference less than .002
library(sqldf) sqldf("select t.*, q.bid, q.ask from trades t left join quotes q on t.ticker = q.ticker and abs(q.time - t.time) < .002 ")
giving:
time ticker price quantity bid ask
1 2016 - 05 - 25 13: 30: 00 MSFT 51.95 75 51.95 51.96
2 2016 - 05 - 25 13: 30: 00 MSFT 51.95 155 NA NA
3 2016 - 05 - 25 13: 30: 00 GOOG 720.77 100 720.50 720.93
4 2016 - 05 - 25 13: 30: 00 GOOG 720.92 100 720.50 720.93
5 2016 - 05 - 25 13: 30: 00 AAPL 98.00 100 97.99 98.01
or to join on ticker and minimum time anycodings_pandas difference:
sqldf("select t.*, q.bid, q.ask, min(abs(q.time - t.time)) from trades t left join quotes q on t.ticker = q.ticker group by t.rowid ")[1:6]
Note
Lines1 < -" time ticker bid ask 0 2016 - 05 - 25 T13: 30: 00.023 GOOG 720.50 720.93 1 2016 - 05 - 25 T13: 30: 00.023 MSFT 51.95 51.96 2 2016 - 05 - 25 T13: 30: 00.030 MSFT 51.97 51.98 3 2016 - 05 - 25 T13: 30: 00.041 MSFT 51.99 52.00 4 2016 - 05 - 25 T13: 30: 00.048 GOOG 720.50 720.93 5 2016 - 05 - 25 T13: 30: 00.049 AAPL 97.99 98.01 6 2016 - 05 - 25 T13: 30: 00.072 GOOG 720.50 720.88 7 2016 - 05 - 25 T13: 30: 00.075 MSFT 52.01 52.03 " quotes < -read.table(text = Lines1, as.is = TRUE) quotes < -transform(quotes, time = as.POSIXct(sub("T", " ", time))) Lines2 < -" time ticker price quantity 0 2016 - 05 - 25 T13: 30: 00.023 MSFT 51.95 75 1 2016 - 05 - 25 T13: 30: 00.038 MSFT 51.95 155 2 2016 - 05 - 25 T13: 30: 00.048 GOOG 720.77 100 3 2016 - 05 - 25 T13: 30: 00.048 GOOG 720.92 100 4 2016 - 05 - 25 T13: 30: 00.048 AAPL 98.00 100 " trades < -read.table(text = Lines2, as.is = TRUE) trades < -transform(trades, time = as.POSIXct(sub("T", " ", time)))
You can also use the data.table package anycodings_pandas to perform a non-equi join:
quotes[trades, on = .(ticker, time <= time), .(time = i.time, ticker, price, quantity, bid, ask), mult = 'last']
This gives more control and is easier to anycodings_pandas tweak upon other matching criteria. And anycodings_pandas the result is the same.
time ticker price quantity bid ask
1: 2016 - 05 - 25 13: 30: 00.023 MSFT 51.95 75 51.95 51.96
2: 2016 - 05 - 25 13: 30: 00.038 MSFT 51.95 155 51.97 51.98
3: 2016 - 05 - 25 13: 30: 00.048 GOOG 720.77 100 720.50 720.93
4: 2016 - 05 - 25 13: 30: 00.048 GOOG 720.92 100 720.50 720.93
5: 2016 - 05 - 25 13: 30: 00.048 AAPL 98.00 100 NA NA
You can use the merge function to join anycodings_pandas two data frames in R
merge(trades, quotes, by = "ticker", all = TRUE)
Perform an asof merge. This is similar to a left-join except that we match on nearest key rather than equal keys.,Field name to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore this must be a numeric column, such as datetimelike, integer, or float. On or left_on/right_on must be given.,A “forward” search selects the first row in the right DataFrame whose ‘on’ key is greater than or equal to the left’s key.,A “backward” search selects the last row in the right DataFrame whose ‘on’ key is less than or equal to the left’s key.
>>> left = pd.DataFrame({
'a': [1, 5, 10],
'left_val': ['a', 'b', 'c']
}) >>>
left
a left_val
0 1 a
1 5 b
2 10 c
>>> right = pd.DataFrame({
'a': [1, 2, 3, 6, 7],
...'right_val': [1, 2, 3, 6, 7]
}) >>>
right
a right_val
0 1 1
1 2 2
2 3 3
3 6 6
4 7 7
>>> pd.merge_asof(left, right, on = 'a')
a left_val right_val
0 1 a 1
1 5 b 3
2 10 c 7
>>> pd.merge_asof(left, right, on = 'a', allow_exact_matches = False)
a left_val right_val
0 1 a NaN
1 5 b 3.0
2 10 c 7.0
>>> pd.merge_asof(left, right, on = 'a', direction = 'forward')
a left_val right_val
0 1 a 1.0
1 5 b 6.0
2 10 c NaN
>>> pd.merge_asof(left, right, on = 'a', direction = 'nearest')
a left_val right_val
0 1 a 1
1 5 b 6
2 10 c 7