Query duplicated()
and change astype to int
df['isDup'] = (df['Start time'].duplicated(False) | df['End time'].duplicated(False)).astype(int)
Or did you need
df['isDup'] = (df['Start time'].between(df['Start time'].shift(), df['End time'].shift())).astype(int)
Map the time
like values in columns start_time
and end_time
to pandas TimeDelta
objects and subtract 1 seconds
from the 00:00:00
timedelta values in end_time
column.
c = ['start_time', 'end_time']
s, e = df[c].astype(str).apply(pd.to_timedelta).to_numpy().T
e[e == pd.Timedelta(0)] += pd.Timedelta(days = 1, seconds = -1)
Then for each pair of start_time
and end_time
in the dataframe df
mark the corresponding duplicate intervals using numpy broadcasting
:
m = (s[: , None] >= s) & (e[: , None] <= e)
np.fill_diagonal(m, False)
df['isDupe'] = (m.any(1) & ~df[c].duplicated(keep = False)).view('i1')
# example 1 start_time end_time isDupe 0 00: 12: 38 01: 00: 02 0 1 00: 55: 14 01: 00: 02 1 2 01: 00: 02 01: 32: 40 0 3 01: 00: 02 01: 08: 40 1 4 01: 41: 22 03: 56: 23 0 5 18: 58: 26 19: 16: 49 0 6 20: 12: 37 20: 52: 49 0 7 20: 55: 16 22: 02: 50 0 8 22: 21: 24 22: 48: 50 0 9 23: 11: 30 00: 00: 00 0 # example 2 start_time end_time isDupe 0 13: 32: 54 13: 32: 55 0 1 13: 36: 10 13: 50: 16 0 2 13: 37: 54 13: 38: 14 1 3 13: 46: 38 13: 46: 45 1 4 13: 48: 59 13: 49: 05 1 5 13: 50: 16 13: 50: 20 0 6 14: 03: 39 14: 03: 49 0 7 15: 36: 20 15: 36: 20 0 8 15: 46: 47 15: 46: 47 0
Here's my solution to the above question. However, if there are any efficient way, I would be happy to accept it. Thanks!
def getDuplicate(data):
data['check_time'] = data.iloc[-1]['start_time']
data['isDup'] = data.apply(lambda x: 1
if (x['start_time'] <= x['check_time']) & (x['check_time'] < x['end_time'])
else 0, axis = 1)
return data['isDup'].sum()
limit = 1
df_copy = df.copy()
df['isDup'] = 0
for i, row in df.iterrows():
data = df_copy.iloc[: limit]
isDup = getDuplicate(data)
limit = limit + 1
if isDup > 1:
df.at[i, 'isDup'] = 1
else:
df.at[i, 'isDup'] = 0
The LAG() function can be very useful for comparing the value of the current row with the value of the previous row.,The following statement uses the LAG() function to compare the sales of the current month with the previous month of each brand in the year 2018:,In other words, by using the LAG() function, from the current row, you can access data of the previous row, or the row before the previous row, and so on.,To compare the sales of the current month with the previous month of net sales by brand in 2018, you use the following query:
The following shows the syntax of the LAG()
function:
.wp - block - code {
border: 0;
padding: 0;
}
.wp - block - code > div {
overflow: auto;
}
.shcb - language {
border: 0;
clip: rect(1 px, 1 px, 1 px, 1 px); -
webkit - clip - path: inset(50 % );
clip - path: inset(50 % );
height: 1 px;
margin: -1 px;
overflow: hidden;
padding: 0;
position: absolute;
width: 1 px;
word - wrap: normal;
word - break: normal;
}
.hljs {
box - sizing: border - box;
}
.hljs.shcb - code - table {
display: table;
width: 100 % ;
}
.hljs.shcb - code - table > .shcb - loc {
color: inherit;
display: table - row;
width: 100 % ;
}
.hljs.shcb - code - table.shcb - loc > span {
display: table - cell;
}
.wp - block - code code.hljs: not(.shcb - wrap - lines) {
white - space: pre;
}
.wp - block - code code.hljs.shcb - wrap - lines {
white - space: pre - wrap;
}
.hljs.shcb - line - numbers {
border - spacing: 0;
counter - reset: line;
}
.hljs.shcb - line - numbers > .shcb - loc {
counter - increment: line;
}
.hljs.shcb - line - numbers.shcb - loc > span {
padding - left: 0.75 em;
}
.hljs.shcb - line - numbers.shcb - loc::before {
border - right: 1 px solid #ddd;
content: counter(line);
display: table - cell;
padding: 0 0.75 em;
text - align: right; -
webkit - user - select: none; -
moz - user - select: none; -
ms - user - select: none;
user - select: none;
white - space: nowrap;
width: 1 % ;
}
LAG(return_value, offset[,
default])
OVER(
[PARTITION BY partition_expression, ...] ORDER BY sort_expression[ASC | DESC], ...
)
Code language: SQL(Structured Query Language)(sql)
The following query shows the data from the sales.vw_netsales_brands
view:
SELECT
*
FROM
sales.vw_netsales_brands
ORDER BY
year,
month,
brand_name,
net_sales;
Code language: SQL(Structured Query Language)(sql)
This example uses the LAG()
function to return the net sales of the current month and the previous month in the year 2018:
WITH cte_netsales_2018 AS( SELECT month, SUM(net_sales) net_sales FROM sales.vw_netsales_brands WHERE year = 2018 GROUP BY month ) SELECT month, net_sales, LAG(net_sales, 1) OVER( ORDER BY month ) previous_month_sales FROM cte_netsales_2018; Code language: SQL(Structured Query Language)(sql)
To compare the sales of the current month with the previous month of net sales by brand in 2018, you use the following query:
WITH cte_sales AS(
SELECT month,
brand_name,
net_sales,
LAG(net_sales, 1) OVER(
PARTITION BY brand_name ORDER BY month
) previous_sales FROM sales.vw_netsales_brands WHERE year = 2018
)
SELECT
month,
brand_name,
net_sales,
previous_sales,
FORMAT(
(net_sales - previous_sales) / previous_sales,
'P'
) vs_previous_month
FROM
cte_sales;
Code language: SQL(Structured Query Language)(sql)
Fetch Previous Row Value With Lag Function,Fetch Previous Row Value Without Lag Function,In this article, you will learn how to fetch previous row value with lag function and without lag function.,Here we use the Lag() function to get data from previous rows based on an offset value. SQL Server 2012 onwards, it's a window function. We can access earlier rows by using lag function. It's a handy tool for comparing current and previous row values.
I have created a table with four columns: id, name, stack, and salary. I would like you to concentrate on id as part of this article. Is a column of sequential numbers such as 1,2,3 and so on. And I have to figure out what the preceding row value for the id column was for each row. I hope you understand the problem statement.
CREATE DATABASE emp_deatails;
CREATE TABLE details(
id int,
name varchar(50),
stack varchar(50),
salary int
);
INSERT INTO details
VALUES(1, 'ishika', 'Graphite GTC', 100);
INSERT INTO details
VALUES(2, 'deepak', 'CMS', 2000);
INSERT INTO details
VALUES(3, 'nitin', 'Unity', 40000);
INSERT INTO details
VALUES(4, 'abhishek', 'IOS', 60000);
INSERT INTO details
VALUES(5, 'vijay', '.NET', 80000);
INSERT INTO details
VALUES(6, 'Vijay kumari ', 'Hiring', 90000);
SELECT * FROM details;
SELECT
id,
LAG(id) OVER(ORDER BY id) AS previous_id,
name,
stack,
salary
FROM details;
You can use either min or max in this case and everything will stay the same. There's no need to be concerned with the order by clause or anything else, All you have to do now is some additional information here. We'll add rows between and then we'll add one preceding and 1 proceeding basically this is doing is pointing to the previous row. This is a reference to the preceding rule. Let's execute the query and see what comes back. So, as you can see we got the same record for 1 there is null in previous record. For 2 we get 1 for 3 we get 2 for 3 we get 2 and for 4 we get 3 so on
SELECT
id,
MIN(id) OVER(ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS previous_id,
name,
stack,
salary
FROM details;
07-06-2022 7:07 PM , 08-06-2022 1:55 PM , 08-06-2022 1:50 PM
SELECT
COUNT(ORDERS) AS TOTAL_ORDERS--counting total orders
COUNT(DISTINCT ORDERS) TOTAL_DIST_ORDERS, --counting distinct orders
LOCAL.TOTAL_ORDERS / LOCAL.TOTAL_DIST_ORDERS AS RATIO_ORDERS, --ratio of orders
AVG(LOCAL.RATIO_ORDERS) OVER BY(ORDER BY DATE ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS MOVING_AVERAGE, --moving average of ratio of orders
for the past 4 months
LOCAL.MOVING_AVERAGE - LAG(MOVING_AVERAGE, 1) OVER(ORDER BY LOCAL.DATE) AS day_difference--calculate the difference of today with the previous day
FROM ORDER_TABLE
Hello everyone, I found a solution:
SELECT
A.*,
MOVING_AVERAGE - LAG(MOVING_AVERAGE, 1) OVER(ORDER BY DATE) AS day_difference--calculate the difference of today with the previous day
from(
SELECT COUNT(ORDERS) AS TOTAL_ORDERS--counting total orders COUNT(DISTINCT ORDERS) TOTAL_DIST_ORDERS, --counting distinct orders LOCAL.TOTAL_ORDERS / LOCAL.TOTAL_DIST_ORDERS AS RATIO_ORDERS, --ratio of orders AVG(LOCAL.RATIO_ORDERS) OVER BY(ORDER BY DATE ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS MOVING_AVERAGE, --moving average of ratio of orders
for the past 4 months FROM ORDER_TABLE) AS A
I can´t quite make out how your order_table works, I´m assuming since you work with row offsets in your window every row
is for a month ? But then, how would we end up with differenct count and count_dist ?! well, not too important here.....
Maybe this could help you a bit further:
with order_table as(
select 4711 as orders, 10 as val, to_date('20100101', 'YYYYMMDD') as dat from dual union all select 4711, 12, to_date('20100101', 'YYYYMMDD') from dual union all select 4712, 15, to_date('20100101', 'YYYYMMDD')
), lv2 as(
select count(orders) as total_orders, count(distinct orders) as total_dist_orders, local.total_orders / local.total_dist_orders as ratio_orders, dat
--, avg(local.ratio_orders) over(order by dat rows between 4 preceding and current row) as moving_average
--, local.moving_average - lag(local.moving_average, 1) over(order by dat) as day_difference from order_table group by dat
)
select
a.*, avg(ratio_orders) over(order by dat rows between 4 preceding and current row) as moving_average
from lv2 a
;
Hello everyone, I found a solution:
SELECT
A.*,
MOVING_AVERAGE - LAG(MOVING_AVERAGE, 1) OVER(ORDER BY DATE) AS day_difference--calculate the difference of today with the previous day
from(
SELECT COUNT(ORDERS) AS TOTAL_ORDERS--counting total orders COUNT(DISTINCT ORDERS) TOTAL_DIST_ORDERS, --counting distinct orders LOCAL.TOTAL_ORDERS / LOCAL.TOTAL_DIST_ORDERS AS RATIO_ORDERS, --ratio of orders AVG(LOCAL.RATIO_ORDERS) OVER BY(ORDER BY DATE ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS MOVING_AVERAGE, --moving average of ratio of orders
for the past 4 months FROM ORDER_TABLE) AS A