I have a dataframe (df1) that contains Start times and End times for observations of different IDs:
df <- structure(list(ID = 1:4, Start = c("2021-05-12 13:22:00", "2021-05-12 13:25:00", "2021-05-12 13:30:00", "2021-05-12 13:42:00"),
End = c("2021-05-13 8:15:00", "2021-05-13 8:17:00", "2021-05-13 8:19:00", "2021-05-13 8:12:00")),
class = "data.frame", row.names = c(NA,
-4L))
I want to create a new dataframe that shows the latest Start time and the earliest End time for each possible pairwise comparison between the levels ofID.
I was able to accomplish this by making a duplicate column of ID called ID2, using dplyr::expand to expand them, and saving that in an object called Pairs:
library(dplyr)
df$ID2 <- df$ID
Pairs <-
df%>%
expand(ID, ID2)
Making two new objects a and b that store the Start and End times for each comparison separately, and then combining them into df2:
a <- left_join(df, Pairs, by = 'ID')%>%
rename(StartID1 = Start, EndID1 = End, ID2 = ID2.y)%>%
select(-ID2.x)
b <- left_join(Pairs, df, by = "ID2")%>%
rename(StartID2 = Start, EndID2 = End)%>%
select(ID2, StartID2, EndID2)
df2 <- cbind(a,b)
df2 <- df2[,-4]
and finally using dplyr::if_else to find the LatestStart time and the EarliestEnd time for each of the comparisons:
df2 <-
df2%>%
mutate(LatestStart = if_else(StartID1 > StartID2, StartID1, StartID2),
EarliestEnd = if_else(EndID1 > EndID2, EndID2, EndID1))
This seems like such a simple task to perform, is there a more concise way to achieve this from df1 without creating all of these extra objects?
Aucun commentaire:
Enregistrer un commentaire