dimanche 13 juin 2021

Conditional operations between factor level pairs

I have a dataframe (df1) that contains Start times and End times for observations of different IDs:

df <- structure(list(ID = 1:4, Start = c("2021-05-12 13:22:00", "2021-05-12 13:25:00", "2021-05-12 13:30:00", "2021-05-12 13:42:00"), 
End = c("2021-05-13 8:15:00", "2021-05-13 8:17:00", "2021-05-13 8:19:00", "2021-05-13 8:12:00")), 
class = "data.frame", row.names = c(NA, 
-4L))

I want to create a new dataframe that shows the latest Start time and the earliest End time for each possible pairwise comparison between the levels ofID.

I was able to accomplish this by making a duplicate column of ID called ID2, using dplyr::expand to expand them, and saving that in an object called Pairs:

library(dplyr)
df$ID2 <- df$ID
Pairs <- 
  df%>%
  expand(ID, ID2)

Making two new objects a and b that store the Start and End times for each comparison separately, and then combining them into df2:

a <- left_join(df, Pairs, by = 'ID')%>%
  rename(StartID1 = Start, EndID1 = End, ID2 = ID2.y)%>%
  select(-ID2.x)
b <- left_join(Pairs, df, by = "ID2")%>%
  rename(StartID2 = Start, EndID2 = End)%>%
  select(ID2, StartID2, EndID2)
df2 <- cbind(a,b)
df2 <- df2[,-4]

and finally using dplyr::if_else to find the LatestStart time and the EarliestEnd time for each of the comparisons:

df2 <- 
  df2%>%
  mutate(LatestStart = if_else(StartID1 > StartID2, StartID1, StartID2),
         EarliestEnd = if_else(EndID1 > EndID2, EndID2, EndID1))

This seems like such a simple task to perform, is there a more concise way to achieve this from df1 without creating all of these extra objects?

Aucun commentaire:

Enregistrer un commentaire