if-statement: How to simplify if-statement with multiple data frames/conditions in a list, in R?

lundi 11 novembre 2019

How to simplify if-statement with multiple data frames/conditions in a list, in R?

I would like help to improve my code/knowledge of R. My code works, but I think it can be more efficient and any help is appreciated.

I have a list (DFx) of nested lists, of data frames, so:
- DFx1 = list of length 1
  - DFx[1] = dataframe

if I call on a dataframe within this list (for example):

list(`31457` = structure(list(by5min = structure(c(1L, 2L,3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("2018-08-06 23:20:00", "2018-08-06 23:25:00", "2018-08-06 23:30:00", 
                                                                                                                     "2018-08-06 23:35:00", "2018-08-06 23:40:00", "2018-08-06 23:45:00", 
                                                                                                                     "2018-08-06 23:50:00", "2018-08-06 23:55:00", "2018-08-07 00:00:00", 
                                                                                                                     "2018-08-07 00:05:00", "2018-08-07 00:10:00"), class = "factor"), 
                              HR = c(90.1966666666667, 94.99, 95.54, 91.2633333333333, 
                                     93.37, 92.3466666666667, 89.0933333333333, 90.92, 91.0533333333333, 
                                     96.7666666666667, 93.3533333333333), 
                              WeekDay = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Mon", "Tue"), class = c("ordered","factor")), Hour = c(23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 0L, 0L, 0L), 
                              YearDay = c(218, 218, 218, 218, 218, 218, 218, 218, 219, 219, 219),
                              Name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "31457", class = "factor")), row.names = 245:255, class = "data.frame"))

I get: (keep in mind, this is a tiny subset, typically the data spans for 1000+ rows)

$`31457`
                 by5min       HR WeekDay Hour YearDay  Name
245 2018-08-06 23:20:00 90.19667     Mon   23     218 31457
246 2018-08-06 23:25:00 94.99000     Mon   23     218 31457
247 2018-08-06 23:30:00 95.54000     Mon   23     218 31457
248 2018-08-06 23:35:00 91.26333     Mon   23     218 31457
249 2018-08-06 23:40:00 93.37000     Mon   23     218 31457
250 2018-08-06 23:45:00 92.34667     Mon   23     218 31457
251 2018-08-06 23:50:00 89.09333     Mon   23     218 31457
252 2018-08-06 23:55:00 90.92000     Mon   23     218 31457
253 2018-08-07 00:00:00 91.05333     Tue    0     219 31457
254 2018-08-07 00:05:00 96.76667     Tue    0     219 31457
255 2018-08-07 00:10:00 93.35333     Tue    0     219 31457

I then convert this list into a data frame. I use left_join where I add information from a separate data frame about specific dates. This allows me to split my list into two groups (Pre vs post MRI).

for (y in 1:length(DFx)) {
DF_Joined_tmp = DFx[y] %>% 
  data.frame()

#skip iteration if DF is empty
if (is.na(DF_Joined_tmp)) {next}
if (ncol(DF_Joined_tmp) <= 1) {next}

#clean
colnames(DF_Joined_tmp)[5:6] = c("YearDay", "Name") 
DF_Joined_tmp$Name = as.character(DF_Joined_tmp$Name)

#Pre
DF_Prex = left_join(DF_Joined_tmp, DF2, by = c("YearDay" = "Day_MRI", "Name" = "Whoop_ID")) # join by MRI date/ID
DF_Prex$MRI_DAY = lubridate::yday(DF_Prex$MRI_DATE)
DF_Prex = filter(DF_Prex, DF_Prex$YearDay <= mean(DF_Prex$MRI_DAY, na.rm = T)) #if less than/equal to MRI date, set as PRE
DF_Prex = DF_Prex[,-c(7:11)] #clean DF

#Post
DF_Postx = left_join(DF_Joined_tmp, DF2, by = c("YearDay" = "Day_MRI", "Name" = "Whoop_ID"))
DF_Postx$MRI_DAY = lubridate::yday(DF_Postx$MRI_DATE)
DF_Postx = filter(DF_Postx, DF_Postx$YearDay > mean(DF_Postx$MRI_DAY, na.rm = T)) # if greater than mri date, set as post
DF_Postx = DF_Postx[,-c(7:11)] #clean DF

After the Prex/Postx DFs are created I run them through an if else loop, where I think my code can be most improved:

# loop through processed data
  if (nrow(DF_Prex) == 0 & nrow(DF_Postx) == 0) {  # if there is no pre/post dates
    DF_Post[[y]] = DF_Joined_tmp[,-c(7:11)]

  } else {

       if(length(unique(DF_Prex)) == 1) # if there is only one day of data, then:
           DF_Pre[[y]] = DF_Prex

       else {
        DF_Prex = split(DF_Prex, DF_Prex$YearDay)
        DF_Pre[[y]] = DF_Prex
      }

      if(length(unique(DF_Postx)) == 1) # if there is only one day of data, then:
          DF_Post[[z]] = DF_Postx 

      else {
        DF_Postx = split(DF_Postx, DF_Postx$YearDay)
        DF_Post[[y]] = DF_Postx

        }    

  }
}

remove(DF_Joined_tmp, DF_Prex, DF_Postx) # tidy workspace

I looked into using case_when, but I'm not sure how to apply it when I have 2 data frames.

I need to split the data into lists by day of the year while nested under the same participant ID.
So each list would hold nested lists of separate days.
The output would be:

PRE group List of DF's POST group List of DF's