I would like help to improve my code/knowledge of R. My code works, but I think it can be more efficient and any help is appreciated.
if I call on a dataframe within this list (for example):
list(`31457` = structure(list(by5min = structure(c(1L, 2L,3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("2018-08-06 23:20:00", "2018-08-06 23:25:00", "2018-08-06 23:30:00",
"2018-08-06 23:35:00", "2018-08-06 23:40:00", "2018-08-06 23:45:00",
"2018-08-06 23:50:00", "2018-08-06 23:55:00", "2018-08-07 00:00:00",
"2018-08-07 00:05:00", "2018-08-07 00:10:00"), class = "factor"),
HR = c(90.1966666666667, 94.99, 95.54, 91.2633333333333,
93.37, 92.3466666666667, 89.0933333333333, 90.92, 91.0533333333333,
96.7666666666667, 93.3533333333333),
WeekDay = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Mon", "Tue"), class = c("ordered","factor")), Hour = c(23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 0L, 0L, 0L),
YearDay = c(218, 218, 218, 218, 218, 218, 218, 218, 219, 219, 219),
Name = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "31457", class = "factor")), row.names = 245:255, class = "data.frame"))
I get: (keep in mind, this is a tiny subset, typically the data spans for 1000+ rows)
$`31457`
by5min HR WeekDay Hour YearDay Name
245 2018-08-06 23:20:00 90.19667 Mon 23 218 31457
246 2018-08-06 23:25:00 94.99000 Mon 23 218 31457
247 2018-08-06 23:30:00 95.54000 Mon 23 218 31457
248 2018-08-06 23:35:00 91.26333 Mon 23 218 31457
249 2018-08-06 23:40:00 93.37000 Mon 23 218 31457
250 2018-08-06 23:45:00 92.34667 Mon 23 218 31457
251 2018-08-06 23:50:00 89.09333 Mon 23 218 31457
252 2018-08-06 23:55:00 90.92000 Mon 23 218 31457
253 2018-08-07 00:00:00 91.05333 Tue 0 219 31457
254 2018-08-07 00:05:00 96.76667 Tue 0 219 31457
255 2018-08-07 00:10:00 93.35333 Tue 0 219 31457
I then convert this list into a data frame. I use left_join where I add information from a separate data frame about specific dates. This allows me to split my list into two groups (Pre vs post MRI).
for (y in 1:length(DFx)) {
DF_Joined_tmp = DFx[y] %>%
data.frame()
#skip iteration if DF is empty
if (is.na(DF_Joined_tmp)) {next}
if (ncol(DF_Joined_tmp) <= 1) {next}
#clean
colnames(DF_Joined_tmp)[5:6] = c("YearDay", "Name")
DF_Joined_tmp$Name = as.character(DF_Joined_tmp$Name)
#Pre
DF_Prex = left_join(DF_Joined_tmp, DF2, by = c("YearDay" = "Day_MRI", "Name" = "Whoop_ID")) # join by MRI date/ID
DF_Prex$MRI_DAY = lubridate::yday(DF_Prex$MRI_DATE)
DF_Prex = filter(DF_Prex, DF_Prex$YearDay <= mean(DF_Prex$MRI_DAY, na.rm = T)) #if less than/equal to MRI date, set as PRE
DF_Prex = DF_Prex[,-c(7:11)] #clean DF
#Post
DF_Postx = left_join(DF_Joined_tmp, DF2, by = c("YearDay" = "Day_MRI", "Name" = "Whoop_ID"))
DF_Postx$MRI_DAY = lubridate::yday(DF_Postx$MRI_DATE)
DF_Postx = filter(DF_Postx, DF_Postx$YearDay > mean(DF_Postx$MRI_DAY, na.rm = T)) # if greater than mri date, set as post
DF_Postx = DF_Postx[,-c(7:11)] #clean DF
After the Prex/Postx DFs are created I run them through an if else loop, where I think my code can be most improved:
# loop through processed data
if (nrow(DF_Prex) == 0 & nrow(DF_Postx) == 0) { # if there is no pre/post dates
DF_Post[[y]] = DF_Joined_tmp[,-c(7:11)]
} else {
if(length(unique(DF_Prex)) == 1) # if there is only one day of data, then:
DF_Pre[[y]] = DF_Prex
else {
DF_Prex = split(DF_Prex, DF_Prex$YearDay)
DF_Pre[[y]] = DF_Prex
}
if(length(unique(DF_Postx)) == 1) # if there is only one day of data, then:
DF_Post[[z]] = DF_Postx
else {
DF_Postx = split(DF_Postx, DF_Postx$YearDay)
DF_Post[[y]] = DF_Postx
}
}
}
remove(DF_Joined_tmp, DF_Prex, DF_Postx) # tidy workspace
I looked into using case_when, but I'm not sure how to apply it when I have 2 data frames.
- I need to split the data into lists by day of the year while nested under the same participant ID.
- So each list would hold nested lists of separate days.
- The output would be:
PRE group List of DF's
POST group List of DF's 
I am asking for help to simplify because after this I use similar loops to further separate the data and add information from other dataframes.
Aucun commentaire:
Enregistrer un commentaire