Problem
I have a dataframe, df, with 82 columns, where after the fourth column, the remaining columns contain the same beginning string in triplicate. E.g. mass.mean, mass.stdev, mass.rsd, density.mean, density.stdev, density.rsd etc. I need to:
1) Match the triplicate columns with partial strings (e.g. mass or density) and
2) Replace those matched column's values with specific computations if conditions are met (e.g. if columns in df contain the string mass then replace NA's with the previous value (impute) zoo::na.locf or if columns in df contain the string density then replace NA's with zero.
To me it seems like I would need to incorporate grepl lapply and an ifelse ladder, but I can't seem to pull it together. If I can avoid converting from wide to long that would be best as my dataframe contains < 450k rows.
Example Dataframe
set.seed(123)
df <- data.frame("A" = sample(0:100,8),
"B" = sample(0:100,8),
"C" = sample(0:100,8),
"D" = sample(0:100,8),
"mass.mean" = c(1, NA, 2, 3, NA, NA, 2, 1),
"mass.stdev" = c(1, NA, 1, 1, NA, NA, 2, 1),
"mass.rsd" = c(0, NA, 0.1, 0.1, NA, NA, 0.2, 0.1),
"denisty.mean" = c(6, 5, 7, NA, NA, NA, 6, 4),
"denisty.stdev" = c(3, 1, 1, NA, NA, NA, 2, 1),
"denisty.rsd" = c(0.8,0.2, 2, NA, NA, NA, 0.5, 0.7),
stringsAsFactors = FALSE)
print(df)
A B C D mass.mean mass.stdev mass.rsd denisty.mean denisty.stdev denisty.rsd
1 29 55 24 66 1 1 0.0 6 3 0.8
2 78 45 4 70 NA NA NA 5 1 0.2
3 40 94 32 53 2 1 0.1 7 1 2.0
4 86 44 93 58 3 1 0.1 NA NA NA
5 91 65 86 28 NA NA NA NA NA NA
6 4 54 66 14 NA NA NA NA NA NA
7 50 9 60 91 2 2 0.2 6 2 0.5
8 83 84 97 84 1 1 0.1 4 1 0.7
Desired Output
A B C D mass.mean mass.stdev mass.rsd denisty.mean denisty.stdev denisty.rsd
1 29 55 24 66 1 1 0.0 6 3 0.8
2 78 45 4 70 1 1 0.0 5 1 0.2
3 40 94 32 53 2 1 0.1 7 1 2.0
4 86 44 93 58 3 1 0.1 0 0 0.0
5 91 65 86 28 3 1 0.1 0 0 0.0
6 4 54 66 14 3 1 0.1 0 0 0.0
7 50 9 60 91 2 2 0.2 6 2 0.5
8 83 84 97 84 1 1 0.1 4 1 0.7
Aucun commentaire:
Enregistrer un commentaire