I have a dataset of 21 variables and 182 observations. First I need to subset my data by the column "Substrate". Then, for each response variable in the dataset (14 out of the 21), I want to compute a gls()
model, extract the residuals and make a new data frame with the residuals of each response variable. I managed to do each of these steps seperately with for-loops and if-statements and it gives me the right output, but when I try to turn it into a function, it doesn't compute all the steps.
This is what my data looks like :
> head(data_soil)
Position Table Treatment Rep Genotype Origin Substrate TotalC TotalN TotalS pHH2O pHCaCl2 P K Ca Mg Mn Al Fe Na CEC
1 1 1 25-H 1 ABI-25 La Corne Mine NA Waste Rock 0.20885 0.00000 4.32 3.43 3.38 14.14522 0.02588793 17.60550 0.09964484 0.0002030385 0.4408331 2.953925 0.04492462 21.17092
2 2 1 MTN4-L 3 MTN4 Eastern A Waste Rock 3.04490 0.03996 4.79 3.56 3.44 17.59083 0.14421424 11.34370 0.34310013 0.0018679479 1.1641636 3.775913 0.14227578 16.91523
3 3 1 MIN3-H 3 MIN3 Central NA Waste Rock 0.12163 0.00000 4.46 3.30 3.23 14.33419 0.02193277 16.10658 0.09913785 0.0000000000 0.4265272 3.526153 0.04765741 20.22765
4 4 1 WOL7-H 3 WOL7 Western NA Waste Rock 0.38140 0.00000 3.85 3.29 3.22 16.94811 0.02768036 10.51024 0.11555323 0.0000000000 0.4307781 3.293853 0.03481761 14.41262
5 5 1 13-C 2 ABI-13 Westwood Control 29.24300 0.53669 0.17 4.13 3.65 142.29403 1.47436677 20.20521 6.68630402 0.2529050181 4.4632055 2.533394 4.26399368 39.87938
6 6 1 BOY3-H 1 BOY3 Western NA Waste Rock 0.14188 0.00000 4.38 3.22 3.18 15.39028 0.01738542 16.72458 0.10505116 0.0000000000 0.4080436 3.454034 0.03663752 20.74553
And this is the code I wrote :
get_residuals <- function(dataset, # name of the data.frame
substrate, # name of the substrate by which the data is to be subseted
colnum) { # first column in the data.frame containing response variable
# SUBSET
if (substrate == "Tailings") {
data_subset <- subset(dataset, Substrate == "Tailings")
} else if (substrate == "Control") {
data_subset <- subset(dataset, Substrate == "Control")
} else if (substrate == "NAWR") {
data_subset <- subset(dataset, Substrate == "NA Waste Rock")
} else if (substrate == "AWR") {
data_subset <- subset(dataset, Substrate == "A Waste Rock")
}
# CREATE A LIST OF DATA FRAMES FOR EACH VARIABLE
DF.ls <- list()
for (i in colnum:length(data_subset)){
DF.ls[[i]] <- data.frame(Genotype = data_subset$Genotype,
y = data_subset[[i]])
}
DF.ls <- DF.ls[-c(1:colnum-1)]
# COMPUTE MODEL
model.ls <- list()
for (i in 1:length(DF.ls)){
model.ls[[i]] <- gls(y ~ Genotype,
data = DF.ls[[i]],
na.action = "na.exclude")
}
# EXTRACT RESIDUALS
resid.ls <- list()
for (i in 1:length(model.ls)){
resid.ls[[i]] <- rbind(residuals(model.ls[[i]], type = "pearson"))
}
# MAKE DATA FRAME
data_frame = data.frame(t(as.data.frame(do.call(rbind, resid.ls))))
colnames(data_frame) <- rbind(colnames(data_subset[colnum:length(data_subset)]))
}
I tried running the code like this :
data_tailings_resid <- get_residuals(dataset = data_soil,
substrate = "Tailings",
colnum = 8)
The output seems to only be the last step of my code :
> data_tailings_resid
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] "TotalC" "TotalN" "TotalS" "pHH2O" "pHCaCl2" "P" "K" "Ca" "Mg" "Mn" "Al" "Fe" "Na" "CEC"
This is output I was expecting :
> head(data_frame)
TotalC TotalN TotalS pHH2O pHCaCl2 P K Ca Mg Mn Al Fe Na CEC
1 -1.1793447 -1.542095e-15 1.29958891 -1.5895147 -1.447781e+00 -5.398460e-16 1.6150879 -0.6963919 1.22550246 1.2664670 0.4755890 0.89620191 0.86626534 -0.1737194
2 -0.2379789 -1.927619e-16 0.13563549 0.3725425 5.161653e-01 -2.024422e-16 0.8591796 1.2267315 1.39357556 -0.6936503 0.5313932 -0.35961939 1.48506380 1.3485134
3 -0.3113096 -1.349333e-15 -0.42197707 -0.5712318 -5.287547e-01 -1.494043e+00 -1.1051493 -0.1830617 -0.28885422 0.3832047 -0.2589761 -1.58296163 0.16422915 -0.4488072
4 0.8955341 -1.349333e-15 0.43704768 -0.2731978 -2.643774e-01 2.988086e+00 1.9158605 0.3972578 1.26406058 -0.5093381 1.5471558 1.30145762 0.93252397 1.1286771
5 -0.6525396 -1.527525e+00 0.04973301 -0.0745085 -3.354490e-15 -6.003841e-01 -0.2686305 -0.1455130 -0.06653803 0.5009465 -0.4714063 0.69187950 0.02845169 -0.1988274
6 0.1082666 0.000000e+00 0.59127026 0.6209042 6.420593e-01 1.349615e-16 0.2639568 0.8615335 0.85692038 1.5660618 0.5566046 0.06234642 0.51111909 1.0147509
I tried removing the steps one by one and the first two steps (#SUBSET and #CREATE A LIST OF DATA FRAMES FOR EACH VARIABLE) seem to work, but I after that I get an empty objet (or the previous output).
This is my first time trying to write a function, so I don't know what is wrong since there is no error. Is there an easier way to do this? I am simply missing a link between the steps?
Aucun commentaire:
Enregistrer un commentaire