I am a new R programmer and am trying to create a loop through a large amount of columns to weigh data by a certain metric.
I have a large data set of variables (some factors, some numerics). I want to loop through my columns, determine which one is a factor, and then if it is a factor I would like to use some tapply functions to do some weighting and return a mean. I have established a function that can do this one at a time here:
weight.by.mean <- function(metric,by,x,funct=sum()){
if(is.factor(x)){
a <- tapply(metric, x, funct)
b <- tapply(by, x, funct)
return (a/b)
}
}
I am passing in the metric that I want to weigh and the by argument is what
I am weighting the metric BY. x is simply a factor variable that I would
like to group by.
Example: I have 5 donut types (my argument x) and I would like to see the mean dough (my argument metric) used by donut type but I need to weigh the dough used by the amount (argument by) of dough used for that donut type.
In other words, I am trying to avoid skewing my means by not weighting different donut types more than others (maybe I use a lot of normal dough for glazed donuts but dont use as much special dough for cream filled donuts. I hope this makes sense!
This is the function I am working on to loop through. It is not yet functional because I am not sure what else to add. Thank you for any assistance you can provide for me. I have been using R for less than a month so please keep that in mind.
weight.matrix <- function(df,metric,by,funct=sum()){
n <- ncol(df) ##Number of columns to iterate through
ColNames <- as.matrix(names(df))
OutputMatrix <- matrix(1, ,3,nrow=, ncol=3)
for(i in 1:n){
if(is.factor(paste("df$",ColNames[i], sep=""))){
a[[i]] <- tapply(metric, df[,i], funct)
b[[i]] <- tapply(by, df[,i], funct)
}
OutputMatrix <- (a[[i]]/b[[i]])
}
}
Aucun commentaire:
Enregistrer un commentaire