jeudi 29 mars 2018

How to look through column names in R and perform operations then store it in a list of unknown row size

I am a new R programmer and am trying to create a loop through a large amount of columns to weigh data by a certain metric.

I have a large data set of variables (some factors, some numerics). I want to loop through my columns, determine which one is a factor, and then if it is a factor I would like to use some tapply functions to do some weighting and return a mean. I have established a function that can do this one at a time here:

weight.by.mean <- function(metric,by,x,funct=sum()){

if(is.factor(x)){
a <- tapply(metric, x, funct)
b <- tapply(by, x, funct)
return (a/b)
} 
}

I am passing in the metric that I want to weigh and the by argument is what 
I am weighting the metric BY. x is simply a factor variable that I would 
like to group by.

Example: I have 5 donut types (my argument x) and I would like to see the mean dough (my argument metric) used by donut type but I need to weigh the dough used by the amount (argument by) of dough used for that donut type.

In other words, I am trying to avoid skewing my means by not weighting different donut types more than others (maybe I use a lot of normal dough for glazed donuts but dont use as much special dough for cream filled donuts. I hope this makes sense!

This is the function I am working on to loop through. It is not yet functional because I am not sure what else to add. Thank you for any assistance you can provide for me. I have been using R for less than a month so please keep that in mind.

weight.matrix <- function(df,metric,by,funct=sum()){


  n <- ncol(df) ##Number of columns to iterate through
  ColNames <- as.matrix(names(df))
  OutputMatrix <- matrix(1, ,3,nrow=, ncol=3)

 for(i in 1:n){


 if(is.factor(paste("df$",ColNames[i], sep=""))){
  a[[i]] <- tapply(metric, df[,i], funct)
  b[[i]] <- tapply(by, df[,i], funct)
}
OutputMatrix <- (a[[i]]/b[[i]])
}
}

Aucun commentaire:

Enregistrer un commentaire