I have a data set that has 900 columns of numeric data and I need to convert the numeric columns to factors that have labels. Many labels will repeat. I am trying to write a function that will take the numeric columns, identify the type of label that the column needs, and then apply that label.
Here is an example data frame:
#create data frame with columns a,b,c,d
a<-c(1,2,3,4,5)
b<-c(0,1,0,1,0)
c<-c(1,0,1,0,1)
d<-c(2,3,4,5,3)
x<-as.data.frame(cbind(a,b,c,d))
I have a separate dataframe (i.e. y) that includes a key (i.e. column e) that identifies which factor labels should be applied to which of the columns (i.e. column f). Notice that b and c should have the same label.
e<-c(1,2,2,3)
f<-c(a,b,c,d)
y<-as.data.frame(cbind(e,f))
I would like to write a function that does the following, but automated. Here are the example labels that I would like to apply to a,b,c,d--where a and d are different, but b and c are the same.
x$a<-factor(x$a,
levels=c(1,2,3,4,5),
labels=c("Less than 25%",
"25-50%",
"51-75%",
"76-90%",
"More than 90%"))
x$b<-factor(x$b,
levels=c(0,1),
labels=c("Yes","No"))
x$c<-factor(x$c,
levels=c(0,1),
labels=c("Yes","No"))
x$d<-factor(x$c,
levels=c(1,2,3,4,5),
labels=c("l","m","n","o","p"))
With the final data set looking like:
>x
a b c d
1 Less than 25% Yes No m
2 25-50% No Yes n
3 51-75% Yes No o
4 76-90% No Yes p
5 More than 90% Yes No n
In the actual data set, there will be close to 60 labels.
Aucun commentaire:
Enregistrer un commentaire