vendredi 19 août 2016

Convert numeric columns to factors with different labels using key

I have a data set that has 900 columns of numeric data and I need to convert the numeric columns to factors that have labels. Many labels will repeat. I am trying to write a function that will take the numeric columns, identify the type of label that the column needs, and then apply that label.

Here is an example data frame:

    #create data frame with columns a,b,c,d
    a<-c(1,2,3,4,5)
    b<-c(0,1,0,1,0)
    c<-c(1,0,1,0,1)
    d<-c(2,3,4,5,3)

    x<-as.data.frame(cbind(a,b,c,d))

I have a separate dataframe (i.e. y) that includes a key (i.e. column e) that identifies which factor labels should be applied to which of the columns (i.e. column f). Notice that b and c should have the same label.

    e<-c(1,2,2,3)
    f<-c(a,b,c,d)

    y<-as.data.frame(cbind(e,f))

I would like to write a function that does the following, but automated. Here are the example labels that I would like to apply to a,b,c,d--where a and d are different, but b and c are the same.

    x$a<-factor(x$a,
        levels=c(1,2,3,4,5),
        labels=c("Less than 25%",
        "25-50%",
        "51-75%",
        "76-90%",
        "More than 90%"))

    x$b<-factor(x$b,
        levels=c(0,1),
        labels=c("Yes","No"))

    x$c<-factor(x$c,
        levels=c(0,1),
        labels=c("Yes","No"))

    x$d<-factor(x$c,
        levels=c(1,2,3,4,5),
        labels=c("l","m","n","o","p"))

With the final data set looking like:

    >x
    a   b   c d
    1 Less than 25% Yes  No m
    2        25-50%  No Yes n
    3        51-75% Yes  No o
    4        76-90%  No Yes p
    5 More than 90% Yes  No n

In the actual data set, there will be close to 60 labels.

Aucun commentaire:

Enregistrer un commentaire