mardi 5 octobre 2021

Creating new numeric variable in R based on specific strings from character variable

I have a character variable in a data frame, and would like to create a new numeric variable in that data frame based on the existence of specific strings in that character variable.

I've figured out one way to do this, but was wondering if there was a quicker, or more elegant approach than mine here.

Here is my solution, using ifelse and substr:

First, the values of each observation of my character variable is a random series of numbers and letters (from 0-9, and A-Z). I will create an example variable below:

library(stringi)
set.seed(100)
my.df<-data.frame("V1"=sprintf("%s%s%s", stri_rand_strings(10, 5, c('[A-B]','[0-2]')),
    stri_rand_strings(10, 4, c('[0-9]','[A-J]')), stri_rand_strings(10, 1, '[A-Z]')))

In my actual data, as I noted above, the first element of each observation can be any of the number 0-9, OR letters A-Z.

Now, what I've done is create a new variable [value] that takes on specific values (no pattern) depending upon what the first element of each observation in the V1 variable is. So, if the first element is the letter "A", the value of corresponding observation (row) in my new variable--V2--is 3, and if the first element of the V1 variable is "B" the value of the corresponding observation in my new variable V2 is 12, etc.

Here is how I have chosen to do this. It's unwieldy as my real data would need ifelse clauses for all numerals 0-9 and all letters A-Z, which would be about 35 lines or so. Here is my code:

my.df$value<-ifelse(substr(my.df$V1,1,1)=="A",3,
                    ifelse(substr(my.df$V1,1,1)=="B",12,
                           ifelse(substr(my.df$V1,1,1)=="0",44,
                                           ifelse(substr(my.df$V1,1,1)=="1",6,27))))

This yields:

        V1 value
'AABAA3122X'     1
'12110FCBCF'     6
'BBAAB5246J'    12
'20112JGEDL'    27
'BBBBA4426X'    12
'02210EDFJK'    44
'ABABB6687N'     1
'20120IAEFD'    27
'ABBBB9905A'     1
'12200HCDHU'     6

Is there a way to do this without using so many lines of code?

Thanks!

Aucun commentaire:

Enregistrer un commentaire