lundi 30 mars 2020

How to ignore characters in a statement in R?

so currently I have variable a, b, and c. I have a column 'v4' that is a binary variable based off of the 'v1' column. 1 (a,b, or c) 0 (not).

Example:

v1 v2 v3 v4
a  b  c  1
b  b  c  1
d  b  c  0

An issues I have with my data is that sometime they have years or other characters before the value. For example, I have instances of '2020 c'. This would be correct and I would want to capture this in column 'v4'. However, if these years come after it would be incorrect. Example, 'c 2020' would appear as a 0 in column 'v4'.

Example of how I want it to look:

v1     v2 v3 v4
a      b  c  1
b      b  c  1
d      b  c  0
c 2020 b  c  0
2020 c b  c  1
1990 c b  a  1

How could I made this work? Currently I am using

df1$v4 <- as.integer(grepl("(a|b|c)$", df1$v1))

this is good at capturing all instances, but I am not able to exclude the instances where the data is coming after the the variable I am trying to capture. Hopefully this makes sense.

Aucun commentaire:

Enregistrer un commentaire