I have a requirement where I need to look at values in one data frame and check that with other data frame and create a new categorical column. This seems to be something simple as vlookup but I am not able to get how to do it in R.
My data is big and has thousands of rows and many columns. Below I have tried to create a sample data of similar format.
#####Generating sample data
library("plyr")
library("data.table")
library("rpart")
set.seed(1200)
id <- 1:1000
ibd <- sample(1:15,1000,replace = T)
bills <- sample(1:20,1000,replace = T)
nos <- sample(1:80,1000,replace = T)
stru <- sample(c("A","B","C","D"),1000,replace = T)
v1 <- sample(1:80,1000,replace = T)
v2 <- sample(1:80,1000,replace = T)
v3 <- sample(1:80,1000,replace = T)
v4 <- sample(1:80,1000,replace = T)
v5 <- sample(1:80,1000,replace = T)
v6 <- sample(1:80,1000,replace = T)
v7 <- sample(1:80,1000,replace = T)
v8 <- sample(1:80,1000,replace = T)
v9 <- sample(1:80,1000,replace = T)
v10 <- sample(1:80,1000,replace = T)
a1 <- sample(1:80,1000,replace = T)
b1 <- sample(1:80,1000,replace = T)
type <- sample(1:15,1000,replace = T)
value <- sample(100:1000,1000,replace = T)
df1 <- data.frame(id,ibd,bills,nos,stru,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10,a1,b1,type,value)
num_var <- c("bills","nos","v1","v2","v3")
v0 <- num_var
ibda <- sort(rep(1:15,4),decreasing = F)
billsa <- sample(5:15,60,replace = T)
nosa <- sample(15:60,60,replace = T)
v1a <- sample(10:70,60,replace = T)
v2a <- sample(20:70,60,replace = T)
v3a <- sample(20:70,60,replace = T)
df2 <- data.frame(ibda,billsa,nosa,v1a,v2a,v3a)
bills_ibd1 <- sort(df2[ibda == 1,"billsa"])
So if you see bills_ibd1 contains 05,10,13,15. I want to check these values in df1 for ibd==1 and create a categorical variable "bills_cat" in df1 which will have codes as below
if (ibd == 1 & bills_ibd1 <= 05) bills_cat = 1
if (ibd == 1 & bills_ibd1 > 05 & bills_ibd1 <= 10) bills_cat = 2
if (ibd == 1 & bills_ibd1 > 10 & bills_ibd1 <= 13) bills_cat = 3
if (ibd == 1 & bills_ibd1 > 13 & bills_ibd1 <= 15) bills_cat = 4
if (ibd == 1 & bills_ibd1 > 15 ) bills_cat = 5
Note - bills_ibd1 is getting generated from df2 and I would have such variable for each ibd and column variable.
But this way I will have to write many if statements and I observed variable bills_cat getting replaced.
Is there a simple and better way of achieving this? I need to check for variable in df1 at ibd level based on the values from df2. Please suggest
Aucun commentaire:
Enregistrer un commentaire