I have this dataset bank-full with a variable job summary(bank.full$job) admin. blue-collar entrepreneur housemaid management 5171 9732 1487 1240 9458 retired self-employed services student technician 2264 1579 4154 938 7597 unemployed unknown 1303 288 This is the percent cross tab of the variable with the target variable y no yes admin. 0.88 0.12 blue-collar 0.93 0.07 entrepreneur 0.92 0.08 housemaid 0.92 0.08 management 0.87 0.13 retired 0.83 0.17 self-employed 0.89 0.11 services 0.91 0.09 student 0.72 0.28 technician 0.90 0.10 unemployed 0.84 0.16 unknown 0.89 0.11 Now I wish to merge job categories whose cross tab values are similar I used this two approaches
bank.full$newjob<-ifelse(c(bank.full$job=='admin.',
+ bank.full$job=='self-employed',
+ bank.full$job=='unknown'),'CAT1',
+ ifelse(c(bank.full$job=='blue-collar',
+ bank.full$job=='entrepreneur'),'CAT2',
+ ifelse(c(bank.full$job=='housemaid',
+ bank.full$job=='services'),'CAT3',
+ ifelse(c(bank.full$job=='management',
+ bank.full$job=='unemployed',
+ bank.full$job=='technician'),'CAT4',
+ ifelse(bank.full$job=='student','student','retired')))))
Error in `$<-.data.frame`(`*tmp*`, newjob, value = c("CAT4", "retired", :
replacement has 135633 rows, data has 45211
Second Approach
bank.full$newjob<-ifelse(bank.full$job=='admin.','CAT1',
+ ifelse(bank.full$job=='self-employed','CAT1',
+ ifelse(bank.full$job=='unknown'),'CAT1',
+ ifelse(bank.full$job=='blue-collar','CAT2',
+ ifelse(bank.full$job=='entrepreneur','CAT2',
+ ifelse(bank.full$job=='housemaid','CAT3',
+ ifelse(bank.full$job=='services','CAT3',
+ ifelse(bank.full$job=='management','CAT4',
+ ifelse(bank.full$job=='unemployed','CAT4',
+ ifelse(bank.full$job=='technician','CAT4',"")))))))))
Error in ifelse(bank.full$job == "self-employed", "CAT1", ifelse(bank.full$job == :
unused arguments ("CAT1", ifelse(bank.full$job == "blue-collar", "CAT2", ifelse(bank.full$job ==
"entrepreneur", "CAT2", ifelse(bank.full$job == "housemaid", "CAT3", ifelse(bank.full$job == "services", "CAT3", ifelse(bank.full$job == "management", "CAT4", ifelse(bank.full$job == "unemployed", "CAT4",
ifelse(bank.full$job == "technician", "CAT4", ""))))))))
I was able to get an output till this level but when i inserted all the if conditions it's giving me a an error
bank.full$newjob<-ifelse(bank.full$job=='admin.','CAT1',
+ ifelse(bank.full$job=='self-employed','CAT1',
+ ifelse(bank.full$job=='unknown','CAT1',
+ ifelse(c(bank.full$job=='blue-collar',bank.full$job=='entrepreneur'),'CAT2',""))))
> bank.full$newjob<-as.factor(bank.full$newjob)
> summary(bank.full$newjob)
> summary(bank.full$newjob)
CAT1 CAT2
28441 7038 9732
Aucun commentaire:
Enregistrer un commentaire