I have a dataframe df, mentioned below.
a <- c(1:6)
b <- c("Audi,BMW,Skoda, Rackets,Toy,Football",
"Suzuki,Kawasaki,Ducati,Aprilia,Baseball, Rugby",
"Mazda, Ford, chevrolet,Mercedes,Gloves,Helmet",
"Lemon,Yamaha,Table,Kawasaki,Chair,Fruits",
"Ford, chevrolet,Bread,Ducati,Tesla,Hyundai",
"Honey,Apple,Alcohol,cake,Sweets, Mango")
df <- data.frame(a,b)
*
I also have two list containing brand name of cars and bikes.
cars <- c("Audi","BMW","Ford","Skoda","Mazda","chevrolet","Mercedes","Volkswagen","Tesla","Hyundai","Lamborghini","Mini-Cooper","Lexus")
motorbike <- c("Yamaha","Suzuki","Kawasaki","Harley-Davidson","Ducati","Aprilia","KTM", "Triumph","Piaggio","Hyosung","Vespa","MV-Agusta")
I used grepl with ifelse to match the words from the two list in df$b and assign a value to each rows if they have a match.
df$c<-ifelse(grepl(paste(cars, collapse="|"), df$b), "cars",
ifelse(grepl(paste(motorbike, collapse="|"),df$b), "bikes","others"))
Now, I want to put a condition that if 4 or more than 4 words are matching in each row, only then a value (car,bike) is assigned in df$c. I want my df to be like this:
structure(list(a = 1:6, b = structure(c(1L, 6L, 5L, 4L, 2L, 3L
), .Label = c("Audi,BMW,Skoda, Rackets,Toy,Football", "Ford, chevrolet,Bread,Ducati,Tesla,Hyundai",
"Honey,Apple,Alcohol,cake,Sweets, Mango", "Lemon,Yamaha,Table,Kawasaki,Chair,Fruits",
"Mazda, Ford, chevrolet,Mercedes,Gloves,Helmet", "Suzuki,Kawasaki,Ducati,Aprilia,Baseball, Rugby"
), class = "factor"), c = c("others", "bikes", "cars", "others",
"cars", "others")), row.names = c(NA, 6L), class = "data.frame")
Aucun commentaire:
Enregistrer un commentaire