I have a dataframe with a feature called place and it has many levels. My goal is to keep the top ten levels then replace all the others with "other"
topten.area <- names(sort(table(raw.train$place), decreasing = T)[1:10])
This returns a character vector of names of the top ten levels.
> topten.area
[1] "Glasgow" "Edinburgh" "Aberdeen"
[4] "Dundee" "Stirling" "Inverness"
[7] "Perth" "Aye" "Dingwall"
[10] "Avoch"
p.train <- raw.train %>%
mutate(place = ifelse(place %in% topten.area, place, "other"))
I had hoped to see feature "place" update where it's values are either one of the top ten or "other". Instead I get this:
> unique(p.train$place)
[1] "other" "66" "61" "49" "73" "135" "103" "95" "106" "88" "104"
Aucun commentaire:
Enregistrer un commentaire