My dataset contains the following typos
unique(d$gender)
[1] "k" "kobieta" "M" "K" "m─Ö┼╝czyzna" "21" "m" "M─Ö┼╝czyzna"
> unique(d$age)
[1] 19 NA 21 20 30 32 22 25 29
Actually, rows with 21 for gender and NA for age have been switched and moreover, different naming have been used for gender variable (indeed, all the 'k' heading name corresponds to female 'F' and the heading one with 'm' stand for male 'M'). I've written down this command lines to fix this for gender variable:
> d$gender = ifelse(d$gender == 'K', 'F',
+ ifelse(d$gender =='kobieta', 'F', ifelse(d$gender == 'k', 'F',
+ ifelse(d$gender == "m-Ö++czyzna", 'M',ifelse(d$gender == '21', 'M',
+ ifelse(d$gender == 'm', 'M', ifelse(d$gender == 'M-Ö++czyzna', 'M',
+ ifelse(d$gender == 'M', 'M', 'M'))))))))
>
> unique(d$gender)
[1] "F" "M"
But I don't know how to do he same for age variable, neither if this method could be the right way. Anyone has any suggestions?
Aucun commentaire:
Enregistrer un commentaire