jeudi 8 juillet 2021

Create a column with multiple categories based on string patterns in Tidyverse

I checked this question, but not sure how to make it for multiple categories (not just two). This is conceptually similar, but not sure it is a good option for strings.

I have a dataframe

Gender          Frequency 
female            49719         
male              14835         
NA                712           
female, male      518   

Moreover, there are much more options like female, female, female or male, male, female. I have dozens of combinations.

I would like to have a new column where I will have only four categories - male, female, both, NA. For instance, if one fender female is repeated multiple times, classify it as female. If it is a combination of different genders (any length) - call it both.

Desired output:

Gender          Frequency      Category
female            49719          female 
male              14835          male
NA                712            NA
female, male      518            both
male, male, male  100            male
male, male, female 100           both

I would appreciate a tidyverse solution.

Aucun commentaire:

Enregistrer un commentaire