lundi 18 mars 2019

If - Then with multiple characters and conditions

I hope someone can help me as my current approach with grepl does not lead to anything that works

I have several categories (stored as characters). I now want to build a variable that takes different values for different categories.

The data looks like the following

category                                 

Candidate Biography                        
Candidate Biography                         
Candidate Biography                         
Candidate Biography, Campaign Finance       
Justice, Candidate Biography, Economy       
Candidate Biography, Jobs                   
Economy, Education, Candidate Biography    
Economy, Civil Rights, Candidate Biography

Now I want to create new variables that can take different values according to the category like shown below

category                                 CandBio   Economy  CivilRights   Family
Candidate Biography                         1         0          0           0
Candidate Biography                         1         0          0           0
Candidate Biography                         1         0          0           0
Candidate Biography, Campaign Finance       0.5       0.5        0           0
Justice, Candidate Biography, Economy       0.33      0.33       0.33        0
Candidate Biography, Jobs                   0.5       0.5        0           0
Economy, Education, Candidate Biography     0.33      0.33       0           0.33
Economy, Civil Rights, Candidate Biography  0.33      0.33       0.33        0

Each category has a specific factor for each variable (and can load on different categories). E.g. "Candidate Biography, Campaign Finance" loads on CandBio and Economy 0.5 each. Categories re-occur for many observations within the dataset. (in total 49k obs with 120 different categories that need to be aggregated into 10 variables like CandBio, Economy, CivilRights, etc. in the example)

I first tried it combining ifelse and grepl, but I realized that grepl is very sensitive to order and that I can get fault categorizations for each category depending on how I structure my ifelse. Also I tried to get vactors with all category terms that share a similar number and to then include the vector in the grepl function but that didnt work either.

So I am looking for any solution that helps me to assign my weights to variable depending on the category text.

I hope I could clearly describe my problem and I am looking forward to any help, that is very much appreciated! Many thanks beforehand!

Aucun commentaire:

Enregistrer un commentaire