mercredi 12 mai 2021

R: Substitute multiple unwanted variables with NAs across multiple columns

Having a tough time with some data wrangling. I am trying to keep only certain author institutional affiliations in a bibliographic data frame and replace the many unwanted affiliations with NAs. I need to do this across multiple columns since papers have multiple authors (up to 68).

reproducible code:

so <- data.frame(inst_1=c("FC1","FC2","Uni1","lab3","lab2"),
                 inst_2=c("FC1","FC5","college4","laboratory1","lab2"),
                 inst_3=c("FC2","FC2","University2","lab5","lab5"),
                 inst_4=c("FC3","FC6","dept2","lab3.2","lab1"),
                 inst_5=c("FC1","FC2","Uni3","labB","lab5"))

Example data frame:

 inst_1      inst_2      inst_3 inst_4 inst_5
1    FC1         FC1         FC2    FC3    FC1
2    FC2         FC5         FC2    FC6    FC2
3   Uni1    college4 University2  dept2   Uni3
4   lab3 laboratory1        lab5 lab3.2   labB
5   lab2        lab2        lab5   lab1   lab5

I want to select every column that has the prefix "inst" (likely using str_detect), and in those selected columns replace any institution that does not contain the characters "FC" with NAs. This is necessary because this sheet has 68 institution columns (inst prefix) and hundred of rows (individual scientific articles). I can't just select which institutions to replace with NAs because there are hundreds of different institutions, while I am just interested in keeping the institutions that contain "FC".

Aucun commentaire:

Enregistrer un commentaire