Having a tough time with some data wrangling. I am trying to keep only certain author institutional affiliations in a bibliographic data frame and replace the many unwanted affiliations with NAs. I need to do this across multiple columns since papers have multiple authors (up to 68).
reproducible code:
so <- data.frame(inst_1=c("FC1","FC2","Uni1","lab3","lab2"),
inst_2=c("FC1","FC5","college4","laboratory1","lab2"),
inst_3=c("FC2","FC2","University2","lab5","lab5"),
inst_4=c("FC3","FC6","dept2","lab3.2","lab1"),
inst_5=c("FC1","FC2","Uni3","labB","lab5"))
Example data frame:
inst_1 inst_2 inst_3 inst_4 inst_5
1 FC1 FC1 FC2 FC3 FC1
2 FC2 FC5 FC2 FC6 FC2
3 Uni1 college4 University2 dept2 Uni3
4 lab3 laboratory1 lab5 lab3.2 labB
5 lab2 lab2 lab5 lab1 lab5
I want to select every column that has the prefix "inst" (likely using str_detect), and in those selected columns replace any institution that does not contain the characters "FC" with NAs. This is necessary because this sheet has 68 institution columns (inst prefix) and hundred of rows (individual scientific articles). I can't just select which institutions to replace with NAs because there are hundreds of different institutions, while I am just interested in keeping the institutions that contain "FC".
Aucun commentaire:
Enregistrer un commentaire