mardi 2 octobre 2018

Improve detecting words like "she" and "her" from sentences and return "Female" as a result

I have a variable "bio_sentences" and as the name of the variable suggests, it has four to five bio sentences of individuals (extracted and split into sentences from "bio" variable). I am trying to determine what gender an individual is using this logic...

Femalew <- c("She", "Her")
Check <- str_extract_all(bio,Femalew)
Check <- Check[Check != "character(0)"]
Gender <- vector("character")
if(length(Check) > 0){
  Gender[1] <- "Female"
}else{
  Gender[1] <- "Male"
}
for(i in 1:length(bio_sentences)){
  Gender[i] <- Gender[1]
} 

I am getting a good result (majority in my dataset are male), however there are few misses (some females aren't detected) despite the fact the sentences have "she" or "her" in them. Is there anyway, I can improve the accuracy of the logic or deploy some new function like grepl?

Aucun commentaire:

Enregistrer un commentaire