Generally speaking, I am trying to define a function which will compare the number of occurrences of specified strings per row between two columns, and modify the value of a third column depending on the comparisons outcome.
More specifically, I want a function which corrects the sentiment value of a word if there is a negation in the word which is not in the stem - given that the sentiment value currently in the dataframe is associated with the stem.
Example data frame:
df <- data.frame(word=c("disgraceful","ungrateful","impatient","unimportant","disloyal","loyal"),
stem=c("grace","grateful","patient","important","loyal","loyal"),
sentiment=c(1,1,1,1,1,1))
word stem sentiment
1 disgraceful grace 1
2 ungrateful grateful 1
3 impatient patient 1
4 unimportant important 1
5 disloyal loyal 1
6 loyal loyal 1
Desired outcome after running the newly defined correct_negation(df,word,stem,sentiment) function:
word stem sentiment
1 disgraceful grace -1
2 ungrateful grateful -1
3 impatient patient -1
4 unimportant important -1
5 disloyal loyal -1
6 loyal loyal 1
The way I tried defining the function without luck:
correct_negation <- function(x, word_x, stem_x, sentiment_x) {
sapply(x[[sentiment_x]], function(x, word_x, stem_x, sentiment_x)
if (str_count(x[[word_x]], paste(c("dis","un","im"),collapse = "|")) >
str_count(x[[stem_x]], paste(c("dis","un","im"),collapse = "|")))
{x[[sentiment_x]]*(-1)})
}
It gives the error of Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, : object 'sentiment' not found.
Aucun commentaire:
Enregistrer un commentaire