samedi 3 mars 2018

R define function for comparing number of string occurrences between cells in data frame

Generally speaking, I am trying to define a function which will compare the number of occurrences of specified strings per row between two columns, and modify the value of a third column depending on the comparisons outcome.

More specifically, I want a function which corrects the sentiment value of a word if there is a negation in the word which is not in the stem - given that the sentiment value currently in the dataframe is associated with the stem.

Example data frame:

df <- data.frame(word=c("disgraceful","ungrateful","impatient","unimportant","disloyal","loyal"), 
                 stem=c("grace","grateful","patient","important","loyal","loyal"), 
                 sentiment=c(1,1,1,1,1,1))


  word        stem      sentiment
1 disgraceful grace     1
2 ungrateful  grateful  1
3 impatient   patient   1
4 unimportant important 1
5 disloyal    loyal     1
6 loyal       loyal     1

Desired outcome after running the newly defined correct_negation(df,word,stem,sentiment) function:

  word        stem      sentiment
1 disgraceful grace     -1
2 ungrateful  grateful  -1
3 impatient   patient   -1
4 unimportant important -1
5 disloyal    loyal     -1
6 loyal       loyal     1

The way I tried defining the function without luck:

correct_negation <- function(x, word_x, stem_x, sentiment_x) {
  sapply(x[[sentiment_x]], function(x, word_x, stem_x, sentiment_x)
    if (str_count(x[[word_x]], paste(c("dis","un","im"),collapse = "|")) > 
        str_count(x[[stem_x]], paste(c("dis","un","im"),collapse = "|")))
    {x[[sentiment_x]]*(-1)})
}

It gives the error of Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, : object 'sentiment' not found.

Aucun commentaire:

Enregistrer un commentaire