vendredi 18 décembre 2020

Receiving "missing value where TRUE / FALSE needed" error message when web scraping

I'm writing code to scrape data from a blog. As the posts are created by two different authors, and I just want to get the data from one author, I created a function with if to try to solve this problem. But when I run the function on the blog address, I get the following error message: "ERROR: missing value where TRUE / FALSE needed". Does anyone know what this means and what can I do to resolve it?

The function code:

extract_articles_blogger_preto <- function(x){
  tryCatch({
    webpage <- read_html(x) 
    text <- html_nodes(webpage, ".cabecalho") %>% html_nodes(".corpo")
    i <- 0
    pular_texto <- FALSE
    article <- ""
    for (p in text){
      if (i==0){
        i <- 1
      }
      else if(i==1){
        i <- 2
        }
      else if(i==2){
        autor <- html_nodes(p, "a[href]") %>% html_attr("href")
        i <- 3
        if (str_detect(autor[2], "rainhafragil")){
          pular_texto <- FALSE
        } else {
          pular_texto <- TRUE
        }
      }
        else if(i==3){
        if (pular_texto==FALSE){
          article <- str_c(article, html_text(text), "\n")
        }
        i <-0
      }
    }
    return(article) 
  }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}

#Trying to apply the function to the blog address:

extract_articles_blogger_preto("http://web.archive.org/web/20070430023653mp_/http://fragilreino.blogger.com.br/2002_12_01_archive.html")

#Error message:"missing value where TRUE / FALSE needed"

Aucun commentaire:

Enregistrer un commentaire