mardi 13 août 2019

Multiple if statements using lapply to scrape multiple urls

I want to use multiple if statements to scrape different urls. I have been able to put together two if statements. However, I need to add a third one and I am struggling with the structure of the code. The reason why I need the statements is because the content of the urls is under different tags depending on the website page

This works fine

noticias_semana_lapply is a list of 10.000 urls

prueba_titulos =lapply(noticias_semana_lapply[12:14,1], function(x) {
  tryCatch(
    {
      Sys.sleep(0.1)
      read_html(x) %>% html_nodes(".tittleArticuloOpinion") %>% html_text %>% 
      {if(length(.) == 0) read_html(x) %>% html_nodes(".nameColumnista") %>% html_text else .}%>%
        as.character
    },
    error = function(cond) return(NULL),
    finally = print(x)
  )
})

However, when I add the other condition, I get chr(0) for each website

prueba_titulos2 =lapply(noticias_semana_lapply[12:14,1], function(x) {
  tryCatch(
    {
      Sys.sleep(0.1)
      read_html(x) %>% html_nodes(".tittleArticuloOpinion") %>% 
      html_text %>% {if(length(.) == 0) read_html(x) %>% 
      html_nodes(".nameColumnista") %>% html_text else {
      if (length(.) == 0) read_html(x) %>% html_nodes(".article-header h2") %>% 
      html_text}} %>% as.character
      },
      error = function(cond) return(NULL),
      finally = print(x)
  )
})

Could someone please help me? Thanks a lot!

Aucun commentaire:

Enregistrer un commentaire