mardi 19 février 2019

Adding ifelse() into a Map function in R

I've got a simple Map function that scrapes text files from a blog site. It's pretty easy to get a scraper that gets all of the text files and downloads them to my working directory. My goal: use an ifelse() or a plain if statement to only scrape a file based on a certain date.

Eg, if four files were posted on 1/31/19, and I pointed my ifelse at that date, the function would return those four files. Code:

library(tidyverse)
library(rvest)

# URL set up
url <- "https://www.example-blog/posts.aspx"
page <- html_session(url, config(ssl_verifypeer = FALSE))

# Picking elements
links <- page %>% 
  html_nodes("td") %>% 
  html_nodes("a") %>% 
  html_attr("href") 

# Getting date elements
dates <- page %>%
  html_nodes("node.dates") %>% 
  html_text()

dates <- parse_date_time(dates, "%m/%d/%Y", tz = "EST", 
                     locale = Sys.getlocale("LC_TIME"))

# Function 
out <- Map(function(ln) {

fun1 <- html_session(URLencode(
  paste0("https://www.example-blog", ln)),
  config(ssl_verifypeer = FALSE))

write <- writeBin(fun1$response$content)

ifelse(dates == '2019-01-31', write, "He's dead, Jim")

}, links)

I've tried various ways to get that if statement in there, and also moving the writeBin around. (Usually the writeBin would not be vectorized - I did it for easy viewing in my ifelse).

If I leave out the if code, everything works great, it just returns many text files, when I only want the ones from the specified date.

Aucun commentaire:

Enregistrer un commentaire