I'm relatively new to using R in this way and I'm completely stuck with the following problem.
I'm attempting to save html pages for parliamentary debates to a local folder in order to carry out some scraping in the future. I've written the following function (relying on other snippets of code rather than entirely freestyle!) in order to construct the directory, strip the URL down to a more understandable format (e.g. "2010-12_academieshl.html"), and then, if the file does not exist, write the file to the specified folder. (At this point I'm aware that the use of gsub below is kind of clumsy!)
dlPages <- function(pageurl, folder ,handle) {
dir.create(folder, showWarnings = FALSE)
gsub_URL <- gsub("/stages.html", "", link_list)
gsub_URL <- gsub("http://ift.tt/1oeOe1g", "", gsub_URL)
gsub_URL <- gsub("/", "_", gsub_URL)
page_name <- str_c(gsub_URL, ".html")
if (!file.exists(str_c(folder, "/", page_name))) {
content <- try(getURL(pageurl, curl = handle))
write(content, str_c(folder, "/", page_name))
Sys.sleep(1)
} }
I'm then using l_ply to run a list (link_list) of links over the function:
handle <- getCurlHandle()
l_ply(link_list, dlPages,
folder = "lords_bills_all",
handle = handle)
The following error message is being returned:
Error in file(file, ifelse(append, "a", "w")) : invalid 'description' argument
Along with the following warning messages:
In addition: Warning messages:
1: In if (!file.exists(str_c(folder, "/", page_name))) { :
the condition has length > 1 and only the first element will be used
2: In if (file == "") file <- stdout() else if (substring(file, 1L, :
the condition has length > 1 and only the first element will be used
3: In if (substring(file, 1L, 1L) == "|") { :
the condition has length > 1 and only the first element will be used
Can someone help me understand where I'm going wrong? Also, is it best to use an 'if' argument in this scenario or write a 'for' loop instead?
Thanks in advance. Andy
Aucun commentaire:
Enregistrer un commentaire