jeudi 9 avril 2015

If statement returns argument of length 0 error when called from a function, but works fine when called line by line

WARNING: tl;dr! So thank you in advance for your time.


I'm writing a set of functions to parse and process emails and I'm getting stuck on what seems like a trivial error. I have a set of functions that takes in a set of mbox files one by one, makes a corpus of the emails in that mbox file and then parses each email one by one to make a data frame. A snippet of the code follows:



library(tm)
library(tm.plugin.mail)

parseSingleMbox <- function(mailbox) {
convert_mbox_eml(mailbox, '~/Temp/OSE/Temp')
myCorpus <- Corpus(DirSource('~/Temp/OSE/Temp'), readerControl=list(reader=readMail))
unlink('~/Temp/OSE/Temp', recursive=T)
return(myCorpus)
}

parseEmail <- function(email) {
#Content
content <- email$content
idx <- grep('^Content-Type: ', content)
start <- idx - 1
finish <- c(idx[-1]-2, length(content))

for (i in 1:length(idx)) {
#Get formatting data
c1 <- content[start[i]:finish[i]]
type <- grep('^Content-Type: ', c1, ignore.case=F, value=T) %>%
str_extract(., ': .*;') %>%
substr(., 3, nchar(.)-1)
encoding <- grep('^Content-Transfer-Encoding: ', c1, ignore.case=F,

#Get body of email
print(type)
if (type == 'text/plain') {
idxx <- which(nchar(c1) == 0) %>% min()
c2 <- c1[idxx:length(c1)] %>% paste(sep='', collapse='')
content <- formatEmail(...)
body <- content
}
}
}

emailsToDF <- function(projectDir, pattern='.*\\.mbox$') {
files <- list.files(path=file.path(projectDir, 'Originals'), pattern=pattern, full.names=T, recursive=T)
dat <- NULL
for (i in 1:length(files)) {
mboxCorpus <- parseSingleMbox(files[i])
for (j in 1:length(mboxCorpus)) {
dat.temp <- parseEmail(mboxCorpus[[i]])
dat <- rbind(dat, dat.temp)
}
}
return(dat)
}


I don't know how to provide a reproducible example here because these emails are confidential, but here is a redacted version:



> email$content
[1] "--0016e6de00577b89540497f2f3b3"
[2] "Content-Type: text/plain; charset=UTF-8"
[3] "Content-Transfer-Encoding: base64"
[4] ""
[5] "[:alnum:]"
[6] "[:alnum:]"
[7] "[:alnum:]"


The problem is that when I run this function I get an error back on the if (type == 'text/plain') line saying the argument is of length 0. If I run the code line by line, however, using the exact same email, it works fine. What am I missing?


Thanks again for your time!


Aucun commentaire:

Enregistrer un commentaire