jeudi 23 janvier 2020

Why is lapply and parLapply returning different lists using the same function with if-conditions?

If I run a simple function which unlists 3 date columns of a data frame into one vector and then checks if the input date of the lapply function is in that vector, I get two different outcomes using lapply and parLapply. Here an example:

mydt <- data.frame(date1 = seq(as.Date('2014-01-01'), as.Date('2014-01-31'), by = 'days'),
                   date2 = seq(as.Date('2014-03-01'), as.Date('2014-03-31'), by = 'days'),
                   date3 = seq(as.Date('2014-05-01'), as.Date('2014-05-31'), by = 'days'))

date.vector <- as.character(seq(as.Date('2014-01-01'), as.Date('2014-07-31'), by = 'days'))

myfunction <- function(x) {

  if (any(as.character(do.call(c, mydt)) == x)) {

    a <- 'TRUE'
    } else {
      a <- 'FALSE'
    }
  return(a)
}
# Use lapply
test1 <- lapply(date.vector, myfunction)
length(test1)

# Use parLapply
cl <- parallel::makeCluster(getOption("cl.cores", 3))
junk <- parallel::clusterEvalQ(cl, c(library(data.table)))
parallel::clusterExport(cl, c('mydt'), envir = environment())
test2 <- parallel::parLapply(cl, date.vec, myfunction)
length(test2)

test1 has length 212 as it should be but test2 consists of only 25 elements. Why do I get this different outcome and how can I adjust my parLapply call such that I get the same outcome as using lapply?

Aucun commentaire:

Enregistrer un commentaire