jeudi 9 septembre 2021

for loop a tibble takes too much time

For the project I am working on, I am analyzing two datasets each of 500,000 rows. I had to filter these rows based on a value in one specific column. Here is the function I've coded to use on the tibbles:


theme_analyser <- function(tibble_to_analyse) {
 
for (i in 1:nrow(tibble_to_analyse)) {
  theme <- unlist(strsplit((tibble_to_analyse$themes[i]), ";"))
  if (any(theme %in% themes_to_use)){
    next}
  else {
    tibble_to_analyse <- tibble_to_analyse[-i,]
  }
}  
}

In this function, themes_to_use is a vector that contains a set of string values. The problem with this code is that it works too slow. It managed to complete the work for only 250k rows in 18 hours. What are the ways I can fasten this process so that it does not take as much time? Thank you in advance.

Aucun commentaire:

Enregistrer un commentaire