Preface: I do have two csv-tables each containing 3 million rows and about 20 columns and I want to extract 5 columns for all rows which meet certain requirements. It would be better if I worked with SQL or some other data base tool, but hey, I started out in R! and I do have to finish it now.
Currently my request is running on a R!-server with about 16 GB RAM - tomorrow the run of the first table will hit one week runtime and about 80% are done.
This leads me to following question: Does it make any difference how I formulate my if-clause? Currently I do the following (omitting loading csv, preparing dataframe etc):
i = 1
while(i < length_csv){
if((csv$row11[i] != condition1) && (csv$row11[i] != condition2)
&& (csv$row11[i] != condition3) && (csv$row11[i] != condition4)
&& (csv$row11[i] != condition5) && (csv$row11[i] != condition6)
&& (csv$row11[i] != condition7) && (csv$row3[i] == condition8)){
dataframe = rbind(dataframe,c(csv$row1[i],csv$row2[i],csv$row11[i],csv$row12[i],csv$row13[i]))
}
i = i + 1
}
Would it be more efficient if the request was nested like
i = i+1
while(i < length_csv){
if(csv$row3[i] == condition8){
if(csv$row11[i] != condition1){
if(csv$row11[i] != condition2){
... etc
}
}
}
Or are there other ways to formulate the request I might have overlooked?
Aucun commentaire:
Enregistrer un commentaire