if-statement: Increased efficiency in nested for{ifelse} loop across conditional mean

mardi 21 mars 2017

Increased efficiency in nested for{ifelse} loop across conditional mean

I have a dataframe datav2 with the structure:

      project_id  hour cap_factor load_area  fuel capacity
17521    1000097 17902   0.753329  CA_PGE_S Solar    21.54
17522    1000097 17901   0.847296  CA_PGE_S Solar    21.54
17523    1001197 17924   0.586530  CA_PGE_S Solar     9.88
17524    2200097 25374   0.000000  CA_PGE_S Solar    44.54
17525    1077597 25414   0.635047  CA_PGE_S Wind     11.33
17526    1000097 19770   -0.39957  CA_PGE_S Solar    21.54

The dataframe is 21million rows in length. I want to loop across every row and, when a given rows df$cap_factor < 0, replace that rows df$cap_factor with the average of every other cap_factor variable that is of the same load_area, hour, and fuel, and is positive.

Thus far I have been using the following:

 datav2$cap_factor2 <-NA

for (i in 1:length(datav2$cap_factor)) {
  if (datav2[i,3] < 0)
    datav2[i,7] <-mean(datav2$cap_factor[datav2$hour == datav2[i,2] & datav2$fuel == datav2[i,5] &
                                           datav2$fuel == datav2[i,4] & datav2$cap_factor >= 0 ])
  else  
    datav2[i,7] <- datav2[i,3]}

This is very slow when looping across such a large dataset. Any suggestions on a technique to improve the efficiency of this process?

if-statement

mardi 21 mars 2017

Increased efficiency in nested for{ifelse} loop across conditional mean

Aucun commentaire:

Enregistrer un commentaire