I have a dataframe datav2 with the structure:
project_id hour cap_factor load_area fuel capacity
17521 1000097 17902 0.753329 CA_PGE_S Solar 21.54
17522 1000097 17901 0.847296 CA_PGE_S Solar 21.54
17523 1001197 17924 0.586530 CA_PGE_S Solar 9.88
17524 2200097 25374 0.000000 CA_PGE_S Solar 44.54
17525 1077597 25414 0.635047 CA_PGE_S Wind 11.33
17526 1000097 19770 -0.39957 CA_PGE_S Solar 21.54
The dataframe is 21million rows in length. I want to loop across every row and, when a given rows df$cap_factor < 0, replace that rows df$cap_factor with the average of every other cap_factor variable that is of the same load_area, hour, and fuel, and is positive.
Thus far I have been using the following:
datav2$cap_factor2 <-NA
for (i in 1:length(datav2$cap_factor)) {
if (datav2[i,3] < 0)
datav2[i,7] <-mean(datav2$cap_factor[datav2$hour == datav2[i,2] & datav2$fuel == datav2[i,5] &
datav2$fuel == datav2[i,4] & datav2$cap_factor >= 0 ])
else
datav2[i,7] <- datav2[i,3]}
This is very slow when looping across such a large dataset. Any suggestions on a technique to improve the efficiency of this process?
Aucun commentaire:
Enregistrer un commentaire