mercredi 10 avril 2019

Alternative for my slow nested loop with if-statement?

I got two big data frames (df with 7038 rows and df2 with 14076 rows). I want to compare them and add values if certain fields are the same.

I tried a nested for loop with an if-statement but it takes several hours to complete.

df:

Date       HomeTeam     AwayTeam      FTR   GoalScoreHome GoalScoreAway
   <date>     <chr>           <chr>         <chr> <chr>         <chr>        
 1 1995-08-18 For Sittard     PSV Eindhoven A     NA            NA           
 2 1995-08-19 Go Ahead Eagles Groningen     D     NA            NA           
 3 1995-08-19 Roda JC         Heerenveen    D     NA            NA           
 4 1995-08-19 Willem II       Sparta        H     NA            NA           
 5 1995-08-20 Ajax            Utrecht       H     NA            NA           
 6 1995-08-20 Feyenoord       Vitesse       H     NA            NA           
 7 1995-08-20 Graafschap      Nijmegen      A     NA            NA           
 8 1995-08-20 Volendam        Twente        A     NA            NA           
 9 1995-08-20 Waalwijk        NAC Breda     D     NA            NA           
10 1995-08-23 Groningen       For Sittard   H     NA            NA   


df2:

Round Date        Team   GDPerGame      PointsPerGame      GoalScore5.2
1     1 1995-08-20 Ajax          4             3           NA
2     2 1995-08-25 Ajax          6             3           NA
3     3 1995-09-10 Ajax          4             3           NA
4     4 1995-09-17 Ajax          4             3           NA
5     5 1995-09-20 Ajax          4             3           NA
6     6 1995-09-24 Ajax          1             3           22

I'm using the following loop:

for (i in 1:nrow(df)) {
  for (j in  1:nrow(df2)) {
    if(df$HomeTeam[i] == df2$Team[j] & df$Date[i] == df2$Date[j] ){

      df$GoalScoreHome[i] = df2$GoalScore5.2[j]
    }
    else if(df$AwayTeam[i] == df2$Team[j] & df$Date[i] == df2$Date[j]){
      df$GoalScoreAway[i] = df2$GoalScore5.2[j]
    }

  }

}


This works as intended, but as I said before it's way too slow

I found some alternatives for a nested loop, but not with an if-statement in it. Does anyone know a good, faster alternative?

Aucun commentaire:

Enregistrer un commentaire