I got two big data frames (df with 7038 rows and df2 with 14076 rows). I want to compare them and add values if certain fields are the same.
I tried a nested for loop with an if-statement but it takes several hours to complete.
df:
Date HomeTeam AwayTeam FTR GoalScoreHome GoalScoreAway
<date> <chr> <chr> <chr> <chr> <chr>
1 1995-08-18 For Sittard PSV Eindhoven A NA NA
2 1995-08-19 Go Ahead Eagles Groningen D NA NA
3 1995-08-19 Roda JC Heerenveen D NA NA
4 1995-08-19 Willem II Sparta H NA NA
5 1995-08-20 Ajax Utrecht H NA NA
6 1995-08-20 Feyenoord Vitesse H NA NA
7 1995-08-20 Graafschap Nijmegen A NA NA
8 1995-08-20 Volendam Twente A NA NA
9 1995-08-20 Waalwijk NAC Breda D NA NA
10 1995-08-23 Groningen For Sittard H NA NA
df2:
Round Date Team GDPerGame PointsPerGame GoalScore5.2
1 1 1995-08-20 Ajax 4 3 NA
2 2 1995-08-25 Ajax 6 3 NA
3 3 1995-09-10 Ajax 4 3 NA
4 4 1995-09-17 Ajax 4 3 NA
5 5 1995-09-20 Ajax 4 3 NA
6 6 1995-09-24 Ajax 1 3 22
I'm using the following loop:
for (i in 1:nrow(df)) {
for (j in 1:nrow(df2)) {
if(df$HomeTeam[i] == df2$Team[j] & df$Date[i] == df2$Date[j] ){
df$GoalScoreHome[i] = df2$GoalScore5.2[j]
}
else if(df$AwayTeam[i] == df2$Team[j] & df$Date[i] == df2$Date[j]){
df$GoalScoreAway[i] = df2$GoalScore5.2[j]
}
}
}
This works as intended, but as I said before it's way too slow
I found some alternatives for a nested loop, but not with an if-statement in it. Does anyone know a good, faster alternative?
Aucun commentaire:
Enregistrer un commentaire