I have a large dataframe with 31181 observations and 9 variables. In this dataframe, the academic performance of students is registered.
I also have a second dataframe, in which each student is represented in 1 row. In this row I would like to store his/her results from the academic performance dataframe.
Dataframe 1 (let's call it Academic) looks as follows:
Programme Resits Student_ID Course_code Academic_year Course_Grade_Binned Graduated Master_Student Course.rating_M Rating.tutor_M Selfstudy_M
1 IB 0 9000006 ABC1198 2013 B TRUE 1 7.5 8.2 14.1
2 IB 0 9000006 ABC1192 2014 B TRUE 1 8.4 8.8 13.0
3 IB 0 9000006 ABC1277 2014 A TRUE 1 6.0 6.4 10.6
4 IB 0 9000006 ABC1448 2013 B TRUE 1 5.7 7.8 14.4
5 IB 0 9000006 ABC1120 2014 B TRUE 1 7.1 7.4 11.2
6 IB 0 9000006 ABC1362 2013 B TRUE 1 6.7 7.5 15.8
7 IB 0 9000006 ABC1213 2013 C TRUE 1 7.7 8.1 11.4
8 IB 0 9000006 ABC1382 2013 B TRUE 1 6.6 7.1 16.3
9 IB 0 9000006 ABC1108 2013 C TRUE 1 7.1 7.6 15.7
10 IB 1 9000006 ABC1329 2014 B TRUE 1 7.5 7.9 10.7
11 IB 0 9000006 ABC1126 2013 B TRUE 1 6.7 7.5 15.3
12 IB 0 9000006 ABC1003 2013 B TRUE 1 7.3 8.5 12.6
13 IB 0 9000014 ABC1309 2014 B TRUE 0 6.9 6.1 12.4
14 IB 0 9000014 ABC1198 2013 A TRUE 0 7.5 8.2 14.1
15 IB 0 9000014 ABC1277 2014 A TRUE 0 6.0 6.4 10.6
16 IB 0 9000014 ABC1448 2013 A TRUE 0 5.7 7.8 14.4
17 IB 0 9000014 ABC1362 2013 B TRUE 0 6.7 7.5 15.8
18 IB 0 9000014 ABC1213 2013 B TRUE 0 7.7 8.1 11.4
19 IB 0 9000014 ABC1152 2014 A TRUE 0 7.0 7.6 12.3
20 IB 0 9000014 ABC1382 2013 A TRUE 0 6.6 7.1 16.3
21 IB 0 9000014 ABC1108 2013 B TRUE 0 7.1 7.6 15.7
22 IB 0 9000014 ABC1455 2014 A TRUE 0 6.7 7.3 11.2
23 IB 0 9000014 ABC1126 2013 B TRUE 0 6.7 7.5 15.3
24 IB 0 9000014 ABC1003 2013 A TRUE 0 7.3 8.5 12.6
25 IB 1 9000028 ABC1213 2014 C TRUE 0 7.8 8.6 10.7
26 IB 0 9000028 ABC1198 2014 B TRUE 0 7.1 8.0 15.5
Dataframe 2 (let's call it NewData) looks like this:
Student_ID Master Resits Programme ABC1198 ABC1192 ABC1277 ABC1448 ABC1120 ABC1362 ABC1213 ABC1382 ABC1108 ABC1329 ABC1126 ABC1003 ABC1309 ABC1152 ABC1455 ABC1123 ABC1409
1 9000006 1 1 IB NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 9000014 0 0 IB NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 9000028 0 5 IB NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
4 9000045 1 5 EBE NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
As you can see, all the course columns are still NA. I would like to create a loop to check if a course_code exists in a group (i.e. by Student_ID) in the academic dataframe and then put a 1 in the particular course column in the NewData dataframe and a 0 if the student didn't do that course.
The end result (the NewData) should thus look like this:
Student_ID Master Resits Programme ABC1198 ABC1192 ABC1277 ABC1448 ABC1120 ABC1362 ABC1213 ABC1382 ABC1108 ABC1329 ABC1126 ABC1003 ABC1309 ABC1152 ABC1455 ABC1123 ABC1409
1 9000006 1 1 IB 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0
How would I do this?
Thank you!
Aucun commentaire:
Enregistrer un commentaire