dimanche 5 juillet 2020

Pandas: Insert missing row data and iterate with conditions within groups Complex & Fun

I have a dataframe and need to insert missing row data. Here is the dataframe:

df = pd.DataFrame({
    'name': ['Jim', 'Jim', 'Jim', 'Jim', 'Mike', 'Mike', 'Mike', 'Mike', 'Mike',
           'Polo', 'Polo', 'Polo', 'Polo', 'Tom', 'Tom', 'Tom', 'Tom'],
    'From_num': [80, 68, 751, 'Started', 32, 68, 126, 49, 'Started', 105, 68, 76, 'Started', 251, 49, 23, "Started"],
    'To_num':[99, 80, 68, 751, 105, 32, 68, 126, 49, 324, 105, 114, 76, 96, 115, 49, 23],
})
    name From_num  To_num
0    Jim       80      99
1    Jim       68      80
2    Jim      751      68
3    Jim  Started     751
4   Mike       32     105
5   Mike       68      32
6   Mike      126      68
7   Mike       49     126
8   Mike  Started      49
9   Polo      105     324
10  Polo       68     105
11  Polo       76     114 #Missing record between line 10 and 11
12  Polo  Started      76
13   Tom      251      96
14   Tom       49     115 # Missing record between 13 and 14
15   Tom       23      49
16   Tom  Started      23

The data record for each group (person's name) is continuous in 'From_num' to 'To_num' in each row, and aligned from bottom to top, for example Jim: 'Started' -> 751, 751->68, 68->80, 80->99; Same pattern for Mike. But there are some missing data for Polo ad Tom, e.g. I wish to insert a row between line 10 and 11: 114 -> 105 to make the whole record is continuous. Same as Tom, insert a line between 13 and 14: 115 -> 251. I tried to code with loop conditions and failed, so please help if you have any ideas. Please DO NOT directly insert those missing records as this is a simple example. A great thanks for help! Hopefully the question is clear. The expected result is below:

df_expected:
    name From_num  To_num
0    Jim       80      99
1    Jim       68      80
2    Jim      751      68
3    Jim  Started     751
4   Mike       32     105
5   Mike       68      32
6   Mike      126      68
7   Mike       49     126
8   Mike  Started      49
9   Polo      105     324
10  Polo       68     105
11  Polo      114      68 # New Inserted line
12  Polo       76     114
13  Polo  Started      76
14   Tom      251      96
15   Tom      115     251 # New Inserted line
16   Tom       49     115
17   Tom       23      49
18   Tom  Started      23

Aucun commentaire:

Enregistrer un commentaire