I have a dataframe and need to insert missing row data. Here is the dataframe:
df = pd.DataFrame({
'name': ['Jim', 'Jim', 'Jim', 'Jim', 'Mike', 'Mike', 'Mike', 'Mike', 'Mike',
'Polo', 'Polo', 'Polo', 'Polo', 'Tom', 'Tom', 'Tom', 'Tom'],
'From_num': [80, 68, 751, 'Started', 32, 68, 126, 49, 'Started', 105, 68, 76, 'Started', 251, 49, 23, "Started"],
'To_num':[99, 80, 68, 751, 105, 32, 68, 126, 49, 324, 105, 114, 76, 96, 115, 49, 23],
})
name From_num To_num
0 Jim 80 99
1 Jim 68 80
2 Jim 751 68
3 Jim Started 751
4 Mike 32 105
5 Mike 68 32
6 Mike 126 68
7 Mike 49 126
8 Mike Started 49
9 Polo 105 324
10 Polo 68 105
11 Polo 76 114 #Missing record between line 10 and 11
12 Polo Started 76
13 Tom 251 96
14 Tom 49 115 # Missing record between 13 and 14
15 Tom 23 49
16 Tom Started 23
The data record for each group (person's name) is continuous in 'From_num' to 'To_num' in each row, and aligned from bottom to top, for example Jim: 'Started' -> 751, 751->68, 68->80, 80->99; Same pattern for Mike. But there are some missing data for Polo ad Tom, e.g. I wish to insert a row between line 10 and 11: 114 -> 105 to make the whole record is continuous. Same as Tom, insert a line between 13 and 14: 115 -> 251. I tried to code with loop conditions and failed, so please help if you have any ideas. Please DO NOT directly insert those missing records as this is a simple example. A great thanks for help! Hopefully the question is clear. The expected result is below:
df_expected:
name From_num To_num
0 Jim 80 99
1 Jim 68 80
2 Jim 751 68
3 Jim Started 751
4 Mike 32 105
5 Mike 68 32
6 Mike 126 68
7 Mike 49 126
8 Mike Started 49
9 Polo 105 324
10 Polo 68 105
11 Polo 114 68 # New Inserted line
12 Polo 76 114
13 Polo Started 76
14 Tom 251 96
15 Tom 115 251 # New Inserted line
16 Tom 49 115
17 Tom 23 49
18 Tom Started 23
Aucun commentaire:
Enregistrer un commentaire