I have a dataset similar to the this sample below:
| id | old_a | old_b | new_a | new_b |
|----|-------|-------|-------|-------|
| 6 | 3 | 0 | 0 | 0 |
| 6 | 9 | 0 | 2 | 0 |
| 13 | 3 | 0 | 0 | 0 |
| 13 | 37 | 0 | 0 | 1 |
| 13 | 30 | 0 | 0 | 6 |
| 13 | 12 | 2 | 0 | 0 |
| 6 | 7 | 0 | 2 | 0 |
| 6 | 8 | 0 | 0 | 0 |
| 6 | 19 | 0 | 3 | 0 |
| 6 | 54 | 0 | 0 | 0 |
| 87 | 6 | 0 | 2 | 0 |
| 87 | 11 | 1 | 1 | 0 |
| 87 | 25 | 0 | 1 | 0 |
| 87 | 10 | 0 | 0 | 0 |
| 9 | 8 | 1 | 0 | 0 |
| 9 | 19 | 0 | 2 | 0 |
| 9 | 1 | 0 | 0 | 0 |
| 9 | 34 | 0 | 7 | 0 |
I'm providing this sample dataset for the above table:
data=[[6,3,0,0,0],[6,9,0,2,0],[13,3,0,0,0],[13,37,0,0,1],[13,30,0,0,6],[13,12,2,0,0],[6,7,0,2,0],
[6,8,0,0,0],[6,19,0,3,0],[6,54,0,0,0],[87,6,0,2,0],[87,11,1,1,0],[87,25,0,1,0],[87,10,0,0,0],
[9,8,1,0,0],[9,19,0,2,0],[9,1,0,0,0],[9,34,0,7,0]]
data= pd.DataFrame(data,columns=['id','old_a','old_b','new_a','new_b'])
I want to look into columns 'new_a' and 'new_b' for each id and even if a single value exists in these two columns for each id, I want to count it as 1 irrespective of the number of times any value has occurred and assign 0 if no value is present. For example, if I look into the id '9', there are two distinct values in new_a, but I want to count it as 1. Similarly, for id '13', there are no values in new_a, so I would want to assign it 0.
My final output should like:
| id | new_a | new_b |
|----|-------|-------|
| 6 | 1 | 0 |
| 9 | 1 | 0 |
| 13 | 0 | 1 |
| 87 | 1 | 0 |
I would eventually want to calculate the % of clients using new_a and new_b. So from the above table, 75% clients use new_a and 25% use new_b. I'm a beginner in python and not sure how to proceed in this.
Aucun commentaire:
Enregistrer un commentaire