mardi 20 octobre 2020

Assign a total value of 1 if any number is present in a column, else 0

I have a dataset similar to the this sample below:

| id | old_a | old_b | new_a | new_b |
|----|-------|-------|-------|-------|
| 6  | 3     | 0     | 0     | 0     |
| 6  | 9     | 0     | 2     | 0     |
| 13 | 3     | 0     | 0     | 0     |
| 13 | 37    | 0     | 0     | 1     |
| 13 | 30    | 0     | 0     | 6     |
| 13 | 12    | 2     | 0     | 0     |
| 6  | 7     | 0     | 2     | 0     |
| 6  | 8     | 0     | 0     | 0     |
| 6  | 19    | 0     | 3     | 0     |
| 6  | 54    | 0     | 0     | 0     |
| 87 | 6     | 0     | 2     | 0     |
| 87 | 11    | 1     | 1     | 0     |
| 87 | 25    | 0     | 1     | 0     |
| 87 | 10    | 0     | 0     | 0     |
| 9  | 8     | 1     | 0     | 0     |
| 9  | 19    | 0     | 2     | 0     |
| 9  | 1     | 0     | 0     | 0     |
| 9  | 34    | 0     | 7     | 0     |

I'm providing this sample dataset for the above table:

data=[[6,3,0,0,0],[6,9,0,2,0],[13,3,0,0,0],[13,37,0,0,1],[13,30,0,0,6],[13,12,2,0,0],[6,7,0,2,0],
                                 [6,8,0,0,0],[6,19,0,3,0],[6,54,0,0,0],[87,6,0,2,0],[87,11,1,1,0],[87,25,0,1,0],[87,10,0,0,0],
                                [9,8,1,0,0],[9,19,0,2,0],[9,1,0,0,0],[9,34,0,7,0]]
data= pd.DataFrame(data,columns=['id','old_a','old_b','new_a','new_b'])

I want to look into columns 'new_a' and 'new_b' for each id and even if a single value exists in these two columns for each id, I want to count it as 1 irrespective of the number of times any value has occurred and assign 0 if no value is present. For example, if I look into the id '9', there are two distinct values in new_a, but I want to count it as 1. Similarly, for id '13', there are no values in new_a, so I would want to assign it 0.

My final output should like:

| id | new_a | new_b |
|----|-------|-------|
| 6  | 1     | 0     |
| 9  | 1     | 0     |
| 13 | 0     | 1     |
| 87 | 1     | 0     |

I would eventually want to calculate the % of clients using new_a and new_b. So from the above table, 75% clients use new_a and 25% use new_b. I'm a beginner in python and not sure how to proceed in this.

Aucun commentaire:

Enregistrer un commentaire