I am trying to distribute rows of a pandas data frame into a bucket based on conditions.
topic1 topic2
name1 1 4
name2 4 4
name3 4 3
name4 4 4
name5 2 4
I need a count of 3 values for topic1 and 4 values for topic2 in bucket 1 if they fullfil the condition that they are 4 in my bucket. Once the bucket is filled, I want to stop the code. Hence, my bucket variables looks like this:
bucket1_topic1 = 2
bucket1_topic2 = 3
I wrote this pretty convoluted starter that is 'almost' working...But I am having issues in dealing with rows that fulfil the conditions for both topic1 and topic2. What is the more efficent & correct way to do this?
rows_list = []
counter1 = 0
counter2 = 0
for index,row in data.iterrows():
if counter1 < bucket1_topic1:
if row.topic1 == 4:
counter1 +=1
rows_list.append([row[1], row.topic1, row.topic2])
if counter2 < bucket1_topic2:
if row.topic2 == 4 and row.topic1 !=4:
counter2 +=1
if [row[1], row.topic1, row.topic2] not in rows_list:
rows_list.append([row[1], row.topic1, row.topic2])
Desired result:
topic1 topic2
name1 1 4
name2 4 4
name3 4 3
name5 2 4
Aucun commentaire:
Enregistrer un commentaire