lundi 29 juin 2020

Delete rows of dataframe according to condition (comparing data in two columns)

I have a dataframe with trade flows (exports) between countries of origin and countries of destination (which are specified in two different columns of the df). I need to clean the data and delete rows where the country of destination matches the country of destination. I used the following code:

dfn = dfn[dfn["Destination"] != dfn["Origin"]]

However, I realized I actually need to keep the lines where "World" is both in destination and origin (i.e. total world exports towards the world). How can I delete all rows where destination == origin except for the rows where world == destination == origin?

I was thinking of interating through my ~2 million rows and delete only those where a certain conditionality applies. I tried something along those lines, but it doesn't really work. Could you please help me?

for index, row in dfn.iterrows():
         if row['Destination'] == row['Origin'] and row['Destination'] =! 'World':
            df.drop(index, inplace=True) 

Many thanks in advance

Aucun commentaire:

Enregistrer un commentaire