mardi 17 mars 2020

How to apply conditions on pandas dataframe columns?

I have a pandas dataframe that has information on rejections. A little background on the problem, an email sender may send the same email multiple times, but it is only resolved once. I want to still account for the emails that have the same sender and same message as 'resolved' in a new column.

The starting dataframe looks like this:

data = [['Sent from automated email', 'jim@yahoo.com', 'Resolved','2020-01-13 07:06:34'], 
        ['Sent from automated email', 'jim@yahoo.com', 'Rejected','2020-01-13 07:06:39'], 
        ['Hello I would like for you to make an update please','new101@cnn.com', 'Resolved', '2020-02-14 09:06:39'], 
        ['Hello I would like for you to make an update please','new101@cnn.com', 'Rejected', '2020-02-14 09:06:41'],
        ['Hello I would like for you to make an update please','new101@cnn.com', 'Resolved', '2020-02-14 09:06:59'],
        ['Take one newspaper','notneeded@gmail.com', 'Resolved', '2020-02-17 09:05:39'],
        ['Hey hows it going','jamie@gmail.com', 'Rejected', '2020-03-12 09:03:42'],
        ] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Message', 'Email','Resolution','Time Sent']) 

I want to take all the emails that have the same sender and the same message, but different resolutions and label them as 'resolved' if any of prior emails were resolved. My desired output is would be:

data = [['Sent from automated email', 'jim@yahoo.com', 'Resolved','2020-01-13 07:06:34','Resolved' ], 
        ['Sent from automated email', 'jim@yahoo.com', 'Rejected','2020-01-13 07:06:39','Resolved'], 
        ['Hello I would like for you to make an update please','new101@cnn.com', 'Resolved', '2020-02-14 09:06:39','Resolved'], 
        ['Hello I would like for you to make an update please','new101@cnn.com', 'Rejected', '2020-02-14 09:06:41','Resolved'],
        ['Hello I would like for you to make an update please','new101@cnn.com', 'Resolved', '2020-02-14 09:06:59','Resolved'],
        ] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Message', 'Email','Resolution','Time Sent','Real Resolution']) 

I have tried writing a function like below:

def a(df):
    if df[df['message'].duplicated()] & df[(df['resolution'] == 'Rejected') | (df['resolution'] == 'Resolved') ] & df[df['Email].duplicated()]: 

I do not think this is correct since I am not accounting for only duplicated messages that are Resolved and then rejected. Any tips? Thanks!

Aucun commentaire:

Enregistrer un commentaire