jeudi 30 mai 2019

Conditional If Statement applied to dataframe

I am trying to iterate in a pythonic way (i.e. without a loop) through a dataframe in order to create a new columns based on whether the condition was met. In particular, given a dataframe of daily returns, I would like to create a new column that tells me whether either an upper limit or lower limit was crossed (limit is symmetric, but stock specific, so each row might have a different limit, called std in the df below) , something like this:

import pandas as pd
dict = [
        {'ticker':'jpm','date': '2016-11-28','returns': '0.2','returns2': '0.3','std': '0.1'},
{ 'ticker':'ge','date': '2016-11-28','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'fb', 'date': '2016-11-28','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'aapl', 'date': '2016-11-28','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'msft','date': '2016-11-28','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'amzn','date': '2016-11-28','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'jpm','date': '2016-11-29','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'ge', 'date': '2016-11-29','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'fb','date': '2016-11-29','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'aapl','date': '2016-11-29','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'msft','date': '2016-11-29','returns': '0.2','returns2': '0.3','std': '0.1'},
{'ticker':'amzn','date': '2016-11-29','returns': '0.2','returns2': '0.3','std': '0.1'}
]
df = pd.DataFrame(dict)
df['date']      = pd.to_datetime(df1['date'])
df=df.set_index(['date','ticker'], drop=True)  

That should be transformed, such that I obtain a new column which contains the respective day's return, if the upper/lower threshold was crossed, if it wasn't crossed, it should just contain the last day's return (so returns2).

dict2 = [
        {'ticker':'jpm','date': '2016-11-28','returns': '0.2','returns2': '-0.3','std': '0.1','sl': '0.2'},
{ 'ticker':'ge','date': '2016-11-28','returns': '-0.2','returns2': '0.3','std': '0.1','sl': '-0.2'},
{'ticker':'fb', 'date': '2016-11-28','returns': '0.05','returns2': '-0.3','std': '0.1','sl': '-0.3'},
{'ticker':'aapl', 'date': '2016-11-28','returns': '-0.2','returns2': '0.3','std': '0.1','sl': '-0.2'},
{'ticker':'msft','date': '2016-11-28','returns': '0.2','returns2': '-0.3','std': '0.1','sl': '0.2'},
{'ticker':'amzn','date': '2016-11-28','returns': '-0.2','returns2': '0.3','std': '0.1','sl': '-0.2'},
{'ticker':'jpm','date': '2016-11-29','returns': '0.2','returns2': '-0.3','std': '0.1','sl': '0.2'},
{'ticker':'ge', 'date': '2016-11-29','returns': '-0.2','returns2': '0.3','std': '0.1','sl': '-0.2'},
{'ticker':'fb','date': '2016-11-29','returns': '0.2','returns2': '-0.3','std': '0.1','sl': '0.2'},
{'ticker':'aapl','date': '2016-11-29','returns': '-0.2','returns2': '0.3','std': '0.1','sl': '-0.2'},
{'ticker':'msft','date': '2016-11-29','returns': '0.2','returns2': '-0.3','std': '0.1','sl': '0.2'},
{'ticker':'amzn','date': '2016-11-29','returns': '-0.2','returns2': '0.3','std': '0.1','sl': '-0.2'}
]
df2 = pd.DataFrame(dict2)
df2['date']      = pd.to_datetime(df2['date'])
df2=df2.set_index(['date','ticker'], drop=False)   

I am trying to keep this flexible (so it works for more than just 2 columns with returns) and efficient (so that it works on very large dfs.

Can anyone suggest an approach?

Aucun commentaire:

Enregistrer un commentaire