samedi 3 juillet 2021

Adding a value to a column IF the company ID matches between two dataframes AND the year is lower or equal to the year for the same row

I have two datasets: Dataset_1 has about 140 variables, one of which is "company id", "yr" and a column for "lower_year" This dataset has multiple rows for each year for each company id.

Dataset_2 has ids (some of which match the "company_id") named "org_number", and "year" with only a single row and year for each id.

I want to add a 1 in the column called "lower_year" for all the rows of Dataset_1 where the company ID matches one of the IDs in Dataset_2 and the "yr" value in Dataset_1 is equal to or lower than the "year" value in Dataset_2 at the specific row of the ID.

Down the road I would like to delete the matching IDs that are not of a lower than or equal year (but not values where the ids do not match) but that should be simpler when this first step is done.

I have made some attempts and tried combining if and and functions but have had no luck and am not entirely sure I am going in the right direction.

# assign data of lists.
Dataset_1 = {'company_id': ['111', '111', '223', '444'], 'yr': [2012, 2014, 2020, 1843], 'lower_year': [0, 0, 0, 0]}
Dataset_2 = {'org_number': ['111', '444'], 'year': [2015, 2020]}
Dataset_1 = pd.DataFrame(Dataset_1)
Dataset_2 = pd.DataFrame(Dataset_2)
print(Dataset_1)
print(Dataset_2)
data.insert(1, 'investment', '0')


#one attempt that doesn't work
if Dataset_2['org_number'] == Dataset_1['company_id'] and Dataset_1['yr'] <= Dataset_2['year']:
    Dataset_2['lower_year'] = Dataset_1['lower_year'] = 1
else:
    Dataset_1['lower_year'] = Dataset_1['lower_year'] = 0


#alternatively: This runs but I think it is not filtering for the year needing to be the same in the same row as the company ID of Dataset_2
Dataset_1['lower_year'] = np.where((Dataset_1['company_id'].isin(Dataset_2['org_number'])) & (Dataset_1['yr'].isin((Dataset_2)['year'])), '1', '2')

Here is a reproductable sample and my two various attempts that didn't work. My issue seems to be that I need it to be from the same row as the company ID in Dataset_2.

Aucun commentaire:

Enregistrer un commentaire