I have two dataframes:
Page1:
name dob
John 07-20200
Lilly 05-1999
James 02-2002
Page2:
name dob
chris 8-1997
robert 4-1989
barb 07-20022
in a orderdict:
OrderedDict([('page1', name dob
0 John 07-20200
1 Lilly 05-1999
2 James 02-2002), ('page2', name dob
0 Chris 07-2020
1 Robert 05-1999
2 barb 02-20022)])
I need the date in a particular format so have an expression to filter the dates out:
date_pattern = r'(?<!\d)((?:0?[1-9]|1[0-2])-(?:19|20)\d{2})(?!\d)'
I want to test this date pattern against all values in the dob column in both dfs. If all the values aren't in this format I want to print a statement that shows which row in both doesn't follow this format. and if if they all do follow the format continue of what else i do in the program
I got to this point
for dfname, df in employbd.items():
dd = df['dob'].str.extract(date_pattern)
print(dd)
but all it does is show me where it matches and shows nan values for the ones that doesn't follow it.
any ideas?
if they all follow the format I don't want to print anything but if they don't I want to print something like:
invalid format: page 1: index 0: dob: 02-20200
Aucun commentaire:
Enregistrer un commentaire