dimanche 3 mai 2020

how to show where column value doesn't match format in each dataframe?

I have two dataframes:

Page1:
name  dob
John   07-20200
Lilly  05-1999
James  02-2002

Page2:
name    dob
chris   8-1997
robert  4-1989
barb    07-20022

in a orderdict:

OrderedDict([('page1',     name       dob
          0   John  07-20200
          1  Lilly   05-1999
          2  James   02-2002), ('page2',      name       dob
          0   Chris   07-2020
          1  Robert   05-1999
          2    barb  02-20022)])

I need the date in a particular format so have an expression to filter the dates out:

date_pattern = r'(?<!\d)((?:0?[1-9]|1[0-2])-(?:19|20)\d{2})(?!\d)'

I want to test this date pattern against all values in the dob column in both dfs. If all the values aren't in this format I want to print a statement that shows which row in both doesn't follow this format. and if if they all do follow the format continue of what else i do in the program

I got to this point

for dfname, df in employbd.items():
dd = df['dob'].str.extract(date_pattern)
print(dd)

but all it does is show me where it matches and shows nan values for the ones that doesn't follow it.

any ideas?

if they all follow the format I don't want to print anything but if they don't I want to print something like:

invalid format: page 1: index 0: dob: 02-20200

Aucun commentaire:

Enregistrer un commentaire