mardi 4 avril 2017

Conditional Regex Function on Pandas Dataframe

I have the following function (see below). I might be over complicating this. A new set of fresh eyes would be deeply appreciated.

The goal is to:

  1. Iter first through df['Plan Unique ID'], search for a specific value (we_match or uk_match), if there is a match

  2. Check that the string value is bigger than a certain value in that group (we12720203 or uk11350200)

  3. If the value is greater than add that we or uk value to a new column df['Consolidated ID'].

  4. If the value is lower or there is no match, then search df['Atlas Placement ID'] with new_id_search

  5. If there is a match, then add that to df['Consolidated ID']

  6. If not, return 0 to df['Consolidated ID]

The current problem is that it returns an empty column.

 def placement_extract(df="mediaplan_df", we_search="we\d{8}", uk_search="uk\d{8}", new_id_search= "(\d{14})"):

        if type(df['Plan Unique ID']) is str:
            we_match = re.search(we_search, df['Plan Unique ID'])
            if we_match:
                if we_match > "we12720203":
                    return we_match.group(0)
                else:
                    uk_match =  re.search(uk_search, df['Plan Unique ID'])
                    if uk_match:
                        if uk_match > "uk11350200":
                            return uk_match.group(0)
                        else:
                            match_new =  re.search(new_id_search, df['Atlas Placement ID'])
                            if match_new:
                                return match_new.group(0)

                            return 0


    mediaplan_df['Consolidated ID'] = mediaplan_df.apply(placement_extract, axis=1)

Aucun commentaire:

Enregistrer un commentaire