lundi 8 mars 2021

Creating a Pandas dataframe column which is conditional on a function

Say I have some dataframe like below and I create a new column (track_len) which gives the length of the column track_no.

import pandas as pd
df = pd.DataFrame({'item_id': [1,2,3], 'track_no': ['qwerty23', 'poiu2', 'poiuyt5']})

df['track_len'] = df['track_no'].str.len()
df.head()

My Question is:

How do I now create a new column (new_col) which selects a specific subset of the track_no string and outputs that depending on the length of the track number (track_len).

I have tried creating a function which outputs the specific string slice of the track_no given the various track_len conditions and then use an apply method to create the column and it doesnt work. The code is below:

Tried:


def f(row):
    if row['track_len'] == 8:
        val = row['track_no'].str[0:3]
    elif row['track_len'] == 5:
        val = row['track_no'].str[0:1]
    elif row['track_len'] =7:
        val = row['track_no'].str[0:2]
    return val

df['new_col'] = df.apply(f, axis=1)
df.head()

Thus the desired output should be (based on string slicing output of f):

Output

{new_col: ['qwe', 'p', 'po']}

If there are alternative better solutions to this problem those would also be appreciated.

Aucun commentaire:

Enregistrer un commentaire