lundi 4 novembre 2019

Reduce comma separated str in df if all strs identical

This is my code so far

import pandas as pd
from io import StringIO

data = StringIO("""
"name1","hej","7aa","a"
"name1","du","71al","a"
"name1","aj","74a","a"
"name1","oj","7aj","a"
"name2","fin","7ag","a"
"name2","katt","7a","a"
""")
df = pd.read_csv(data, header=0, names= . 
["name","text2","text","as"])
df[['text2','text','as']] = df.groupby(['name']).transform(lambda 
x: ','.join(x))
df = df[['name','text','text2','as']].drop_duplicates()
df

Gets me most of the way.

df
    name          text     text2     as
0  name1  71al,74a,7aj  du,aj,oj  a,a,a
3  name2        7ag,7a  fin,katt    a,a

I just need one line to check each of the cols ['text','text2','as'] and if all comma separated elements are identical return just the first one

so the result I'm after is

df
    name          text     text2     as
0  name1  71al,74a,7aj  du,aj,oj    a
3  name2        7ag,7a  fin,katt    a

I've tried apply with split(','). Can't get it to work.

Aucun commentaire:

Enregistrer un commentaire