This is my code so far
import pandas as pd
from io import StringIO
data = StringIO("""
"name1","hej","7aa","a"
"name1","du","71al","a"
"name1","aj","74a","a"
"name1","oj","7aj","a"
"name2","fin","7ag","a"
"name2","katt","7a","a"
""")
df = pd.read_csv(data, header=0, names= .
["name","text2","text","as"])
df[['text2','text','as']] = df.groupby(['name']).transform(lambda
x: ','.join(x))
df = df[['name','text','text2','as']].drop_duplicates()
df
Gets me most of the way.
df
name text text2 as
0 name1 71al,74a,7aj du,aj,oj a,a,a
3 name2 7ag,7a fin,katt a,a
I just need one line to check each of the cols ['text','text2','as'] and if all comma separated elements are identical return just the first one
so the result I'm after is
df
name text text2 as
0 name1 71al,74a,7aj du,aj,oj a
3 name2 7ag,7a fin,katt a
I've tried apply with split(','). Can't get it to work.
Aucun commentaire:
Enregistrer un commentaire