I want to categorizing data by city based on string in a column in dataframe. Firstly, I've tried to create the if-else statement but the code become so long. So, So, i plan to create the if-else statement based on array in which the query read if there is same data between data in dataframe and array, before categorizing the data.
Sample data:
**full**
london
menchester
mench
lndon
lndn
chester
scotland
scot
menches
manchaster
My code is
import pandas as pd
data = pd.read_excel (r'/c:/Documents/data.xlsx')
def func(a):
london = ['london','lo','ldn','lnn','lndon','lon','ld','ndn']
menchester = ['hester','ester','mencstr']
if str(london) in a.lower():
return "london"
elif str(menchester) in a.lower():
return "menchester"
else:
return "others"
data["city"] = data["full"].apply(lambda x: func(x))
This code is definitely wrong. but im not sure how to change it so that I dont have to create a long if-else statement, but instead that if-else statement will compare the data from array/dictionary. Note: data in code is just an example, the real data is big.
Aucun commentaire:
Enregistrer un commentaire