I want to categorize data based on certain keyword that exists in column. The pseudocode should be:
- Program checks if the any of keywords in data dictionary exists in dataframe
- If exists, create new column based on the data dictionary
- If not exists, create new column with data "OTHERS"
Problem: So far the code is able to do it, but it only categorize data if the dataframe is exactly same as the keywords in data dictionary. For example:
- if data is "scotland", categorized as "scotland.
- if data is "I love scotland", it should categorize as "scotland" too, but current program categorized as "others"
code:
import pandas as pd
data = {'country': ['cheshire','scotland', 'scot', 'scot54','sctland is my country', 'here is Cambrgeshire','Cambridgeshire','County of Cambridgeshire Tourism Website','berkshire']}
# Create DataFrame
df = pd.DataFrame(data)
print(df)
def func(a):
scotland_dict = {k:"scotland" for k in ['scotland','scot','sctland']}
cambridgeshire_dict = {k:"cambridgeshire" for k in ['Cambrgeshire','cambridgeshire','idgeshire']}
city_dict = {**scotland_dict ,**cambridgeshire_dict }
if a.lower() in city_dict.keys():
return city_dict[a.lower()]
elif "cheshire" in a.lower():
return "cheshire"
else:
return "others"
df["city"] = df["country"].apply(lambda x: func(x))
print(df["city"])
current output:
0 cheshire
1 scotland
2 scotland
3 others
4 others
5 others
6 cambridgeshire
7 others
8 others
Expected output:
0 cheshire
1 scotland
2 scotland
3 scotland
4 scotland
5 cambridgeshire
6 cambridgeshire
7 cambridgeshire
8 others
Updated: What I've tried:
if city_dict.keys() in a.lower():
return city_dict[a.lower()]
elif "cheshire" in a.lower():
...
Error:
Exception has occurred: TypeError
'in <string>' requires string as left operand, not dict_keys
File "/home/abyres/testa.py", line 22, in func
if city_dict.keys() in a.lower():
...
Aucun commentaire:
Enregistrer un commentaire