So to turn a text file into a dataframe of features, I am writing a custom function that would be able to do so. Now I want the function to be able to find question/exclamation marks in the text input to then transform this into a value in a df.column. My part of the function looks like this:
discount = ['[%]','[€]','[$]','[£]','korting','deal','discount','reduct','remise','voucher',
'descuento', 'rebaja', 'скидка', 'sconto','rabat','alennus','kedvezmény',
'할인','折扣','ディスカウント','diskon']
data = [text_input.split()]
for word in data:
if any(char in discount for char in word):
df['discount'] = 1
else:
df['discount'] = 0
for word in data:
if any(char == '!' for char in word):
df['exclamation'] = 1
else:
df['exclamation'] = 0
for word in data:
if any(char == '?' for char in word):
df['question'] = 1
else:
df['question'] = 0
The problem is that if the text input, for example, contains: 'discount!' it does not recognize the '!' or word 'discount', resulting in a 0 in both the specified columns. Now if I remove the '!' from 'discount' it recognizes them both.
Therefore I am wondering how I need to split my text_input to make sure it strips the '!' from the words. Or is there a more efficient way to find these characters?
Thanks in advance!
Aucun commentaire:
Enregistrer un commentaire