jeudi 18 juin 2020

Check for special character in text with Python function

So to turn a text file into a dataframe of features, I am writing a custom function that would be able to do so. Now I want the function to be able to find question/exclamation marks in the text input to then transform this into a value in a df.column. My part of the function looks like this:

discount = ['[%]','[€]','[$]','[£]','korting','deal','discount','reduct','remise','voucher', 
            'descuento', 'rebaja', 'скидка', 'sconto','rabat','alennus','kedvezmény',
            '할인','折扣','ディスカウント','diskon']
data = [text_input.split()]

for word in data:
    if any(char in discount for char in word):
        df['discount'] = 1
    else:
        df['discount'] = 0
for word in data:
    if any(char == '!' for char in word):
        df['exclamation'] = 1
    else:
        df['exclamation'] = 0
for word in data:
    if any(char == '?' for char in word):
        df['question'] = 1
    else:
        df['question'] = 0

The problem is that if the text input, for example, contains: 'discount!' it does not recognize the '!' or word 'discount', resulting in a 0 in both the specified columns. Now if I remove the '!' from 'discount' it recognizes them both.

Therefore I am wondering how I need to split my text_input to make sure it strips the '!' from the words. Or is there a more efficient way to find these characters?

Thanks in advance!

Aucun commentaire:

Enregistrer un commentaire