dimanche 25 juillet 2021

Feature extraction: using if statements for multiple different features [closed]

I am trying to use if statements in multiple scenarios to append suffixes and prefixes to text. It seems as though python doesnt pick up the prefixes, only the suffixes. how can i make it clearer to python that these conditional statements are used for different feature extractions?

    # Suffix up to length 5
    if len(token) > 1:
        feature_list.append("SUF_" + token[-1:])
    if len(token) > 2:
        feature_list.append("SUF_" + token[-2:])
    if len(token) > 3:
        feature_list.append("SUF_" + token[-3:])
    if len(token) > 4:
        feature_list.append("SUF_" + token[-4:])
    if len(token) > 5:
        feature_list.append("SUF_" + token[-5:])
        
    # Prefix up to length 5
    if len(token) < 1:  
        feature_list.append("PRE_1" + token[0])
    if len(token) < 2:  
        feature_list.append("PRE_2" + token[:1])
    if len(token) < 3:  
        feature_list.append("PRE_3" + token[:2])
    if len(token) < 4:  
        feature_list.append("PRE_4" + token[:3])
    if len(token) < 5:  
        feature_list.append("PRE_5" + token[:4])

Below is the current output: ```

['SUF_P', 'SUF_BP', 'SUF_VBP', 'SUF_BP', 'SUF_VBP', 'WORD_steve', 'POS_PRPVBP']
['SUF_N', 'SUF_BN', 'SUF_VBN', 'SUF_BP', 'SUF_VBP', 'WORD_mcqueen', 'POS_VBN']
['SUF_N', 'SUF_BN', 'SUF_VBN', 'SUF_BP', 'SUF_VBP', 'WORD_provided', 'POS_VBN']
['SUF_T', 'SUF_DT', 'SUF__DT', 'SUF_BP', 'SUF_VBP', 'WORD_a', 'POS_DT']
['SUF_N', 'SUF_NN', 'SUF__NN', 'SUF_BP', 'SUF_VBP', 'WORD_thrilling', 'POS_NN']
['SUF_N', 'SUF_NN', 'SUF__NN', 'SUF_BP', 'SUF_VBP', 'WORD_motorcycle', 'POS_NN']
```

desired output would also include prefix such that each line would include the following:

['SUF_P', 'SUF_BP', 'SUF_VBP', 'SUF_BP', 'SUF_VBP','PRE_1P', 'PRE_2BP', 'PRE_3VBP', 'PRE_4BP', 'PRE_5VBP' 'WORD_steve', 'POS_PRPVBP']

Aucun commentaire:

Enregistrer un commentaire