Right now I've implemented serveral ifs to specify each condition, once the case is specified:
- strip
@(@only appears at the end of a token) - join with the following words
- replace the afterwards tokens with the newly created token
But as you see (from the code), it's very repetitive, could anyone suggest a more concise way to present the code?
A snapshot of the code:
# str2_tokens is the tokenized sentence
for i in range(len(str2_tokens)):
if "@" in str2_tokens[i] and "@" in str2_tokens[i+1] and "@" in str2_tokens[i+2]:
str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@") +\
str2_tokens[i+2].strip("@") + str2_tokens[i+3].strip("@")
str2_tokens[i+1] = str2_tokens[i]
str2_tokens[i+2] = str2_tokens[i]
str2_tokens[i+3] = str2_tokens[i]
if "@" in str2_tokens[i] and "@" in str2_tokens[i+1]:
str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@") +\
str2_tokens[i+2].strip("@")
str2_tokens[i+1] = str2_tokens[i]
str2_tokens[i+2] = str2_tokens[i]
if "@" in str2_tokens[i]:
str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@")
str2_tokens[i+1] = str2_tokens[i]
Edited
For instance:
Case 1: input is paper and board — determination of the ink absorb@@ ency and would like to obtain an output of paper and board — determination of the ink absorbency absorbency, absorbency repeated twice since two tokens've got combined.
Case 2: input is related substance in f@@ ti@@ bam@@ zone can be determined with this method and would like to obtain an output of related substance in ftibamzone ftibamzone ftibamzone ftibamzone can be determined with this method, ftibamzone repeated 4 times since 4 tokens've got combined.
Number of tokens with @ could be any.
Aucun commentaire:
Enregistrer un commentaire