vendredi 20 octobre 2017

Join contigous tokens if the token includes "@" char

Right now I've implemented serveral ifs to specify each condition, once the case is specified:

  1. strip @ (@ only appears at the end of a token)
  2. join with the following words
  3. replace the afterwards tokens with the newly created token

But as you see (from the code), it's very repetitive, could anyone suggest a more concise way to present the code?

A snapshot of the code:

   # str2_tokens is the tokenized sentence
   for i in range(len(str2_tokens)):
        if "@" in str2_tokens[i] and "@" in str2_tokens[i+1] and "@" in str2_tokens[i+2]:
            str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@") +\
                             str2_tokens[i+2].strip("@") + str2_tokens[i+3].strip("@")
            str2_tokens[i+1] = str2_tokens[i]
            str2_tokens[i+2] = str2_tokens[i]
            str2_tokens[i+3] = str2_tokens[i]

        if "@" in str2_tokens[i] and "@" in str2_tokens[i+1]:
            str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@") +\
                             str2_tokens[i+2].strip("@")
            str2_tokens[i+1] = str2_tokens[i]
            str2_tokens[i+2] = str2_tokens[i]

        if "@" in str2_tokens[i]:
            str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@")
            str2_tokens[i+1] = str2_tokens[i]

Edited

For instance:

Case 1: input is paper and board — determination of the ink absorb@@ ency and would like to obtain an output of paper and board — determination of the ink absorbency absorbency, absorbency repeated twice since two tokens've got combined.

Case 2: input is related substance in f@@ ti@@ bam@@ zone can be determined with this method and would like to obtain an output of related substance in ftibamzone ftibamzone ftibamzone ftibamzone can be determined with this method, ftibamzone repeated 4 times since 4 tokens've got combined.

Number of tokens with @ could be any.

Aucun commentaire:

Enregistrer un commentaire