lundi 1 février 2021

Python: get n next elements in list if they match condition

I have a list that contains lists of strings :

[['Far', 'far', 'away', ',', 'behind', 'the', 'word', 'mountains', ',', 'far', 'from', 'the', 'countries', 'Vokalia', 'and', 'Consonantia', ',', 'there', 'live', 'the', 'blind', 'texts', '.'], ['Separated', 'they', 'live', 'in', 'Bookmarksgrove', 'right', 'at', 'the', 'coast', 'of', 'the', 'Semantics', ',', 'a', 'large', 'language', 'ocean', '.'], ['A', 'small', 'river', 'named', 'Duden', 'flows', 'by', 'their', 'place', 'and', 'supplies', 'it', 'with', 'the', 'necessary', 'regelialia', '.'], ['It', 'is', 'a', 'paradisematic', 'country', ',', 'in', 'which', 'roasted', 'parts', 'of', 'sentences', 'fly', 'into', 'your', 'mouth', '20', '2', '.'], ['Even', 'the', 'all', 'powerful', 'Pointing', 'has', 'no', 'control', 'about', 'the', 'blind', 'texts', 'it', 'is', 'an', 'almost', 'unorthographic', 'life', '1', 'day', 'however', 'a', 'small', 'line', 'of', 'blind', 'text', 'by', 'the', 'name', 'of', 'Lorem', 'Ipsum', 'decided', 'to', 'leave', 'for', 'the', 'far', 'World', 'of', 'Grammar', '.'], ['The', 'Big', 'Oxmox', 'advised', 'her', 'not', 'to', 'do', 'so', ',', 'because', 'there', 'were', 'thousands', 'of', 'bad', 'Commas', ',', 'wild', 'Question', 'Marks', 'and', 'devious', 'Semikoli', ',', 'but', 'the', 'Little', 'Blind', 'Text', 'didn', '’', 't', 'listen', '.'], ['She', 'packed', 'her', '7', 'versalia', ',', 'put', 'her', 'initial', 'into', 'the', 'belt', 'and', 'made', 'herself', '40', '4', 'on', 'the', 'way', '.'], ['When', 'she', 'reached', 'the', 'first', 'hills', 'of', 'the', 'Italic', 'Mountains', ',', 'she', 'had', 'a', 'last', 'view', '3', '00', 'and', '3', 'back', 'on', 'the', 'skyline', 'of', 'her', 'hometown', 'Bookmarksgrove', ',', '2', '000', 'and', '20', '1', 'the', 'headline', 'of', 'Alphabet', 'Village', 'and', 'the', '2', '000', ',', '20', '1', 'subline', 'of', 'her', 'own', 'road', ',', 'the', 'Line', 'Lane', '.'], ['Pityful', 'a', 'rethoric', 'question', 'ran', 'over', 'her', 'cheek', ',', 'then']]

I want to loop through every sublist and capture every sequence of digits in a list. so I want to capture, in groups:

['20', '2'],[1],['3', '00', 'and', '3],['20', '1],['2', '000', ',', '20', '1']

So far this is what I managed to do:

for sentence in listToks :
    temp = []
    print(sentence)
    for i in range(len(sentence)-1) :
        if sentence[i].isdigit() :
            if not sentence[i+1].isdigit() and not sentence[i+1] in ["and",","]:
                temp.append(sentence[i])
            else :
                temp.append(sentence[i:i+2])
    print("==>",temp)
    print('\n')

the output is:

['It', 'is', 'a', 'paradisematic', 'country', ',', 'in', 'which', 'roasted', 'parts', 'of', 'sentences', 'fly', 'into', 'your', 'mouth', '20', '2', '.']
==> [['20', '2'], '2']

['She', 'packed', 'her', '7', 'versalia', ',', 'put', 'her', 'initial', 'into', 'the', 'belt', 'and', 'made', 'herself', '40', '4', 'on', 'the', 'way', '.']
==> ['7', ['40', '4'], '4']

['When', 'she', 'reached', 'the', 'first', 'hills', 'of', 'the', 'Italic', 'Mountains', ',', 'she', 'had', 'a', 'last', 'view', '3', '00', 'and', '3', 'back', 'on', 'the', 'skyline', 'of', 'her', 'hometown', 'Bookmarksgrove', ',', '2', '000', 'and', '20', '1', 'the', 'headline', 'of', 'Alphabet', 'Village', 'and', 'the', '2', '000', ',', '20', '1', 'subline', 'of', 'her', 'own', 'road', ',', 'the', 'Line', 'Lane', '.']
==> [['3', '00'], ['00', 'and'], '3', ['2', '000'], ['000', 'and'], ['20', '1'], '1', ['2', '000'], ['000', ','], ['20', '1'], '1']

I also tried:

for sentence in listToks :
    temp = []
    print(sentence)
    for i in range(len(sentence)-1) :
        if sentence[i].isdigit() :
            if not sentence[i+1].isdigit() and not sentence[i+1] in ["and",","]:
                temp.append(sentence[i])
            else :
                while True :
                    temp2 =[]
                    temp2.append(sentence[i])
                    i+=1
                    if not sentence[i+1].isdigit() and not sentence[i+1] in ["and",","]:
                        break
                temp.append(temp2)         
    print("==>",temp)
    print('\n')

and the output is:

['It', 'is', 'a', 'paradisematic', 'country', ',', 'in', 'which', 'roasted', 'parts', 'of', 'sentences', 'fly', 'into', 'your', 'mouth', '20', '2', '.']
==> [['20'], '2']


['Even', 'the', 'all', 'powerful', 'Pointing', 'has', 'no', 'control', 'about', 'the', 'blind', 'texts', 'it', 'is', 'an', 'almost', 'unorthographic', 'life', '1', 'day', 'however', 'a', 'small', 'line', 'of', 'blind', 'text', 'by', 'the', 'name', 'of', 'Lorem', 'Ipsum', 'decided', 'to', 'leave', 'for', 'the', 'far', 'World', 'of', 'Grammar', '.']
==> ['1']

['She', 'packed', 'her', '7', 'versalia', ',', 'put', 'her', 'initial', 'into', 'the', 'belt', 'and', 'made', 'herself', '40', '4', 'on', 'the', 'way', '.']
==> ['7', ['40'], '4']

['When', 'she', 'reached', 'the', 'first', 'hills', 'of', 'the', 'Italic', 'Mountains', ',', 'she', 'had', 'a', 'last', 'view', '3', '00', 'and', '3', 'back', 'on', 'the', 'skyline', 'of', 'her', 'hometown', 'Bookmarksgrove', ',', '2', '000', 'and', '20', '1', 'the', 'headline', 'of', 'Alphabet', 'Village', 'and', 'the', '2', '000', ',', '20', '1', 'subline', 'of', 'her', 'own', 'road', ',', 'the', 'Line', 'Lane', '.']
==> [['and'], ['and'], '3', ['20'], ['20'], ['20'], '1', ['20'], ['20'], ['20'], '1']

What I want is something like:

[['7'], ['40', '4']]
[['3', '00', 'and', '3'],['2', '000', 'and', '20', '1']]
etc.

The idea is to recreate the numbers: 20+1 = 21, 2+000=2000, 21+2000= 2021
Thank you

Aucun commentaire:

Enregistrer un commentaire