I would like to create a variable based on specific condition. I have a text variable description
and I tried to create a loop based on this variable. What I'm trying to do is :
If word1
and word2
in description
ont the first row, then append classe
with word1
and word2
.
If word3 is contained alone for the second row, then append classe
with word3
and so on. And if there none of the three word then append with Others
.
Here's the loop I did so far :
container = ("word1", "word2", "word3")
classe = []
for i in range(0, df.shape[0]):
for n in container:
if [df['description'].str.contains(n).any()]:
classe.append(n)
else :
classe.append('Other')
And I would like something like that :
But I got this weird result instead, not at all what I want :
classe
['word1',
'word2'
'word3',
'Others',
'word1',
'word2'
'word3',
'Others',
'word1',
'word2'
'word3',
'Others',
...]
I obtained a shape for classe
of 3040
or df.shape[0]
is equal to 608
, so there is a trouble because I need to have the same. I need a shape of 608
for classe
. Something's wrong.
Any idead how to fix that ? Or if you have another solution, I'm also interested :)
Thanks.
Aucun commentaire:
Enregistrer un commentaire