mercredi 9 décembre 2020

Python - trying to get beautifulsoup to find words in a list, but it's unable to find them

I'm working on my first project that isn't straight out of a book but I'm having trouble getting a function to work.

The function receives a list of strings and a BeautifulSoup object and attempts to find each word in the soup.text. However, the code seems unable to find any words/strings at all even when I am certain it should be finding them. I checked and confirmed that the function is definitely receiving the list properly and that the URL works and returns what I expect it to when I do something like print(urlSoup).

The relevant code:

def find_words(words_list, urlSoup):
    for word in words_list:
        words_count = 0
        if word.casefold() in urlSoup:
            # ideally it should also count the number of times the word shows up with the 'words_count' bit,
            # but I have an impression that this also won't work how I want it to. 
            words_count += 1
            print("The word " + word + " was found " + str(words_count) + " times in " + url + ".")
        else:
            print("The word '" + word + "' was not found in the URL you provided.")

Things I have tried to fix the fact that the IF statement does not activate (presumably because it doesn't find any words/strings from the list in the soup.text) include removing the .casefold() bit, changing soup.text to soup.content and changing the IF statement to something like

if urlSoup.find_all(word):

I also changed the parser for BeautifulSoup to lxml but that didn't work either. At this point I'm a bit stuck and despite looking around a bit on Stack Overflow and in the bs4 documentation I haven't managed to crack this yet. I'm sure the solution is painfully obvious but as a beginner I'm afraid that I need a bit of help here.

I hope that I have provided enough information, please feel free to ask if you need me to explain further.

Aucun commentaire:

Enregistrer un commentaire