if-statement: Web Scraping links

vendredi 24 avril 2020

Web Scraping links

I am working on scraping links from a Christmas tree farm website. First, I used this tutorial method to get all the links. Then, I noticed that the links that I wanted did not lead with the proper hypertext transfer protocol, so I created a variable to concatenate. Now I am trying to create a if statement that grabs each link and looks for any two characters followed by "xmastrees.php". If that is true then my concatenate variable to the front of it. If the link does not contain the specific text then it is deleted. For example NYxmastrees.php will be http://www.pickyourownchristmastree.org/NYxmastrees.php and ../disclaimer.htm will be removed. I've tried multiple ways, but can't seem to find the right one.

Here is what I currently have and keep running into a syntax error: del. I commented out that line and get another error saying my string object has no attribute 're'. This confuses me because I though i could use regex with strings??

source = requests.get('http://www.pickyourownchristmastree.org/').text
soup = BeautifulSoup(source, 'lxml')
concatenate = 'http://www.pickyourownchristmastree.org/'

find_state_group = soup.find('div', class_ = 'alert')
for link in find_state_group.find_all('a', href=True):
    if link['href'].re.search('^.\B.\$xmastrees'):
        states = concatenate + link
    else del link['href']
    print(link['href']

Error with else del link['href']:

    else del link['href']
           ^
SyntaxError: invalid syntax

Error without else del link['href']:

    if link['href'].re.search('^.\B.\$xmastrees'):
AttributeError: 'str' object has no attribute 're'

if-statement

vendredi 24 avril 2020

Web Scraping links

Aucun commentaire:

Enregistrer un commentaire