I am working on scraping links from a Christmas tree farm website. First, I used this tutorial method to get all the links. Then, I noticed that the links that I wanted did not lead with the proper hypertext transfer protocol, so I created a variable to concatenate. Now I am trying to create a if statement that grabs each link and looks for any two characters followed by "xmastrees.php". If that is true then my concatenate variable to the front of it. If the link does not contain the specific text then it is deleted. For example NYxmastrees.php will be http://www.pickyourownchristmastree.org/NYxmastrees.php and ../disclaimer.htm will be removed. I've tried multiple ways, but can't seem to find the right one.
Here is what I currently have and keep running into a syntax error: del. I commented out that line and get another error saying my string object has no attribute 're'. This confuses me because I though i could use regex with strings??
source = requests.get('http://www.pickyourownchristmastree.org/').text
soup = BeautifulSoup(source, 'lxml')
concatenate = 'http://www.pickyourownchristmastree.org/'
find_state_group = soup.find('div', class_ = 'alert')
for link in find_state_group.find_all('a', href=True):
if link['href'].re.search('^.\B.\$xmastrees'):
states = concatenate + link
else del link['href']
print(link['href']
Error with else del link['href']:
else del link['href']
^
SyntaxError: invalid syntax
Error without else del link['href']:
if link['href'].re.search('^.\B.\$xmastrees'):
AttributeError: 'str' object has no attribute 're'
Aucun commentaire:
Enregistrer un commentaire