mardi 24 octobre 2017

Unable to keep a particular item and ignores the rest

I've written some code in python to scrape a particular item from three different sites. For each sites the item within elements are different. So, i had to create three different selectors to catch those. My script will look for a item in a site, then it goes after another if it fails to find in the first site and so on. What I wanna achieve is that if the scraper finds the item in it's first search which means in the first link then it will ignore the rest of the links and so on. However, the issue I'm facing is that if the scraper finds its desired item in link two and then again when it goes to the third link and does not find anything, it prints nothing. How can I fix my script so that it will stop searching any particular item as soon as it finds it in any of the links.

The appearance of my script is more like the below one:

import requests
from lxml.html import fromstring

list_urls = ['url1','url2','url3']

for link in list_urls:
    res = requests.get(link).text
    root = fromstring(res)
    try:
        item = root.cssselect(some_selector)[0].text
    except:
        item =""
    try:
        item = root.cssselect(another_selector)[0].text
    except:
        item =""
    try:
        item = root.cssselect(some_other_selector)[0].text
    except:
        item =""
    print(item)

Aucun commentaire:

Enregistrer un commentaire