mardi 10 avril 2018

Scrapy replace the image with multiple if statement after check the url response

This question is related to this post: Python Scrapy check xpath url image exists

So refining the early post where I received help for the answer and continuing with the same code I was wondering if it's possible to do this:

The code will check for 3 images. If the 1st is ok will be printed. If the 1st is broken and 2nd is ok, will be print the 2nd. If the 1st and the 2nd are broken, it will be printed the 3rd. If the 3 images are broken, only then will be printing the referential image.

This is what I tried:

        for picUrl in ntp.xpath('//div/p[3]/img'):

            imgUrl1 = response.urljoin(ntp.xpath('//div/p[3]/img[1]/@src').extract_first())
            imgUrl2 = response.urljoin(ntp.xpath('//div/p[3]/img[2]/@src').extract_first())

             if 404 in {requests.get(imgUrl1).status_code}:
                 picUrl = ("http://i.ebayimg.com/00/s/MTQyN1gxNjAw/z/w-oAAOSwtfhYpIK3/$_7.JPG")
             else:
                 picUrl = (imgUrl1)

             if 404 in {requests.get(imgUrl2).status_code}:
                 picUrl = ("http://i.ebayimg.com/00/s/MTQyN1gxNjAw/z/w-oAAOSwtfhYpIK3/$_7.JPG")
             else:
                 picUrl = imgUrl2

I also tried this:

        for picUrl in ntp.xpath('//div/p[3]/img'):

            imgUrl1 = response.urljoin(ntp.xpath('//div/p[3]/img[1]/@src').extract_first())
            imgUrl2 = response.urljoin(ntp.xpath('//div/p[3]/img[2]/@src').extract_first())
            imgUrl3 = response.urljoin(ntp.xpath('//div/p[3]/img[3]/@src').extract_first())

            if 404 in {requests.get(imgUrl1).status_code, requests.get(imgUrl2).status_code, requests.get(imgUrl3).status_code}:
                picUrl = ("http://i.ebayimg.com/00/s/MTQyN1gxNjAw/z/w-oAAOSwtfhYpIK3/$_7.JPG")
            elif 200 in {requests.get(imgUrl1).status_code, requests.get(imgUrl2).status_code, requests.get(imgUrl3).status_code}:
                picUrl = (imgUrl1 or imgUrl2 or imgUrl3)

And this:

        imgUrl1 = response.urljoin(ntp.xpath('//div/p[3]/img[1]/@src').extract_first())
        imgUrl2 = response.urljoin(ntp.xpath('//div/p[3]/img[2]/@src').extract_first())
        imgUrl3 = response.urljoin(ntp.xpath('//div/p[3]/img[3]/@src').extract_first())
        if (requests.get(imgUrl1).status_code == 404 or requests.get(imgUrl2).status_code == 404 or requests.get(imgUrl3).status_code == 404):
            picUrl = ("http://i.ebayimg.com/00/s/MTQyN1gxNjAw/z/w-oAAOSwtfhYpIK3/$_7.JPG")
        else:
            picUrl = (response.urljoin(ntp.xpath('//div/p[3]/img[1]/@src').extract_first()) or response.urljoin(ntp.xpath('//div/p[3]/img[2]/@src').extract_first()) or response.urljoin(ntp.xpath('//div/p[3]/img[3]/@src').extract_first()))

And also this:

        for picUrl in ntp.xpath('//div/p[3]/img'):

            imgUrl1 = response.urljoin(ntp.xpath('//div/p[3]/img[1]/@src').extract_first())
            imgUrl2 = response.urljoin(ntp.xpath('//div/p[3]/img[2]/@src').extract_first())
            imgUrl3 = response.urljoin(ntp.xpath('//div/p[3]/img[3]/@src').extract_first())

            if 200 in {requests.get(imgUrl1).status_code}:
                picUrl = (imgUrl1)
            elif 200 in {requests.get(imgUrl2).status_code}:
                picUrl = (imgUrl2)
            elif 200 in {requests.get(imgUrl3).status_code}:
                picUrl = (imgUrl3)
            else:
                picUrl = ("http://i.ebayimg.com/00/s/MTQyN1gxNjAw/z/w-oAAOSwtfhYpIK3/$_7.JPG")

As you can see, I am trying to get the exact answer, the results may vary. It can replace the image for referential one or not, in some cases it print the url of the site of the product, in another cases print the image url broken, in some cases print the second image if the first doesn't exist but most of the cases print the first image.

I also noted that the code take more time to check if the url is broken or not, also I noted that the code go back and forward checking the same image url.

Despite the fact that the answer to my original post was answered it left me wondering if would be possible to improve the code, so after several searches and tryouts I ended here again.

Again any help will be welcome

Aucun commentaire:

Enregistrer un commentaire