lundi 19 février 2018

Unexpected Behavior Scraping Images With Selenium

Can someone help me understand why my function here doesn't return each individual url in the list of urls I provide as a parameter and why I am getting the following output? I am simply trying to return the url for each item and the list and all the corresponding images for the item for each url.

beta_test_items = ['https://www.facebook.com/marketplace/item/2009940172578816',
 'https://www.facebook.com/marketplace/item/1591865710899243']

from selenium import webdriver
from time import sleep
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

def scrape_item_details(beta_test_items, first=True):
    #finish this function
    for url in beta_test_items:
    driver.get(url)
    sleep(3)
    image_element = driver.find_element_by_xpath('//img[contains(@class, "_5m")]')
    if first:
        images = [image_element.get_attribute('src')]
        first = False
    else:
        pass
    print(images)

    #try:
    previous_and_next_buttons = driver.find_elements_by_xpath("//i[contains(@class, '_3ffr')]")
    next_image_button = previous_and_next_buttons[1]
    print(next_image_button.text)
    if  next_image_button.is_displayed():
        next_image_button.click()

        image_element = driver.find_element_by_xpath('//img[contains(@class, "_5m")]')
        print(image_element.get_attribute('src'))
        sleep(2)   


        if image_element.get_attribute('src') in images:
            pass
        else:
            images.append(image_element.get_attribute('src'))

    else:
        pass
    #except Exception:
    #    print(Exception)
    #    pass

    return(url, images)

I get the following output when I try it run it currently and Im not sure why its stopping on the first url after the second photo is appended to the list of images:

In [46]: scrape_item_details(beta_items_list)
['https://scontent-atl3-1.xx.fbcdn.net/v/t1.0-9/27750896_2002108023449096_2229019388723795634_n.jpg?oh=26d3fe06595affdcbd142754766fe934&oe=5B0933C9']
Next
https://scontent-atl3-1.xx.fbcdn.net/v/t1.0-9/27655331_2002108026782429_4575620607831413757_n.jpg?oh=a7c94bc2b8ef8b39bc65291b641f7953&oe=5B0A11DD
Out[46]: 
('https://www.facebook.com/marketplace/item/2009940172578816',
 ['https://scontent-atl3-1.xx.fbcdn.net/v/t1.0-9/27750896_2002108023449096_2229019388723795634_n.jpg?oh=26d3fe06595affdcbd142754766fe934&oe=5B0933C9',
  'https://scontent-atl3-1.xx.fbcdn.net/v/t1.0-9/27655331_2002108026782429_4575620607831413757_n.jpg?oh=a7c94bc2b8ef8b39bc65291b641f7953&oe=5B0A11DD'])

Aucun commentaire:

Enregistrer un commentaire