vendredi 6 juillet 2018

Python List Index Out of Range on If Statement

I am writing a script to scrape a proxy site, and am stuck on a scenario where I receive a list index error while evaluating if a variable is True or False.

To save space, I haven't included the full html (can be found here - https://pastebin.com/meJKmtTB ), however I can confirm that in the hardcoded HTML I am using to test on, the lists equal the following after iterating through all the table rows.

elite_seconds = False

elite_minutes = True

elite_minutes = True

elite_proxies_seconds = []

elite_proxies_minutes = [('168.235.79.189', '10808', 'US', 'elite proxy', 'yes', '10 minutes ago'), ('185.26.153.4', '53281', 'US', 'elite proxy', 'yes', '11 minutes ago'), ('104.238.146.146', '8118', 'US', 'elite proxy', 'yes', '20 minutes ago'), ('35.198.23.6', '8080', 'US', 'elite proxy', 'yes', '20 minutes ago'), ('38.29.146.10', '53281', 'US', 'elite proxy', 'yes', '20 minutes ago'), ('72.14.19.2', '53281', 'US', 'elite proxy', 'yes', '30 minutes ago'), ('97.107.153.28', '8080', 'US', 'elite proxy', 'yes', '41 minutes ago'), ('209.190.4.117', '8080', 'US', 'elite proxy', 'yes', '51 minutes ago')]

elite_proxies_minute = [('204.52.206.65', '8080', 'US', 'elite proxy', 'yes', '1 minute ago')]

I receive this error @ if elite_seconds is True: and do not know why. Please not some of the indentation isn't working properly on the copy/paste. Any help would be greatly appreciated!!

Traceback (most recent call last): File "proxy_scrape.py", line 90, in if elite_seconds is True: IndexError: list index out of range

my code below:


page_soup = BeautifulSoup(html, "html.parser")

table_find = page_soup.findAll('table')

proxy_table = table_find[0]

rows = proxy_table.findAll('tr')

elite_proxies_seconds = []

elite_seconds = False

elite_proxies_minutes = []

elite_minutes = False

elite_proxies_minute = []

elite_minute = False

for x in rows:

columns = x.findAll('td')

if len(columns) > 0:

    ip = columns[0].get_text().strip()
    port = columns[1].get_text().strip()
    code = columns[2].get_text().strip()
    anon = columns[4].get_text().strip()
    https = columns[6].get_text().strip()
    update = columns[7].get_text().strip()

    update_time = update.split()



    if anon == 'elite proxy':
        if https == 'yes':
            if code == 'US':
                if update_time[1] == 'seconds': 

elite_proxies_seconds.append((ip,port,code,anon,https,update))

                    elite_seconds = True

                if update_time[1] == 'minutes':
                    elite_proxies_minutes.append((ip,port,code,anon,https,update))

                    elite_minutes = True

                if update_time[1] == 'minute':
                    elite_proxies_minute.append((ip,port,code,anon,https,update))

                    elite_minute = True

if elite_seconds is True:

n_proxies = (len(elite_proxies_seconds) - 1)

i = randint(0,n_proxies)

proxy = elite_proxies_seconds[i]

elif elite_minute is True:

n_proxies = (len(elite_proxies_minute) - 1)

i = randint(0,n_proxies)

proxy = elite_proxies_seconds[i]

elif elite_minutes is True:

n_proxies = (len(elite_proxies_minutes) - 1)

i = randint(0,n_proxies)

proxy = elite_proxies_minutes[i]

proxy_ip = proxy[0]

proxy_port = proxy[1]

proxy_line = 'https://' + str(proxy_ip) + ':' + str(proxy_port)

PROXY = {"https": proxy_line}

Aucun commentaire:

Enregistrer un commentaire