jeudi 17 mai 2018

Trying to read each line of a CSV file then grab a specific url in that line and check the title. If title exists, remove that line from CSV file?

Yes, the title is quite long. Allow me to explain.

I have a CSV file called md.csv. It stores the html code of each URL that is submitted.

Example:

<a href="https://example.com/1">name of url</a><br>
<a href="https://example.com/2">name of url</a><br>
<a href="https://example.com/3">name of url</a><br>

I would like to read each line of this csv file by only extracting the url.

then...

import csv 
import urllib2

#[ for loop function here ]
var = [read line from csv file and extract only the URL]
url = var 
html = urllib2.urlopen(url).read()
page_soup = soup(html, 'html.parser')
if page_soup.title.string == 'File not found': 
    delete that line in the CSV file
else
    continue

Am I even on the right track here?

Aucun commentaire:

Enregistrer un commentaire