Yes, the title is quite long. Allow me to explain.
I have a CSV file called md.csv. It stores the html code of each URL that is submitted.
Example:
<a href="https://example.com/1">name of url</a><br>
<a href="https://example.com/2">name of url</a><br>
<a href="https://example.com/3">name of url</a><br>
I would like to read each line of this csv file by only extracting the url.
then...
import csv
import urllib2
#[ for loop function here ]
var = [read line from csv file and extract only the URL]
url = var
html = urllib2.urlopen(url).read()
page_soup = soup(html, 'html.parser')
if page_soup.title.string == 'File not found':
delete that line in the CSV file
else
continue
Am I even on the right track here?
Aucun commentaire:
Enregistrer un commentaire