Lately I've been trying to work with a loop that tries to scrap some pages, with some exceptions. What I want is to scrap pages that start with 'http://www2' , but don't start with 'http://www2.abcd.com/xyz'. What I tried is:
for li in links:
if (li.startswith('http://www2') and not li.startswith('http://www2.abcd.com/xyz')):
But this still brings pages that start with 'http://www2.abcd.com/xyz'. I guess it's a simple solution, but I can't grasp what I'm doing wrong. Also tried using re.compile
, but it doesn't work with startswith
.
Any ideas?
Aucun commentaire:
Enregistrer un commentaire