lundi 16 avril 2018

How to prioritize a condition over another?

I've written a script to parse a link available within the visible text contact or about from each webpage. However, when I run my script I can see that my scraper always goes for parsing the link within about. It parses the link within contact only when about is not available. How can i make my script do the opposite, I meant it will look for the link connected to contact instead of about. If contact is not available then only it will parse about. I tried the below way to get it done but it is doing the way I described.

This is my try:

import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

links = (
    "http://www.mount-zion.biz/",
    "http://www.latamcham.org/",
    "http://www.innovaprint.com.sg/",
    "http://www.cityscape.com.sg/"
    )

def Get_Link(site):
    res = requests.get(site)
    soup = BeautifulSoup(res.text,"lxml")
    for item in soup.select("a[href]"):
        if "contact" in item.text.lower():
            abslink = urljoin(site,item['href']) ##I thought the script prioritizes the first condition but I am wrong
            print(abslink)
            break
        else:
            if "about" in item.text.lower():
                abslink = urljoin(site,item['href'])
                print(abslink)
                break

if __name__ == '__main__':
    for link in links:
        Get_Link(link)

Is there any way to prioritize a condition based on it's availability? The bottom line is I wanna get the link connected to contact. if it is not available then the script will look for the link connected to about.

Aucun commentaire:

Enregistrer un commentaire