jeudi 17 décembre 2020

How to review a paragraph on page and search for keywords? (and also building a list/directory of keywords redirecting to the correct result)

I am trying to do something that would go something like this below. I am trying to determine the language of a job description so my assumption is that if there is not a div class or h2 for it already, I would have to review the job description to find keywords that represent each language (such as 'and'), that I could preferably build some kind of directory. How could I do something like below?

P.S. I am a complete beginner to python and code in general, so simple terms if possible. This is my assumption of framework structures if I made one up myself. I hope it is easy to understand. The purpose is for scraping job offers from company pages.

     ## Last one is the language. I would need to to first review the language of the text 
     if there's a tool to do that automatically
    ## or I could set up a review job description and create a keyword list for each 
       language. 
       Of course it is easier if they have this part on the job offer page.
    
    language = joblink.get('h2', class_ = language)(language.redirect) 
     #keep reading for this directory
        return if True #if the page has this
        if False: #if the page doesn't have this
            language = jobdescription.search(languagetxt.review & language.review):
                if duplicates.remove
                if multiple(language.redirect)
            #'languagetxt.review' Directory
            ' und ' => 'German'
            ' and ' => 'English'
            ' et ' => 'French'
            #'language.review' Directory
            'German' => 'German'
            'Native German' => 'German'
            'Fluent in German' => 'German'
            #'language.redirect' Directory
            'English and German' => 'German and English'
            'German, English' => 'German and English'
    
            #expect it to read something like (German, English, English) = 'German and 
        English' 
    
        because of multiple
        
        # result would look something like 
        # German and English

Aucun commentaire:

Enregistrer un commentaire