I am trying to do something that would go something like this below. I am trying to determine the language of a job description so my assumption is that if there is not a div class or h2 for it already, I would have to review the job description to find keywords that represent each language (such as 'and'), that I could preferably build some kind of directory. How could I do something like below?
P.S. I am a complete beginner to python and code in general, so simple terms if possible. This is my assumption of framework structures if I made one up myself. I hope it is easy to understand. The purpose is for scraping job offers from company pages.
## Last one is the language. I would need to to first review the language of the text
if there's a tool to do that automatically
## or I could set up a review job description and create a keyword list for each
language.
Of course it is easier if they have this part on the job offer page.
language = joblink.get('h2', class_ = language)(language.redirect)
#keep reading for this directory
return if True #if the page has this
if False: #if the page doesn't have this
language = jobdescription.search(languagetxt.review & language.review):
if duplicates.remove
if multiple(language.redirect)
#'languagetxt.review' Directory
' und ' => 'German'
' and ' => 'English'
' et ' => 'French'
#'language.review' Directory
'German' => 'German'
'Native German' => 'German'
'Fluent in German' => 'German'
#'language.redirect' Directory
'English and German' => 'German and English'
'German, English' => 'German and English'
#expect it to read something like (German, English, English) = 'German and
English'
because of multiple
# result would look something like
# German and English
Aucun commentaire:
Enregistrer un commentaire