From the URL I want to extract profile information of this care home: The information is given in the following format on the website: https://www.carehome.co.uk/carehome.cfm/searchazref/10001005FITA
Group: Excelcare Holdings
Person in charge: Denise Marks (Registered Manager)
Local Authority / Social Services: London Borough of Tower Hamlets Council (click for contact details)
etc
My get_deets function is only outputting the first elements in their respective lists "tag" and "sibling". I want the entire list of tag text and corresponding information aswell.
SCRIPT
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup as soup
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\Users\Main\Documents\Work\Projects\chromedriver')
my_url = "https://www.carehome.co.uk/carehome.cfm/searchazref/10001005FITA"
def make_soup(url):
driver.get(url)
m_soup = soup(driver.page_source, features='html.parser')
return m_soup
main_page = make_soup(my_url)
strongs = main_page.select(".blue")
def get_deets(strongs):
tag = []
sibling = []
for strong_tag in strongs:
if strong_tag.next_sibling == '\n':
tag.append(strong_tag.text), sibling.append(strong_tag.next_sibling.next_sibling.text)
else:
tag.append(strong_tag.text), sibling.append(strong_tag.next_sibling.strip())
return tag, sibling
My Current Output :
get_deets(strongs)
(['Group:'], ['Excelcare Holdings'])
Desired Output
tag
['Group:','Person in charge:', 'Local Authority / Social Services:']
sibling
['Excelcare Holdings', 'Denise Marks (Registered Manager)','London Borough of Tower Hamlets Council (click for contact details)' ]
Using this HTML:
<div class="profile-group-description col-xs-12 col-sm-8">
<p><strong class="blue">Group:</strong>
<a href="https://www.carehome.co.uk/care_search_results.cfm/searchgroup/36151505EXCA">Excelcare Holdings</a>
</p>
<p><strong class="blue">Person in charge:</strong>
Denise Marks (Registered Manager)</p>
<p><strong class="blue">Local Authority / Social Services:</strong>
London Borough of Tower Hamlets Council (<a href="https://www.carehome.co.uk/local-authorities/profile.cfm/id/Tower-Hamlets">click for contact details</a>)</p>
<p>
<strong class="blue">Type of Service:</strong>
Care Home only (Residential Care) – Privately Owned , Registered for a maximum of 44 Service Users
</p>
<p>
<strong class="blue">Registered Care Categories*:</strong>
Dementia • Learning Disability • Mental Health Condition • Old Age
</p>
Aucun commentaire:
Enregistrer un commentaire