I am in web scrapping work and I am ok with it for almost all datas. I create dataframes to related analysis. However I have a problem with "None" values in the scrapping part at some of the web-pages.
In the last step of my work, I can scrap the <span> parts with the price values in the loop as shown below.
<span class="abababa">100,00$</span>
and here is my for loop and I use .text to escape from <span> parts:
price=[]
for i in range(4):
for j in range(100):
pr=data[i][j].find('span', class_ = 'abababa')
price.append(pr.text)
dPP=pd.DataFrame(price, columns=['Price'])
Note: There are 4 main web pages and each page has at least 100 value to scrap as price in above code. That is why I used to two nested for loop.
There are no problems if there are no 'None' value in 100 values in above code. However in real world of course there are some 'None' values for the price section without any <span> part.
Like below one:
<span class="abababa">100,00$</span>
<span class="abababa">48,00$</span>
None
<span class="abababa">100,00$</span>
I tried to find a solution with if statement to escape from 'None' values but in this case below code drops 'None' values in index and my Dataframe getting mess. For example I have 4 main web page and 100 values, I expect to have a 4x100 = 400 rows single column dataframe. If I have 10 'None' values in total, my dataframe will be 390 rows in total.
price=[]
for i in range(4):
for j in range(100):
pr=data[i][j].find('span', class_ = 'abababa')
if pr == (None):
continue
price.append(pr.text)
dPP=pd.DataFrame(price, columns=['Price'])
To summarize, I didn't make it work without dropping those 'None' values in index. .text also doesn't work on any 'None' values. I get 'NoneType' object has no attribute 'text' error.
Could you help me?
Aucun commentaire:
Enregistrer un commentaire