I am currently trying to create a connection from my sublime web scraper to my notepad++ database. I want to grad the output of my web scraper using subproocess, and implement the output into a string. I then want to use the string in an insert ignore function/if else statement, that reads the file and checks if any dates in the string exists in the file. If any of the dates match I would like the dates that don't already exists in the file to be written to it, and the dates that do exists to be ignored, to prevent duplicates I would like the date to be the primary key.
Example:
Output String:
03012019, 03022019, 03032019, 03042019, 03052019, 03062019
Text File:
03042019, 03052019, 03062019, 03072019, 03082019, 03092019
If the example above happens I want the first 3 dates of the output string to be written to the file
(03012019, 03022019, 03032019)
and the last 3 dates to be ignored because they already exists.
(03042019, 03052019, 03062019)
Webscraper:
import requests
from bs4 import BeautifulSoup
from datetime import datetime
response = requests.get('https://www.lotteryusa.com/michigan/powerball/')
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.find(class_='game-title').get_text()
date = soup.find_all("td", {"class":"date"})
results = soup.find_all("ul",{"class":"draw-result list-unstyled list-inline"})
print(title)
for date, results in zip(date, results):
d = datetime.strptime(date.time['datetime'], '%Y-%m-%d')
print(d.strftime("%m%d%Y")+(',')+results.get_text()[:-20].strip().replace('\n',','))
Subprocessor:
from subprocess import*
p = check_output("Webscraper", shell=True)
data = p.decode("utf-8")
print(data)
with open('Notepad++ Text File','r+') as file3:
file3 = file3.readlines()
for lines in file3:
if ('23') in lines:
#Can only put integers in ().
print(lines)
Aucun commentaire:
Enregistrer un commentaire