lundi 4 mars 2019

Sublime Webscraper to Notepad++

I want to redirect the output(data) of my sublime web scraper, to a string in a different program using subprocessing. I then want the output string, to be used in an if else statement to prevent duplicates. I want the if else statement to read the file I'm trying to import the data into and check if any of the dates in the string already exists in the file, I also want to set the date as the primary key.

Example:
Output String: 03012019, 03022019, 03032019, 03042019, 03052019, 03062019

Text File: 03042019, 03052019, 03062019, 03072019, 03082019, 03092019

If the example above happens I want the first 3 dates of the output string to be written to the file, and the last to be ignored because they already exists.

Webscraper:

import requests

from bs4 import BeautifulSoup

from datetime import datetime

response = requests.get('https://www.lotteryusa.com/michigan/powerball/')

soup = BeautifulSoup(response.text, 'html.parser')

title = soup.find(class_='game-title').get_text()

date = soup.find_all("td", {"class":"date"})

results = soup.find_all("ul",{"class":"draw-result list-unstyled list-inline"})

print(title)

for date, results in zip(date, results):

d = datetime.strptime(date.time['datetime'], '%Y-%m-%d')

print(d.strftime("%m%d%Y")+(',')+results.get_text()[:-20].strip().replace('\n',','))

Subprocessor:

from subprocess import*

p2 = check_output("Webscraper", shell=True)

data = p2.decode("utf-8")

print(data)

with open('Notepad++ Text File','r+') as file3:

file3 = file3.readlines()

for lines in file3:

    if ('23') in lines:
    #Can only put integers in ().
        print(lines)

Aucun commentaire:

Enregistrer un commentaire