mardi 5 mars 2019

Sublime to Notepad++

I am currently trying to create a connection from my sublime web scraper to my notepad++ database. I want to grad the output of my web scraper using subproocess, and implement the output into a string. I then want to use the string in an insert ignore function/if else statement, that reads the file and checks if any dates in the string exists in the file. If any of the dates match I would like the dates that don't already exists in the file to be written to it, and the dates that do exists to be ignored, to prevent duplicates I would like the date to be the primary key.

Example:
Output String:

03012019, 03022019, 03032019, 03042019, 03052019, 03062019

Text File:

03042019, 03052019, 03062019, 03072019, 03082019, 03092019

If the example above happens I want the first 3 dates of the output string to be written to the file

(03012019, 03022019, 03032019) 

and the last 3 dates to be ignored because they already exists.

(03042019, 03052019, 03062019)

Webscraper:

import requests

from bs4 import BeautifulSoup

from datetime import datetime

response = requests.get('https://www.lotteryusa.com/michigan/powerball/')

soup = BeautifulSoup(response.text, 'html.parser')

title = soup.find(class_='game-title').get_text()

date = soup.find_all("td", {"class":"date"})

results = soup.find_all("ul",{"class":"draw-result list-unstyled list-inline"})

print(title)

for date, results in zip(date, results):

    d = datetime.strptime(date.time['datetime'], '%Y-%m-%d')

    print(d.strftime("%m%d%Y")+(',')+results.get_text()[:-20].strip().replace('\n',','))

Subprocessor:

from subprocess import*

p = check_output("Webscraper", shell=True)

data = p.decode("utf-8")

print(data)

with open('Notepad++ Text File','r+') as file3:

    file3 = file3.readlines()

    for lines in file3:

        if ('23') in lines:
        #Can only put integers in ().
            print(lines)

Aucun commentaire:

Enregistrer un commentaire