mardi 29 janvier 2019

Python re.search inside for-loop is giving false positives. How do I fix this?

I am creating a code that automatically updates my website, and when working on a code to identify tags and properly label pages on my database, I encountered a bug that I have no clue of how to fix.
I made a for loop to iterate the .php's lines, then used a if statement to find the tags. But somehow my if statement is responding twice, judging from its output.

First I checked if my regex was giving false positives. Used text editing softwares to manually search using the same regex from the code, but it only found one line.
Then I went to check how re.compile and re.search works, but there was nothing I was doing wrong there.

Here is the portion of the code.

        mydb = mysql.connector.connect(
        [Personal information redacted]
        )
        mycursor = mydb.cursor()
        local = input('Select directory.')
        for paths, dirs, files in os.walk(local):
            for f in files:
                print(f)
                if(splitext(f)[1] == ".php"):
                    print("found .php")
                    opened = open(local + f, 'r')
                    lines = opened.readlines()
                    date = splitext(f)[0]
                    flagD = re.compile(r'<!--desc.')
                    flagS = re.compile(r'<!--subject.')
                    flagE = re.compile(r'-->')
                    desc = None
                    subject = None
                    for l in lines:
                        if(flagD.search(l) != None):
                            print("found desc")
                            desc = re.sub(flagD, "",l)
                            descF = re.sub(flagE,"",desc)
                        if(flagS.search(l) != None):
                            print("found subj")
                            subject = re.sub(flagS, "",l)
                            subjectF = re.sub(flagE,"",subject)
                    if(desc == None or subject == None):
                        continue
                    sql = "INSERT INTO arquivos (quando, descricao, assunto, file) VALUES (%s, %s, %s, %s)"
                    val = (date, descF, subjectF, f)
                    mycursor.execute(sql, val)
                    mydb.commit()  

and this is the output:

2018-11-15.php
found .php
2018-11-16.php
found .php
2018-11-26.php
found .php
2019-01-13.php
found .php
2019-01-15.php
found .php
2019-01-16.php
found .php
2019-01-17.php
found .php
2019-01-22.php
found .php
found desc
found subj
2019-01-24.php
found .php
found desc
found desc
found subj
found subj
BdUpdate.php
found .php
BdUpdate1.php
found .php
Comentarios.php
found .php
FINAL.php
found .php
Foot.inc
Formulario.php
found .php
FormularioCompleto.php
found .php
Head.inc
index.php
found .php
index1.php
found .php
Java.php
found .php
Layout Base - Copy.php
found .php
Layout Base.php
found .php
Php_Test.ste
Phyton.php
found .php
SalvandoDB.php
found .php
sidenav.inc
Side_Menu.php
found .php
Thema.php
found .php
Translations.php
found .php
Web.php
found .php
2019-01-13.php
found .php

As you can see, somehow the print("found desc") and print("found subj")
is being called twice within one print("found .php"). Meaning it is giving a false positive somewhere in my code, but it is simply impossible, as I tested this regex in other softwares. This is totally unintended, and leaves the rest of the code as a entry on my database.

PS. Most of my questions is getting closed or locked and no one explains why. I have edited past questions to match the guidelines but the question is buried soon after. Please stop.

Aucun commentaire:

Enregistrer un commentaire