I'm trying to open an XML file and parse through it, looking through its tags and finding the text within each specific tag. If the text within the tag matches a string, I want it remove a part of the string or substitute it with something else.
However, it looks like for some reason the code stays inside the third if-statement and thinks that end_int always equals none. I'm not sure why because when finding the value of the variable end_int, I had printed out the values and it gets all the 'end_char' tag values from the xml file, which is what end_int should be. But inside the if statement, it thinks end_char is always None.
The mfn_pn variable is a barcode inputted by the user, something similar to ATL-157-1815, DFW-184-8378., ATL-324-3243., DFW-432-2343, ATL 343 8924, DFW 342 3413, DFW-324 3423 T&R.
The XML file has the following data:
<?xml version="1.0" encoding="utf-8"?>
<metadata>
<filter>
<regex>ATL|LAX|DFW</regex >
<start_char>3</start_char>
<end_char></end_char>
<action>remove</action>
</filter>
<filter>
<regex>DFW.+\.$</regex >
<start_char>3</start_char>
<end_char>-1</end_char>
<action>remove</action>
</filter>
<filter>
<regex>\-</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex>\s</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex> T&R$</regex >
<start_char></start_char>
<end_char>-4</end_char>
<action>remove</action>
</filter>
</metadata>
The Python code I'm using is:
import re
from xml.etree.ElementTree import ElementTree
# filters.xml is the file that holds the things to be filtered
tree = ElementTree()
tree.parse("filters.xml")
# Get the data in the XML file
root = tree.getroot()
# Loop through filters
for x in root.findall('filter'):
# Find the text inside the regex tag
regex = x.find('regex').text
# Find the text inside the start_char tag
start_prim = x.find('start_char')
# If the element exists assign its text to start variable
start = start_prim.text if start_prim is not None else None
start_int = int(start) if start is not None else None
print('start: ', start_int)
# Find the text inside the end_char tag
end_prim = x.find('end_char')
# If the element exists assign its text to end variable
end = end_prim.text if end_prim is not None else None
end_int = int(end) if end is not None else None
print('end: ', end_int)
# Find the text inside the action tag
action = x.find('action').text
if action == 'remove':
if re.match(r'%s' % regex, mfn_pn, re.IGNORECASE):
print('if statement start:', start_int)
print('if statement end:', end_int)
if end_int == None:
print('if statement start_int:', start_int)
print('if statement end_int:', end_int)
mfn_pn = mfn_pn[start_int:]
elif start_int == None:
print('elif statement start_int:' ,start_int)
print('elif statement end_int:', end_int)
mfn_pn = mfn_pn[:end_int]
else:
print('else statement start_int:', start_int)
print('else statement end_int:', end_int)
mfn_pn = mfn_pn[start_int:end_int]
elif action == 'substitute':
mfn_pn = re.sub(r'%s' % regex, '', mfn_pn)
For the print statements inside the elif and else statements, nothing prints out because for some reason, the code thinks start_int never equals "None" and all the other cases for the else statement don't work either. It thinks that end_int == 'None' is always true and I'm not sure why it would think that because printing out "end_int" outside the if-statements get all the end_char values from the XML file.
Aucun commentaire:
Enregistrer un commentaire