mercredi 21 octobre 2020

Python: How can I add if then logic to this excel/xml code?

This current code I am using for a project, and would like to add logic to it for certain instances. This I am the code I am using, with easy data/columns for this example.

from openpyxl import load_workbook
from yattag import Doc, indent
Path=('/Users/username/Desktop/Popular_Baby_Names.xlsx')

wb = load_workbook(Path)
ws = wb.worksheets[0]
# Create Yattag doc, tag and text objects
doc, tag, text = Doc().tagtext()

xml_header = '<?xml version="1.0" encoding="UTF-8"?>'
xml_schema = '<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">'
doc.asis(xml_header)
doc.asis(xml_schema)


with tag('Rows'):

    # Use ws.max_row for all rows
    for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=6):
        row = [cell.value for cell in row]
        with tag("Row"):
            with tag("Year of Birth"):
                text(row[0])
            with tag("Gender"):
                text(row[1])
            with tag("Ethnicity"):
                text(row[2])
            with tag("First Name"):
                text(row[3])
            with tag("Count"):
                text(row[4])  
            with tag("Rank"):
                text(row[5])       
result = indent(
    doc.getvalue(),
    indentation = '    ',
    indent_text = False
)
with open("baby_names.xml", "w") as f:
    f.write(result)

Which returns

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" dev = "">
<Rows>
    <Row>
        <Year of Birth>2011</Year of Birth>
        <Gender>FEMALE</Gender>
        <Ethnicity>HISPANIC</Ethnicity>
        <First Name>GERALDINE</First Name>
        <Count>13</Count>
        <Rank>75</Rank>
    </Row> 

.
.
.


I am trying to add logic so that certain elements get picked up. So I am hoping (excuse my bad python code, mostly a r user) to do something like

# Use ws.max_row for all rows
    for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=6):
        row = [cell.value for cell in row]

   if row[3] == "Geraldine":   (hope that makes sense in psuedocode)
   Print(
        with tag("Row"):
            with tag("Gender"):
                text(row[1])
            with tag("Ethnicity"):
                text(row[2])
            with tag("First Name"):
                text(row[3]))

 if row[3] == "Alex":
  Print(
      with tag("Row"):
            with tag("Gender"):
                text(row[1])
            with tag("Rank"):   #different from previous
                text(row[5])
            with tag("First Name"):
                text(row[3])))

Does that make sense? Kinda want an if then statement so that if something is true, kinda filters for that and prints the corresponding rows with the columns I would want.

I want to expand that onto a dataset where if there is a certain element in a column such as "referred" or "notreffered", I can choose the columns like "referred date" or "referral rejected" that corresponds to that. I believe "Row" when iterating is a list, not sure tho.

Aucun commentaire:

Enregistrer un commentaire