lundi 30 novembre 2015

find tags inside an xml with python

I have an XML like this:

    <w:p>
        <w:r>
            <w:rPr />
            <w:t> Description 1</w:t>
        </w:r>
    </w:p>
    <w:p>
        <w:r>
            <w:rPr />
            <w:t>Checkbox 1</w:t>
        </w:r>
        <w:r>
            <w:fldChar w:fldCharType="begin">
                <w:ffData>
                    <w:name w:val="" />
                    <w:enabled />
                    <w:calcOnExit w:val="0" />
                    <w:checkBox>
                        <w:sizeAuto />
                        <w:checked />
                    </w:checkBox>
                </w:ffData>
            </w:fldChar>
        </w:r>
        <w:r>
            <w:rPr />
            <w:t> Checkbox 2</w:t>
        </w:r>
        <w:r>
            <w:fldChar w:fldCharType="begin">
                <w:ffData>
                    <w:name w:val="" />
                    <w:enabled />
                    <w:calcOnExit w:val="0" />
                    <w:checkBox>
                        <w:sizeAuto />
                    </w:checkBox>
                </w:ffData>
            </w:fldChar>
        </w:r>
    </w:p>
   <w:p>
        <w:r>
            <w:rPr />
            <w:t> Description 2</w:t>
        </w:r>
    </w:p>
    <w:p>
        <w:r>
            <w:rPr />
            <w:t> Description 3</w:t>
        </w:r>
    </w:p>
.....

On this XML I have couples of <w:p> </w:p> There are some <w:p> Description tags that contains checkbox tag after them and some that are empty. For each I need to create a JSON object and store it in a list.

I need to find tags to take text inside <w:t> and then to continue to another <w:p> tag to see if it contains checkbox, if yes then to take <w:t> value the JSON will look like this:

json['description'] = description
json['checkbox_text'] = checkbox

else if the tag after Description tag contain no checkbox then the JSON will contain only one element:

 json['description'] = description

My code looks like this:

results = []
    default_positions = [m.start() for m in re.finditer('w:p', xml_content)]
        jsonobj = {}
        for position in default_positions:
        if .. :
            //code
        else:
            //code

Any help?

Aucun commentaire:

Enregistrer un commentaire