mardi 1 décembre 2015

Find Tags inside an XML-Content with Python

I have an XML like this:

<w:p>
    <w:r>
        <w:rPr />
        <w:t> Description 1</w:t>
    </w:r>
</w:p>
<w:p>
    <w:r>
        <w:rPr />
        <w:t>Checkbox 1</w:t>
    </w:r>
    <w:r>
        <w:fldChar w:fldCharType="begin">
            <w:ffData>
                <w:name w:val="" />
                <w:enabled />
                <w:calcOnExit w:val="0" />
                <w:checkBox>
                    <w:sizeAuto />
                    <w:checked />
                </w:checkBox>
            </w:ffData>
        </w:fldChar>
    </w:r>
    <w:r>
        <w:rPr />
        <w:t> Checkbox 2</w:t>
    </w:r>
    <w:r>
        <w:fldChar w:fldCharType="begin">
            <w:ffData>
                <w:name w:val="" />
                <w:enabled />
                <w:calcOnExit w:val="0" />
                <w:checkBox>
                    <w:sizeAuto />
                </w:checkBox>
            </w:ffData>
        </w:fldChar>
    </w:r>
</w:p>
<w:p>
    <w:r>
        <w:rPr />
        <w:t> Description 2</w:t>
    </w:r>
</w:p>
<w:p>
    <w:r>
        <w:rPr />
        <w:t> Description 3</w:t>
    </w:r>
</w:p>

.....

On this XML I have couples of <w:p> </w:p> There are some <w:p> Description tags that contains checkbox tag after them and some that are empty. For each I need to create a JSON object and store it in a list.

I need to find tags to take text inside <w:t> and then to continue to another <w:p> tag to see if it contains checkbox, if yes then to take <w:t> value the JSON will look like this:

json['description'] = description
json['checkbox_text'] = checkbox

else if the tag after Description tag contain no checkbox then the JSON will contain only one element:

json['description'] = description

My code looks like this:

results = []
    default_positions = [m.start() for m in re.finditer('w:p', xml_content)]
        jsonobj = {}
        for position in default_positions:
        if .. :
            //code
            json['description'] = description
            json['checkbox_text'] = checkbox
        else:
            //code
            json['description'] = description

Any help from someone?

Aucun commentaire:

Enregistrer un commentaire