Saturday, March 5, 2022

[FIXED] Get path of tags using attribute field in XSD

March 05, 2022 beautifulsoup, python, xml, xsd No comments

Issue

My current task is to get information from XSD file (type of field, name of field etc). I have XSD file looks like that:

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSpy v2018 rel. 2 sp1 (x64) (http://www.altova.com) by test (123321) -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:complexType name="attribute">
        <xs:annotation>
            <xs:documentation>Атрибуты ОГХ</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element name="owner_id">
                <xs:annotation>
                    <xs:documentation>Данные о балансодержателе</xs:documentation>
                </xs:annotation>
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="legal_person" type="xs:integer">
                            <xs:annotation>
                                <xs:documentation>ID балансодержателя</xs:documentation>
                            </xs:annotation>
                        </xs:element>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
            <xs:element name="snow_clean_area" type="xs:double">
                <xs:annotation>
                    <xs:documentation>Площадь вывоза снега, кв. м</xs:documentation>
                </xs:annotation>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:schema>

As we can see, there is some fields <xs:element> with other <xs:element> inside (nesting).

I need to get the names of all elements in that XSD. BUT if an element is inside another one, I need to write the name as "all_prev_names;cur_name". For XSD that I showed before, it will be:

"owner_id;legal_person"
"snow_clean_area"

For more nesting, the name must have all previous names.

I wrote that code:

        def recursive(xml, name=None):
            res = xml.find_all('xs:element')

            if res:
                for elem in res:
                    if name:
                        yield from recursive(elem, elem['name'] + ';' + name)
                    else:
                        yield from recursive(elem, elem['name'])
            else:
                if name:
                    yield (name)
                else:
                    yield (xml['name'])

But there is a problem with duplicate paths. The result of that function will be:

"owner_id;legal_person"
"legal_person"
"snow_clean_area"

I need to fix that code, or get another idea, how to solve that task.

Solution

I found a solution, that is suitable for me. I use ElementTree.iterparse, not BeautifulSoup. Than, after each element I save my fields, and at the end of tag, save it to my structure:

def getXsd(self, typeNumber: int) -> t.List[t.Dict[str, str]]:
    paths = []
    for elem in self.xsds:
        if elem[0] == typeNumber:
            events = ("start", "end")
            codes = []
            type_field = None
            for event, elem in ET.iterparse(BytesIO(elem[1].encode("UTF-8")), events=events):
                if event == 'start' and elem.tag == '{http://www.w3.org/2001/XMLSchema}element':
                    codes.append(elem.attrib['name'])
                    if 'type' in elem.attrib:
                        type_field = elem.attrib['type']
                elif event == 'start' and elem.tag == '{http://www.w3.org/2001/XMLSchema}documentation':
                    if codes and type_field:
                        paths.append({'code': "".join([str(item).capitalize() for item in codes[::-1]]),
                                     'type': type_field,
                                     'name': elem.text})
                        type_field = None

                elif event == 'end' and elem.tag == '{http://www.w3.org/2001/XMLSchema}element':
                    codes.pop()
    return paths

The result is:

[{'code': 'Legal_personOwner_id', 'type': 'xs:integer', 'name': 'ID балансодержателя'}, {'code': 'Legal_personCustomer_id', 'type': 'xs:integer', 'name': 'ID заказчика'}, {'code': 'Improvement_object_categoryImprovement_object_category_id', 'type': 'xs:integer', 'name': 'Код категории озеленения'}, {'code': 'Legal_personDepartment_id', 'type': 'xs:integer', 'name': 'ID ведомственного ОИВ'}, {'code': 'Snow_clean_area', 'type': 'xs:double', 'name': 'Площадь вывоза снега, кв. м'}, {'code': 'Reservoir_area', 'type': 'xs:double', 'name': 'Водоемы, кв. м'}]

Answered By - magicarm22

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, March 5, 2022

[FIXED] Get path of tags using attribute field in XSD

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels