Issue
My current task is to get information from XSD file (type of field, name of field etc). I have XSD file looks like that:
<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSpy v2018 rel. 2 sp1 (x64) (http://www.altova.com) by test (123321) -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:complexType name="attribute">
<xs:annotation>
<xs:documentation>Атрибуты ОГХ</xs:documentation>
</xs:annotation>
<xs:sequence>
<xs:element name="owner_id">
<xs:annotation>
<xs:documentation>Данные о балансодержателе</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element name="legal_person" type="xs:integer">
<xs:annotation>
<xs:documentation>ID балансодержателя</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="snow_clean_area" type="xs:double">
<xs:annotation>
<xs:documentation>Площадь вывоза снега, кв. м</xs:documentation>
</xs:annotation>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:schema>
As we can see, there is some fields <xs:element> with other <xs:element> inside (nesting).
I need to get the names of all elements in that XSD. BUT if an element is inside another one, I need to write the name as "all_prev_names;cur_name". For XSD that I showed before, it will be:
"owner_id;legal_person"
"snow_clean_area"
For more nesting, the name must have all previous names.
I wrote that code:
def recursive(xml, name=None):
res = xml.find_all('xs:element')
if res:
for elem in res:
if name:
yield from recursive(elem, elem['name'] + ';' + name)
else:
yield from recursive(elem, elem['name'])
else:
if name:
yield (name)
else:
yield (xml['name'])
But there is a problem with duplicate paths. The result of that function will be:
"owner_id;legal_person"
"legal_person"
"snow_clean_area"
I need to fix that code, or get another idea, how to solve that task.
Solution
I found a solution, that is suitable for me. I use ElementTree.iterparse, not BeautifulSoup. Than, after each element I save my fields, and at the end of tag, save it to my structure:
def getXsd(self, typeNumber: int) -> t.List[t.Dict[str, str]]:
paths = []
for elem in self.xsds:
if elem[0] == typeNumber:
events = ("start", "end")
codes = []
type_field = None
for event, elem in ET.iterparse(BytesIO(elem[1].encode("UTF-8")), events=events):
if event == 'start' and elem.tag == '{http://www.w3.org/2001/XMLSchema}element':
codes.append(elem.attrib['name'])
if 'type' in elem.attrib:
type_field = elem.attrib['type']
elif event == 'start' and elem.tag == '{http://www.w3.org/2001/XMLSchema}documentation':
if codes and type_field:
paths.append({'code': "".join([str(item).capitalize() for item in codes[::-1]]),
'type': type_field,
'name': elem.text})
type_field = None
elif event == 'end' and elem.tag == '{http://www.w3.org/2001/XMLSchema}element':
codes.pop()
return paths
The result is:
[{'code': 'Legal_personOwner_id', 'type': 'xs:integer', 'name': 'ID балансодержателя'}, {'code': 'Legal_personCustomer_id', 'type': 'xs:integer', 'name': 'ID заказчика'}, {'code': 'Improvement_object_categoryImprovement_object_category_id', 'type': 'xs:integer', 'name': 'Код категории озеленения'}, {'code': 'Legal_personDepartment_id', 'type': 'xs:integer', 'name': 'ID ведомственного ОИВ'}, {'code': 'Snow_clean_area', 'type': 'xs:double', 'name': 'Площадь вывоза снега, кв. м'}, {'code': 'Reservoir_area', 'type': 'xs:double', 'name': 'Водоемы, кв. м'}]
Answered By - magicarm22
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.