Issue
I'm trying to parse an XML document written by a BI software program (Tableau, specifically!). I'm using BS4 and have followed multiple other StackOverflow solutions which haven't worked for me. Hoping someone will be able to point out what I'm doing wrong.
This is my XML
<datasources>
<datasource>
<_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
<objects>
<object caption='table' id='table'>
<properties context='extract'>
<relation name='Extract' table='[Extract].[Extract]' type='table' />
</properties>
</object>
</objects>
</_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
</datasource>
</datasources>
And I've cleaned up code below so I can post it here:
Parsing the tree
soup = BeautifulSoup(xmlstr, 'lxml')
print(soup.find("_.fcp.objectmodelencapsulatelegacy.true...object-graph"))
# This works! Prints the object markup
datasources = soup.find('datasources').find_all('datasource')
for ds in datasources:
print(ds['caption'])
print(ds['name'])
# This works!
result = ds.find("_.fcp.objectmodelencapsulatelegacy.true...object-graph")
print(result.name)
# This doesn't work! returns none
for tag in ds:
if tag.name == "_.fcp.objectmodelencapsulatelegacy.true...object-graph":
print(tag.name)
# This works ^^
As you can tell, the item definitely exists within the tag it's supposed to be in. Iterating the elements inside the datasource spits out the element I'm looking for & checking if name = the one I'm looking for confirms it's in there. But for some reason when I access it with find or find_all when I'm looking inside the datasource, I keep getting none returned. I thought the issue was with the name (as some StackOverflow posts suggested) but it would appear not as soup.find catches the element. So I'm at a loss, any help would be appreciated.
Thanks!
Solution
Try the following code. It should work.
from bs4 import BeautifulSoup
xmlstr = '''
<datasources>
<datasource>
<_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
<objects>
<object caption='table' id='table'>
<properties context='extract'>
<relation name='Extract' table='[Extract].[Extract]' type='table' />
</properties>
</object>
</objects>
</_.fcp.ObjectModelEncapsulateLegacy.true...object-graph>
</datasource>
</datasources>
'''
soup = BeautifulSoup(xmlstr, 'lxml')
datasources = soup.find_all('datasources')#.find_all('datasource')
for ds in datasources:
print(ds.find('object')['caption'])
print(ds.find('relation')['name'])
# This works!
result = ds.find("_.fcp.objectmodelencapsulatelegacy.true...object-graph")
print(result.name)
Output:
table
Extract
_.fcp.objectmodelencapsulatelegacy.true...object-graph
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.