Issue
I am trying to extract figures from a series of xml data.
The xml data looks like:
<commentinfo>
<note>This file contains the sample data for testing</note>
<comments>
<comment>
<name>Romina</name>
<count>97</count>
</comment>
And so on with a new name and comment.
My code is:
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
url = 'http://py4e-data.dr-chuck.net/comments_42.xml'
uh = urllib.request.urlopen(url)
data = uh.read()
# print(data)
tree = ET.fromstring(data)
# print('Name:',tree.find('count').text)
lst = tree.findall('comments/comment/count')
# print(len(lst))
# print(lst)
# x1 = result[1].find('comment')
# for item in lst:
# print('Count', item.find('count').text)
counts = tree.findall('.//count')
print(counts)
When I print counts
I get a longer version of:
<Element 'count' at 0x000000000A09FB88>, <Element 'count' at 0x000000000A09FC78>, <Element 'count' at 0x000000000A09FD68>, <Element 'count' at 0x000000000A09FE58>, <Element 'count' at 0x000000000A09FF48>, <Element 'count' at 0x000000000A0A3098>]
I am quite new to this, so I don't understand why I am getting these hex numbers, nor do I know how to extract the actual figures.
I am hoping someone can help.
Solution
Just loop through the list and print the text of each element.
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
url = 'http://py4e-data.dr-chuck.net/comments_42.xml'
uh = urllib.request.urlopen(url)
data = uh.read()
tree = ET.fromstring(data)
lst = tree.findall('comments/comment/count')
counts = tree.findall('.//count')
for each in counts:
print(each.text)
Answered By - kyle
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.