Issue
I want to print the text between a particular tag in an XML file using SAX.
However, some of the text output consist of spaces or a newline character.
Is there a way to just pick out the actual strings? What am I doing wrong?
See code extract and XML document below.
(I get the same effect with both Python 2 and Python 3.)
#!/usr/bin/env python3
import xml.sax
class MyHandler(xml.sax.ContentHandler):
def startElement(self, name, attrs):
self.tag = name
def characters(self, content):
if self.tag == "artist":
print('[%s]' % content)
if __name__=='__main__':
parser=xml.sax.make_parser()
Handler=MyHandler()
parser.setContentHandler(Handler) #overriding default ContextHandler
parser.parse("songs.xml")
<?xml version="1.0"?>
<genre catalogue="Pop">
<song title="No Tears Left to Cry">
<artist>Ariana Grande</artist>
<year>2018</year>
<album>Sweetener</album>
</song>
<song title="Delicate">
<artist>Taylor Swift</artist>
<year>2018</year>
<album>Reputation</album>
</song>
<song title="Mrs. Potato Head">
<artist>Melanie Martinez</artist>
<year>2015</year>
<album>Cry Baby</album>
</song>
</genre>
Solution
The value of self.tag
is set to "artist" when the <artist>
start tag is encountered, and it does not change until startElement()
is called for the <year>
start tag. Between those elements is some uninteresting whitespace for which SAX events are also reported by the parser.
One way to get around this is to add an endElement()
method to MyHandler
that sets self.tag
to something else.
def endElement(self, name):
self.tag = "whatever"
Answered By - mzjn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.