Issue
I've next part of code:
In [8]: st = u"опа"
In [11]: st.encode("ascii", "xmlcharrefreplace")
Out[11]: 'опа'
In [14]: st1 = st.encode("ascii", "xmlcharrefreplace")
In [15]: st1.decode("ascii", "xmlcharrefreplace")
Out[15]: u'опа'
In [16]: st1.decode("utf-8", "xmlcharrefreplace")
Out[16]: u'опа'
Do you have any idea how to convert st1
back to u"опа"
?
Solution
Use the html.unescape()
function (Python 3.4 and newer):
>>> import html
>>> html.unescape('опа')
'опа'
On older versions (including Python 2), you’d have to use an instance of HTMLParser.HTMLParser()
:
>>> from HTMLParser import HTMLParser
>>> parser = HTMLParser()
>>> parser.unescape('опа')
u'\u043e\u043f\u0430'
>>> print parser.unescape('опа')
опа
Answered By - Martijn Pieters
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.