Issue
My string is Niệm Bồ Tát (Thiá»n sÆ° Nhất Hạnh)
and I want to decode it to Niệm Bồ Tát (Thiền sư Nhất Hạnh)
. I see in that site can do that http://www.enderminh.com/minh/utf8-to-unicode-converter.aspx
and I start to try by Python
mystr = '09. Bát Nhã Tâm Kinh'
mystr.decode('utf-8')
but actually it is not correct because original string is utf-8 but the string show is not my expecting result.
Note: it is Vietnamese character.
How to resolve that case? Is that Windows Unicode or something? How to detect the encoding here.
Solution
I'm not sure what you can do with these kind of data, but for your example in your original post, this works:
>>> mystr = '09. Bát Nhã Tâm Kinh'
>>> s = mystr.decode('utf8').encode('latin1').decode('utf8')
>>> s
u'09. B\xe1t Nh\xe3 T\xe2m Kinh'
>>> print(s)
09. Bát Nhã Tâm Kinh
Answered By - Jonathan Ballet
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.