Issue
I have an XML file test.xml
, with content:
<?xml version='1.0' encoding='UTF-8'?>
<Configurations version="1.0">
<Item name="a" value="avalue" />
<Item name="b" value="bvalue" />
</Configurations>
And I want to change some attribute values with Python ElementTree. The python program is:
#!/usr/bin/python
#coding=utf-8
###############################################################################
import sys
from xml.etree import ElementTree as ET
a_value = ""
b_value = ""
# Functions to modify the Upgrader configuration files.
def Modify_Config ():
tree = ET.parse('./test.xml')
root = tree.getroot()
for child in root.getiterator():
childName = child.get('name')
if 'a' == childName:
child.set('value', a_value)
elif 'b' == childName:
child.set('value', b_value)
tree.write('./test.xml', encoding="UTF-8")
###############################################################################
a_value = sys.argv[1]
b_value = sys.argv[2]
###############################################################################
Modify_Config()
when I execute the python file like this: "encode.py 测试 bvalue" in Windows Command Prompt, it ends with exception:
D:\Test_study\python\encode>encode.py 测试 bvalue
Traceback (most recent call last):
File "D:\Test_study\python\encode\encode.py", line 25, in <module>
Modify_Config()
File "D:\Test_study\python\encode\encode.py", line 20, in Modify_Config
tree.write('./test.xml', encoding="UTF-8")
File "C:\Python27\lib\xml\etree\ElementTree.py", line 821, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 933, in _serialize_xml
v = _escape_attrib(v, encoding)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1091, in _escape_attrib
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb2 in position 0: ordinal
not in range(128)
Why does that error occur? My OS is windows 7, 64 bit
Solution
byte strings and unicode strings are exchangeable only when there are only characters < 128. The elements sys.argv contents are byte strings in python 2.x. Furthermore, it seems that they are encoded in GB2312, not UTF-8. Thus you need to be explicit:
a_value = sys.argv[1].decode('GB2312')
b_value = sys.argv[2].decode('GB2312')
Answered By - Antti Haapala
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.