Issue
So, I have this code to fetch JSON string from url
url = 'http://....'
response = urllib2.urlopen(rul)
string = response.read()
data = json.loads(string)
for x in data:
print x['foo']
The problem is x['foo']
, if tried to print it as seen above, I get this error.
Warning: Incorrect string value: '\xE4\xB8\xBA Co...' for column 'description' at row 1
If I use x['foo'].decode("utf-8")
I get this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a' in position 0: ordinal not in range(128)
If I try, encode('ascii', 'ignore').decode('ascii')
Then I get this error.
x['foo'].encode('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType' object has no attribute 'encode'
Is there any way to fix this problem?
Solution
x['foo'].decode("utf-8")
resulting in UnicodeEncodeError
means that x['foo']
is of type unicode
. str.decode
takes a str
type and translates it to unicode
type. Python 2 is trying to be helpful here and attempts to implicitly convert your unicode
to str
so that you can call decode
on it. It does this with sys.defaultencoding
, which is ascii
, which can't encode all of Unicode, hence the exception.
The solution here is to remove the decode
call - the value is already unicode
.
Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future.
It's worth noting here that everything returned by json.load
will be unicode
and not str
.
Addressing the new question after edits:
When you print
, you need bytes - unicode is an abstract concept. You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode
object to str
. You can do this be calling encode
with an encoding that tells it how to translate from the abstract string into concrete bytes. Generally you want to use the utf-8 encoding.
This should work:
print x['foo'].encode('utf-8')
Answered By - Daenyth
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.