Issue
I have a Python 2 pickle file that when I try to read it with Python 3 it shows the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
Here are some code sample in Python 2 and Python 3:
python_2_dump.py
# -*- coding: utf-8 -*-
# Python 2 version
import cPickle
test = {
'Á': 'A',
'á': 'a',
'Ã': 'A',
'ã': 'a',
'Â': 'A',
'â': 'a',
}
with open('test.pickle', 'w') as f:
cPickle.dump(test, f)
python_3_load.py
# Python 3 version
import pickle
with open('test.pickle', 'rb') as f:
print(pickle.load(f))
Is there any reason Python 3 doesn't detect the old protocol and convert it accordingly? If it was the other way around, i.e. Python 2 reading a Python 3 pickle data, it makes sense.
Solution
The protocol is detected automatically, as stated in the docs:
The protocol version of the pickle is detected automatically, so no protocol argument is needed.
However, you need to use fix_imports
, encoding
and errors
to control compatibility support for pickle stream generated by Python 2. The relevant docs:
The optional arguments
fix_imports
,encoding
anderrors
are used to control compatibility support for pickle stream generated by Python 2. Iffix_imports
is true, pickle will try to map the old Python 2 names to the new names used in Python 3. Theencoding
anderrors
tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects. Using encoding='latin1' is required for unpickling NumPy arrays and instances of datetime, date and time pickled by Python 2.
In your example, it will read the test.pickle
if you pass encoding='utf-8'
:
print(pickle.load(f, encoding='utf-8'))
output:
{'Ã': 'A', 'â': 'a', 'Á': 'A', 'ã': 'a', 'Â': 'A', 'á': 'a'}
Answered By - buran
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.