Issue
What is the default encoding used for encoding strings in python 2.x? I've read that there are two possible ways to declare a string.
string = 'this is a string'
unicode_string = u'this is a unicode string'
The second string is in Unicode. What is the encoding of the first string?
Solution
As per Python default/implicit string encodings and conversions (reciting its Py2 part concisely, to minimize duplication):
There are actually multiple independent "default" string encodings in Python 2, used by different parts of its functionality.
Parsing the code and string literals:
str
from a literal -- will contain raw bytes from the file, no transcoding is doneunicode
from a literal -- the bytes from the file aredecode
'd with the file's "source encoding" which defaults toascii
- with
unicode_literals
future, all literals in the file are treated as Unicode literals
Transcoding/type conversion:
str<->unicode
type conversion andencode
/decode
w/o arguments are done withsys.getdefaultencoding()
- which is
ascii
almost always, so any national characters will cause aUnicodeError
- which is
str
can only bedecode
'd andunicode
--encode
'd. Trying otherwise will involve an implicit type conversion (with the aforementioned result)
I/O, including
print
ing:unicode
--encode
'd with<file>.encoding
if set, otherwise implicitly converted tostr
(with the aforementioned result)str
-- raw bytes are written to the stream, no transcoding is done. For national characters, a terminal will show different glyphs depending on its locale settings.
Answered By - ivan_pozdeev
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.