Issue
The only similar question to this I've found is Django UnicodeDecodeError when using pdb - unfortunately, the solution there does not apply to this case.
Consider the following code, test.py
:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# encoding: utf-8
def subtract(ina, inb):
myresult = ina - inb
return myresult
def main():
y2 = 10
y1 = 7
# calculate (y₂-y₁)
print("Calculating difference between y2: {} and y1: {}".format(y2, y1))
result = subtract(y2, y1)
print("The result is: {}".format(result))
if __name__ == '__main__':
main()
Using Python3 from Anaconda3 on Windows 10:
(base) C:\tmp>conda --version
conda 4.7.12
(base) C:\tmp>python --version
Python 3.7.3
... I can run this program without a problem:
(base) C:\tmp>python test.py
Calculating difference between y2: 10 and y1: 7
The result is: 3
However, if I want to debug/step through this program using pdb
, it fails as soon as I type b main
to set a breakpoint on the main
function:
(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 648, in do_break
lineno = int(arg)
ValueError: invalid literal for int() with base 10: 'main'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 659, in do_break
code = func.__code__
AttributeError: 'str' object has no attribute '__code__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1701, in main
pdb._runscript(mainpyfile)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 1570, in _runscript
self.run(statement)
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 585, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "c:\tmp\test.py", line 6, in <module>
def subtract(ina, inb):
File "c:\tmp\test.py", line 6, in <module>
def subtract(ina, inb):
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 88, in trace_dispatch
return self.dispatch_line(frame)
File "C:\ProgramData\Anaconda3\lib\bdb.py", line 112, in dispatch_line
self.user_line(frame)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 261, in user_line
self.interaction(frame, None)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 352, in interaction
self._cmdloop()
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 321, in _cmdloop
self.cmdloop()
File "C:\ProgramData\Anaconda3\lib\cmd.py", line 138, in cmdloop
stop = self.onecmd(line)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 418, in onecmd
return cmd.Cmd.onecmd(self, line)
File "C:\ProgramData\Anaconda3\lib\cmd.py", line 217, in onecmd
return func(arg)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 667, in do_break
(ok, filename, ln) = self.lineinfo(arg)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 740, in lineinfo
answer = find_function(item, fname)
File "C:\ProgramData\Anaconda3\lib\pdb.py", line 100, in find_function
for lineno, line in enumerate(fp, start=1):
File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 199: character maps to <undefined>
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> c:\programdata\anaconda3\lib\encodings\cp1252.py(23)decode()
-> return codecs.charmap_decode(input,self.errors,decoding_table)[0]
(Pdb) q
Post mortem debugger finished. The test.py will be restarted
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) q
(base) C:\tmp>
The problem is the comment line: # calculate (y₂-y₁)
; if it is deleted, then pdb
starts fine:
(base) C:\tmp>python -m pdb test.py
> c:\tmp\test.py(6)<module>()
-> def subtract(ina, inb):
(Pdb) b main
Breakpoint 1 at c:\tmp\test.py:10
(Pdb) q
(base) C:\tmp>
I'm slightly surprised by this - wasn't Python3 supposed to be "utf-8 by default"?
Obviously, this is a trivial case where I can easily erase the single comment line that causes the trouble. However, I have a large script, where I have utf-8 characters all over the place, both in comments, and in prints I'd actually want to step through, and it is not really viable to go in and manually change all those instances to UTF-8 characters.
So, is there a way to cheat Python3's pdb
, so it works - even if there are utf-8 characters present in the source code (regardless if in comments, or in actual commands)?
Solution
Python 3 is UTF-8 by default, but the environment in which it is operating is not - it has a default encoding of cp1252.
You can set the PYTHONIOENCODING environment variable to UTF-8 to override the default encoding, or change the environment to use UTF-8.
Edit
I analysed this too hastily. The above solutions apply to fixing unicode errors raised when reading or writing from stdin/stdout, but the problem here is that pdb opens a file for reading without specifying an encoding:
def find_function(funcname, filename):
cre = re.compile(r'def\s+%s\s*[(]' % re.escape(funcname))
try:
fp = open(filename)
except OSError:
return None
If no encoding is specified, according to the io docs Python will default to using the result of locale.getpreferredencoding - presumably cp1252 in this case.
One solution might be to set the console locale before running the debugger.
It may also be possible to set the PYTHONUTF8 environment variable to 1
. Amongst other things, this will cause
open(), io.open(), and codecs.open() use the UTF-8 encoding by default.
Since I originally answered this question, the behaviour has been changed to use the encoding specified in the source file's encoding cookie, if present, falling back to UTF-8.
Answered By - snakecharmerb
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.