Monday, December 4, 2023

[FIXED] Why I'm getting "UnicodeEncodeError: 'charmap' codec can't encode character '\u25b2' in position 84811: character maps to <undefined>" error?

December 04, 2023 beautifulsoup, encoding, python-3.x, web-scraping No comments

Issue

I'm getting UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 756: character maps to error while running this code::

from bs4 import BeautifulSoup
import requests
r = requests.get('https://stackoverflow.com').text
soup = BeautifulSoup(r, 'lxml')
print(soup.prettify())

and the output is:

Traceback (most recent call last):
  File "c:\Users\Asus\Documents\Hello World\Web Scraping\st.py", line 5, in <module>
    print(soup.prettify())
  File "C:\Users\Asus\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 756: character maps to <undefined>

I'm using python 3.8.1 and UTF-8 in vs code. How to solve this?

Solution

There are hints in the full error message... I will keep here what seems most important:

Traceback ...
  File "...\cp1252.py", ...
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' ...

The error is caused by the print call. Somewhere in you text, you have a ZERO WIDTH SPACE character (Unicode U+200B), and if you print to a Windows console, the string is internally encoded into the Windows console code page (cp1252 here). And the ZERO WIDTH SPACE is not represented in that code page. BTW the default console is not really unicode friendly in Windows.

There is little to do in a Windows console. I would advise you to try one of these workarounds:

do not print to the console but write to a (utf8) file. You will then be able to read it with a utf8 enabled text editor like notepad++
manually encode anything before printing it, with errors='ignore' or errors='replace'. That way, possibly offending characters will be ignored and no error will arise
```
  print(soup.prettify().encode('cp1252', errors='ignore'))
```

Answered By - Serge Ballesta

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, December 4, 2023

[FIXED] Why I'm getting "UnicodeEncodeError: 'charmap' codec can't encode character '\u25b2' in position 84811: character maps to <undefined>" error?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels