Issue
Hi I'm working with the boot camp 100 Days of code of UDEMY. Currently I am working on the webscraping lesson using BeautifulSoup, however, I have not been able to complete the classes because I am geting a type error that I do not know why is happening and how to solve as the code is very simple. Here, my phyton code
from bs4 import BeautifulSoup
with open("website.html") as file:
html_doc = file.read()
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.title.name)
Here the error
Traceback (most recent call last):
File "C:\Users\xarss\Desktop\100 days of python\Webdev_projects\Websrapingproyect\main.py", line 12, in <module>
html_doc = file.read()
File "C:\Users\xarss\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 281: character maps to <undefined>
I already try to re-install beautiful soup package and I am still having the same problem and try using other html files and the problem presist.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Angela's Personal Site</title>
</head>
<body>
<h1 id="name">Angela Yu</h1>
<p><em>Founder of <strong><a href="https://www.appbrewery.co/">The App Brewery</a></strong>.</em></p>
<p>I am an iOS and Web Developer. I ❤️ coffee and motorcycles.</p>
<hr>
<h3 class="heading">Books and Teaching</h3>
<ul>
<li>The Complete iOS App Development Bootcamp</li>
<li>The Complete Web Development Bootcamp</li>
<li>100 Days of Code - The Complete Python Bootcamp</li>
</ul>
<hr>
<h3 class="heading">Other Pages</h3>
<a href="https://angelabauer.github.io/cv/hobbies.html">My Hobbies</a>
<a href="https://angelabauer.github.io/cv/contact-me.html">Contact Me</a>
</body>
</html>
Solution
This is a common error which we get while opening a file if we don't know the encoding.
One of the below methods may work.
with open("website.html", errors="ignore") as file:
with open("website.html", errors='replace') as file:
with open("website.html", 'rb') as file:
Answered By - Goku
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.