Issue
I try to parse the HTML content using the following code:
import requests
from bs4 import BeautifulSoup
url = "https://www.thespruce.com/christmas-village-display-ideas-8407777"
page = requests.get(url)
soup = BeautifulSoup (page.content, 'lxml')
print(soup.prettify())
But I only get the following response:
<html>
<body>
<p>
Signal - Not Acceptable
</p>
</body>
</html>
Is it somehow possible to get the page-content using requests
?
(I need this without using selenium
)
Solution
Try to add a user-agent
because it seems the website is trying to recognise it. In case there is no result for that it will provide you a 406 error
.
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}
Example
import requests
from bs4 import BeautifulSoup
url = "https://www.thespruce.com/christmas-village-display-ideas-8407777"
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup (page.content, 'lxml')
print(soup.prettify())
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.