Issue
Currently I'm running on a VRM using ubuntu; so I'm trying to scrape data from an e-commerce web as a test. So far, I'm able to load the HTML contents; but I can't access any of the tags. I've checked other similar post to this problem by including a header; etc
from requests import get
from bs4 import BeautifulSoup
url = 'https://shopee.com.my/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'}
response = get(url, headers=headers)
html_soup = BeautifulSoup(response.text, 'html.parser')
def findDiv():
for container in html_soup.find_all('div'):
print(container)
print(container.div)
#returns None#
print(findDiv())
However, it still won't load anything other than two div tags, which is <main>
& <modal>
Similar Post
Solution
For dynamic page you have to use Selenium or use Bot user-agent for requests
, to inspect the element install Chrome extension to change user-agent or save the page source
headers = {'User-Agent': 'Googlebot/2.1 (+http://www.google.com/bot.html)'}
response = get(url, headers=headers)
Answered By - ewwink
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.