Issue
I am trying to import all the offers from the following website with beautifulsoup.
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import requests
my_url='https://www.promobit.com.br/promocoes/playstation-4/s/'
uclient=uReq(my_url)
page_html=uclient.read()
uclient.close()
page_soup=soup(page_html, "html.parser")
containers=page_soup.findAll("div",{"class":"in-size"})
However when I check the length of my list, instead of obtaining a value of 96, which was what I was expecting, it returns 3 instead.
print(len(containers))
>>> 3
For some reason when I print the text in "containers" I obtain the data from some of the offers labeled as "de graca".
I've tried using different parsers, however the result remains wrong.
Solution
So you made mistake in the selector that you used to find the elements. If you a ctrl+f in the elements tabs after inspecting element on that page, you will find there are only 3 node with the class of "in-size". Hence the length of 3.
I have modified your code bit with the right selectors. What I am doing is finding the container that has the id of "offers", which is the parent container of all the offers. And if you look at the html structure, you will notice that all the cards per ser have and class of "pr-tl-card".
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import requests
my_url='https://www.promobit.com.br/promocoes/playstation-4/s/'
uclient=uReq(my_url)
page_html=uclient.read()
uclient.close()
page_soup=soup(page_html, "html.parser")
offers=page_soup.select("#offers div.pr-tl-card")
len(offer_container)
Answered By - mithushancj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.