Issue
I am trying to write a python program that will count the words on a web page. I use Beautiful Soup 4 to scrape the page but I have difficulties accessing nested HTML tags (for example: <p class="hello">
inside <div>
).
Every time I try finding such tag using page.findAll()
(page is Beautiful Soup object containing the whole page) method it simply doesn't find any, although there are. Is there any simple method or another way to do it?
Solution
Maybe I'm guessing what you are trying to do is first looking in a specific div tag and the search all p tags in it and count them or do whatever you want. For example:
soup = bs4.BeautifulSoup(content, 'html.parser')
# This will get the div
div_container = soup.find('div', class_='some_class')
# Then search in that div_container for all p tags with class "hello"
for ptag in div_container.find_all('p', class_='hello'):
# prints the p tag content
print(ptag.text)
Hope that helps
Answered By - Mario Kirov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.