Issue
In my code I need to get only the main text not the header or footer data. I also would like to filter out any html/css/js code that is received with the request. How would I do this? I have tried making a request with requests, looking through the data with beautiful soup and then printing the body content. The issue with this is that it is also picking up the footer and header contents. Thanks for any responses in advance!
Solution
Use the browser developer tools (Usually F12) to find out what element contains the content you are looking for. Usually content other than headers and footers will be in <section>
or <article>
elements.
You can then use something like soup.article.get_text()
to retrieve text from the containing element.
Answered By - typewriter
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.