Issue
I need to remove the tags and leave only the text in the below codes output using python and beautifulsoup.
import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content)
print(soup.prettify())
first_header = soup.find(["h2", "h2"])
first_headers = soup.find_all(["h2", "h2"])
first_headers
Solution
import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content,features="html.parser") # getting content from webpage
# retriving all h1 and h2 tags and extracting text from each of them
first_headers = [html.text for html in soup.find_all(["h1", "h2"])]
print(first_headers)
I used list comprehension to solve it in a single line you can use a for loop instead which goes as
import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content,features="html.parser")
first_headers = soup.find_all(["h1", "h2"])
for i in first_headers:
print(i.text)
This is the output of my code:
Tutorials
References
Exercises and Quizzes
HTML Tutorial
HTML Forms
HTML Graphics
HTML Media
HTML APIs
HTML Examples
HTML References
HTML Introduction
What is HTML?
A Simple HTML Document
What is an HTML Element?
Web Browsers
HTML Page Structure
HTML History
Report Error
Thank You For Helping Us!
Answered By - Sukka Rishivarun Goud
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.