Issue
New to Python webscraping and BeautifulSoup.
I'd like to format the following so when it outputs the tags, it does so indented
H1 text
H2 text
H3 text
H2 text
...
etc.
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup(website.content, 'html.parser')
tags = soup.find_all(['h1', 'h2'])
for soups in tags:
print(soups.string)
Your help is much appreciated.
Solution
You can define a dictionary of indents/prefixes
preString = {'h1': '', 'h2': '\t', 'h3':'\t\t', 'h4':'\t\t\t'}
then you can just loop and print like:
tags = soup.find_all([t for t in preString])
for soups in [t for t in tags if t.string]:
print(preString[soups.name]+soups.string)
I filtered with if t.string
in case they have tags inside rather than just text. Using .text
gets you the full text regardless of child tags; if you want that, and you want your find_all
to be independent, you can instead:
tags = soup.find_all(['h1', 'h2'])
for soups in tags:
preStr = preString[soups.name] if soups.name in preString else ''
print(preStr+soups.string)
(You can add a default indent/prefix after the else
when defining preStr
)
Answered By - Driftr95
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.