Sunday, January 21, 2024

[FIXED] BeautifulSoup to get 'span' contents next to each other

January 21, 2024 beautifulsoup, parsing, python, web-scraping No comments

Issue

A part of HTML looks like below. I want to extract the contents in the 'span' tags:

from bs4 import BeautifulSoup
data = """
<section><h2>Team</h2><ul><li><ul><li><span>J36</span>—<span>John</span></li><li><span>B56</span>—<span>Bratt</span></li><li><span>K3</span>—<span>Kate</span></li></ul></li></ul></section>
... """
soup = BeautifulSoup(data, "html.parser")

classification = soup.find_all('section')[0].find_all('span')

for c in classification:
    print (c.text)

It works out:

J36
John
B56
Bratt
K3
Kate

But the wanted:

J36-John
B56-Bratt
K3-Kate

What's the proper BeautifulSoup way to extract the contents, other than below?

contents = [c.text for c in classification]

l = contents[0::2]
ll = contents[1::2]

for a in zip(l, ll):
    print ('-'.join(a))

Solution

You could get the next sibling tag. If it's the dash, it will be printed along with the text, otherwise just the text will be printed .

for c in classification:
    if c.next_sibling:
        print(c.text + str(c.next_sibling), end='')
    else:
        print(c.text)

Answered By - alec

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 21, 2024

[FIXED] BeautifulSoup to get 'span' contents next to each other

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels