Sunday, March 20, 2022

[FIXED] How to find a line of text inside multiple div classes python

March 20, 2022 beautifulsoup, python, web-scraping No comments

Issue

Hello everyone I'm trying to pull certain text info from a website not all of the text is needed but I'm confused about how to do so when the text is in multiple divs. here is the code I'm looking at. But I get confused when there are multiple rows inside. I need to pull the "Number" title and the text (which is 837270), and the "Location" title and the text which is (Ohio)

                   <br>
                <br>
              </p>
            </div>
          </div>
          <div class="row">
            <div class="col-md-4">
                <p>
                  <span class="text-muted">Number</span>
                  <br>
                  "837270"
                </p>
            </div>
            <div class="col-md-4">
              <p>
                <span class="text-muted">Location</span>
                <br>
                "Ohio"
              </p>
            </div>
              <div class="col-md-4">
                <p>
                  <span class="text-muted">Office</span>
                <be>
                   "Joanna" 
                </p>
              </div>
          </div>
          <div class="row">
            <div class="col-md-4">
              <p>
                <span class="text-muted">Date</span>
              <be>
                "07/01/2022"
              </p>
            </div>
            <div class="col-md-4">
                <p>
                  <span class="text-muted">Type</span>
                <br>
                  "Business"
                </p>
            </div>
            <div class="col-md-4">
                <p>
                  <span class="text-muted">Status</span>
                  <br>
                  "Open"
                </p>
            </div>
          </div>
        </div>
      </div>

    </div>

I've tried this and it prints out none.

soup = BeautifulSoup(driver.page_source,'html.parser')  
df = soup.find('div', id = "Location")
print(df.string)

I want to pull it and save it. any help would be appreciated thank you.

Solution

Sometimes HTML won't have IDs or other patterns that can be followed easily. You can get pretty clever with this though, you don't have to rely on HTML pages using table structures.

In this case, for example, it appears each section is titled by a <span class="text-muted"> tag and its value is the last sibling of that span tag.

To scrape each of these titles and their values, we can do something like this:

import bs4
from bs4 import BeautifulSoup
soup = BeautifulSoup(..., 'lxml')

for title_tag in soup.find_all('span', class_='text-muted'):

    # get the last sibling
    *_, value_tag = title_tag.next_siblings

    title = title_tag.text.strip()

    if isinstance(value_tag, bs4.element.Tag):
        value = value_tag.text.strip()
    else:  # it's a navigable string element
        value = value_tag.strip()

    print(title, value)

Output:

Number "837270"
Location "Ohio"
Office "Joanna"
Date "07/01/2022"
Type "Business"
Status "Open"

There are of course other patterns you could identify here to reliably get the values. This is just one example.

If you wanted to get just the Location, you could locate it by its text.

location_tag = soup.find('span', class_='text-muted', text='Location')

Then getting its value is the same in the above.

*_, location_value_element = location_tag.next_siblings
print(location_value_element.strip()) # "Ohio"

Answered By - sytech

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, March 20, 2022

[FIXED] How to find a line of text inside multiple div classes python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels