Issue
THE PROBLEM
I am trying to collect text contents within the div tag at two instances in this website https://doffin.no/Notice/Details/2022-311143. The specific content I want to scrape has differing locations on every variation of this page (e.g., if I want to open a similar page but for another contract), and it seems there are numerous elements with the same "tag" (i.e., "div" and class "eps-text). However, the following strings always appears in front of the information I am looking for:
- "Navn og Adresser" (h3)
- "Den valgte leverandørens navn og adresse" (h3)
WHAT I TRIED
I tried to use findNext with the text strings "Navn og Adresser" and "Den valgte leverandørens navn og adresse", but it yields no result. I understand it is because I search for text instead of a tag.
MY CODE
ua = {"User-Agent":"Mozilla/5.0"}
page = requests.get(new_url, headers=ua)
soup = BeautifulSoup(page.text,'lxml')
a = soup.find("h3", text="Navn og adresser")
print(a.findNext('div',{'class':'eps-text'}))
b = soup.find("h3", text="Den valgte leverandørens navn og adresse")
print(b.findNext('div', {'class':"eps-text"}))
EXAMPLE OUTPUT To illustrate exactly what I am looking for, please see the screenshot I added. In this case I want "Sarpsborg kommune" and "Damslet Skilt AS".
FINAL PLEA
As you probably understand, I am super inexperienced with python, so any help is super appreciated
Solution
Try this:
import requests
from bs4 import BeautifulSoup
your_text = (
BeautifulSoup(
requests.get(
"https://doffin.no/Notice/Details/2022-311143"
).text,
"html.parser"
).select_one(".eps-sub-section-body > div").getText()
)
print(your_text)
Output:
Sarpsborg kommune
EDIT:
You've asked for Damslet Skilt AS
, which wasn't in the original version of the question, but here it is:
import requests
from bs4 import BeautifulSoup
damslet_css = "div.eps-section:nth-child(5) > div:nth-child(2) > div:nth-child(7) > div:nth-child(1)"
find_damslet = (
BeautifulSoup(
requests.get(
"https://doffin.no/Notice/Details/2022-311143"
).text,
"html.parser"
).select_one(damslet_css).getText()
)
print(find_damslet)
Output:
Damslet Skilt AS
Answered By - baduker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.