Issue
from bs4 import BeautifulSoup
import re
text = "<tr>
<td style="width:127.5pt;padding:3.75pt 0in 3.75pt 0in" width="170">
<p class="MsoNormal"><span style="font-size:11.0pt">Job #<o:p></o:p></span></p>
</td>
<td style="padding:3.75pt 0in 3.75pt 3.75pt">
<p class="MsoNormal"><strong><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>TEST-12311</span></strong><span style="font-size:11.0pt"><o:p></o:p></span></p>
</td>
</tr>"
soup = BeautifulSoup(text,"html.parser")
print(soup)
job_number = soup.find("span", string="Job #")
print(job_number)
When I search for Job #
it is showing None
. But there is a <span>
with text Job #
.
Is there any solution to find <span>
text which is followed by <td>
.
Solution
I have to check Job # is there or not in the html content
You could use css selectors
with pseudo class, to check if element contains a string:
soup.select_one('span:-soup-contains("Job #")')
or to check if it also has a sibling <td>
:
soup.select_one('td:-soup-contains("Job #"):has(+ td)')
The other way around the combination that selects the sibling <td>
of a <td>
that contains a <span>
with your string:
soup.select_one('td:has(span:-soup-contains("Job #")) + td').get_text(strip=True)
or not that strict:
soup.select_one('td:-soup-contains("Job #") + td').get_text(strip=True)
both above will give you TEST-12311
just in case that your string was found in previous sibling <td>
.
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.