Issue
I have an HTML temp like below
<tr>
<td width="45">
<p style="text-align: center;"><strong>STT</strong></p>
</td>
<td width="204">
<p style="text-align: center;"><strong>Tên bệnh viện</strong></p>
</td>
<td width="364">
<p style="text-align: center;"><strong>Địa chỉ</strong></p>
</td>
</tr>,
<tr>
<td width="45"><strong> </strong>
<p><strong>1</strong></p>
</td>
<td width="204"><strong> </strong>
<h3><span id="list hospital"><strong> ABC HOSPITAL</strong></span></h3>
</td>
<td width="364">
<img alt="abc hospital" class="aligncenter size-full wp-image-5549" height="470" sizes="(max-width: 705px) 100vw, 705px" src="https://suckhoe2t.net/wp-content/uploads/2017/11/benh-vien-an-binh-suckhoe2t.jpg" srcset="https://suckhoe2t.net/wp-content/uploads/2017/11/benh-vien-an-binh-suckhoe2t.jpg 705w, https://suckhoe2t.net/wp-content/uploads/2017/11/benh-vien-an-binh-suckhoe2t-696x464.jpg 696w, https://suckhoe2t.net/wp-content/uploads/2017/11/benh-vien-an-binh-suckhoe2t-630x420.jpg 630w" width="705"/>
<p><iframe allowfullscreen="allowfullscreen" frameborder="0" height="450" src="https://www.google.com/maps/embed?pb=!1m18!221m12!1m3!1d3919.743463626014!2d106.66938211450157!3d10.75424379233658!2m3!1f0!2f0!3f0!3m2!1i1024!2i768!4f13.1!3m3!1m2!1s0x31752efc4039dee3%3A0x9157c2008d49be79!2sAn+Binh+Hospital!5e0!3m2!1sen!2s!4v1557202875759!5m2!1sen!2s" style="border: 0;" width="600"></iframe></p>
<ul>
<li>address: 1345 Golden View , LA</li>
<li>phonenumber: 3923 4260</li>
<li>Email: [email protected]</li>
</ul>
<ul>
<li>Website: <a href="xxxxxx" rel="noopener" target="_blank">xxxxxxxxx</a></li>
I would like to have an output like this:
ABC HOSPITAL
address: 1345 Golden View , LA
phonenumber: 3923 4260
Email: [email protected]
Because it has many li
tags, I don't know how to get exactly all the fields I wish. Could you please help assist on this?
My code like below:
res = '''html code above'''
soup = BeautifulSoup(res, 'html.parser')
data = soup.find_all('tr')
for temp in data:
each = temp.find('h3')
print(each)
Output I got:
None
<h3><span id="list hospital"><strong> ABC HOSPITAL</strong></span></h3>
Solution
This should work.
soup = BeautifulSoup(res, 'html.parser')
data = soup.find_all('tr')
accepted_li = ('address', 'phonenumber', 'email') # tuple of "li" informations you want to get
for tr in data:
hospital_span = tr.find('span', {'id': 'list hospital'}) # get span of the hospital name
if hospital_span is not None:
print(hospital_span.find('strong').text.strip())
for li in tr.find_all('li'): # iterate over every li
if li.text.lower().startswith(accepted_li): # check if li element starts with any value in tuple
print(li.text)
Answered By - darthbane426
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.