Issue
I'm having issues scraping the html IDs off of the below html file, because there are 2 lines of code that do not have any ID under 14 Jun 2020
, meaning that there are no further appointment slots after8.15am on 14 June
, appointments resume on 15 June.
<table class="table table-borderless table-striped no-background clear-padding-first-child available-slots-mobile main-table clone">
<thead>
<tr>
<th width="14%" class="text-left nowrap fixed-side">Session Date</th>
<th width="14%" class="text-center">
<b>1</b>
</th>
<th width="14%" class="text-center">
<b>2</b>
</tr>
</thead>
<tbody class="tr-border-bottom">
<tr>
<th class="pb-15 text-left fixed-side">
<a href="javascript:changeDate('13 Jun 2020');">13 Jun 2020</a>
<br> Saturday
</th>
<td class="pb-15 text-center">
<a href="#" id="1217464_1_13/6/2020 12:00:00 AM" class="slotBooking">
8:15 AM ✔
</a>
</td>
</tr>
<tr>
<th class="pb-15 text-left fixed-side">
<a href="javascript:changeDate('14 Jun 2020');">13 Jun 2020</a>
<br> Sunday
</th>
<td class="pb-15 text-center">
<a href="#" id="1217482_1_14/6/2020 12:00:00 AM" class="slotBooking">
8:15 AM ✔
</a>
</td>
<td class="pb-15 text-center"><span class="c-gray">n/a</span></td>
<td class="pb-15 text-center"><span class="c-gray">n/a</span></td>
</tr>
<tr>
<th class="pb-15 text-left fixed-side">
<a href="javascript:changeDate('15 Jun 2020');">15 Jun 2020</a>
<br> Monday
</th>
<td class="pb-15 text-center">
<a href="#" id="1217506_1_15/6/2020 12:00:00 AM" class="slotBooking">
8:15 AM ✔
</a>
</td>
</tr>
</tbody>
</table>
I have come up with the below code, but only the html IDs of the appointments up till the 8.15am 14th June 2020 slot
will be printed. I then encounter a TypeError (NoneType object is not iterable) after the ID of the 8.15am 14 June
slot has been printed, and no IDs of the 15th June slots are printed.
for slots in soup.findAll(attrs={"class" : "pb-15 text-center"}):
tags = slots.find("a")
for IDS in tags:
IDS = tags.attrs["id"]
print (IDS)
I also tried exception handling here, but I encounter a syntax error (and i'm not too sure what i've done wrong exactly).
for slots in soup.findAll(attrs={"class" : "pb-15 text-center"}):
tags = slots.find("a")
for IDS in tags:
try:
IDS = tags.attrs["id"]
except TypeError:
else:
print (IDS)
Solution
Just check the tag is there a tag with id
attribute then print that.
data='''<table class="table table-borderless table-striped no-background clear-padding-first-child available-slots-mobile main-table clone">
<thead>
<tr>
<th width="14%" class="text-left nowrap fixed-side">Session Date</th>
<th width="14%" class="text-center">
<b>1</b>
</th>
<th width="14%" class="text-center">
<b>2</b>
</tr>
</thead>
<tbody class="tr-border-bottom">
<tr>
<th class="pb-15 text-left fixed-side">
<a href="javascript:changeDate('13 Jun 2020');">13 Jun 2020</a>
<br> Saturday
</th>
<td class="pb-15 text-center">
<a href="#" id="1217464_1_13/6/2020 12:00:00 AM" class="slotBooking">
8:15 AM ✔
</a>
</td>
</tr>
<tr>
<th class="pb-15 text-left fixed-side">
<a href="javascript:changeDate('14 Jun 2020');">13 Jun 2020</a>
<br> Sunday
</th>
<td class="pb-15 text-center">
<a href="#" id="1217482_1_14/6/2020 12:00:00 AM" class="slotBooking">
8:15 AM ✔
</a>
</td>
<td class="pb-15 text-center"><span class="c-gray">n/a</span></td>
<td class="pb-15 text-center"><span class="c-gray">n/a</span></td>
</tr>
<tr>
<th class="pb-15 text-left fixed-side">
<a href="javascript:changeDate('15 Jun 2020');">15 Jun 2020</a>
<br> Monday
</th>
<td class="pb-15 text-center">
<a href="#" id="1217506_1_15/6/2020 12:00:00 AM" class="slotBooking">
8:15 AM ✔
</a>
</td>
</tr>
</tbody>
</table>'''
soup=BeautifulSoup(data,'html.parser')
for slots in soup.findAll(attrs={"class" : "pb-15 text-center"}):
tag= slots.find("a",id=True)
if tag:
print(tag.attrs["id"])
You can achieve the same using single css selector.
for slots in soup.select('.pb-15.text-center>a[id]'):
if slots:
print(slots.attrs["id"])
Output:
1217464_1_13/6/2020 12:00:00 AM
1217482_1_14/6/2020 12:00:00 AM
1217506_1_15/6/2020 12:00:00 AM
Update
for slots in soup.findAll(attrs={"class" : "pb-15 text-center"}):
tag= slots.find("a",attrs={"id",True})
if tag:
print(tag.attrs["id"])
Answered By - KunduK
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.