Issue
I am working on a webscrappig to extract a value in a nested html tag imported from the HTML file. Here is the snippet of the HTML
<table class="Total_coverage" cellspacing="0" id="coveragetable">
<thead>
<tr>
<td class="sortable" id="a" onclick="toggleSort(this)">Element</td>
<td class="down sortable bar" id="b" onclick="toggleSort(this)">Nike</td>
<td class="sortable ctr2" id="c" onclick="toggleSort(this)">Value.</td>
<td class="sortable bar" id="d" onclick="toggleSort(this)">Adidas</td>
<td class="sortable ctr2" id="e" onclick="toggleSort(this)">Value.</td>
<td class="sortable ctr1" id="f" onclick="toggleSort(this)">Russia</td>
<td class="sortable ctr2" id="g" onclick="toggleSort(this)">UAE</td>
<td class="sortable ctr1" id="h" onclick="toggleSort(this)">Japan</td>
<td class="sortable ctr2" id="i" onclick="toggleSort(this)">India</td>
</tr>
</thead>
<tfoot>
<tr>
<td>Total</td>
<td class="bar">2323</td>
<td class="ctr2">12%</td>
<td class="bar">233</td>
<td class="ctr2">61%</td>
<td class="ctr1">222</td>
<td class="ctr2">322</td>
<td class="ctr1">233</td>
<td class="ctr2">455</td>
</tr>
</tfoot>
I want to extract the 12% in the <td class="ctr2">12%</td>
. I tried with the below below step and got all value under </tfoot>
as <tfoot><tr><td>Total</td><td class="bar">2323</td><td class="ctr2">12%</td><td class="bar"> 233</td>.... </tfoot>
with open('index.html', 'r') as f:
contents = f.read()
print(contents)
soup = BeautifulSoup(contents, 'lxml')
mydivs = soup.select_one("tfoot", {"class": "ctr2"})
print("mydivs", mydivs)
Then I tried the below script and got <td class="ctr2">12%</td>
mydivs = soup.select_one('td[class="ctr2"]')
print("mydivs", mydivs)
Let me know where I am missing and How to get the only 12% and also all the values in td. I am using Python to extract the data
Solution
How to get the only 12% and also all the values in td. I am using Python to extract the data
Cause question is not that focused and expected output not that clear, there are a lot of ways to get the data.
Get all the values in your tfoot
as list
using stripped_strings
:
list(soup.tfoot.stripped_strings)
#['Total', '2323', '12%', '233', '61%', '222', '322', '233', '455']
Get your explicit value while picking by index
:
list(soup.tfoot.stripped_strings)[2]
#12%
Get your explicit value by css selector
directly:
soup.select_one('tfoot td:nth-of-type(3)').text
#12%
or
soup.select_one('tfoot td.ctr2').text
#12%
Example
from bs4 import BeautifulSoup
html='''
<table class="Total_coverage" cellspacing="0" id="coveragetable">
<thead>
<tr>
<td class="sortable" id="a" onclick="toggleSort(this)">Element</td>
<td class="down sortable bar" id="b" onclick="toggleSort(this)">Nike</td>
<td class="sortable ctr2" id="c" onclick="toggleSort(this)">Value.</td>
<td class="sortable bar" id="d" onclick="toggleSort(this)">Adidas</td>
<td class="sortable ctr2" id="e" onclick="toggleSort(this)">Value.</td>
<td class="sortable ctr1" id="f" onclick="toggleSort(this)">Russia</td>
<td class="sortable ctr2" id="g" onclick="toggleSort(this)">UAE</td>
<td class="sortable ctr1" id="h" onclick="toggleSort(this)">Japan</td>
<td class="sortable ctr2" id="i" onclick="toggleSort(this)">India</td>
</tr>
</thead>
<tfoot>
<tr>
<td>Total</td>
<td class="bar">2323</td>
<td class="ctr2">12%</td>
<td class="bar">233</td>
<td class="ctr2">61%</td>
<td class="ctr1">222</td>
<td class="ctr2">322</td>
<td class="ctr1">233</td>
<td class="ctr2">455</td>
</tr>
</tfoot>
</table>
'''
soup = BeautifulSoup(html)
print(list(soup.tfoot.stripped_strings))
print(list(soup.tfoot.stripped_strings)[2])
Output
['Total', '2323', '12%', '233', '61%', '222', '322', '233', '455']
12%
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.