Issue
I am trying to convert HTML table to json using beautifulsoup() function python, I was able to convert but the data coming in wrong json format.
from bs4 import BeautifulSoup
import json
reading_table = """\
<table>
<tbody>
<tr>
<td><span class="customlabel">Energy Source</span></td>
<td><span class="custominput">EB</span></td>
<td><span class="customlabel">Grid Reading </span></td>
<td><span class="custominput">2666.2</span></td>
<td><span class="customlabel">DG Reading </span></td>
<td><span class="custominput">15.5</span></td>
</tr>
<tr>
<td><span class="customlabel">Power Factor</span></td>
<td><span class="custominput">0.844</span></td>
<td><span class="customlabel">Total Kw</span></td>
<td><span class="custominput">0.273</span></td>
<td><span class="customlabel">Total KVA</span></td>
<td><span class="custominput">0.34</span></td>
</tr>
<tr>
<td><span class="customlabel">Average Voltage</span></td>
<td><span class="custominput">241.7</span></td>
<td><span class="customlabel">Total Current</span></td>
<td><span class="custominput">1.54</span></td>
<td><span class="customlabel">Frequency Hz</span></td>
<td><span class="custominput">50</span></td>
</tr>
</tbody>
</table>
"""
reading_table_data = [
[cell.text for cell in row("td")]
for row in BeautifulSoup(reading_table, features="html.parser")("tr")
]
print(reading_table_data)
The above code prints JSON in the below format.
[['Energy Source', 'EB', 'Grid Reading ', '2666.2', 'DG Reading ', '15.5'], ['Power Factor', '0.844', 'Total Kw', '0.273', 'Total KVA', '0.34'], ['Average Voltage', '241.7', 'Total Current', '1.54', 'Frequency Hz', '50']]
I would like to get it in below format
[
'Energy Source': 'EB',
'Grid Reading ': '2666.2'
'DG Reading ', '15.5',
'Power Factor', '0.844',
'Total Kw', '0.273',
'Total KVA', '0.34',
'Average Voltage', '241.7',
'Total Current', '1.54',
'Frequency Hz', '50'
]
Some help is appreciated
Solution
The output you want is not a valid format, so you can print it after converting the dict to string and replacing the braces.
Here is the working code:
tds = BeautifulSoup(reading_table, features="html.parser").findAll("td")
data = {}
for td in tds:
if "customlabel" in td.span.get("class"):
attr_key = td.span.text
data[attr_key] = ""
if "custominput" in td.span.get("class"):
attr_value = td.span.text
data[attr_key] = attr_value
print(json.dumps(data).replace("{", "[").replace("}", "]"))
Answered By - defiant
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.