Issue
I am trying to extract the table from a HTML code using Python BeatifulSoup
{"html":"<table width=\"100%\" cellspacing=\"0\" cellpadding=\"0\" class=\"gr_table_b1\">\n <tbody>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\" class=\"gr_table_colm10\">Bond Type<\/td>\n <td valign=\"top\" align=\"right\">\n US Corporate Debentures\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Debt Type<\/td>\n <td valign=\"top\" align=\"right\">\n \t Senior Unsecured Note\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Industry Group<\/td>\n <td valign=\"top\" align=\"right\">\n \t Industrial\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Industry Sub Group<\/td>\n <td valign=\"top\" align=\"right\">\n \t Transportation\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\" class=\"gr_table_colm10\">Sub-Product Asset<\/td>\n <td valign=\"top\" align=\"right\">\n CORP\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Sub-Product Asset Type<\/td>\n <td valign=\"top\" align=\"right\">\n \t Corporate Bond\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">State<\/td>\n <td valign=\"top\" align=\"right\">\n \t —\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Use of Proceeds<\/td>\n <td valign=\"top\" align=\"right\">\n \t —\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Security Code<\/td>\n <td valign=\"top\" align=\"right\">\n \t —\n <\/td>\n <\/tr>\n <\/tbody>\n<\/table>\n<div class=\"gr_row_b6 gr_table_title\">Special Characteristics<\/div>\n<div class=\"gr_section_b1\">\n <table width=\"100%\" cellspacing=\"0\" cellpadding=\"0\" class=\"gr_table_b1\">\n <tbody>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Medium Term Note<\/td>\n <td valign=\"top\" align=\"right\">\n \t N\n <\/td>\n <\/tr>\n <\/tbody>\n <\/table>\n <\/div>"}
And my desired outcome will be:
| Bond Type | US Corporate Debentures |
| Debt Type | Senior Unsecured Note |
| Industry Group | Industrial |
| Industry Sub Group | Transportation |
| Sub-Product Asset | CORP |
| Sub-Product Asset Type | Corporate Bond |
| State | — |
| Use of Proceeds | — |
| Security Code | — |
| | |
Solution
Assuming you already have located / extracted the JSON, simply use pandas.read_html()
to parse the table:
pd.read_html(json_data['html'])[0]
Example
import pandas as pd
json_data = {"html":"<table width=\"100%\" cellspacing=\"0\" cellpadding=\"0\" class=\"gr_table_b1\">\n <tbody>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\" class=\"gr_table_colm10\">Bond Type<\/td>\n <td valign=\"top\" align=\"right\">\n US Corporate Debentures\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Debt Type<\/td>\n <td valign=\"top\" align=\"right\">\n \t Senior Unsecured Note\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Industry Group<\/td>\n <td valign=\"top\" align=\"right\">\n \t Industrial\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Industry Sub Group<\/td>\n <td valign=\"top\" align=\"right\">\n \t Transportation\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\" class=\"gr_table_colm10\">Sub-Product Asset<\/td>\n <td valign=\"top\" align=\"right\">\n CORP\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Sub-Product Asset Type<\/td>\n <td valign=\"top\" align=\"right\">\n \t Corporate Bond\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">State<\/td>\n <td valign=\"top\" align=\"right\">\n \t —\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Use of Proceeds<\/td>\n <td valign=\"top\" align=\"right\">\n \t —\n <\/td>\n <\/tr>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Security Code<\/td>\n <td valign=\"top\" align=\"right\">\n \t —\n <\/td>\n <\/tr>\n <\/tbody>\n<\/table>\n<div class=\"gr_row_b6 gr_table_title\">Special Characteristics<\/div>\n<div class=\"gr_section_b1\">\n <table width=\"100%\" cellspacing=\"0\" cellpadding=\"0\" class=\"gr_table_b1\">\n <tbody>\n <tr class=\"gr_table_row4\">\n <td valign=\"top\">Medium Term Note<\/td>\n <td valign=\"top\" align=\"right\">\n \t N\n <\/td>\n <\/tr>\n <\/tbody>\n <\/table>\n <\/div>"}
pd.read_html(json_data['html'])[0]
Output
0 | 1 | |
---|---|---|
0 | Bond Type | US Corporate Debentures |
1 | Debt Type | Senior Unsecured Note |
2 | Industry Group | Industrial |
3 | Industry Sub Group | Transportation |
4 | Sub-Product Asset | CORP |
5 | Sub-Product Asset Type | Corporate Bond |
6 | State | — |
7 | Use of Proceeds | — |
8 | Security Code | — Special Characteristics Medium Term Note N |
9 | Medium Term Note | N |
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.