Issue
In this HTML I am trying to parse the text fields and the impact but impact is not text its an image
<td class="fxs_c_item fxs_c_time"><span>01:00</span></td>,
<td class="fxs_c_item fxs_c_flag"><span class="fxs_flag fxs_us" title="United States"></span></td>,
<td class="fxs_c_item fxs_c_currency"><span>USD</span></td>,
<td class="fxs_c_item fxs_c_name"><span>New Year's Day</span><span> <span></span></span></td>,
<td class="fxs_c_item fxs_c_impact"><span class="fxs_c_impact-icon fxs_c_impact-none"></span></td>,
<td class="fxs_c_item fxs_c_type" colspan="4"><span class="fxs_c_label fxs_c_label_info">All Day</span></td>,
<td class="fxs_c_item fxs_c_notify"></td>,
<td class="fxs_c_item fxs_c_dashboard" data-gtmid="features-calendar-eventdetails-eventoptions-4d3300ad-c168-4a5f-a4ac-a60a338e63c4"><span><svg aria-hidden="true" class="fxs_icon svg-inline--fa fa-ellipsis-h fa-w-16" data-icon="ellipsis-h" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg"><path d="M328 256c0 39.8-32.2 72-72 72s-72-32.2-72-72 32.2-72 72-72 72 32.2 72 72zm104-72c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72zm-352 0c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72z" fill="currentColor"></path></svg></span></td>]
I am able to get all the table
text with this line
cols = [ele.text.strip() for ele in cols]
but substituting span.text does not work I need the span
value of
fxs_c_impact-icon fxs_c_impact-none
for impact for each row of text
I am trying extract all the span
text from a table
data3 = []
table3 = soup.find('table', attrs={'class':'fxs_c_table'})
table_body3 = table3.find('tbody')
rows = table_body3.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.span.text for ele in cols]
data3.append([ele for ele in cols if ele])
The span
item looks like this
<span class="fxs_c_impact-icon fxs_c_impact-medium"></span>
Error I get
AttributeError: 'NoneType' object has no attribute 'text'
The script works if I want to extract text from text fields from the table but I cant seem to extract this span text value.
Solution
As mentioned in the comments, try to select the elements more specific.
.find('span',{'class' : 'fxs_c_impact-icon'}).get('class')[-1]
Because the main issue is that there is one td
that do not have a span
:
<td class="fxs_c_item fxs_c_notify"></td>
So ele.span
will become None
and you could not call .text
on it.
Example
from bs4 import BeautifulSoup
html = '''
<tr>
<td class="fxs_c_item fxs_c_time"><span>01:00</span></td>
<td class="fxs_c_item fxs_c_flag"><span class="fxs_flag fxs_us" title="United States"></span></td>
<td class="fxs_c_item fxs_c_currency"><span>USD</span></td>
<td class="fxs_c_item fxs_c_name"><span>New Year's Day</span><span> <span></span></span></td>
<td class="fxs_c_item fxs_c_impact"><span class="fxs_c_impact-icon fxs_c_impact-none"></span></td>
<td class="fxs_c_item fxs_c_type" colspan="4"><span class="fxs_c_label fxs_c_label_info">All Day</span></td>
<td class="fxs_c_item fxs_c_notify"></td>
<td class="fxs_c_item fxs_c_dashboard" data-gtmid="features-calendar-eventdetails-eventoptions-4d3300ad-c168-4a5f-a4ac-a60a338e63c4"><span><svg aria-hidden="true" class="fxs_icon svg-inline--fa fa-ellipsis-h fa-w-16" data-icon="ellipsis-h" data-prefix="fas" focusable="false" role="img" viewbox="0 0 512 512" xmlns="http://www.w3.org/2000/svg"><path d="M328 256c0 39.8-32.2 72-72 72s-72-32.2-72-72 32.2-72 72-72 72 32.2 72 72zm104-72c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72zm-352 0c-39.8 0-72 32.2-72 72s32.2 72 72 72 72-32.2 72-72-32.2-72-72-72z" fill="currentColor"></path></svg></span></td>
</tr>
'''
soup = BeautifulSoup(html)
data = []
for e in soup.find_all('tr'):
data.append(
{
'time': e.span.text,
'title': e.find('span',{'class' : 'fxs_flag'}).get('title'),
'currency': e.find('td',{'class' : 'fxs_c_currency'}).text,
'...': '...',
'impact': e.find('span',{'class' : 'fxs_c_impact-icon'}).get('class')[-1]
}
)
data
Output
[{'time': '01:00','title': 'United States','currency': 'USD', '...':'...','impact': 'fxs_c_impact-none'}]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.