Issue
I'm trying to teach myself how to web scrape stock data. I'm quite a newbie so please excuse any stupid questions I may ask.
Here's my code for scraping the price and I'm trying to scrape the PE ratio as well.
import urllib.request
from bs4 import BeautifulSoup
start = 'http://www.google.com/finance?cid=694653'
page = urllib.request.urlopen(start)
soup = BeautifulSoup(page)
P = soup.find('span',{'id':'ref_694653_l'})
print(P.get_text())
pe = soup.find_all('td',{'class':'val'})
print(pe[5].get_text())
pe = soup.find('td',{'data-snapfield':'pe_ratio'})
print(pe.td.next_sibling.get_text())
I can get the price data, and i managed to get the PE ratio but not directly. I tried to use next_sibling
and next_element
but it gives me an error saying there is no attribute.
I'm having trouble figuring out how to scrape data from a table as it's usually set up in rows and the classes around the data are usually very common like <td>
or <tr>
.
So just wanted to ask for some help in scraping the PE ratio.
Thanks guys
YS
Solution
This will help:
>>> pe = soup.find('td',{'data-snapfield':'pe_ratio'})
>>> pe
<td class="key" data-snapfield="pe_ratio">P/E
</td>
>>> print(pe.td.next_sibling.get_text())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'next_sibling'
>>>
>>>
>>>
>>> pe
<td class="key" data-snapfield="pe_ratio">P/E
</td>
>>> pe.td
>>> pe.next_sibling
u'\n'
>>> pe.next_sibling.next_sibling
<td class="val">29.69
</td>
>>> pe.next_sibling.next_sibling.get_text()
u'29.69\n'
Answered By - KGo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.