Issue
I want to extract some data from a website. I saved it as 'Webpage, HTML Only', in a file called soccerway.html on my Desktop.
Afterwards I wrote the following command using an IPython notebook:
from bs4 import BeautifulSoup
soup=BeautifulSoup(open("soccerway.html"))
I get the following error:
IOError: [Errno 2] No such file or directory: 'soccerway.html'
How can I solve this?
Solution
You don't need to manually save a page. Use urllib2 to get the html source you need:
from bs4 import BeautifulSoup
from urllib2 import urlopen
soup = BeautifulSoup(urlopen("http://my_site.com/mypage"))
Example:
>>> from bs4 import BeautifulSoup
>>> from urllib2 import urlopen
>>> soup = BeautifulSoup(urlopen('http://google.com'))
>>> soup('a')
[<a class="gb1" href="http://www.google.com/imghp?hl=en&tab=wi">Images</a>,
...
]
Answered By - alecxe
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.