Issue
I am attempting to scrape a Persian website with the following code:
import urlparse, urllib
parts = urlparse.urlsplit(u'http://fa.wikipedia.org/wiki/صفحهٔ_اصلی')
parts = parts._replace(path=urllib.quote(parts.path.encode('utf8')))
encoded_url = parts.geturl().encode('ascii')
'https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C'
I get this error message in the prompt when I run my crawler:
ModuleNotFoundError: No module named urlparse
And in VS Code there are three underlined words. When I click on them, the following error messages are displayed:
- Unable to import 'scrapy'
- Unable to import 'urlparse'
- Module 'urllib' has no quote member
What is wrong with my code?
Solution
import urllib.parse
parts = urllib.parse.urlsplit(u'http://fa.wikipedia.org/wiki/صفحهٔ_اصلی')
parts = parts._replace(path=urllib.parse.quote(parts.path.encode('utf8')))
encoded_url = parts.geturl().encode('ascii')
'https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C'
print(encoded_url)
This code runs in python 3.* environment as urlparse library was replaced by urllib.parse
Answered By - SahilDesai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.