Issue
I'm new on scrapy and i have to crawl a webpage for a test. So I use the code below on a terminal but its returns a empty list i Don't understand why. When i use the same command on a another website, like amazon, with the right selector, it works. Can someone put light on it? thank you so much
scrapy shell "'https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas"
response.css('.tileList-title').extract()
Solution
First of all, when I consulted the source code of the page you seemed interested to scrape the title Iced Teas
in a header tags <h1>
. Am I right ?
Second, I tried scrapy shell sessions to understand the issue. It seems to be a settings of user-agent request's headers. Look at the code sessions below:
Without user-agent set
scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas
In [1]: response.css('.tileList-title').extract()
Out[1]: []
view(response) #open the given response in your local web browser, for inspection.
With user agent set
scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas -s USER_AGENT='Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
In [1]: response.css('.tileList-title').extract()
Out[1]: ['<h1 class="tileList-title" ng-if="$ctrl.listTitle" tabindex="-1">Iced Teas</h1>']
#now as you can see it does not return an empty list.
view(response)
So to improve your future practices, know you can use -s KEYWORDSETTING=value
in your scrapy shell sessions. Here the settings key words for scrapy.
And to check with view(response)
to see if the requests returns the expected content even if it sent a 200. For my experience, with view(response)
you can see that the content page, and even source code sometimes, is a little different when you use it in scrapy shell than when you use it in a normal browser. So that's a good practice to check with this shortcut. Here the shorcuts for scrapy. They are mentioned at each scrapy shell session too.
Answered By - AvyWam
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.