Issue
The goal of the Scraper is to analyze which board games got most thumbs up and print them out in sorted list. Basically a dictionary of "name: thumbs up". Here is the list of games I want to sort: https://boardgamegeek.com/geeklist/268396/20-most-anticipated-games-2020-11th-year-nominatio
I am using Framework Scrapy in Python. I found that the following commands is good to extract the title and thumbs up:
response.css('.fl > a:nth-child(2)::text').getall()
response.css('.recs a::text').getall()
The problem arise when a game got 0 thumbs, then Scrapy just skip that thumb up. Meaning that the list of titles are more than the list of thumbs up. For example, I could get a list of 25 titles and just a list of 20 thumbs ups with the commands above. Is there a way to convert empty strings to a default value of 0 so that the list of names and list of thumbs up is equal? Like:
response.css('.recs a::text').getall(default="0")
When there is no thumbs up, it looks like this:
<a aria-label="Recommendations and tip info" class="js-score" href="javascript://" onclick="RecSpy( 'listitem', '7520669', 'tippers' ); return false;"></a>
Solution
Instead of collecting each board game and their likes separately from the main DOM, you could instead fetch every selector that contains both the likes and the name of the board game, e.g.
games = response.css('.mb5') # fetch every selector with class "mb5"
for game in games:
name = game.css('.fl > a:nth-child(2)::text').get()
likes = game.css('.recs a::text').get() or 0
...
Pseudo-code, but I hope you get the idea.
Answered By - Krisz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.