Issue
I am trying to get the GDP Estimate (Under IMF) from the following page: https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)
However, I am only getting the first row (93,863,851). Here's the Scrapy Spider code:
def parse(self, response):
title = response.xpath("(//tbody)[3]")
for country in title:
yield {'GDP': country.xpath(".//td[3]/text()").get()}
On other hand, I can use getall() method to get all the data but this brings all data points into one single cell when I export it to CSV/XLSX. So this is not a solution for me.
How can I get all the datapoints via the loop? Please help.
Solution
Your selector is not correct. You should loop through the table rows and yield the data that you need. See sample below.
import scrapy
class TestSpider(scrapy.Spider):
name = 'test'
start_urls = ['https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)']
def parse(self, response):
for row in response.xpath("//caption/parent::table/tbody/tr"):
yield {
"country": row.xpath("./td[1]/a/text()").get(),
"region": row.xpath("./td[2]/a/text()").get(),
"imf_est": row.xpath("./td[3]/text()").get(),
"imf_est_year": row.xpath("./td[4]/text()").get(),
"un_est": row.xpath("./td[5]/text()").get(),
"un_est_year": row.xpath("./td[6]/text()").get(),
"worldbank_est": row.xpath("./td[7]/text()").get(),
"worldbank_est_year": row.xpath("./td[8]/text()").get(),
}
Answered By - msenior_
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.