Issue
import scrapy
import logging
class AssetSpider(scrapy.Spider):
name = 'asset'
start_urls = ['http://mnregaweb4.nic.in/netnrega/asset_report_dtl.aspx?lflag=eng&state_name=WEST%20BENGAL&state_code=32&district_name=NADIA&district_code=3201&block_name=KRISHNAGAR-I&block_code=&panchayat_name=DOGACHI&panchayat_code=3201009009&fin_year=2020-2021&source=national&Digest=8+kWKUdwzDQA1IJ5qhD8Fw']
def parse(self, response):
i = 4
while i<2236:
assetid = response.xpath("//table[2]//tr['i']/td[2]/text()")
assetcategory = response.xpath("//table[2]//tr['i']/td[3]/text()")
schemecode = response.xpath("//table[2]//tr['i']/td[5]/text()")
link = response.xpath("//table[2]//tr['i']/td[6]/a/@href")
schemename = response.xpath("//table[2]//tr['i']/td[7]/text()")
yield {
'assetid' : assetid,
'assetcategory' : assetcategory,
'schemecode' : schemecode,
'link' : link,
'schemename' : schemename
}
i += 1
I want to use 'i' variable to loop in the xpath of tr[position] from 4 to 2235. i just dont know if it is possible! and if it is possible, then what is the right way to do it? mine does not work.
Solution
Sure, it is possible and widely used.
You can format the string with variable.
There are several syntaxes for that.
For example you can do it like this:
i = 4
while i<2236:
assetid_path = "//table[2]//tr[{1}]/td[2]/text()".format(i)
assetcategory_path = "//table[2]//tr[{1}]/td[3]/text()".format(i)
schemecode_path = "//table[2]//tr[{1}]/td[5]/text()".format(i)
link_path = "//table[2]//tr[{1}]/td[6]/a/@href".format(i)
schemename_path = "//table[2]//tr[{1}]/td[7]/text()".format(i)
assetid = response.xpath(assetid_path)
assetcategory = response.xpath(assetcategory_path)
schemecode = response.xpath(schemecode_path)
link = response.xpath(link_path)
schemename = response.xpath(schemename_path)
yield {
'assetid' : assetid,
'assetcategory' : assetcategory,
'schemecode' : schemecode,
'link' : link,
'schemename' : schemename
}
i += 1
While the above can be shortened like this:
i = 4
while i<2236:
root_path = "//table[2]//tr[{1}]".format(i)
assetid_path = root_path + "/td[2]/text()"
assetcategory_path = root_path + "/td[3]/text()"
schemecode_path = root_path + "/td[5]/text()"
link_path = root_path + "/td[6]/a/@href"
schemename_path = root_path + "/td[7]/text()"
assetid = response.xpath(assetid_path)
assetcategory = response.xpath(assetcategory_path)
schemecode = response.xpath(schemecode_path)
link = response.xpath(link_path)
schemename = response.xpath(schemename_path)
yield {
'assetid' : assetid,
'assetcategory' : assetcategory,
'schemecode' : schemecode,
'link' : link,
'schemename' : schemename
}
i += 1
But the better way is to use bind variable. As following:
i = 4
while i<2236:
assetid = response.xpath("//table[2]//tr[$i]/td[2]/text()",i=i))
assetcategory = response.xpath("//table[2]//tr[$i]/td[3]/text()",i=i))
schemecode = response.xpath("//table[2]//tr[$i]/td[5]/text()",i=i)
link = response.xpath("//table[2]//tr[$i]/td[6]/a/@href",i=i)
schemename = response.xpath("//table[2]//tr[$i]/td[7]/text()",i=i)
yield {
'assetid' : assetid,
'assetcategory' : assetcategory,
'schemecode' : schemecode,
'link' : link,
'schemename' : schemename
}
i += 1
Answered By - Prophet
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.