Issue
from scrapy.spiders import Spider from ..items import QtItem
class QuoteSpider(Spider):
name = 'acres'
start_urls = ['any_url']
def parse(self, response):
items = QtItem()
all_div_names = response.xpath('//article')
for bks in all_div_names:
name = all_div_names.xpath('//span[@class="css-fwbz9r"]/text()').extract()
price = all_div_names.xpath('//h2[@class="css-yr18fa"]/text()').extract()
sqft = all_div_names.xpath('//div[@class="css-1ty8tu4"]/text()').extract()
bhk = all_div_names.xpath('//a[@class="css-163eyf0"]/text()').extract()
yield {
'ttname': name,
'ttprice': price,
'ttsqft': sqft,
'ttbhk': bhk
}
the question has been answered
Solution
Corrections
- Add in
.//
instead of//
for each variable you're looping over - Use
bks
instead ofall_div_names
. - Use
get()
instead ofextract()
as it's one item within the span.get()
grabs a single item,extract()
grabs multiple items. - Your yield statement is not within the for loop. To yield each variable into the dictionary the yield statement needs be within the for loop.
eg. name = bks.xpath('.//span[@class="css-fwbz9r"]/text()').get()
Tips
.//
traverses all child elements ofall_div_names
XPATH selector. Should always use.//
when you're looping over an XPATH selector with multiple items such asall_div_names
. egname = bks.xpath('.//span[@class="css-fwbz9r"]/text()').get()
You will access all span elements of bks in this XPATH selector by using.//
.- use
getall()
instead ofextract()
andget()
instead ofextract_first()
. Withget()
you will always get a string, withextract()
you wont know if you're getting a list or string unfortunately! - Use an Items dictionary rather than yielding a dictionary. It's easier to do things like pipelines. That is a pipeline modifys data. Eg for modifying what Items will be outputted to a json file etc... A common pipeline is a duplicates pipeline which an example can be found on scrapy docs. You can drop certain items from the item dictionary if it's a duplicate piece of data using this pipeline. I almost never yield a dictionary for scraping projects unless the data is highly structured, requiring no modifications or there is no duplicate information extracted.
- Consider using Scrapy's ItemLoaders for any scraping project where the data you're extracting requires simple modification eg clearing newlines, changing the extracted data slightly . You'll be surprised how often this is.
Code Example
def parse(self, response):
items = QtItem()
all_div_names = response.xpath('//article')
for bks in all_div_names:
name = bks.xpath('.//span[@class="css-fwbz9r"]/text()').get()
price = bks.xpath('.//h2[@class="css-yr18fa"]/text()').get()
sqft = bks.xpath('.//div[@class="css-1ty8tu4"]/text()').get()
bhk = bks.xpath('.//a[@class="css-163eyf0"]/text()').get()
yield {
'ttname': name,
'ttprice': price,
'ttsqft': sqft,
'ttbhk': bhk
}
Answered By - AaronS
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.