Issue
I am trying to get data from the following script. I have divided the XPath into 02 parts in the parse function. 1st part contains the fixed data which I don't want to loop and the 2nd part contains a table which I want to loop. When I ran the script, it only gives the 2nd part data. I have used Splash to render the HTML.
import scrapy
from scrapy_splash import SplashRequest
class RaceSpider(scrapy.Spider):
name = 'race'
allowed_domains = ['www.racing.com']
script = '''
function main(splash, args)
splash.private_mode_enabled = false
assert(splash:go(args.url))
assert(splash:wait(5))
splash:set_viewport_full()
return splash:html()
end
'''
def start_requests(self):
yield SplashRequest(
url= 'https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results',
callback=self.parse, endpoint='execute', args={
'lua_source': self.script
}
)
def parse(self, response):
information = response.xpath("//div[@class='race-results-table ng-scope']/table")
yield{
#part 1
'Race Number': response.xpath("(.//span[@class='number-circle xlg'])[1]/text()").get(),
'Title': response.xpath("(.//div[@class='popup ng-scope']/h1)[1]/text()").get(),
'Result Distance Thumbnail': response.xpath(".//div[@class='ng-scope']/p/text()").get(),
'Track Condition': response.xpath(".//div[@class='condition']/div/p/span/text()").get(),
'Rail': response.xpath("(.//div[@class='rail']/div/p/span)[1]/text()").get(),
}
for info in information:
yield{
#part 2
'Position': info.xpath("(.//td[@class='td-position tcenter']/span)[1]/text()").get(),
'Horse Entry Number': info.xpath("(.//td[@class='horse-name']/h3/a/span)[1]/text()").get(),
'Horse Full Name': info.xpath("(.//td[@class='horse-name']/h3/a/span)[2]/text()").get(),
'Horse Barrier Number': info.xpath("(.//td[@class='horse-name']/h3/a/span)[3]/text()").get(),
'Trainers': info.xpath("(.//td[@class='horse-details']/span/a)[1]/text()").get(),
'Jockey': info.xpath("(.//td[@class='horse-details']/span/a)[2]/text()").get(),
}
output
2021-09-08 22:58:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results via http://localhost:8050/execute> (referer: None)
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Race Number': '1', 'Title': 'Flemington', 'Date': 'Sat, 8th Aug', 'Result Time': '2:05am', 'Result Distance': '2530m\xa0\xa0', 'Race Name': 'TAB Handicap', 'Result Distance Thumbnail': '2530m', 'Track Condition': 'Soft 7', 'Rail': 'Out 10m Entire Circuit\n ', 'Track Record': 'Unavailable', 'Price Money': '$135,000'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': None, 'Horse Entry Number': None, 'Horse Full Name': None, 'Horse Barrier Number': None, 'Trainers': None, 'Jockey': None, 'Gear': None, 'WGT': None, 'Price': None, '800m': None, '400m': None, 'Margin': None, 'SP': None}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '1st', 'Horse Entry Number': '5. ', 'Horse Full Name': 'Exemplar (IRE)', 'Horse Barrier Number': ' (7)', 'Trainers': 'C.Maher & D.Eustace', 'Jockey': 'J.Allen', 'Gear': '1', 'WGT': '56.5kg', 'Price': '$74,250', '800m': '1st', '400m': '1st', 'Margin': '2:45.74', 'SP': '$7.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '2nd', 'Horse Entry Number': '3. ', 'Horse Full Name': 'Double You Tee', 'Horse Barrier Number': ' (6)', 'Trainers': 'P.Payne', 'Jockey': 'W.J.Egan', 'Gear': '0', 'WGT': '57.5kg', 'Price': '$24,300', '800m': '6th', '400m': '4th', 'Margin': '1.25L', 'SP': '$4.80'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '3rd', 'Horse Entry Number': '6. ', 'Horse Full Name': 'Bertwhistle', 'Horse Barrier Number': ' (4)', 'Trainers': 'D.I.Dodson', 'Jockey': 'L.J.Neindorf', 'Gear': '0', 'WGT': '54kg', 'Price': '$12,150', '800m': '4th', '400m': '3rd', 'Margin': '4.75L', 'SP': '$11.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '4th', 'Horse Entry Number': '7. ', 'Horse Full Name': 'Flag Edition (NZ)', 'Horse Barrier Number': ' (2)', 'Trainers': 'M.Payne', 'Jockey': 'M.Payne', 'Gear': '0', 'WGT': '56kg', 'Price': '$6,750', '800m': '5th', '400m': '6th', 'Margin': '4.85L', 'SP': '$21.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '5th', 'Horse Entry Number': '8. ', 'Horse Full Name': 'Blandford Lad (NZ)', 'Horse Barrier Number': ' (3)', 'Trainers': 'P.Gelagotis', 'Jockey': 'W.T.Price', 'Gear': '2', 'WGT': '53kg', 'Price': '$4,050', '800m': '7th', '400m': '7th', 'Margin': '5.6L', 'SP': '$10.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '6th', 'Horse Entry Number': '4. ', 'Horse Full Name': 'South Pacific (GB)', 'Horse Barrier Number': ' (5)', 'Trainers': 'C.Maher & D.Eustace', 'Jockey': 'D.Oliver', 'Gear': '3', 'WGT': '57.5kg', 'Price': '$2,700', '800m': '2nd', '400m': '2nd', 'Margin': '5.8L', 'SP': '$1.95'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '7th', 'Horse Entry Number': '1. ', 'Horse Full Name': 'Home By Midnight (NZ)', 'Horse Barrier Number': ' (1)', 'Trainers': 'P.Payne', 'Jockey': 'T.J.Hope', 'Gear': '2', 'WGT': '60kg', 'Price':
'$2,700', '800m': '3rd', '400m': '5th', 'Margin': '6.55L', 'SP': '$16.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '\n ', 'Horse Entry Number': '2. ', 'Horse Full Name': 'Lord Belvedere (GB)', 'Horse Barrier Number': None, 'Trainers': 'C.Maher & D.Eustace', 'Jockey': 'B.J.Melham', 'Gear': '0', 'WGT': '60kg', 'Price': '–', '800m': None, '400m': None, 'Margin': None, 'SP': None}
2021-09-08 22:58:36 [scrapy.core.engine] INFO: Closing spider (finished)
2021-09-08 22:58:36 [scrapy.extensions.feedexport] INFO: Stored csv feed (10 items) in: data1.csv
2021-09-08 22:58:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 839,
'downloader/request_count': 1,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 427762,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 23.855061,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 9, 8, 16, 58, 36, 640162),
'item_scraped_count': 10,
'log_count/DEBUG': 86,
'log_count/INFO': 13,
'log_count/WARNING': 3,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'splash/execute/request_count': 1,
'splash/execute/response_count/200': 1,
'start_time': datetime.datetime(2021, 9, 8, 16, 58, 12, 785101)}
2021-09-08 22:58:36 [scrapy.core.engine] INFO: Spider closed (finished)
Solution
There is no way in scrapy to use two yield method at the same response.
Actually, data is generating from API
calls json response. You can do that easily from backdoor generating data and you can grab data items whatever you want.
Here is the example of working solution:
CODE:
import scrapy
import json
class RaceSpider(scrapy.Spider):
name = 'race'
headers = {
'accept': 'application/json, text/plain, */*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8,es;q=0.7,ar;q=0.6',
'origin': 'https://www.racing.com',
'referer': 'https://www.racing.com/',
'sec-ch-ua': '"Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'}
def start_requests(self):
yield scrapy.Request(
url='https://api.racing.com/v1/en-au/meet/details/5162295/',
callback=self.parse,
method="GET",
headers=self.headers)
def parse(self, response):
response = json.loads(response.body)
for resp in response['raceCollection']:
for res in resp['raceResultsCollection']:
#print(resp)
items = {
'Race Number': resp['raceNumber'],
'Result Distance Thumbnail': resp['distance'],
'Title_name': resp['name'],
'Position':res ['barrierNumber'],
'Horse Full Name': res['horse']['fullName'],
'Jockey': res['jockey']['fullName']
}
yield items
Output:
{'Race Number': 1, 'Result Distance Thumbnail': 2530, 'Title_name': 'TAB Handicap', 'Position': 6, 'Horse Full Name': 'Double You Tee', 'Jockey': 'W.J.Egan'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 1, 'Result Distance Thumbnail': 2530, 'Title_name': 'TAB Handicap', 'Position': 4, 'Horse Full Name': 'Bertwhistle', 'Jockey': 'L.J.Neindorf'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 1, 'Result Distance Thumbnail': 2530, 'Title_name': 'TAB Handicap', 'Position': 2, 'Horse Full Name': 'Flag Edition (NZ)', 'Jockey': 'M.Payne'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 9, 'Result Distance Thumbnail': 1410, 'Title_name': 'Rubaroc Handicap', 'Position': 0, 'Horse Full Name': 'Honorable Mention (NZ)', 'Jockey': 'B.Allen'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 9, 'Result Distance Thumbnail': 1410, 'Title_name': 'Rubaroc Handicap', 'Position': 0, 'Horse Full Name': 'Copper Fox', 'Jockey': 'G.J.Cartwright'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 9, 'Result Distance Thumbnail': 1410, 'Title_name': 'Rubaroc Handicap', 'Position': 0, 'Horse Full Name': 'Muswellbrook', 'Jockey': 'J.Mott'}
2021-09-09 23:22:22 [scrapy.core.engine] INFO: Closing spider (finished)
2021-09-09 23:22:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 605,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 19582,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 5.144949,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 9, 9, 17, 22, 22, 334263),
'httpcompression/response_bytes': 205617,
'httpcompression/response_count': 1,
'item_scraped_count': 100,
... so on
Answered By - Fazlul
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.