Issue
Im a newby to scrapy and Im having dificulties extracting the price but not the name using the code below. Any idea what Im doing wrong to get the price? Thank you!
This is the code:
import scrapy
class BfPreciosSpider(scrapy.Spider):
name = 'BF_precios'
allowed_domains = ['https://www.boerse-frankfurt.de']
start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']
def parse(self, response):
what_name=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[1]/div/app-widget-datasheet-header/div/div/div/div/div[1]/div/h1/text()').extract_first()
what_price=response.xpath('/html/body/app-root/app-wrapper/div/div[2]/app-bond/div[2]/div[3]/div[1]/font/text()').extract_first()
yield{'name': what_name , 'price': what_price}
And these are the items(in red) - name and price:
Solution
The name
information is available directly on the page but the price
information is obtained from an api. If you investigate the Network traffic you will find an api call that returns the price information. See below example of how you could obtain this data.
import scrapy
from time import time
class RealtorSpider(scrapy.Spider):
name = 'BF_precios'
allowed_domains = ['boerse-frankfurt.de']
custom_settings = {
'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36'
}
start_urls = ['https://www.boerse-frankfurt.de/anleihe/xs1186131717-fce-bank-plc-1-134-15-22']
def parse(self, response):
item = {}
current_time = int(time())
name = response.xpath('//h1/text()').get()
isin = response.xpath("//span[contains(text(),'ISIN:')]/text()").re_first(r"ISIN:\s(.*)$")
mic = response.xpath("//app-widget-index-price-information/@mic").get()
api_url = f"https://api.boerse-frankfurt.de/v1/tradingview/lightweight/history/single?\
resolution=D&isKeepResolutionForLatestWeeksIfPossible=false\
&from={current_time}&to={current_time}&isBidAskPrice=false&symbols={mic}%3A{isin}"
item['name'] = name
item['isin'] = isin
item['mic'] = mic
yield response.follow(api_url, callback=self.parse_price, cb_kwargs={"item": item})
def parse_price(self, response, item):
item['price'] = response.json()[0]['quotes']['timeValuePairs'][0]['value']
yield item
Running the above spider will yield a dictionary similar to the below
{'name': 'FCE Bank PLC 1,134% 15/22', 'isin': 'XS1186131717', 'mic': 'XFRA', 'price': 99.955}
Answered By - msenior_
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.