Issue
I have been trying to extract property id from the following website: https://www.kwsouthafrica.co.za/Property/RouteUrl?ids=P22%2C&ForSale=ForSale&PropertyTypes=&Beds=Any&Baths=Any&MinPrice=Any&MaxPrice=Any
But whichever combination I try to use I can't seem to retrieve it.
Property id is located here:
<div class="corner-ribbon">
<span class="ribbon-green">NEW!</span>
</div>
<a href="Details?id=182519" title="view this property">
<img class="img-responsive img-prop" src="https://kwsadocuments.blob.core.windows.net/devblob/24c21aa4-ae17-41d1-8719-5abf8f24c766.jpg" alt="Living close to Nature">
</a>
And here is what I have tried so far:
response.xpath('//a[@title="view this property"]/@href').getall(),
response.xpath('//*[@id="divListingResults"]/div/div/a/@href').getall(),
response.xpath('//*[@class="corner-ribbon"]/a/@href').getall()
Any suggestion on what I might be doing wrong? Thank you in advance!
Solution
First you need to understand how this page works. It loads properties using Javascript (check page source in your browser using Ctrl+U
) and (as you know) Scrapy can't process Javascript.
But if you check page source you'll find that all information your need is "hidden" inside <input id="propertyJson" name="ListingResults.JsonResult" >
tag. So all you need to get that value
and process it using json
module:
import scrapy
import json
class PropertySpider(scrapy.Spider):
name = 'property_spider'
start_urls = ['https://www.kwsouthafrica.co.za/Property/RouteUrl?ids=P22%2C&ForSale=ForSale&PropertyTypes=&Beds=Any&Baths=Any&MinPrice=Any&MaxPrice=Any']
def parse(self, response):
property_json = response.xpath('//input[@id="propertyJson"]/@value').get()
# with open('Samples/Properties.json', 'w', encoding='utf-8') as f:
# f.write(property_json)
property_data = json.loads(property_json)
for property in property_data:
property_id = property['Id']
property_title = property['Title']
print(property_id)
print(property_data)
Answered By - gangabass
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.