Issue
Recently, I tried to make some tool to ease myself an apartment search and get only relevant information as soon as possible (the site is not so user-friendly), but I've run into a problem and maybe I'm just home-blind at the moment...or just plain stupid as this is not quite my expertise.
So, anyway. I have a link with filtered out results:
class BostadSpider(scrapy.Spider):
name = "bostadformedlingen"
start_urls = ['https://bostad.stockholm.se/Lista/?s=58.66266&n=59.99899&w=17.07550&e=19.23431&sort=annonserad-fran-desc']
def parse(self, response):
for ad in response.css(
"div.apartment-search-hits > ul.apartment-search-ad-list > li.ad-list__item > a::attr('href')"):
print(ad.get())
And this is structure from website:
<main class="display-flex flex-column search-wrapper u-m-a-0 u-p-a-0" id="main-content">
<div class="row no-gutters search-wrapper__inner">
<div id="apartment-search-hits" class="apartment-search-hits" aria-hidden="false">
<ul id="apartment-search-ad-list" class="ad-list" aria-hidden="false">
<li class="ad-list__item"> <a href="/Lista/Details?aid=190412" class="ad-list__link">
Should I go "a bit more up" and include "main"?
Solution
Actually data is generating from api calls json response. If you make disable javascript then you will see that page go blank meaning the url is dynamic. That's why we can't get data thus way. Here is the working solution:
CODE:
import scrapy
import json
class BostSpider(scrapy.Spider):
name = 'bost'
def start_requests(self):
yield scrapy.Request(
url='https://bostad.stockholm.se/Lista/AllaAnnonser',
method='GET',
callback=self.parse)
def parse(self, response):
resp = json.loads(response.body)
for h in resp:
url = h['Url']
abs_url = response.urljoin(url)
yield {
'URL': abs_url
}
Output:
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190400'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190401'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190360'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190325'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190413'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190412'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190383'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190229'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190230'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190414'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190407'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190432'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190377'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190424'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190291'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190382'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190384'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190356'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190349'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190287'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190399'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190428'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190404'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190368'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190371'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190373'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190390'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190385'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190416'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190396'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190394'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190402'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190359'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190358'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190357'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190265'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190264'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190422'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190420'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190410'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190398'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190429'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190403'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190423'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190417'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190362'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190361'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190387'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190376'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190386'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190391'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190369'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190363'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190409'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190427'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190364'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190378'}
2021-09-20 05:43:29 [scrapy.core.scraper] DEBUG: Scraped from <200 https://bostad.stockholm.se/Lista/AllaAnnonser>
{'URL': 'https://bostad.stockholm.se/Lista/Details?aid=190375'}
... so on
Answered By - Fazlul
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.