Friday, December 29, 2023

[FIXED] Scrape googlemaps location to get lat lon from a website

December 29, 2023 python-3.x, scrapy, web-scraping No comments

Issue

I want to scrape this page "https://www.yaencontre.com/alquiler/pisos/barcelona" to get the price, latitude and longitude from every apartment.

I'm able to get the price, but not the latitude and longitude.

Here's my attempt

#!/usr/local/bin/env python3
import scrapy
from scraping.items import fields
import pandas as pd
import re

n = 'barcelona'
list_of_urls = []
for i in range(1,2):
    url = 'https://www.yaencontre.com/alquiler/pisos/barcelona/pag-{}'.format(i)
    to_append = [url]
    for j in to_append:
        list_of_urls.append(j)

class scraperApp(scrapy.Spider):
    name = n
    start_urls = list_of_urls
    
    def parse(self,response):
        for href in response.xpath("//a[@class= 'd-ellipsis']/@href"):
            u = 'https://www.yaencontre.com'+ href.extract()    
            print(u)
            yield scrapy.Request(u, callback=self.parse_dir_contents)                 

    def parse_dir_contents(self,response):
        if(response):         
            
            item = fields() 
            item['vivienda'] = n
            item['price'] = response.xpath("//div[@class='price-wrapper mb-sm']/span").extract_first()
            item['lat'] = response.xpath ('substring-after(substring-before(//img[@class="d-block"]/@src,"%2C"),"=")').extract()
            item['lon'] = response.xpath ('substring-after(substring-before(//img[@class="d-block"]/@src,"&zoom"),"%2C")*1').extract()
            print(item)
            yield item  
        else:
            pass

Solution

Data is dynamically loaded by JavaScript from hidden API as GET method. So you can easily grab your required data from API response. Below is given working solution as an example.

import scrapy
import json
 
class TestSpider(scrapy.Spider):
    name = "test"
       
    headers = {
        'USER_AGENT' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
        }
   
    def start_requests(self):
        yield scrapy.Request(
            url="https://api.yaencontre.com/v3/searchmap?family=FLAT&lang=es&latMax=45.58873524958013&latMin=37.00876645649905&location=barcelona&lonMax=7.707878748840145&lonMin=-11.628058751159855&operation=RENT&orderBy=RELEVANCE&size=200",
            callback= self.parse,
            method= "GET",
            headers= self.headers

        )
    def parse(self, response):
      
        json_response = json.loads(response.text)
        res = json_response["result"]["items"]
        for item in res:
            yield {
                'lat': item['lat'],
                'lot': item['lon'],
                'price': item['price']
            }

Output:

{'lat': 41.388501578993406, 'lon': 2.1665850093524353, 'price': 4800}
2023-03-22 00:30:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.yaencontre.com/v3/searchmap?family=FLAT&lang=es&latMax=45.58873524958013&latMin=37.00876645649905&location=barcelona&lonMax=7.707878748840145&lonMin=-11.628058751159855&operation=RENT&orderBy=RELEVANCE&size=200>
{'lat': 41.39807929851551, 'lon': 2.1822061506785406, 'price': 1890}
2023-03-22 00:30:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.yaencontre.com/v3/searchmap?family=FLAT&lang=es&latMax=45.58873524958013&latMin=37.00876645649905&location=barcelona&lonMax=7.707878748840145&lonMin=-11.628058751159855&operation=RENT&orderBy=RELEVANCE&size=200>
{'lat': 41.380495743458205, 'lon': 2.1555876168586092, 'price': 2650}
2023-03-22 00:30:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.yaencontre.com/v3/searchmap?family=FLAT&lang=es&latMax=45.58873524958013&latMin=37.00876645649905&location=barcelona&lonMax=7.707878748840145&lonMin=-11.628058751159855&operation=RENT&orderBy=RELEVANCE&size=200>
{'lat': 41.38348724393205, 'lon': 2.1584030638883083, 'price': 2600}
2023-03-22 00:30:25 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.yaencontre.com/v3/searchmap?family=FLAT&lang=es&latMax=45.58873524958013&latMin=37.00876645649905&location=barcelona&lonMax=7.707878748840145&lonMin=-11.628058751159855&operation=RENT&orderBy=RELEVANCE&size=200>
{'lat': 41.37912095360161, 'lon': 2.1722173322787586, 'price': 2990}
2023-03-22 00:30:25 [scrapy.core.engine] INFO: Closing spider (finished)
2023-03-22 00:30:25 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 538,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 135736,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 2.20321,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2023, 3, 21, 18, 30, 25, 752063),
 'item_scraped_count': 200

... so on

Answered By - Md. Fazlul Hoque

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, December 29, 2023

[FIXED] Scrape googlemaps location to get lat lon from a website

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels