Issue
<svg version="1.1" id="Calque_1" xmlns="&ns_svg;" xmlns:xlink="&ns_xlink;" width="700" height="700" viewBox="0 0 300 300" overflow="visible" enable-background="new 0 0 300 300" xml:space="preserve">
<a xlink:href="https://www.pros-locations-de-voitures.fr/location-de-voiture-ain-01/" onmouseover="TipFunction('Ain')" onmouseout="TipFunction('')"><path id="Z1" title="Ain" d="M237.125,152.725l-1.7-1l-2.4,3.3l-2.7,1.6l-2,0.1l-0.2-1.4l-1.6-0.8l-2,2.2l-1.5,0.1v-1.5h-1.5l-2.1-3.9 l-2.5-1.6l-2.7,0.6l-2.9-0.8l-2.9,10.5l-0.8,4l1.5,4.6l1.5-0.3l1.8,2.9l3.2-0.3l3,1l1.5-2.5l1.4-0.4l5.6,7.6l2.9-3.3l1.1-6.8 l-0.4-4.7h1.5l1.3-1.4h-0.1l0.3-2.6l2.8-1.7L237.125,152.725z" fill="red" stroke="#EEEEEE" stroke-width="0.9"></path> </a>
<a xlink:href="https://www.pros-locations-de-voitures.fr/location-de-voiture-aisne-02/" onmouseover="TipFunction('Aisne')" onmouseout="TipFunction('')"><path id="Z2" title="Aisne" d="M179.025,42.325l-6.3,0.4l-0.2,1.8l-1.9,4.1l1.1,3.5l0.2,5.1l-0.3,2.2l1.1,0.9l-1.3,0.6l-1.2,2.8l-1.3,0.8 l1.4,2.3l-1.5-0.1l0.4,1.5l1.2-0.8l1.4,0.6l0.3,1.4l-1.1,0.8l1.3,0.4l0.9,1.2l-0.3,1.4l1.9,2.1l4.7,3l3.8-5.1l-1.3-0.6l0.5-1.4 l-0.8-1.2l2.7-1.1l-1.6-4l0.6-1.4l4-2l2.7,1l0.4-1.5l-0.1-7.1l1.4-0.1l2.5-3.6l-0.7-1.6l0.7-1.7l-0.4-2.9h-0.2l-1.8-0.6v-0.1 l-7.8-2.1l-2.6,0.9l-1.2-0.9L179.025,42.325z " fill="#094353" stroke="#EEEEEE" stroke-width="0.9"></path> </a>
While testing the regex pattern its working fine and matches the links but while applying in code it returning empty list.
import scrapy
class scraper(scrapy.Spider):
name = "scraper"
start_urls = ["https://www.pros-locations-de-voitures.fr/"]
def parse(self, response):
yield {
'Links' : response.selector.re('(?<=xlink:href=").*?(?=")')
}
Solution
The data you are looking for is loaded via javascript so to gain access to the data you will have to pre-render the page using either scrapy-splash
, selenium
or scrapy-playwright
. You can then use below xpath selector to obtain the urls. No need to use regex in this case
response.xpath("//*/@*[name()='xlink:href']").getall()
Answered By - msenior_
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.