Issue
I am trying to scrape some elements on this page:
I would like to scrape the link of the image in the article. Here is the part of the html where the image's link can be found:
<figure class="lead-art-wrapper"><div><div class="sc-ckMVTt hVOpns"><img src="https://www.liberation.fr/resizer/Kmpp6T1oKcLS4NfCHPYuP-bPGMk=/1024x0/filters:format(jpg):quality(70)/cloudfront-eu-central-1.images.arcpublishing.com/liberation/QGDR2IJDFAWHBV35O7NBAJONJI.jpg" width="1024px" height="0px" class="sc-GVOUr jdlgMc"></div></div><figcaption><p class="ImageMetadata__MetadataParagraph-sc-1gn0vty-0 dkGqa-d image-metadata"><span>Peu après minuit, les premiers résultats négatifs parviennent au Luna Park, stade couvert de Buenos Aires, où sont rassemblés les partisans de la présidente Cristina Kirchner. </span>(JUAN MABROMATA/AFP)</p></figcaption></figure>
Using the scrapy shell I am not able to select the link of the image:
response.css('div.sc-ckMVTt img::attr(src)')
Even doing :
response.css('img')
I only get the logo of the website. Could you let me know how can I scrape the url of the image? I need to use CSS selector as I would like to select multiple pages and XPATH would not be convenient.
Thank you very much,
Solution
Your image is rendered by Javascript. You can check HTML source code (Ctrl+U) to find that above markup doesn't exist in the raw HTML.
Unfortunately, Scrapy can't execute Javascript and you need to parse your image path from JSON-like object in Fusion.globalContent
string.
Answered By - gangabass
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.