Issue
I am trying to scrape Feature Image using scrapy in python but its giving result " data:image/svg+xml; .... " instead of Image src can any one help me to fix this and explain me why I am getting this "data:image/svg+xml;charset=utf-8,%3Csvg%20xmlns%3D'http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg'%20viewBox%3D'0%200%20600%20400'%2F%3E"
instead of image src.
Here is my Code. class NewsSpider(scrapy.Spider): name = "cruiseradio"
def start_requests(self):
url = input("Enter the article url: ")
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
try:
Feature_Image = [i.strip() for i in response.css('img[class*="wp-image-150660 sp-no-webp"] ::attr(src)').getall()][0]
except IndexError:
Author = "NULL"
yield{
'Feature_Image': Feature_Image,
}
Here is URL of the site. https://cruiseradio.net/new-expedition-ship-delivered-atlas-ocean-voyages/
Solution
Because that particular attribute is filled dynamically. Use the data-origin-src
attribute instead.
Feature_Image = [i.strip() for i in response.css('img[class*="wp-image-150660 sp-no-webp"] ::attr(data-origin-src)').getall()][0]
Result:
{'Feature_Image': 'https://cdn.cruiseradio.net/wp-content/uploads/2022/10/Atlas_Ocean_Voyages-1030x687.jpeg'}
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.