Issue
I'm currently working on scraping project that requires some images to extracted from webpages.
I am currently attempting to extract an image from the current website: https://www.princype.com, however, I would like to set a condition that it can extract the first image that is larger than 600x600.
I've previously worked on this page: https://www.abitareco.it/scheda-ADRIANO.html and used the following code:
for website in url_list:
driver.get(website)
element = driver.find_element_by_xpath('/html/body/div/div/div[1]/div[3]/div[1]/div/span[1]/img')
image_list.append(element.get_attribute('src'))
That works fine for the similar same website, but now I am facing websites that are not in the same style and at this moment just would like to get the hyperlink of the first image that meets my condition.
I'd really appreciate any help!
Solution
The general thing you are trying to do should be achievable by doing the following. After you driver.get the website, you might need to wait for all images to load. So you might need to tinker with adding a time.sleep or some sort of wait. Let us assume not for now. Then you want to get all the images as follows:
elements = driver.find_elements_by_tag_name("img")
Now you want to loop over them until you find the first one who is large enough:
for element in elements:
el_width= int(element.get_attribute('width'))
el_height = int(element.get_attribute('height'))
if min(el_width, el_height) > 600:
image_list.append(element.get_attribute('src'))
break
I tested this on abitareco.it, and it works, and the idea is general. It will not find anything big enough on princype.com, whose only actual image with a source is 30 by 30. If you inspect the other pictures on the site, you will not find that they are set up as images (not do they have src attributes).
Answered By - Jeremy Kahan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.