Issue
Trying to learn something today and doing a bit of scrapping.
I am trying to list product names and corresponding image URLs into a spreadsheet.
I managed to store the names but the images dont seem to work. Hopefully you can help!
Here is the code I use for extracting the text:
results[0].find('p', {'class': 'product-card__name'}).get_text()
Here is what I thought would extract the image:
results[0].find('img', {'class':'product-card__image'}).get_src()
This is obvioulsy not working.Returning that "'NoneType' object is not callable"
Any pointers?
For reference, below is the source I am trying to scrape.
<li class="product-grid__item"><a href="/p/63818/bumbu-the-original-rum-glass-pack" class="product-card" title=" Bumbu The Original Rum Glass Pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])"><div class="product-card__image-container"><img src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" alt="Bumbu The Original Rum Glass Pack" class="product-card__image" loading="lazy" width="3" height="4"></div><div class="product-card__content"><p class="product-card__name"> Bumbu The Original Rum<span class="product-card__name-secondary">Glass Pack</span></p><p class="product-card__meta"> 70cl / 40% </p></div><div class="product-card__data"><p class="product-card__price"> £39.95 </p><p class="product-card__unit-price"> (£57.07 per litre) </p></div></a></li>
Solution
To grab the image url, you have to call .get('src')
instead of .get_src()
results[0].find('img', {'class':'product-card__image'}).get('src')
Example:
html='''
<li class="product-grid__item">
<a class="product-card" href="/p/63818/bumbu-the-original-rum-glass-pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])" title=" Bumbu The Original Rum Glass Pack">
<div class="product-card__image-container">
<img alt="Bumbu The Original Rum Glass Pack" class="product-card__image" height="4" loading="lazy" src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" width="3"/>
</div>
<div class="product-card__content">
<p class="product-card__name">
Bumbu The Original Rum
<span class="product-card__name-secondary">
Glass Pack
</span>
</p>
<p class="product-card__meta">
70cl / 40%
</p>
</div>
<div class="product-card__data">
<p class="product-card__price">
£39.95
</p>
<p class="product-card__unit-price">
(£57.07 per litre)
</p>
</div>
</a>
</li>
'''
from bs4 import BeautifulSoup
soup=BeautifulSoup(html, "html.parser")
#print(soup.prettify())
print(soup.find('img', {'class':'product-card__image'}).get('src'))
Output:
https://img.thewhiskyexchange.com/480/rum_bum4.jpg
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.