Issue
Im trying to learn to webscrape but cant get the code right any suggestions?
I want it to return the name of product and price but it returns
Amira Spandex Abaya <span class="product-price">579 kr</span>
Selena Abaya <span class="product-price">579 kr</span>
....
I tried change the code to:
in | out | error |
---|---|---|
price=product.find("span", class_="product-price").span.text | - | 'NoneType' object has no attribute 'text' |
price=product.find("span", class_="product-price").text | Amira Spandex Abaya 579 kr | (what I want but doesn´t return all products) |
print(title, price.text) | Amira Spandex Abaya 579 kr | (what I want but doesn´t return all products) |
my current code is
#pip install bs4 & requests
from bs4 import BeautifulSoup as bs
import requests
#Load webpage to scrape
url = "https://www.tahara.se/dam/abaya"
source = requests.get (url)
#convert bs4 object
soup = bs(source.text,"html.parser")
#print out HTML
#print(soup.prettify())
#Run code
products = soup.findAll("div", class_ ="col-md-4 col-12 product")
#print(len(products))
for product in products:
title=product.find("h2", class_="producttitle-font-size").a.text
price=product.find("span", class_="product-price")
print(title, price)
trying to extract from
<div class="col-md-4 col-12 product" data-pid="6792" data-s-price="579.00" data-s-title="Amira Spandex Abaya" data-s-sortcount="1">
<div class="position-relative text-left">
<h2 class="producttitle-font-size">
<a class="color-text-base" href="/dam/amira-spandex-abaya">Amira Spandex Abaya</a>
</h2>
<span class="opacity-7 text-base">
<span class="product-price">579 kr</span>
</span>
</div>
</div>
Solution
You have almost working code, just get a text from the price tag (if exists):
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
url = "https://www.tahara.se/dam/abaya"
source = requests.get (url)
soup = bs(source.text,"html.parser")
products = soup.findAll("div", class_ ="col-md-4 col-12 product")
all_data = []
for product in products:
title=product.find("h2", class_="producttitle-font-size").a.text
price=product.find("span", class_="product-price")
if price:
price = price.text.replace(' kr', '')
else:
price = '-'
all_data.append((title, price))
df = pd.DataFrame(all_data, columns=['name', 'price'])
print(df)
Prints:
name price
0 Amira Spandex Abaya 579
1 Selena Abaya 579
2 Senada Kimono 789
3 Saliha Abaya 884
4 Simone Abaya | Lila 429
5 Simone Abaya | Oliv 429
6 Simone Abaya | Svart -
7 Zoya Abaya 1 469
8 Farah Abaya 589
9 Naila Abaya | Black 589
10 Eshaal Abaya | Marinblå -
11 Eshaal Abaya | Svart 879
12 Amara Abaya | Mullvad 879
13 Amara Abaya | Svart 879
14 Salina Abaya Svart 429
15 Salina Abaya Lila 429
16 Inaya Abaya -
17 Salina Abaya Mörkgrå 429
18 Salina Abaya Beige 429
19 Salina Abaya Marinblå 429
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.