Issue
Hello i am trying to learn python and working with web scraping, as i run the web scrape it print several times the same line in the output(because the web site has same products in several different pages) so my question is there a way that i can only print the product once even if its in several pages? Output picture
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.lamazuna.com/en/")
soup = BeautifulSoup(page.content, "html.parser")
all_product_item_lists = soup.find_all(class_="col-sm-12 mega-col")
for product_link in all_product_item_lists:
for link in product_link.find_all("a", href=True):
find_product_url = link.get('href')
next_page = requests.get(find_product_url)
next_soup = BeautifulSoup(next_page.content, "html.parser")
product_name_none = next_soup.find(class_="h3 product-title")
product_price_none = next_soup.find(class_="price")
if product_name_none is not None:
product_name_several_times = product_name_none.get_text().replace("\n","")
if product_price_none is not None:
product_price = product_price_none.get_text().replace("\n","")
print(product_name_several_times)
Solution
You can create a title list then add title in it like this.
from bs4 import BeautifulSoup
import requests
titles = []
page = requests.get("https://www.lamazuna.com/en/")
soup = BeautifulSoup(page.content, "html.parser")
all_product_item_lists = soup.find_all(class_="col-sm-12 mega-col")
for product_link in all_product_item_lists:
for link in product_link.find_all("a", href=True):
find_product_url = link.get('href')
next_page = requests.get(find_product_url)
next_soup = BeautifulSoup(next_page.content, "html.parser")
product_name_none = next_soup.find(class_="h3 product-title")
product_price_none = next_soup.find(class_="price")
if product_name_none is not None:
product_name_several_times = product_name_none.get_text().replace("\n","")
if product_name_several_times not in titles:
titles.append(product_name_several_times)
print(f'"{product_name_several_times}" is added in list')
if product_price_none is not None:
product_price = product_price_none.get_text().replace("\n","")
print(titles)
Answered By - Samsul Islam
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.