Issue
I should do some web scraping of multiple glovo sites, the information I need to extract is the header of the various products and the price. I have tried using the BeautifulSoup library, but the information I need to extract is quite nested. I set up the code like this:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
def filter_price(price):
if price[0] == '€':
return price[1:]
else:
return price[0:len(price)-1]
return
df_siti_1 = pd.read_excel(".../Link glovo KFC.xlsx", header=None)
siti_1=df_siti_1[0].tolist()
#Automatic code for gathering the lenghts
len_menus = []
siti_1_filtered = []
iterator=0
for url in siti_1:
iterator+=1
print(f"I am at:{100*iterator/len(siti_1)}%", end="\r")
result = requests.get(url)
doc = BeautifulSoup(result.content, "html.parser")
menus = doc.findAll("div", class_="product-row") #HERE THE PROBLEM
if len(menus) > 0:
##save length and corresponding URL, drop the others
len_menus.append(len(menus))
siti_1_filtered.append(url)
tuples = [(idx, i) for d, idx in zip(len_menus, siti_1_filtered) for i in range(1, d+1)]
multi_idx = pd.MultiIndex.from_tuples(tuples, names=["URL", "nid"])
##Initialize the dataframe
df_new_1 = pd.DataFrame(index=multi_idx, columns=["name", "price"])
for url in siti_1_filtered:
print(url, end="\r")
#print(result)
result = requests.get(url)
doc = BeautifulSoup(result.content, "html.parser")
menus = doc.findAll("div", class_="product-row")
db_name=[]
db_price=[]
for menu in menus:
name = menu.find("p", class_="product-row__name") #HERE THE PROBLEM
price = menu.find("span", class_="product-price__effective product-price__effective--new-card") #HERE THE PROBLEM
#print(name.text + " > " + price.text)
db_name.append(name.text)
db_price.append(price.text)
#df_new.loc[url, 'name'] = pd.Series(db_name)
#df_new.loc[url, 'price'] = pd.Series(db_price)
## Once you filter, no problem
price_float = []
for pr in db_price:
price_float.append(filter_price(pr))
df_new_1.loc[url, 'name'] = db_name
#df_new.loc[url, 'price'] = db_price
df_new_1.loc[url, 'price'] = price_float
#the code continues...
Basically, I cannot find the correct html code, am I doing something wrong due to the fact that the code presents a '<span
I am inserting here a link where I can do some testing: https://glovoapp.com/it/it/leini-settimo-torinese/kfc-lst/
The end result should be a database with title and price information, for example: Box Meal Colonel's Burger -> 15,50 Menu Colonel's Burger -> 11,90 €
From here I can't go any further, I hope I was clear with respect to what I need to do, in case anyone can solve the problem I would be very grateful
Solution
You can try:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://glovoapp.com/it/it/leini-settimo-torinese/kfc-lst/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = []
for p in soup.select('[type="product_row"]'):
name, desc, price = p.get_text(strip=True, separator="|||").split("|||")
data.append((name, desc, price))
df = pd.DataFrame(data, columns=["Name", "Description", "Price"]).drop_duplicates()
print(df.head())
Prints:
Name | Description | Price |
---|---|---|
Box Meal Colonel's Burger | Il Box Meal contiene un Colonel's Burger (bacon, formaggio, insalata, salsa barbecue, maionese e un irresistibile filetto Original Recipe), una porzione di Pop Corn Chicken, un contorno speciale, patatine e bibita a scelta. Così buono che il Colonnel... | 15,50 € |
Menu Colonel's Burger | Menu con un contorno a scelta, una bibita e il Burger del Colonnello. Bacon, formaggio, insalata, salsa barbecue, maionese e un irresistibile filetto Original Recipe, preparato secondo la sua leggendaria ricetta, con il mix segreto di 11 erbe e spezie. | 11,90 € |
Menu Double Krunch | Menu con un contorno a scelta, una bibita e il Double Krunch, che raddoppia la sua croccantezza con 2 Tenders Crispy! | 10,90 € |
Box Meal Zinger | Box con Zinger (burger dal delizioso filetto di pollo con panatura Hot & Spicy, lattuga, cheddar e doppio strato di maionese), 2 Hot Wings, patatine, bibita e una pannocchia . Only the brave! | 14,90 € |
Box Meal All Star | Box con Double Krunch, un Tender Crispy, pannocchia, patatine e bibita a scelta. | 14,50 € |
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.