Issue
Usually I can pick up all the href but my scirpt doesn't scrape anything and I cannot figure it why ?
Here's my script :
import warnings
warnings.filterwarnings("ignore")
import re
import json
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
url = "https://www.frayssinet-joaillier.fr/fr/marques/longines"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
#get the links
all_title = soup.find_all('a', class_ = 'prod-item__container')
data_titles = []
for title in all_title:
try:
product_link = title['href']
data_titles.append(product_link)
except:
pass
print(data_titles)
data = pd.DataFrame({
'links' : data_titles
})
data.to_csv("testlink.csv", sep=';', index=False)
Here's the html :
It seems that soup.find_all('a', class_ = 'prod-item__container')
shoudl work but it doesn't.
Any ideas why ?
Solution
Use some headers
in your request to get the content - Some sites provide different responses based on user-agent
to avoid scraping or crawling - read more:
headers = {'User-Agent': 'Mozilla/5.0'}
url = "https://www.frayssinet-joaillier.fr/fr/marques/longines"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
Example
headers = {'User-Agent': 'Mozilla/5.0'}
url = "https://www.frayssinet-joaillier.fr/fr/marques/longines"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
#get the links
all_title = soup.find_all('a', class_ = 'prod-item__container')
data_titles = []
for title in all_title:
try:
product_link = title['href']
data_titles.append(product_link)
except:
pass
print(data_titles)
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.