Issue
I'm new in webscraping with beautiful soup and I have some problems...
Here is my code
from bs4 import BeautifulSoup
import numpy as np
from time import sleep
from random import randint
from selenium import webdriver
page="https://www.acheteralasource.com/producteurs-en-france/all/departement/75/page/1"
driver = webdriver.Chrome()
driver.get(page)
sleep(randint(2,10)) # avoid beeing blocked by IP
soup = BeautifulSoup(driver.page_source, 'html.parser')
my_table = soup.find_all(class_=['companyName', 'presentation','addressCity',\
'addressPostalCode'])
I want to get several informations that are stocked in the targets list below but when I print my table it returnes me an empty list ...
Unfortunately there is no API available for this website..
Any help ?
Solution
There is no api, but the data is in the <script>
tag in json format:
Code:
import requests
from bs4 import BeautifulSoup
import re
import json
import pandas as pd
url = "https://www.acheteralasource.com/producteurs-en-france/all/departement/75/page/1"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
r = requests.get(url, headers=headers)
page_html = r.text
soup = BeautifulSoup(page_html, 'html.parser')
scripts = soup.find_all('script')
script = str(scripts[4])
jsonStr = re.match('.*__APOLLO_STATE__=({.*})' , script).group(1)
jsonData = json.loads(jsonStr)
results = []
for k, v in jsonData.items():
if 'Producer' in k:
categories = v.pop('categories')['json']
addressCoordiantes = v.pop('addressCoordinates')['json']
alpha = v.copy()
alpha.update({'categories':categories,
'addressCoordiantes':addressCoordiantes})
results.append(alpha)
df = pd.DataFrame(results)
df['latitude'], df['longitude'] = zip(*list(df['addressCoordiantes'].values))
Output:
print(df)
id companyName ... latitude longitude
0 168704 La SAUGE ... 2.376545 48.880406
1 168702 La SAUGE ... 2.376545 48.880406
2 164858 cfraisclivre ... 2.256519 48.822534
3 144464 Le Rucher de Sainte Aulde ... 2.434072 48.871433
4 169286 Cultures en ville ... 2.269878 48.830487
5 136834 Bell'Abeille ... 2.349333 48.883017
6 238911 Les Ruchers de Montreuil ... 2.442508 48.862581
7 238678 Miel OTT ... 2.357560 48.870498
8 168791 BienElevées - Maison d'agriculture urbaine ... 2.325042 48.858802
9 114454 Dobreiu Nicolae ... 2.345627 48.888822
10 169233 Famille Herbelin Apiculture ... 2.306824 48.822076
11 169495 APIS CIVI ... 2.352293 48.887543
12 169394 Association Les Ruches POP ... 2.371658 48.879516
13 38919 Télé Sapin ... 2.341048 48.862797
14 28430 La Ferme Parisienne ... 2.354042 48.887165
15 28428 I.T.A.V.I (Institut Technique Aviculture) ... 2.322347 48.876362
16 18815 Maryse Gaitelli Duc de Brabant ... 2.330939 48.897045
17 18810 LES VIGNERONS DE CARNAS ... 2.351173 48.835880
18 18808 Les Domaines Qui Montent ... 2.303042 48.881824
19 18807 Le Nez Rouge ... 2.302591 48.847828
20 18806 La Maison du Vin et des Vignobles ... 2.293603 48.886036
21 18803 Jambon-Chanrion Paul ... 2.321920 48.859814
22 18798 Domaine Les Roques De Cana ... 2.468187 48.831852
23 18797 Domaine Clarence Dillon (SA) ... 2.300900 48.870148
24 18795 Château Margaux ... 2.303760 48.865993
25 18792 Champagne Louis Roederer ... 2.322103 48.871593
26 18788 Bristol ... 2.289587 48.871471
27 18787 Borie-Manoux ... 2.294968 48.880180
28 18785 BOCQUILLON (SA) ... 2.303881 48.885979
29 18780 Vignerons de Paris (Les) ... 2.386668 48.855598
30 18779 Versein et Minvielle (Sté) ... 2.336202 48.867680
31 18778 V 3 ... 2.339331 48.856140
32 18777 Travers Marie ... 2.321962 48.888638
33 18776 Tour des Chênes (SARL) ... 2.340296 48.839760
34 18775 Société Des Domaines ... 2.345637 48.838497
35 18772 RDVINS ... 2.375235 48.857235
36 18771 Quié Jean-Michel ... 2.407137 48.825703
37 18770 Pavillon des Vins ... 2.392453 48.826790
38 5742 Fromageries Bel ... 2.320089 48.871593
39 4994 Matines (SA) ... 2.414774 48.867092
40 2744 Damolini Bonduelle ... 2.292681 48.894718
41 951 Kanabou ... 2.241684 48.832230
42 154464 Nuage Sauvage ... 2.385420 48.870101
43 76256 LES ABEILLES ... 2.349466 48.827641
44 76253 L'Abeille de France ... 2.320717 48.880245
45 76247 Au Miel ... 2.317932 48.879715
46 76244 Un apiculteur pres de chez vous ... 2.406641 48.859982
47 76231 Maison du Miel ... 2.326399 48.871696
[48 rows x 11 columns]
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.