Issue
I am trying to scrape a website with multiple brackets. My plan is to have 3 varbiales (oem, model, leadtime) to generate the desired output. However, I cannot figure out how to scrape this webpage in 3 variables. Given I am new to python and beautfulsoup, I highly appreciate your feedback. Thanks in advance!
Desired output with 3 varibles and the command: print(oem, model, leadtime)
Audi, A1 Sportback, 27 weeks
Audi, A3 Sportback, 27 weeks
...
Volvo, XC90, 27 weeks
Error of code as of now:
AttributeError: 'NavigableString' object has no attribute 'select'
Code as of now:
from bs4 import BeautifulSoup
import requests
response = requests.get("https://www.carwow.co.uk/new-car-delivery-times#gref").text
soup = BeautifulSoup(response, 'html.parser')
for tbody in soup.select('tbody'):
for tr in tbody:
oem = tr.select('td > a')[0].get('href').split('/')[3].capitalize()
model = tr.select('td > a')[0].get('href').split('/')[4].capitalize()
lead_time = tr.select('td')[1].getText(strip=True)
print(oem, model, lead_time)
Solution
Try:
import requests
from bs4 import BeautifulSoup
response = requests.get(
"https://www.carwow.co.uk/new-car-delivery-times#gref"
).text
soup = BeautifulSoup(response, "html.parser")
for tbody in soup.select("tbody"): # for each table
for tr in tbody.select("tr")[1:]: # skip header
brand, leadtime = [
td.get_text(strip=True, separator=" ") for td in tr.select("td")
][:2]
oem, model = brand.split(maxsplit=1)
print("{:<20} {:<20} {}".format(oem, model, leadtime))
Prints:
...
Toyota RAV4 45
Toyota Yaris 18
Volkswagen Golf 41
Volkswagen Golf GTI 45
Volkswagen Golf R 45
Volkswagen Polo 32
Volkswagen T-Cross 23
Volkswagen Tiguan 52
Volkswagen Touareg 52
Volkswagen ID3 52
Volkswagen ID4 52
Volvo V60 27
Volvo V90 27
Volvo XC40 18
Volvo XC60 27
Volvo XC90 27
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.