The classes on the website mean I cannot specify the sentences that I want to save.
This is the website: I'm using python. I would like to save the data in an ideal world like this:
name = soup.find('h1', class_="ride-name").text.strip()
queue = soup.find('span', class_="wait-time").text.strip()
reservation = soup.find('span', class_="reservation-time").text.strip()
( I made these class names up)
But I cannot work out how to use the classes to get what I want. This is the ride names, the queue times and the availability of reservation slots.
This is what I have tried, but I have not been successful.
import requests
from bs4 import BeautifulSoup
import csv
url = ""
html = requests.get(url).text
soup = BeautifulSoup(html, "lxml")
rides = soup.find_all(class_="has-text-weight-normal")
output = []
for element in rides:
with open('input.csv', 'w', encoding="utf-8") as f:
writer = csv.writer(f)
import pandas as pd
pd.read_csv('input.csv', header=None).T.to_csv('output.csv', header=False, index=False)
The output looks like this:
["A Pirate's Adventure ~ Treasures of the Seven Seas"]
['Jungle Cruise']
['↳ No reservation slots currently available']
['Pirates of the Caribbean']
['↳ Reservation slots available for 20:45']
['Swiss Family Treehouse']
['The Magic Carpets of Aladdin']
['↳ Reservation slots available for 20:20']
In the end I am aiming for something like this:
Ride | Queue Time | Reservation Time |
Jungle Cruise | x mins | 00:00 |
Pirates of the Caribbean | y mins | 00:00 |
If you know what to do next that would be appreciated. I know this website has an API but the reservation slots are't included and I want that data as well.
Here is one option :
import re
import requests
from collections import defaultdict
from bs4 import BeautifulSoup
import pandas as pd
url = ""
soup = BeautifulSoup(requests.get(url).text, "html.parser")
data = defaultdict(list)
for ride in soup.find_all("a", {"class": "panel-block"}):
rn = ride.find("span", {"class": "has-text-weight-normal"}).text.strip()
qt_tag = ride.find("span", {"class": re.compile("has-text-dark-(.*)")})
qt = qt_tag.text.strip() if qt_tag else None
rt_tag = ride.find("span", {"class": "has-text-grey"})
rt = rt_tag.text.strip() if rt_tag else None
df = (pd.DataFrame(data)
.assign(Reservation_Time= lambda x: x["Reservation_Time"]
.str.extract(r"(\d{2}:\d{2})$", expand=False).shift(-1))
Output :
Ride Queue Time Reservation Time
0 Jungle Cruise 70 mins NaN
1 Pirates of the Caribbean 30 mins 21:45
2 Swiss Family Treehouse 5 mins None
.. ... ... ...
30 Tomorrowland Speedway 25 mins NaN
31 Tomorrowland Transit Authority PeopleMover 20 mins None
32 Walt Disney's Carousel of Progress 5 mins None
[42 rows x 3 columns]
Answered By - Timeless
Post a Comment
Note: Only a member of this blog may post a comment.