Issue
I am new to BS and trying to scrape a results page and access all the results from it which are kept in a list. I am able to access the ordered list that contains all the results, but unsure of how to iterate through all of them and pull all the info I need.
Here is the html for the page im scraping: Page HTML
I am trying to grab the href title, href link, description, and data timestamp. The end result should be in a container that look like this:
{title: ['Disney plus account . - THIEF'], link: [/search/search/redirect?search_term=...], description: ['No description provided'], timestamp: ['June 26, 2023, 8:59 p.m.']}
Here is my code for accessing the list:
result = session.get(url)
soup = BeautifulSoup(result.text, 'html.parser')
ordered_list = soup.find_all('ol', class_ = 'searchResults')
Where should I go from here? How would I go about iterating and pulling info from each result? Any help is appreciated, thanks!
Solution
This is the code for printing the results, with conditional checks for each value if it exists. You can use a list or DataFrame to store the results if you want to save them for later on.
result = session.get(url)
soup = BeautifulSoup(result.text, 'html.parser')
for list_element in soup.find('ol', class_ = 'searchResults').find_all('li'):
print({
'title': list_element.find('h4').text if list_element.find('h4') else '',
'link': list_element.find('a').get('href') if list_element.find('a') else '',
'description': list_element.find('p').text if list_element.find('p') else '',
'timestamp': list_element.find('span', {'class': 'lastSeen'}).get('data-timestamp') if list_element.find('span', {'class': 'lastSeen'}) else '',
})
Answered By - Zero
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.