Issue
I am trying to scrape the website https://www.eu-startups.com/directory/wpbdp_category/austrian-startups, to get the listed information for austrian startups. The information I would like to scrape is per startup: where is it based, the tags listed and the foundation date. I am using Beautiful soup, but I have no idea how to excess this information. Right now I am able to retrieve the class .listing title, where I get the name of the startup. The problem is that I don´t know how to navigate within the .listing-details class, where the rest of the information is listed.
The current code that i am using is:
import bs4 import requests
result = requests.get('https://www.eu-startups.com/directory/wpbdp_category/austrian-startups') content = bs4.BeautifulSoup(result.text,'lxml') content.select('.listing-details')[0]
The output is:
<div class="listing-details">
<div class="wpbdp-field-display wpbdp-field wpbdp-field-value field-display field-value wpbdp-field-business_name wpbdp-field-title wpbdp-field-type-textfield wpbdp-field-association-title"><span class="field-label">Business Name</span> <div class="value"><a href="https://www.eu-startups.com/directory/shopstory/" target="_self">Shopstory</a></div></div> <div class="wpbdp-field-display wpbdp-field wpbdp-field-value field-display field-value wpbdp-field-category wpbdp-field-category wpbdp-field-type-select wpbdp-field-association-category"><span class="field-label">Category</span> <div class="value"><a href="https://www.eu-startups.com/directory/wpbdp_category/austrian-startups/" rel="tag">Austria</a></div></div> <div class="wpbdp-field-display wpbdp-field wpbdp-field-value field-display field-value wpbdp-field-based_in wpbdp-field-meta wpbdp-field-type-textfield wpbdp-field-association-meta"><span class="field-label">Based in</span> <div class="value">Vienna</div></div> <div class="wpbdp-field-display wpbdp-field wpbdp-field-value field-display field-value wpbdp-field-tags wpbdp-field-meta wpbdp-field-type-textfield wpbdp-field-association-meta"><span class="field-label">Tags</span> <div class="value">Artificial Intelligence, E-Commerce, Marketing Automation, SaaS</div></div> <div class="wpbdp-field-display wpbdp-field wpbdp-field-value field-display field-value wpbdp-field-founded wpbdp-field-meta wpbdp-field-type-select wpbdp-field-association-meta"><span class="field-label">Founded</span> <div class="value">2020</div></div>
</div>
How can I access the other tags (based in, tags and founded)?
Solution
Try:
import requests
from bs4 import BeautifulSoup
url = "https://www.eu-startups.com/directory/wpbdp_category/austrian-startups/page/1/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for l in soup.select(".wpbdp-listing"):
title = l.a.text
based = l.select_one("span:-soup-contains(Based) + div").text
tags = l.select_one("span:-soup-contains(Tags) + div").text.split(", ")
founded = l.select_one("span:-soup-contains(Founded) + div").text
print(title, based, founded)
print(tags)
print()
Prints:
Shopstory Vienna 2020
['Artificial Intelligence', 'E-Commerce', 'Marketing Automation', 'SaaS']
Tubics Vienna 2017
['Advertising', 'SaaS', 'Software', 'Video', 'VideoEditing']
25superstars Vienna 2020
['content creator', 'social media']
myCulture GmbH Vienna 2022
['CultTech', 'marketplace', 'big data']
And-Less Wien 2022
['Packaging', 'Plastic waste', 'Circular economy', 'Sustainable']
heyqq – ask away Vienna 2022
['audio', 'social', 'app']
NXRT Wien 2022
['Artificial Intelligence', 'Automotive', 'Autonomous Vehicles', 'Education', 'Enterprise Software', 'Information Technology', 'Railroad', 'Software', 'Software Engineering']
ReDev Vienna 2022
['Information Technology', 'Recruiting', 'SaaS', 'Software']
Revitalyze Innsbruck 2022
['Building Material', 'Green Building', 'Logistics', 'Marketplace', 'Recycling', 'Waste Management']
Coachfident Vienna 2022
['coaching', 'personal development', 'career coaching']
Goddard – Discovery Hagenberg 2022
['Artificial Intelligence', 'Machine Learning', 'Application Development']
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.