Issue
What I want to do is to be able to go and scrape all the makes of cars off the Wikipedia page for each of the countries and be able to get them into a dictionary.
https://en.wikipedia.org/wiki/List_of_car_brands
Right now I am currently using BeautifulSoup and found that each of these cars lies within a ul and li tag but I am not sure how to specifically get them specifically. Where the cars come from isn't really relevant and I just want to be able to get all of them into a list.
I have the current code that is able to get all of the ul tags but not sure how to go about specifying it in a better way.
# Import packages
from urllib.request import urlopen
from bs4 import BeautifulSoup
# Specify url of the web page
source = urlopen('https://en.wikipedia.org/wiki/List_of_car_brands').read()
# Make a soup
soup = BeautifulSoup(source,'lxml')
for link in soup.find_all("li"):
print(link)
Any direction or idea of what I can do would be much appreciated. Cheers!
Solution
Flat list with cars
Select all <li>
that are next siblings of a <h2>
get its texts and slice by the first list entry of 'See also' section:
(x:=[x.text.split('[')[0]
for x in soup.select('h2~ul li')
])[:x.index('Timeline of motor vehicle brands')]
Output
['Zanella (1948–present)', 'Anasagasti (1911–1915)', 'Andino (1967–1973)', 'ASA (1961– 1969)', 'Eniak (1983–1989)', 'Hispano-Argentina (1925–1953)', 'Industrias Aeronáuticas y Mecánicas del Estado (IAME, Mechanical Aircraft Industries of the State, 1951–1979), not to be confused with Italian American Motor Engineering', 'Industrias Kaiser Argentina (IKA, 1956–1975), United Kingdom', 'Alpha Sports (1963–present)', 'Arrow (1963–present)',...]
List of dicts with cars and country
A bit more to do but nearly the same - Select all <h2>
that has <ul>
as next siblings and iterate over their next siblings while the next <h2>
follows:
data = []
for h in soup.select('h2:has(~ul)'):
cars = []
for tag in h.next_siblings:
if tag.name == 'ul':
for x in tag.text.split('\n'):
cars.append(x.split('[')[0])
elif tag.name == 'h2':
break
if 'See also' not in h.text:
data.append({
'country':h.text.split('[')[0],
'cars': cars
})
print(data)
Output
[{'country': 'Argentina', 'cars': ['Zanella (1948–present)', 'Anasagasti (1911–1915)', 'Andino (1967–1973)', 'ASA (1961– 1969)', 'Eniak (1983–1989)', 'Hispano-Argentina (1925–1953)', 'Industrias Aeronáuticas y Mecánicas del Estado (IAME, Mechanical Aircraft Industries of the State, 1951–1979), not to be confused with Italian American Motor Engineering', 'Industrias Kaiser Argentina (IKA, 1956–1975), United Kingdom']}, {'country': 'Australia', 'cars': ['Alpha Sports (1963–present)', 'Arrow (1963–present)', 'Birchfield (2003–present)', 'Bolwell (1979–present)', 'Borland Racing Developments (1984–present)', 'Bufori (1986–present)', 'Bullet (1996–present)', 'Carbontech (1999–present)', 'Daytona (2002–present)', 'Devaux (2001–present)', 'DRB Sports Cars (1997–present)', 'Elfin Cars (1958–present)', 'Finch Restorations (1965–present)', 'Jacer (1995–present)', 'Joss Developments (2004–present)', 'McKernan (2012–present)', 'Minetti Sports Cars (2003–present)', 'Nota (1955–present)', 'PRB (1978–present)', 'Puma Clubman (1998–present)', 'Python (1981–present)', 'Quantum (2015–present)', 'Roaring Forties (1997–present)', 'Spartan-V (2004–present)', 'Stealth Special Vehicles (2004–present)', 'Stohr Cars (1991–present)', 'Ascort (1958–1960)', 'Austin (1954–1983)', 'Australian Six (1919–1930)', 'Australis (1897–1907)', 'Birchfield (2003–2004)', 'Blade (2008–2013)', 'Buchanan', 'Buckle (1955–1959)', 'Bush Ranger (1977–2016)', 'Caldwell Vale (1907–1913)', 'Cheetah', 'Chrysler (1957–1981)', 'Ford (1925–2016) (continues as a brand applied to imported cars)', 'FPV (2002–2014)', 'Giocattolo (1986–1989)', 'Goggomobil (1958–1961)', 'Hartnett (1949–1955)', 'Holden (1948–2017) (continues as a brand applied to imported cars)', 'HSV (1987–2017)', 'Honda', 'Ilinga (1974-1975)', 'Kaditcha', 'Leyland (1973–1982)', 'Lloyd-Hartnett (1957–1962)', 'Lonsdale (1982–1983) (Cars produced and exported by Mitsubishi Australia and sold in the UK by the Colt Car Company under the Lonsdale brand.)', 'Mercedes-Benz (1890–present)', 'Mitsubishi (1980–2008) (The brand continued to be used in Australia for fully imported cars after 2008.)', 'Morris (1947–1973)', 'Nissan (1983–1992) (The brand continued to be used in Australia for fully imported cars after 1992.)', 'Pellandini (1970–1978)', 'Pioneer', 'Purvis Eureka (1974–1991)', 'Shrike (1988–1989)', 'Southern Cross (1931–1935)', 'Statesman (1971–1984)', 'Tarrant (1900–1907)', 'Toyota, Australian production finished (1963–2017)', 'Volkswagen', 'Zeta (1963–1965)']}, {'country': 'Austria', 'cars': ['Eurostar Automobilwerk', 'KTM', 'Magna Steyr', 'ÖAF', 'Puch', 'Steyr Motors GmbH', 'Rosenbauer', 'Tushek&Spigel Supercars', 'Austro-Daimler (1889–1934)', 'Austro-Tatra (1934–1948)', 'Custoca (also known as Custoka) (1966–1988)', 'Denzel (1948–1959)', 'Felber Autoroller (1952–1953)', 'Gräf & Stift (1902–2001)', 'Grofri (1921–1931)', 'Libelle (1952–1954)', 'Lohner-Porsche (1900–1905)', 'Möve 101', 'Steyr automobile', 'Steyr-Daimler-Puch']}, {'country': 'Azerbaijan', 'cars': ['GA (1986–present)', 'Khazar (2018–present)', 'NAZ (2010–present)', 'Aziz (2005–2010)']}, {'country': 'Belgium', 'cars': ['Ecar (2015–present)', 'Edran (1984–present)', 'Gillet (1982–present)', 'Imperia Automobiles (2008–present)', 'ADK (1930)', 'Alatac (1913–1914)', 'Alberta (1906)', 'Alfa Legia (1914)', 'ALP (1920)', 'Altona (1946)', 'AMA (1913)', 'Antoine (1903)', "d'Aoust (1927)", 'Apal (1998)', 'Aquila (1903)', 'Astra (1931)', 'ATA', 'Auto Garage (1911)', 'Auto-Mixte (1906–1912)', 'Avior (1947)', 'Bastin (1909)', 'Beckett & Farlow (1908)', 'Belga (1921)', 'Belga-Rise (1935)', 'Belgica (1909)', 'Bercley (1900)', 'Bovy (1914)', 'Cambier (1898)', 'CAP (1914)', 'Catala (1914)', 'CIE (1898)', "CLA (Compagnie Liégeoise d'Automobiles) (1901)", 'Claeys-Flandria (1955)', 'Coune (1947)', 'Cyclecars R&D (1921)', 'Dasse (1924)', 'De Cosmo (1908)', 'De Wandre (1923)', 'DéChamps (1906)', 'Delecroix (1899)', 'Delin (1901)', 'Direct (1905)', 'Dyle & Bacalan (1906)', 'Escol (1938)', 'Excelsior (1904–1932)', 'Fab (1914)', 'FD (1925)', 'Fif (1914)', 'Flaid (1921)', 'FN (1935)', 'Fondu (1912)', 'Frenay (1914)', 'Germain (1901)', 'Hermes (1909)', 'Hermes-Mathis (1914)', 'Imperia (1906–1948)', 'Imperia-Abadal (1913–1917)', 'Jeecy-Vea (1926)', 'Jenatzy (Société Générale des Transports Automobiles) (1905)', 'Juwel (1928)', 'Kleinstwagen (1952)', 'Knap (moved to France in 1899 or 1900) (1909)', 'L&B', 'Linon (1914)', 'Loza (1925)', 'Matthieu (1906)', 'Matthys Frères et Osy (1927)', 'Mécanique et Moteurs (1906)', 'Meeussen (1972)', 'Métallurgique (1913)', 'Miesse (1926)', 'Minerva (1939)', 'Nagant (1927)', 'Oracle (2005)', 'P.L.M. (Keller) (1955)', 'P-M (1924)', 'Peterill (1899)', 'Pieper (1903)', 'Pipe (1922)', 'R.A.L. (1914)', 'Ranger (General Motors brand) (1970–1978)', 'Royal Star (1910)', 'Rumpf (1899)', 'S.C.H. (1928)', 'Sava (1923)', 'SOMEA', 'Speedsport (1927)', 'Springuel (1912)', 'Taunton (1922)', 'Turner-Miesse (1913)', 'Vanclee (1989)', 'Vincke (1905)', 'Vivinus (1912)', 'Widi (1960)', 'Wilford (1901)', 'Zelensis (1962)']},...]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.