Tuesday, October 12, 2021

[FIXED] How do I modify code to parse multiple URL?

October 12, 2021 beautifulsoup, python, web-scraping No comments

Issue

I have this code that gets all child URLs within a page.

How do I parse multipe URLs through this code?

from bs4 import BeautifulSoup
import requests

headers = {
    'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/91.0.4472.114 Safari/537.36'}
source = requests.get("https://www.oddsportal.com/soccer/england/efl-cup/results/", headers=headers)

soup = BeautifulSoup(source.text, 'html.parser')
main_div = soup.find("div", class_="main-menu2 main-menu-gray")
a_tag = main_div.find_all("a")
for i in a_tag:
    print(i['href'])

How do I modify it to run for multiple URLs

while my URL list is as:

df:

|    | URL                                                                 |
|----|---------------------------------------------------------------------|
|  0 | https://www.oddsportal.com/soccer/nigeria/npfl-pre-season/results/  |
|  1 | https://www.oddsportal.com/soccer/england/efl-cup/results/          |
|  2 | https://www.oddsportal.com/soccer/europe/guadiana-cup/results/      |
|  3 | https://www.oddsportal.com/soccer/world/kings-cup-thailand/results/ |
|  4 | https://www.oddsportal.com/soccer/poland/division-2-east/results/   |

I tried parsing it this way :

headers = {
    'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/91.0.4472.114 Safari/537.36'}
for url in df:
    source = requests.get(df['URL'], headers=headers)

    soup = BeautifulSoup(source.text, 'html.parser')
    main_div = soup.find("div", class_="main-menu2 main-menu-gray")
    a_tag = main_div.find_all("a")
    for i in a_tag:
        print(i['href'])

However I am getting this error:

line 742, in get_adapter
    raise InvalidSchema("No connection adapters were found for {!r}".format(url))

How can I modify the same to parse multiple URLs?

Solution

change

for url in df:
    source = requests.get(df['URL'], headers=headers)

for url in df['URL']:
    source = requests.get(url, headers=headers)

Answered By - αԋɱҽԃ αмєяιcαη

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, October 12, 2021

[FIXED] How do I modify code to parse multiple URL?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels