Issue
I have this code that gets all child URLs within a page.
How do I parse multipe URLs through this code?
from bs4 import BeautifulSoup
import requests
headers = {
'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/91.0.4472.114 Safari/537.36'}
source = requests.get("https://www.oddsportal.com/soccer/england/efl-cup/results/", headers=headers)
soup = BeautifulSoup(source.text, 'html.parser')
main_div = soup.find("div", class_="main-menu2 main-menu-gray")
a_tag = main_div.find_all("a")
for i in a_tag:
print(i['href'])
How do I modify it to run for multiple URLs
while my URL list is as:
df:
| | URL |
|----|---------------------------------------------------------------------|
| 0 | https://www.oddsportal.com/soccer/nigeria/npfl-pre-season/results/ |
| 1 | https://www.oddsportal.com/soccer/england/efl-cup/results/ |
| 2 | https://www.oddsportal.com/soccer/europe/guadiana-cup/results/ |
| 3 | https://www.oddsportal.com/soccer/world/kings-cup-thailand/results/ |
| 4 | https://www.oddsportal.com/soccer/poland/division-2-east/results/ |
I tried parsing it this way :
headers = {
'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/91.0.4472.114 Safari/537.36'}
for url in df:
source = requests.get(df['URL'], headers=headers)
soup = BeautifulSoup(source.text, 'html.parser')
main_div = soup.find("div", class_="main-menu2 main-menu-gray")
a_tag = main_div.find_all("a")
for i in a_tag:
print(i['href'])
However I am getting this error:
line 742, in get_adapter
raise InvalidSchema("No connection adapters were found for {!r}".format(url))
How can I modify the same to parse multiple URLs?
Solution
change
for url in df:
source = requests.get(df['URL'], headers=headers)
To
for url in df['URL']:
source = requests.get(url, headers=headers)
Answered By - αԋɱҽԃ αмєяιcαη
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.