I am trying to fetch a URL from a webpage, here is how the URL looks in the Inspect section:
Here is how the URL looks in my python-code:
How can I get the actual URL without the ../../ part using BeautifulSoup? Here is my code in case it's needed:
import re
import requests
from bs4 import BeautifulSoup
source = requests.get('').text
soup = BeautifulSoup(source, 'lxml')
# article = soup.find('article')
# title = article.div.a.img['alt']
# print(title['alt'])
titles, topics,urls,sources = [], [], [],[]
article_productPod = soup.findAll('article', {"class":"product_pod"})
for i in article_productPod:
# print(titles)
for q in article_productPod:
# for z in range(len(urls)):
# source2 = requests.get("https://" + urls[z])
Use urllib:
import urllib
Store your target URL in a separate variable :
src_url = r''
source = requests.get(src_url).text
Join the website's URL and the relative URL:
for q in article_productPod:
urls.append(urllib.parse.urljoin(src_url, q.h3.a['href']))
Answered By - Achraf Mansari
Post a Comment
Note: Only a member of this blog may post a comment.