Thursday, November 10, 2022

[FIXED] Python requests, download file

November 10, 2022 beautifulsoup, download, python-3.x, python-requests No comments

Issue

I am trying to access an AMD download link for my gpu drivers but the site detects that I am not using a browser and redirects me to a page in their forums... ('https://www.amd.com/fr/support/kb/faq/download-incomplete')

How to 'fool' their system to access it?

I tried embedding the headers with the site's cookie but it doesn't work. Here is my code:

from requests import get
from bs4 import BeautifulSoup

header = {
    "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Mobile Safari/537.36"}

url = 'https://www.amd.com/fr/support/graphics/amd-radeon-6000-series/amd-radeon-6900-series/amd-radeon-rx-6900-xt'
req = get(url, headers=header)
latestlink = BeautifulSoup(req.text, 'html.parser').find("a", {"class":"btn-transparent-black"})['href']

for key in req.headers:
    header[key]= req.headers[key]

dlpage = get(latestlink, headers=header).url
print(dlpage)   #Actually the forum page

Solution

If you want the download links, try this:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:104.0) Gecko/20100101 Firefox/104.0",
}

url = 'https://www.amd.com/fr/support/graphics/amd-radeon-6000-series/amd-radeon-6900-series/amd-radeon-rx-6900-xt'
soup = (
    BeautifulSoup(requests.get(url, headers=headers).text, 'lxml')
    .find_all("a", href=True)
)
links = [i["href"] for i in soup if i["href"].endswith(".exe")]
print("\n".join(links))

This should output:

https://drivers.amd.com/drivers/amd-software-adrenalin-edition-22.10.3-win10-win11-oct28.exe
https://drivers.amd.com/drivers/whql-amd-software-adrenalin-edition-22.5.1-win10-win11-may10.exe
https://drivers.amd.com/drivers/prographics/amd-software-pro-edition-22.q3-win10-win11.exe
https://drivers.amd.com/drivers/installer/22.20/beta/amd-software-adrenalin-edition-22.10.3-minimalsetup-221027_web.exe
https://drivers.amd.com/drivers/amd-software-adrenalin-edition-22.10.3-win10-win11-oct28.exe
https://drivers.amd.com/drivers/whql-amd-software-adrenalin-edition-22.5.1-win10-win11-may10.exe
https://drivers.amd.com/drivers/prographics/amd-software-pro-edition-22.q3-win10-win11.exe
https://drivers.amd.com/drivers/installer/22.20/beta/amd-software-adrenalin-edition-22.10.3-minimalsetup-221027_web.exe
https://drivers.amd.com/drivers/radeon-software-adrenalin-2020-22.6.1-win7-64bit-june23-2022.exe

And if you want to download the drivers, use this:

import os
from pathlib import Path
from shutil import copyfileobj

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:104.0) Gecko/20100101 Firefox/104.0",
}


def download_file(source_url: str, directory: str) -> None:
    os.makedirs(directory, exist_ok=True)
    save_dir = Path(directory)
    file_name = save_dir / source_url.rsplit('/', 1)[-1]
    with s.get(source_url, stream=True) as file, open(file_name, "wb") as output:
        copyfileobj(file.raw, output)


with requests.session() as s:
    url = 'https://www.amd.com/fr/support/graphics/amd-radeon-6000-series/amd-radeon-6900-series/amd-radeon-rx-6900-xt'
    s.headers.update(headers)
    soup = BeautifulSoup(s.get(url).text, 'lxml').find_all("a", href=True)
    links = [i["href"] for i in soup if i["href"].endswith(".exe")]
    print("\n".join(links))
    for link in links:
        download_file(link, "downloads")

Answered By - baduker

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, November 10, 2022

[FIXED] Python requests, download file

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels