Issue
I trying to automate downloading a bunch of pdfs. Among others, one URL is as follows
https://www.unpri.org/download?ac=4195
I'm using the following code to get the headers from this URL
import requests
h = requests.head(url, allow_redirects=True)
header = h.headers
print(header)
These are the headers
{'Cache-Control': 'no-cache', 'Connection': 'close', 'Content-Type': 'text/html'}
There is no content-disposition or anything else that can give me file name. However, when I open this in the browser and right click --> save as, I get option to save with its original name(screenshot below)
Is there any way I can get this file name with python?
Solution
Just add proper User-Agent
and use the response headers to get the file name.
Here's how:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:95.0) Gecko/20100101 Firefox/95.0",
}
r = requests.get("https://www.unpri.org/download?ac=4195", headers=headers)
print(r.headers["Content-disposition"].split("=", -1)[-1])
Output:
PRI_Investor_guide_on_agricultural_supply_chain.pdf
Answered By - baduker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.