Issue
I am trying to write a program that downloads all the xkcd comics images and save them in a directory, with all the images name as title.png, title being the title of the comic. Here's the code for it:
#Downloads all the xkcd comics
import requests, bs4, os
site = requests.get('https://www.xkcd.com')
def downloadImage(site):
soup = bs4.BeautifulSoup(site.text)
img_tag = soup.select('div[id="comic"] img')
img_title = img_tag[0].get('alt')
img_file = open(img_title+'.png', 'wb')
print("Downloading %s..." %img_title)
img_res = requests.get("https:" + img_tag[0].get('src'))
for chunk in img_res.iter_content(100000):
img_file.write(chunk)
print("Saved %s in " %img_title, os.getcwd())
def downloadPrevious(site):
soup = bs4.BeautifulSoup(site.text)
prev_tag_list = soup.select("ul[class='comicNav'] li > a")
prev_tag = None
for each in prev_tag_list:
if(each.get('rel')==['prev']):
prev_tag = each
break
if(prev_tag.get('href') == '#'):
return True
prev_site = requests.get('https://xkcd.com' + prev_tag.get('href'))
downloadImage(prev_site)
return False, prev_site
def download_XKCD_Comics(site):
try:
os.makedirs('E:\\XKCD Comics')
except:
os.chdir('E:\XKCD Comics')
done = False
downloadImage(site)
while(not done):
done, site = downloadPrevious(site)
return
download_XKCD_Comics(site)
The output of the code:
==== RESTART: E:\Computer_Science_Programs\Python\Get all XKCD Comics.py ====
Downloading Data Pipeline...
Saved Data Pipeline in E:\XKCD Comics
Downloading Incoming Calls...
Saved Incoming Calls in E:\XKCD Comics
Downloading Stanislav Petrov Day...
Saved Stanislav Petrov Day in E:\XKCD Comics
Downloading Bad Opinions...
Saved Bad Opinions in E:\XKCD Comics
Traceback (most recent call last):
File "E:\Computer_Science_Programs\Python\Get all XKCD Comics.py", line 45, in <module>
download_XKCD_Comics(site)
File "E:\Computer_Science_Programs\Python\Get all XKCD Comics.py", line 42, in download_XKCD_Comics
done, site = downloadPrevious(site)
File "E:\Computer_Science_Programs\Python\Get all XKCD Comics.py", line 30, in downloadPrevious
downloadImage(prev_site)
File "E:\Computer_Science_Programs\Python\Get all XKCD Comics.py", line 11, in downloadImage
img_file = open(img_title+'.png', 'wb')
FileNotFoundError: [Errno 2] No such file or directory: '6/6 Time.png'
>>>
I don't understand the problem. None of the other files existed, but the error was raised only with this file name. Please somebody help me with this one!
Solution
/
is an invalid character for Windows filenames.
Theres lots of ways to get a valid file name. One example is the one Django uses:
def get_valid_filename(s):
s = str(s).strip().replace(' ', '_')
return re.sub(r'(?u)[^-\w.]', '', s)
It replaces spaces with underscores, then removes any non-letter, number, _, -, or . characters.
Answered By - Loocid
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.