Issue
I am trying to download many (1000's) of images from tumblr with a particular tag (.e.g #art). I am trying to figure out the fastest and easiest way to do this. I have considered both scrapy and puppeteer as options, and I read a little bit about the tumblr API, but I'm not sure how to use the API to locally download the images I want. Currently, puppeteer seems like the best way, but I'm not sure how to deal with the fact that tumblr uses lazy loading (e.g. what is the code for getting all the images, scrolling down, waiting for for images to load, and getting these) Would appreciate any tips!
Solution
My solution is below. Since I couldn't use offset, I used the timestamps of each post as an offset instead. Since I was trying to specifically get the links of images in the posts, I did a little processing of the output as well. I then used a simple python script to download every image from my list of links. I have included a website and an additional stack overflow post which I found helpful.
import pytumblr
def get_all_posts(client, blog):
offset = None
for i in range(48):
#response = client.posts(blog, limit=20, offset=offset, reblog_info=True, notes_info=True)
response = client.tagged('YOUR TAG HERE', limit=20, before=offset)
for post in response:
# for post in response:
if('photos' not in post):
#print(post)
if('body' in post):
body = post['body']
body = body.split('<')
body = [b for b in body if 'img src=' in b]
if(body):
body = body[0].split('"')
print(body[1])
yield body[1]
else:
yield
else:
print(post['photos'][0]['original_size']['url'])
yield post['photos'][0]['original_size']['url']
# move to the next offset
offset = response[-1]['timestamp']
print(offset)
client = pytumblr.TumblrRestClient('USE YOUR API KEY HERE')
blog = 'staff'
# use our function
with open('{}-posts.txt'.format(blog), 'w') as out_file:
for post in get_all_posts(client, blog):
print(post, file=out_file)
Links:
Print more than 20 posts from Tumblr API
Also thank you very much to Harada, whose advice helped a lot!
Answered By - gollyzoom
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.