Issue
I am attempting to make an API request, pull down specific chunks of the response and ultimately save it into a file for later processing. I also first want to mention that the script works full, until I begin to pull larger sets of data. When I open the params to a larger date range, I receive: ContentTypeError( aiohttp.client_exceptions.ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: text/html'
async def get_dataset(session, url):
async with session.get(url=url, headers=headers, params=params) as resp:
dataset = await resp.json()
return dataset['time_entries']
async def main():
tasks = []
async with aiohttp.ClientSession() as session:
for page in range(1, total_pages):
url = "https://api.harvestapp.com/v2/time_entries?page=" + str(page)
tasks.append(asyncio.ensure_future(get_dataset(session, url)))
dataset = await asyncio.gather(*tasks)
If I keep my params small enough, then it works without issue. But too large of a date range and the error pops up, and anything past the snippet I shared above does not run More for reference:
url_address = "https://api.harvestapp.com/v2/time_entries/"
headers = {
"Content-Type": 'application/json',
"Authorization": authToken,
"Harvest-Account-ID": accountID
}
params = {
"from": StartDate,
"to": EndDate
}
Any ideas on what would cause this to work on certain data sizes but fail on larger sets? I am assuming the JSON is becoming malformed at some point, but I am unsure of how to examine that and/or prevent it from happening, since I am able to pull multiple pages from the API and successfully appending on the smaller data pulls.
Solution
OP: Thank you to the others who gave answers. I discovered the issue and implemented a solution. A friend pointed out that aiohttp can return that error message if the response is of an error page instead of the expected json content i.e. a html page giving a 429 HTTP too many requests. I looked up the API limits and found they do have it set to 100 requests per 15 seconds.
My solution was to implement the asyncio-throttle
module which allowed me to directly limit the requests and time period. You can find this on the devs GitHub
Here is my updated code with the implementation, very simple! For my instance I needed to limit my requests to 100 per 15 seconds which you can see below as well.
async def get_dataset(session, url, throttler):
while True:
async with throttler:
async with session.get(url=url, headers=headers, params=params) as resp:
dataset = await resp.json()
return dataset['time_entries']
async def main():
tasks = []
throttler = Throttler(rate_limit=100, period=15)
async with aiohttp.ClientSession() as session:
try:
for page in range(1, total_pages):
url = "https://api.harvestapp.com/v2/time_entries?page=" + str(page)
tasks.append(asyncio.ensure_future(get_dataset(session, url, throttler)))
dataset = await asyncio.gather(*tasks)
Answered By - Srichard90
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.