Wednesday, March 16, 2022

[FIXED] aiohttp rate limiting requests with unreliable internet

March 16, 2022 aiohttp, async-await, python, python-3.x, python-asyncio No comments

Issue

I am downloading content from a website with a very strict rate limit. If I exceed 10 req/sec, I am banned for 10 minutes. I have been using the following code to rate limit AIOHTTP:

import time

class RateLimitedClientSession:
    """Rate Limited Client.
    Attributes:
        client (aiohttp.ClientSession): A client to call
        rate_limit (int): Maximum number of requests per second to make
    https://quentin.pradet.me/blog/how-do-you-rate-limit-calls-with-aiohttp.html
    """

    def __init__(self, client, rate_limit):
        self.client = client
        self.rate_limit = rate_limit
        self.max_tokens = rate_limit
        self.tokens = self.max_tokens
        self.updated_at = time.monotonic()
        self.start = time.monotonic()

    async def get(self, *args, **kwargs):
        """Wrapper for ``client.get`` that first waits for a token."""
        await self.wait_for_token()
        return self.client.get(*args, **kwargs)

    async def wait_for_token(self):
        """Sleeps until a new token is added."""
        while self.tokens < 1:
            self.add_new_tokens()
            await asyncio.sleep(0.03) # Arbitrary delay, must be small though.
        self.tokens -= 1

    def add_new_tokens(self):
        """Adds a new token if time elapsed is greater than minimum time."""
        now = time.monotonic()
        time_since_update = now - self.updated_at
        new_tokens = time_since_update * self.rate_limit
        if self.tokens + new_tokens >= 1:
            self.tokens = min(self.tokens + new_tokens, self.max_tokens)
            self.updated_at = now

Then I can use it as such:

from aiohttp import ClientSession, TCPConnector

limit = 9 # 9 requests per second
inputs = ['url1', 'url2', 'url3', ...]
conn = TCPConnector(limit=limit)
raw_client = ClientSession(connector=conn, headers={'Connection': 'keep-alive'})
async with raw_client:
    session = RateLimitedClientSession(raw_client, limit)
    tasks = [asyncio.ensure_future(download_link(link, session)) for link in inputs]
    for task in asyncio.as_completed(tasks):
        await task

async def download_link(link, session):
    async with await session.get(link) as resp:
        data = await resp.read()
        # Then write data to a file

My issue is that the code will work correctly for a random amount of times, usually between 100 and 2000. Then, it exits due to hitting the rate limit. I suspect this has to do with the latency of my internet.

For example, imagine a 3 request/second limit.

SECOND 1:
 + REQ 1
 + REQ 2
 + REQ 3

SECOND 2:
 + REQ 4
 + REQ 5
 + REQ 6

With a little bit of lag, this might look like

SECOND 1:
 + REQ 1
 + REQ 2

SECOND 2:
+ REQ 3 - rolled over from previous second due to internet speed
+ REQ 4
+ REQ 5
+ REQ 6

Which then triggers the rate limit.

What can I do to minimize the chance of this happening?

I have already tried lowering the rate limit, and it does work for a longer period of time but still eventually hits the rate limit.
I have also tried firing each request 1/10 a second apart, but this still triggers the rate limit (perhaps for unrelated reasons?).

Solution

I decided that the best solution was to batch the requests into groups, and await the missing time. I no longer use a rate-limiting wrapper around AIOHTTP.

async def download_link(link, session):
    async with await session.get(link) as resp:
        data = await resp.read()
        # Then write data to a file


def batch(iterable, n):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

rate_limit = 10
conn = aiohttp.TCPConnector(limit=rate_limit)
client = aiohttp.ClientSession(
    connector=conn, headers={'Connection': 'keep-alive'}, raise_for_status=True)

async with client:
    for group in batch(inputs, rate_limit):
        start = time.monotonic()
        tasks = [download_link(link, client) for link in group]
        await asyncio.gather(*tasks) # If results are needed they can be assigned here
        execution_time = time.monotonic() - start
        # If execution time > 1, requests are essentially wasted, but a small price to pay
        await asyncio.sleep(max(0, 1 - execution_time))

Answered By - PAS

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, March 16, 2022

[FIXED] aiohttp rate limiting requests with unreliable internet

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels