Issue
I wrote a small script for checking proxies:
async def proxy_check(session, proxy):
global good_proxies
proxy_str = f'http://{proxy}'
async with semaphore:
try:
async with session.get(host, proxy=proxy_str, timeout=10) as r:
if r.status == 200:
resp = await r.json()
if resp['ip'] == proxy:
good_proxies.append(proxy)
proxies.remove(proxy)
except Exception:
logging.exception(proxy)
proxies.remove(proxy)
async def main():
async with aiohttp.ClientSession() as session:
tasks = []
for proxy in proxies:
tasks.append(asyncio.create_task(proxy_check(session, proxy)))
await asyncio.gather(*tasks)
But when I run it, I get one of these errors:
aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid constant string' aiohttp.client_exceptions.ClientResponseError: 400, message='invalid constant string' concurrent.futures._base.TimeoutError
There are almost 20,000 proxies in my list and this script does not connect through all these proxies. Not one proxy does not work in this script.
But if you do this:
proxy = {'http': f'http://{proxy}'}
r = requests.get(url, proxies=proxy)
That everything works. A lot of proxies work. What i'm doing wrong?
Solution
The collection proxies
is iterated within your main method. It's elements processed in parallel by multiple tasks. This is fine so far but within the processing function, you are altering the collection you are intreating on. This results in a race condition causing corruption of the collection you are iterating on.
- You should never alter a collection you are intreating on.
- If you have code altering a shared resource in parallel you need to use a mutual exclusion to make it thread-safe. You could use "Lock" in python 3.7.
Answered By - Felix Quehl
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.