So I've been experimenting with web scraping with aiohttp, and I ran into this issue where whenever I use a proxy, the code within the session.get doesn't run. I've looked all over the internet and couldn't find a solution.
import asyncio
import time
import aiohttp
from aiohttp.client import ClientSession
import random
failed = 0
success = 0
proxypool = []
with open("proxies.txt", "r") as jsonFile:
lines = jsonFile.readlines()
for i in lines:
x = i.split(":")
async def download_link(url:str,session:ClientSession):
global failed
global success
proxy = proxypool[random.randint(0, len(proxypool))]
async with session.get(url, proxy=proxy) as response:
if response.status != 200:
failed +=1
success +=1
result = await response.text()
async def download_all(urls:list):
my_conn = aiohttp.TCPConnector(limit=1000)
async with aiohttp.ClientSession(connector=my_conn,trust_env=True) as session:
tasks = []
for url in urls:
task = asyncio.ensure_future(download_link(url=url,session=session))
await asyncio.gather(*tasks,return_exceptions=True) # the await must be nest inside of the session
url_list = [""]*100
start = time.time()
end = time.time()
print(f'download {len(url_list)-failed} links in {end - start} seconds')
print(failed, success)
Here is the problem though, the code works fine on my mac. However, when I try to run the exact same code on windows, it doesn't run. It also works fine without proxies, but as soon as I add them, it doesn't work.
At the end, you can see that I print failed and succeeded. On my mac it will output 0, 100, whereas on my windows computer, it will print 0,0 - This proves that that code isn't running (Also, nothing is printed)
The proxies I am using are paid proxies, and they work normally if I use requests.get()
. Their format is "http://user:pass@ip:port"
I have also tried just using "http://ip:port" then using BasicAuth to carry the user and password, but this does not work either.
I've seen that many other people have had this problem, however the issue never seems to get solved.
Any help would be appreciated :)
So after some more testing and researching I found the issue, I needed to add ssl = False
So the correct way to make the request would be:
async with session.get(url, proxy=proxy, ssl = False) as response:
That worked for me.
Answered By - Hello_Darkness
Post a Comment
Note: Only a member of this blog may post a comment.