Issue
I am creating a script that crawls one website to gather some data but the problem is that they blocked me after too many requests but using a proxy I can send more request then currently I do. I have integrated proxy with chrome option --proxy-server
options.add_argument('--proxy-server={}'.format('http://ip:port'))
but I am using a paid proxy so it requires authentication and as below screenshot it gives the alert box for username and password
Then I tried to use it with username and password
options.add_argument('--proxy-server={}'.format('http://username:password@ip:port'))
But it also does not seems to work. I was looking for a solution and found below solution and I used it with the chrome extension proxy auto auth and without the chrome extension
proxy = {'address': settings.PROXY,
'username': settings.PROXY_USER,
'password': settings.PROXY_PASSWORD}
capabilities = dict(DesiredCapabilities.CHROME)
capabilities['proxy'] = {'proxyType': 'MANUAL',
'httpProxy': proxy['address'],
'ftpProxy': proxy['address'],
'sslProxy': proxy['address'],
'noProxy': '',
'class': "org.openqa.selenium.Proxy",
'autodetect': False,
'socksUsername': proxy['username'],
'socksPassword': proxy['password']}
options.add_extension(os.path.join(settings.DIR, "extension_2_0.crx")) # proxy auth extension
but neither of above worked properly it seems working because after above code the proxy authentication alert disappeared and when I checked my IP by googling what is my IP and confirmed that is not working.
please anyone who can help me to authenticate the proxy server on chromedriver.
Solution
Selenium Chrome Proxy Authentication
Setting chromedriver proxy with Selenium using Python
If you need to use a proxy with python and Selenium library with chromedriver you usually use the following code (Without any username and password:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % hostname + ":" + port)
driver = webdriver.Chrome(chrome_options=chrome_options)
It works fine unless proxy requires authentication. if the proxy requires you to log in with a username and password it will not work. In this case, you have to use more tricky solution that is explained below. By the way, if you whitelist your server IP address from the proxy provider or server it should not ask proxy credentials.
HTTP Proxy Authentication with Chromedriver in Selenium
To set up proxy authentication we will generate a special file and upload it to chromedriver dynamically using the following code below. This code configures selenium with chromedriver to use HTTP proxy that requires authentication with user/password pair.
import os
import zipfile
from selenium import webdriver
PROXY_HOST = '192.168.3.2' # rotating proxy or host
PROXY_PORT = 8080 # port
PROXY_USER = 'proxy-user' # username
PROXY_PASS = 'proxy-password' # password
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
background_js = """
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "%s",
port: parseInt(%s)
},
bypassList: ["localhost"]
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "%s",
password: "%s"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
""" % (PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS)
def get_chromedriver(use_proxy=False, user_agent=None):
path = os.path.dirname(os.path.abspath(__file__))
chrome_options = webdriver.ChromeOptions()
if use_proxy:
pluginfile = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(pluginfile, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
chrome_options.add_extension(pluginfile)
if user_agent:
chrome_options.add_argument('--user-agent=%s' % user_agent)
driver = webdriver.Chrome(
os.path.join(path, 'chromedriver'),
chrome_options=chrome_options)
return driver
def main():
driver = get_chromedriver(use_proxy=True)
#driver.get('https://www.google.com/search?q=my+ip+address')
driver.get('https://httpbin.org/ip')
if __name__ == '__main__':
main()
Function get_chromedriver returns configured selenium webdriver that you can use in your application. This code is tested and works just fine.
Read more about onAuthRequired event in Chrome.
Answered By - itsmnthn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.