Issue
I'm using .NET Core 6 with C# 10.
What I'm trying to achieve is to run Selenium in headless mode whilst remaining "undetectable". I followed the instructions from here: https://intoli.com/blog/not-possible-to-block-chrome-headless/ which provided a page to test your bot: https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html
Headless mode causes some JS vars (like window.chrome) to be unset or invalid which causes the bot to be detected.
IJavaScriptExecutor doesn't work since it runs after the page has loaded. The same author mentions that you have to capture the response and inject JS in this article: https://intoli.com/blog/making-chrome-headless-undetectable/ (Putting It All Together section)
Since the article uses python, I followed this: https://www.automatetheplanet.com/webdriver-capture-modify-http-traffic/ and this: Titanium Web Proxy - Can't modify request body which uses the Titanium Web Proxy library (found here: https://github.com/justcoding121/titanium-web-proxy)
For testing, I used this site http://www.example.com and tried to modify the response (change something in the HTML, set JS vars, etc)
Here is the proxy class:
public static class Proxy
{
static ProxyServer proxyServer = new ProxyServer(userTrustRootCertificate: true);
public static void StartProxy()
{
//Run on port 8080, decrypt ssl
ExplicitProxyEndPoint explicitEndPoint = new ExplicitProxyEndPoint(IPAddress.Any, 8080, true);
proxyServer.Start();
proxyServer.AddEndPoint(explicitEndPoint);
proxyServer.BeforeResponse += OnBeforeResponse;
}
static async Task OnBeforeResponse(object sender, SessionEventArgs ev)
{
var request = ev.HttpClient.Request;
var response = ev.HttpClient.Response;
//Modify title tag in example.com
if (String.Equals(ev.HttpClient.Request.RequestUri.Host, "www.example.com", StringComparison.OrdinalIgnoreCase))
{
var body = await ev.GetResponseBodyAsString();
body = body.Replace("<title>Example Domain</title>", "<title>Completely New Title</title>");
ev.SetResponseBodyString(body);
}
}
public static void StopProxy()
{
proxyServer.Stop();
}
}
And here is the selenium code:
Proxy.StartProxy();
string url = "localhost:8080";
var seleniumProxy = new OpenQA.Selenium.Proxy
{
HttpProxy = url,
SslProxy = url,
FtpProxy = url
};
ChromeOptions options = new ChromeOptions();
options.AddArgument("ignore-certificate-errors");
options.Proxy = seleniumProxy;
IWebDriver driver = new ChromeDriver(@"C:\ChromeDrivers\103\", options);
driver.Manage().Window.Maximize();
driver.Navigate().GoToUrl("http://www.example.com");
Console.ReadLine();
TornCityBot.Proxy.StopProxy();
When selenium loads http://www.example.com, the <title>Example Domain</title>
should be changed to <title>Completely New Title</title>
, but there was no change. I tried setting the proxy URL as http://localhost:8080, 127.0.0.1:8080, localhost:8080, etc but there was no change.
As a test, I ran the code and left the proxy on. I then ran curl --proxy http://localhost:8080 http://www.example.com
in git bash and the output was:
<!doctype html>
<html>
<head>
<title>Completely New Title</title>
. . .
The proxy was working, it was modifying the response for the curl command. But for some reason, it wasn't working with selenium.
If you guys have a solution that can also work on HTTPS or a better method to execute JavaScript on page load, that would be great. If it's not possible, then I might need to forget about headless.
Thanks in advance for any help.
Solution
Selenium.WebDriver 4.3.0
and ChromeDriver 103
Try use the ExecuteCdpCommand
method
var options = new ChromeOptions();
options.AddArgument("--headless");
options.AddArgument("--user-agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'");
using var driver = new ChromeDriver(options);
Dictionary<string, object> cmdParams= new();
cmdParams.Add("source", "Object.defineProperty(navigator, 'webdriver', { get: () => false });");
driver.ExecuteCdpCommand("Page.addScriptToEvaluateOnNewDocument", cmdParams);
With this piece of code we bypass the first two but if you follow the guide you've already mentioned i think it's easy to bypass the rest.
UPDATE
var initialScript = @"Object.defineProperty(Notification, 'permission', {
get: function () { return ''; }
})
window.chrome = true
Object.defineProperty(navigator, 'webdriver', {
get: () => false})
Object.defineProperty(window, 'chrome', {
get: () => true})
Object.defineProperty(navigator, 'plugins', {
writeable: true,
configurable: true,
enumerable: true,
value: 'works'})
navigator.plugins.length = 1
Object.defineProperty(navigator, 'language', {
get: () => 'el - GR'});
Object.defineProperty(navigator, 'deviceMemory', {
get: () => 8});
Object.defineProperty(navigator, 'hardwareConcurrency', {
get: () => 8});";
cmdParams.Add("source", initialScript);
driver.ExecuteCdpCommand("Page.addScriptToEvaluateOnNewDocument", cmdParams);
Answered By - ggeorge
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.