Issue
I am trying to log in to the site https://signin.siemens.com/regpublic/login.aspx in order to view item availability on a connected domain. The site appears to accept the POST request, but it does not complete the login process and redirect to https://www.automation.siemens.com/ where it would display a 403 forbidden error, which is what I want (it still logs me in elsewhere). My form request payload appears to match the correct manual login request exactly, but it does not work.
I have also tried adding clickdata (both click and no click), formname, with no effect. It appears to be gathering cookies correctly. I have seen in similar posts that __EVENTTARGET
is sometimes the culprit, however this site does not appear to generate an __EVENTTARGET
in the final POST request, it remains an empty field. I also matched format exactly to the various successful post requests from manual completions. So now I'm not sure if it is targeting the form correctly.
def start_requests(self):
login_url = "https://signin.siemens.com/regpublic/login.aspx"
yield scrapy.Request(login_url, callback=self.login)
def login(self, response):
logging.debug("starting login")
req = FormRequest.from_response(
response,
method="POST",
formdata={
"ctl00$GlobalScriptManager": "ctl00$ContentPlaceHolder1$LoginUserNamePasswordUpdatePanel|ctl00$ContentPlaceHolder1$LoginUserNamePasswordButton",
"__LASTFOCUS": "",
"GlobalScriptManager_TSM": "",
"__EVENTTARGET": "",
"__EVENTARGUMENT": "",
"__VIEWSTATE": response.css(
"input#__VIEWSTATE::attr(value)"
).extract_first(),
"__VIEWSTATEGENERATOR": response.css(
"input#__VIEWSTATEGENERATOR::attr(value)"
).extract_first(),
"__EVENTVALIDATION": response.css(
"input#__EVENTVALIDATION::attr(value)"
).extract_first(),
"ctl00$ContentPlaceHolder1$TextSiemensLogin": "EMAIL_USERNAME_HERE",
"ctl00$ContentPlaceHolder1$TextPassword": "PASSWORD_HERE",
"ctl00$ContentPlaceHolder1$hdflLoginUserNamePassword": "1",
"ctl00$ContentPlaceHolder1$LoginUserNamePasswordButton": "Login",
"__ASYNCPOST": "true",
},
callback=self.start_scraping,
)
logging.debug(req.headers)
logging.debug(req.body)
return req
This is what the successful request looks like when logging in manually (with JavaScript disabled like with scrapy, it works fine with JavaScript disabled, but does not work with cookies disabled):
__LASTFOCUS:
GlobalScriptManager_TSM:
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE: [VIEW-STATE-EXTRACT GOES HERE]
__VIEWSTATEGENERATOR: 93C9AE53
__EVENTVALIDATION: [EVENT-VALIDATION-EXTRACT GOES HERE]
ctl00$ContentPlaceHolder1$TextSiemensLogin: [EMAIL_USERNAME_HERE]
ctl00$ContentPlaceHolder1$TextPassword: [PASSWORD_HERE]
ctl00$ContentPlaceHolder1$LoginUserNamePasswordButton: Login
ctl00$ContentPlaceHolder1$hdflLoginUserNamePassword:
I noticed that the request did not complete in scrapy unless "__ASYNCPOST": "true"
was added back to the formdata
, despite it not being present in the manual request. I am about to give up on using FormRequest and try with scrapy splash to simulate a manual completion. I'd be grateful for any insights offered before I do that instead.
Solution
I solved the problem by using a Scrapy Splash script. It should be noted that while it worked correctly for the page in question, the Scrapy Splash browser was unable to render the pages generated with JavaScript on other login areas of the site despite it being enabled. This is the only page that it worked with.
lua_script = """
function main(splash, args)
splash.js_enabled = true
assert(splash:go(args.url))
assert(splash:wait(1))
splash:send_text('[USERNAME]')
assert(splash:wait(0.5))
splash:send_keys("<Tab>")
assert(splash:wait(0.5))
splash:send_text('[PASSWORD]')
assert(splash:wait(0.5))
splash:send_keys("<Return>")
assert(splash:wait(2))
return {
html = splash:html(),
png = splash:png(),
har = splash:har(),
}
end
"""
Answered By - cforcomputer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.