Issue
I'm trying to log in with Scrapy but am receiving lots of "Redirecting (302)" messages. This happens when I use my real login and also with fake login info. I also tried it with another site and still no luck.
import scrapy
from scrapy.http import FormRequest, Request
class LoginSpider(scrapy.Spider):
name = 'SOlogin'
allowed_domains = ['stackoverflow.com']
login_url = 'https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f'
test_url = 'http://stackoverflow.com/questions/ask'
def start_requests(self):
yield Request(url=self.login_url, callback=self.parse_login)
def parse_login(self, response):
return FormRequest.from_response(response, formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl)
def start_crawl(self, response):
yield Request(self.test_url, callback=self.parse_item)
def parse_item(self, response):
print("Test URL " + response.url)
I also tried adding
meta = {'dont_redirect': True, 'handle_httpstatus_list':[302]}
to the initial Request and the FormRequest.
Here's the output from the code above:
2017-04-17 21:48:17 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: stackoverflow) 2017-04-17 21:48:17 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'stackoverflow', 'NEWSPIDER_MODULE': 'stackoverflow.spiders', 'SPIDER_MODULES': ['stackoverflow.spiders'], 'USER_AGENT': 'Mozilla/5.0'} 2017-04-17 21:48:17 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2017-04-17 21:48:17 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2017-04-17 21:48:17 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2017-04-17 21:48:17 [scrapy.middleware] INFO: Enabled item pipelines: [] 2017-04-17 21:48:17 [scrapy.core.engine] INFO: Spider opened 2017-04-17 21:48:17 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-04-17 21:48:17 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-04-17 21:48:18 [scrapy.core.engine] DEBUG: Crawled (200) https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f> (referer: None) 2017-04-17 21:48:18 [scrapy.core.engine] DEBUG: Crawled (200) https://stackoverflow.com/search?q=&email=XXXXX&password=XXXXX> (referer: https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f) 2017-04-17 21:48:19 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> from http://stackoverflow.com/questions/ask> 2017-04-17 21:48:19 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> from http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> 2017-04-17 21:48:19 [scrapy.core.engine] DEBUG: Crawled (200) https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> (referer: https://stackoverflow.com/search?q=&email=XXXXX&password=XXXXX) Test URL https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask 2017-04-17 21:48:19 [scrapy.core.engine] INFO: Closing spider (finished) 2017-04-17 21:48:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1772, 'downloader/request_count': 5, 'downloader/request_method_count/GET': 5, 'downloader/response_bytes': 34543, 'downloader/response_count': 5, 'downloader/response_status_count/200': 3, 'downloader/response_status_count/302': 2, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2017, 4, 17, 18, 48, 19, 470354), 'log_count/DEBUG': 6, 'log_count/INFO': 7, 'request_depth_max': 2, 'response_received_count': 3, 'scheduler/dequeued': 5, 'scheduler/dequeued/memory': 5, 'scheduler/enqueued': 5, 'scheduler/enqueued/memory': 5, 'start_time': datetime.datetime(2017, 4, 17, 18, 48, 17, 386516)} 2017-04-17 21:48:19 [scrapy.core.engine] INFO: Spider closed (finished)
Solution
Scrapy by default try to populate your email and password in the first clickable input field (in login page it's search form). You need to specify input field by formname
or formid
e.g.
FormRequest.from_response(response, formid="login-form", formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl)
.
See docs
Answered By - vold
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.