Monday, March 14, 2022

[FIXED] TypeError: set_user_agent() takes 2 positional arguments but 3 were given

March 14, 2022 python, scrapy, web-scraping No comments

Issue

I was following the tutorial of spoofing header but after making set user agent function the terminal is showing an error

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class BestMoviesSpider(CrawlSpider):
    name = 'best_movies'
    allowed_domains = ['imdb.com']
    user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'

    def start_requests(self):
        yield scrapy.Request(url='https://www.imdb.com/search/title/?genres=drama&groups=top_250&sort=user_rating,desc',
                         headers={
                             'User_Agent': self.user_agent
                         })

    rules = (
        Rule(LinkExtractor(restrict_xpaths=("//h3[@class='lister-item-header']/a")), callback='parse_item',
            follow=True, process_request='set_user_agent'),
        Rule(LinkExtractor(restrict_xpaths="(//a[@class='lister-page-next next-page'])[2]"),
         process_request='set_user_agent')

 )

    def set_user_agent(self, request):
        request.headers['User-Agent'] = self.user_agent
        return request

error

TypeError: set_user_agent() takes 2 positional arguments but 3 were given

Solution

You use set_user_agent as a process_request method in your rules. Documentation says like this:

process_request is a callable (or a string, in which case a method from the spider object with that name will be used) which will be called for every Request extracted by this rule. This callable should take said request as first argument and the Response from which the request originated as second argument. It must return a Request object or None (to filter out the request). (https://docs.scrapy.org/en/latest/topics/spiders.html)

So you need add response as second argument in your set_user_agent method.


def set_user_agent(self, request, response):
        request.headers['User-Agent'] = self.user_agent
        return request

Answered By - ex4

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, March 14, 2022

[FIXED] TypeError: set_user_agent() takes 2 positional arguments but 3 were given

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels