Issue
I'm learning python scraping with scrapy. I did exacly the same thing as the tutorial teaches. But I got an error. Please help!
My Python code:
import scrapy
class BookSpider(scrapy.Spider):
name = "books"
allowed_domains = ["books.toscrape.com"]
start_urls = ["https://books.toscrape.com"]
def parse(self, response):
books = response.css("article.product_pod")
for book in books:
yield{
"name":book.css("h3 a::text").get(),
"price":book.css(".product_price .price_color::text").get(),
"url": book.css("h3 a").attrib["href"],
}
The terminal shows
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\Administrator\python\venv\bookscraper\Scripts\scrapy.exe\__main__.py", line 7, in <module>
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 161, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 114, in _run_print_help
func(*a, **kw)
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\cmdline.py", line 169, in _run_command
cmd.run(args, opts)
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\commands\crawl.py", line 30, in run
self.crawler_process.start()
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\crawler.py", line 390, in start
install_shutdown_handlers(self._signal_shutdown)
File "C:\Users\Administrator\python\venv\bookscraper\Lib\site-packages\scrapy\utils\ossignal.py", line 19, in install_shutdown_handlers reactor._handleSignals()
^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncioSelectorReactor' object has no attribute '_handleSignals'
The ossignal.py file:
import signal
signal_names = {}
for signame in dir(signal):
if signame.startswith("SIG") and not signame.startswith("SIG_"):
signum = getattr(signal, signame)
if isinstance(signum, int):
signal_names[signum] = signame
def install_shutdown_handlers(function, override_sigint=True):
"""Install the given function as a signal handler for all common shutdown
signals (such as SIGINT, SIGTERM, etc). If override_sigint is ``False`` the
SIGINT handler won't be install if there is already a handler in place
(e.g. Pdb)
"""
from twisted.internet import reactor
reactor._handleSignals()
signal.signal(signal.SIGTERM, function)
if signal.getsignal(signal.SIGINT) == signal.default_int_handler or override_sigint:
signal.signal(signal.SIGINT, function)
# Catch Ctrl-Break in windows
if hasattr(signal, "SIGBREAK"):
signal.signal(signal.SIGBREAK, function)
Solution
As pointed out in my comment, the issue you are describing is already being tackled by scrapy here and has to do with one of its dependencies, twisted (a day ago, a new version was released, 23.8.0
, which seems to cause the issue).
Another user fixed the issue by installing a previous version of twisted (see here).
Basically, he installed the following version of twisted, which fixed his issue.
pip install Twisted==22.10.0
Until the issue is fixed and a new version is released, I suggest using the previous version.
Answered By - Builditluc
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.