Issue
I have a spider in Scrapy and I want to check for bottlenecks. I also have a few classes feeding into the main Spider class. I want to use cProlifer to check for function execution times:
if __name__ == '__main__':
import pstats
import cProfile
from pstats import SortKey
cProfile.run("QuotesSpider(scrapy.Spider)", "output.dat")
with open('output_time.txt', 'w') as f:
p = pstats('output.dat', stream=f)
p.sort_stats('time').print_stats()
with open('output_calls.txt', 'w') as f :
p = pstats('output.dat', stream=f)
p.sort_stats('calls').print_stats()
where QuotesSpider(scrapy.Spider)
is the spider class. Understandably, when running the spider using scrapy crawl quotes
, I get the following error: NameError: name 'QuotesSpider' is not defined
.
How do I properly integrate cProfile with Scrapy? And is cProfile the best way to approach this, since Scrapy's requests are async?
Solution
It is a bit hidden, but you can actually run cProfile
with a standard scrapy
command from the command line e.g using your above example for inspiration
scrapy crawl spider --profile output.dat
and then you can just analyse the output as you have done above.
Answered By - tomjn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.