Issue
I have a flask app which will run a scrapy spider. The app works fine in my developement machine however when I run it in container the close method of the spider is not executed.
Here is the code to the spider:
# -*- coding: utf-8 -*-
import scrapy
from bs4 import BeautifulSoup
from scrapy.exceptions import CloseSpider
class ToScrapeCSSSpider(scrapy.Spider):
name = "toscrape-css"
start_urls = [
'http://quotes.toscrape.com/',
]
def parse(self, response):
page_text = response.text
# raise CloseSpider("Blocked")
soup = BeautifulSoup(page_text, "lxml")
if "xml" in str.lower(page_text[:20]):
sitemap = True
links = soup.findAll("loc")
for link in links:
yield scrapy.Request(url=link.text, callback=self.parse)
else:
raise CloseSpider("I want to close it")
def close(spider, reason):
print("Closing spider")
# self.pbar.clear()
# self.pbar.write('Closing {} spider'.format(spider.name))
print("Spider closed")
Here is my flask app in the main.py:
import crochet
crochet.setup() # initialize crochet
import json
import pandas as pd
from flask import redirect, url_for, request
from scrapy.crawler import CrawlerRunner, CrawlerProcess
import time
from datetime import datetime, timedelta
import grequests
from flask import render_template, jsonify, Flask, redirect, url_for, request, flash
from app2.articles_finder.spiders.test_spider import ToScrapeCSSSpider
from app2 import app2
@app2.route("/test_docker")
def test_docker():
scrap_docker()
return "Ok",200
@crochet.run_in_reactor
def scrap_docker():
eventual = crawl_runner.crawl(ToScrapeCSSSpider)
eventual.addCallback(finished_docker)
def finished_docker(null):
print("Scrapping is over in docker container")
And finally her is my docker file:
FROM phusion/baseimage:0.9.19
# Use baseimage-docker's init system.
CMD ["/sbin/my_init"]
ENV TERM=xterm-256color
ENV SCRAPPER_HOME=/app/links_finder
ENV PYTHON_VERSION="3.6.5"
ENV FRONT_ADDRESS = blabla
# Set the locale
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
# Install necessary packages
RUN apt-get update && apt-get install -y \
build-essential
#RUN apt-get update && apt-get install -y \
# build-essential \
# Install core packages
#RUN apt-get update
RUN apt-get install -y build-essential checkinstall software-properties-common llvm cmake wget git nano nasm yasm zip unzip pkg-config \
libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev
# Install Python 3.6.5
RUN wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz \
&& tar xvf Python-${PYTHON_VERSION}.tar.xz \
&& rm Python-${PYTHON_VERSION}.tar.xz \
&& cd Python-${PYTHON_VERSION} \
&& ./configure \
&& make altinstall \
&& cd / \
&& rm -rf Python-${PYTHON_VERSION}
RUN apt-get install -y python3-pip
WORKDIR ${SCRAPPER_HOME}
COPY . ${SCRAPPER_HOME}
RUN ls
#COPY run_gunicorn_app_2.py ${SCRAPPER_HOME}
RUN pip3 install -r requirements2.txt
RUN chmod 777 -R *
# Clean up
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py
EXPOSE 3456
ENTRYPOINT python3 run_gunicorn_app_2.py
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py
The requirements2.txt file:
tqdm==4.19.4
APScheduler ==3.6.1
Flask==1.0.2
Flask-Admin==1.3.0
Flask-Bcrypt==0.7.1
Flask-DebugToolbar==0.10.0
Flask-Login==0.3.2
Flask-Mail==0.9.1
Flask-Script==2.0.5
Flask-SQLAlchemy==2.1
Flask-WTF==0.12
Flask-redis==0.4.0
gunicorn==19.4.5
itsdangerous==0.24
pytz==2016.10
structlog==16.1.0
termcolor==1.1.0
WTForms==2.1
scrapy==1.6.0
grequests==0.4.0
#pandas==0.24
crochet==1.10.0
redis==3.3.8
beautifulsoup4==4.7.1
publicsuffixlist==0.7.1
PyMySQL==0.9.3
When I run the docker container this is what I am getting:
Clearly:close method is not executed at all. Any hints? I have been stuck with this problem for quite some time so any cluses will be more than welcome. Thank you!
Solution
After lots of debugging, it seemed in the end that were no issues there. I just needed to add -u after python3 to add logging.
ENTRYPOINT python3 -u run_gunicorn_app_2.py
Answered By - chiplusplus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.