exited: scrapy (exit status 0; not expected)

I try to run bash script that launch many spiders in my Docker container.
My supervisor.conf that placed in “/etc/supervisor/conf.d/” looke like that:

[program:scrapy]                                                            
command=/tmp/start_spider.sh
autorestart=false
startretries=0
stderr_logfile=/tmp/start_spider.err.log
stdout_logfile=/tmp/start_spider.out.log

but supervisor return this errors:

  • Docker for mac not starting after latest version upgrade Version 1.12.1 (build: 12133) 2d5b4d9c3daa089e3869e6355a47dd96dbf39856
  • docker-compose: define mount for bind mount and managed mount
  • docker-compose for mediawiki with data only containers
  • I am not able to configure Docker VM for Lightbend ConductR. Sandbox init is throwing FileNotFoundError
  • Rebuild and ReRun a DockerContainer
  • Possible to http clone repo to Docker Container?
  • 2015-08-21 10:50:30,466 CRIT Supervisor running as root (no user in
    config file)

    2015-08-21 10:50:30,466 WARN Included extra file
    “/etc/supervisor/conf.d/tor.conf” during parsing

    2015-08-21 10:50:30,478 INFO RPC interface ‘supervisor’ initialized

    2015-08-21 10:50:30,478 CRIT Server ‘unix_http_server’ running without
    any HTTP authentication checking

    2015-08-21 10:50:30,478 INFO supervisord started with pid 5

    2015-08-21 10:50:31,481 INFO spawned: ‘scrapy’ with pid 8

    2015-08-21 10:50:31,555 INFO exited: scrapy (exit status 0; not
    expected)

    2015-08-21 10:50:32,557 INFO gave up: scrapy entered FATAL state, too
    many start retries too quickly

    And my program stop to running. But if I manually run my program , it works very well …

    How to resolve this ? any ideas?

  • How to run command during Docker build which requires a tty?
  • Can't connect to docker-machine container on digitalocean
  • Dokku on Ubuntu refusing connection
  • Unable to Register Chaincode & Successfully Start Blockchain Network
  • What is the difference between CMD and ENTRYPOINT in a Dockerfile?
  • docker is great for run-anywhere but what about the machines to host docker?
  • 2 Solutions collect form web for “exited: scrapy (exit status 0; not expected)”

    I found the solution to my problem. For the supervisor.conf, change

    [program:scrapy]                                                       
            command=/tmp/start_spider.sh
            autorestart=false
            startretries=0
    

    by:

    [program:scrapy]
    command=/bin/bash -c "exec /tmp/start_spider.sh > /dev/null 2>&1 -DFOREGROUND"
    autostart=true
    autorestart=false
    startretries=0
    

    here is my code:

    start_spider.sh

    #!/bin/bash
    
    # list letter
    parseLetter=('a' 'b')
    
    
    # change path
    cd $path/scrapy/scrapyTodo/scrapyTodo
    
    tLen=${#parseLetter[@]}
    for (( i=0; i<${tLen}; i++ ));
    do
        scrapy crawl root -a alpha=${parseLetter[$i]} &
    done
    

    here is my scrapy code:

    #!/usr/bin/python -tt
    # -*- coding: utf-8 -*-
    
    from scrapy.selector import Selector
    from elasticsearch import Elasticsearch
    from scrapy.contrib.spiders import CrawlSpider
    from scrapy.http import Request
    from urlparse import urljoin
    from bs4 import BeautifulSoup
    from scrapy.spider import BaseSpider
    from bs4 import BeautifulSoup
    from tools import sendEmail
    from tools import ElasticAction
    from tools import runlog
    from scrapy import signals
    from scrapy.xlib.pydispatch import dispatcher
    from datetime import datetime
    import re
    
    class studentCrawler(BaseSpider):
        # Crawling Start
        CrawlSpider.started_on = datetime.now()
    
        name = "root"
    
    
        DOWNLOAD_DELAY = 0
    
        allowed_domains = ['website.com']
    
        ES_Index = "website"
        ES_Type = "root"
        ES_Ip = "127.0.0.1"
    
        child_type = "level1"
    
        handle_httpstatus_list = [404, 302, 503, 999, 200] #add any other code you need
    
        es = ElasticAction(ES_Index, ES_Type, ES_Ip)
    
        # Init
        def __init__(self, alpha=''):
    
            base_domain = 'https://www.website.com/directory/student-' + str(alpha) + "/"
    
            self.start_urls = [base_domain]
            super(CompanyCrawler, self).__init__(self.start_urls)
    
    
        def is_empty(self, any_structure):
            """
            Function that allow to check if the data is empty or not
            :arg any_structure: any data
            """
            if any_structure:
                return 1
            else:
                return 0
    
        def parse(self, response):
            """
            main method that parse the web page
            :param response:
            :return:
            """
    
            if response.status == 404:
                self.es.insertIntoES(response.url, "False")
            if str(response.status) == "503":
                self.es.insertIntoES(response.url, "False")
            if response.status == 999:
                self.es.insertIntoES(response.url, "False")
    
            if str(response.status) == "200":
                # Selector
                sel = Selector(response)
    
                self.es.insertIntoES(response.url, "True")
                body = self.getAllTheUrl('u'.join(sel.xpath(".//*[@id='seo-dir']/div/div[3]").extract()).strip(),response.url )
    
    
        def getAllTheUrl(self, data, parent_id):
            dictCompany = dict()
            soup = BeautifulSoup(data,'html.parser')
            for a in soup.find_all('a', href=True):
                self.es.insertChildAndParent(self.child_type, str(a['href']), "False", parent_id)
    

    I discovered that BeautifulSoup not working When the spiders are Launched by supervisor….

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.