docker swarm 1.2.0 reschedule with port mapping

I’am testing the brand new version of Docker Swarm 1.2.0 and expecially the rescheduling functionnality.

So, I have one EC2 VM with swarm manager installed and 2 swarm agents (on 2 other EC2 VM). I have an HTTP Rest service I deploy through swarm like this :

  • Windows Docker in PyCharm - How to locate files on my host computer?
  • Get docker container id from container name
  • Linking containers with Mesos JSON file
  • Protractor tests in Docker - Async callback was not invoked within timeout specified by jasmine.DEFAULT_TIMEOUT_INTERVAL
  • Eureka cluster and docker-compose: Registering with host even when I set preferIpAdress: true
  • 502 gateway error with meteor, browser policy, HTTP connecting to S3
  • docker -H :4000 run -d -p :81 -e reschedule:on-node-failure myTestService
    

    This command line works fine and deploy my test service on one node (node-1).
    If I run a docker ps I see my container deployed on node one :

    CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS                   NAMES
    23ce231b5737        myTestService               "/nodejs/bin/npm star"   3 minutes ago       Up 3 minutes        0.0.0.0:32768->81/tcp   distracted_sinoussi
    

    Look at the port mapping : 0.0.0.0:32768->81/tcp I let docker engine choose an available port on the host (32768).

    Now, if I shutdown node-1, swarm should reschedule my container. If I look in the swarm log I have this :

    time="2016-04-19T13:56:31Z" level=info msg="Initializing discovery without TLS"
    time="2016-04-19T13:56:31Z" level=info msg="Listening for HTTP" addr=":4000" proto=tcp
    time="2016-04-19T13:56:38Z" level=info msg="Registered Engine ip-node-1 at ip.node.1:2375"
    time="2016-04-19T13:56:45Z" level=info msg="Registered Engine ip-node -2 at ip.node.2:2375"
    time="2016-04-19T13:58:24Z" level=error msg="Flagging engine as unhealthy. Connect failed 3 times" id="ZSWT:XLYS:D2HA:K5J3:O32D:AFVT:HUNR:ENKI:MBTC:2PVA:JIC2:X74L" name= ip-node-1
    time="2016-04-19T13:58:24Z" level=error msg="Error monitoring events: unexpected EOF." id="ZSWT:XLYS:D2HA:K5J3:O32D:AFVT:HUNR:ENKI:MBTC:2PVA:JIC2:X74L" name= ip-node-1
    time="2016-04-19T13:58:24Z" level=error msg="Restart event monitoring." id="ZSWT:XLYS:D2HA:K5J3:O32D:AFVT:HUNR:ENKI:MBTC:2PVA:JIC2:X74L" name= ip-node-1
    time="2016-04-19T13:58:24Z" level=error msg="Error monitoring events: Get http://ip.node.1:2375/v1.15/events: dial tcp ip.node.1:2375: getsockopt: connection refused." id="ZSWT:XLYS:D2HA:K5J3:O32D:AFVT:HUNR:ENKI:MBTC:2PVA:JIC2:X74L" name=ip-node-1
    time="2016-04-19T13:58:24Z" level=info msg="Rescheduled container 23ce231b57375a386909175f3dcd730720429eb4ed41d4366d5add17a30d210e from  ip-node-1 to  ip-node-2 as c7fe68332bc61f0f4c498848e59d3e34b58821468ce65bd4ebc92055156d5b8c"
    

    On the last line, we can see that the container has been rescheduled on node-2. Fine, lets do a little docker ps command on node-2 :

    CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS               NAMES
    c7fe68332bc6        myTestService                nodejs/bin/npm star"      27 seconds ago         Created                                 sleepy_hopper
    

    So, the container is there but not running (just “created”) and the port mapping is empty.

    So what’s going wrong here?

    Thank you

  • Restart docker container from another container
  • Docker out of disk space on aufs although df shows plenty
  • Cannot (apt-get) install packages inside docker
  • Docker Rest API '404 NOT FOUND'
  • How to persist 'ln' in Docker with Ubuntu
  • How do I debug “No default robot”?
  • One Solution collect form web for “docker swarm 1.2.0 reschedule with port mapping”

    I think this is the expected behaviour. If you safely shutdown the node with something like shutdown -h now, the docker daemon running on that node is as well safely stoped. This means, that the last known state to the swarm manager is actually that your containers are stoped and that’s why they are not getting started on a new node.

    Try to kill the docker daemon on the node with a kill -9 (like it would actually happen on a true failure). The containers will be rescheduled and will be started on an other node.

    Tested with swarm 1.2.1

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.