my coreos/fleet deployed service is dying and I can't tell why

I’m trying to deploy nsqlookupd using fleet on a brand shiny new coreos cluster in EC2. Here is my systemd unit file:

[Unit]
Description=nsqlookupd service
After=docker.service
Requires=docker.service

[Service]
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill nsqlookupd
ExecStartPre=-/usr/bin/docker rm nsqlookupd
ExecStart=/usr/bin/docker run -d --name=nsqlookupd -e BROADCAST_ADDRESS=$COREOS_PUBLIC_IPV4 -p 4160:4160 -p 4161:4161 mikedewar/nsqlookupd
ExecStartPost=/usr/bin/etcdctl set /nsqlookupd_broadcast_address $COREOS_PUBLIC_IPV4
ExecStop=/usr/bin/docker stop -t 1 nsqlookupd
ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address

I’ve verified the container works fine if I just run the ExecStart command. My docker logs just look like

  • Extend docker container
  • Check mem_limit within a docker container
  • How to run command during Docker build which requires a tty?
  • Collect only from STDERR when using Docker syslog logging driver
  • Dockerize ASP Classic on IIS
  • How to mount my local source file to docker container?
  • ~ $ docker logs nsqlookupd
    2014/08/08 02:23:58 nsqlookupd v0.2.29-alpha (built w/go1.2.2)
    2014/08/08 02:23:58 TCP: listening on [::]:4160
    2014/08/08 02:23:58 HTTP: listening on [::]:4161
    

    and my fleetctl journal looks like

    $ fleetctl journal nsqlookupd.service
    -- Logs begin at Sun 2014-08-03 12:49:00 UTC, end at Fri 2014-08-08 02:30:06 UTC. --
    Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Starting nsqlookupd service...
    Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: Error response from daemon: No such container: nsqlookupd
    Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: 2014/08/08 02:23:57 Error: failed to kill one or more containers
    Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: Error response from daemon: No such container: nsqlookupd
    Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: 2014/08/08 02:23:57 Error: failed to remove one or more containers
    Aug 08 02:23:57 ip-10-147-9-249 etcdctl[6157]: 54.198.93.169
    Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Started nsqlookupd service.
    Aug 08 02:23:57 ip-10-147-9-249 docker[6155]: 0fce4465f61c092541ba9d4c4e89ce13c4d6bedc096519034ed585d7adb5e0d7
    Aug 08 02:23:59 ip-10-147-9-249 docker[6194]: nsqlookupd
    

    both of which look just fine. But the container dies quietly, and my fleetctl list-units gives

    $ fleetctl list-units
    UNIT                STATE       LOAD    ACTIVE          SUB     DESC                MACHINE
    nsqlookupd.service  launched    loaded  deactivating    stop    nsqlookupd service  1320802c.../10.147.9.249
    

    Running docker images is a little worrying:

    $ docker images
    REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
    <none>                 <none>              8ef9d8f9d18d        9 minutes ago       710 MB
    mikedewar/nsqadmin     latest              432af572bda8        2 days ago          710 MB
    mikedewar/nsqd         latest              00bd4e474964        2 days ago          710 MB
    <none>                 <none>              adf0ed97208e        3 weeks ago         710 MB
    mikedewar/nsqlookupd   latest              2219c0e783d9        3 weeks ago         710 MB
    <none>                 <none>              35d2212f8932        3 weeks ago         710 MB
    mikedewar/nsq          latest              f9794fe056e1        3 weeks ago         710 MB
    busybox                latest              a9eb17255234        9 weeks ago         2.433 MB
    zmarcantel/cassandra   latest              b1168b45b4f8        4 months ago        738 MB
    

    as I’ve been updating mikedewar/nsqlookupd quite regularly over the last 3 weeks. Maybe that’s the time I first pushed something to docker hub? I’d love to know that the image I’m working with is the up-to-date one. I’ve tried docker rmi mikedewar/nsqlookupd followed by docker pull mikedewar/nsqlookupd but the CREATED column still says it was created 3 weeks ago.

    I don’t know if this is useful, but the ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address command seems to have worked – the etcdctl log line in the fleet journal suggests I managed to set the key to my IP, but after the container dies I can’t get that key from etcd.

    Any help on where to look next for clues, or any ideas why this is happening would be greatly appreciated! As is probably clear I’m rather new to this sort of thing…

  • Error cannot find -lz building MariaDB on a debian based container
  • docker-compose scale with sticky sessions
  • VS 2017 Add > Docker Support breaks build - “CleanWorkspace” task failed unexpectedly (console app)
  • Docker 500 error
  • debootstrap inside a docker container
  • Docker port conflict on Ubuntu
  • One Solution collect form web for “my coreos/fleet deployed service is dying and I can't tell why”

    You shouldn’t run docker containers in detached mode in a unit file. Your execstart contains it: ExecStart=/usr/bin/docker run -d. This will cause systemd to think the process exited immediately since it was forked into the background.

    As for managing versions, if you want to be absolutely sure you’re getting the latest copy, you should tag your containers and then pull mikedewar/nsqlookupd:1.2.3. You can increment this each time in your fleet unit file.

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.