How to connect Docker Swarm to multiple consul servers for failover?

I am running docker swarm with consul. I have a consul cluster made with 3 nodes connected to each other for failover. The issue is that I can only connect my swarm workers and masters to a single node and if that node goes down then swarm stops working. So how can I connect swarm workers and masters to all my nodes? The following commands if run from the master will set up my swarm environment connected to a single consul server:

#### REFERENCE
# {{master_i}} is the IP address of the master server
# {{consul_i}} is the IP address of the consul server
# {{worker_i}} is the IP address of a worker server


#### START THE MASTER
docker run --restart=unless-stopped --name=swarm-manager0 -d -p 4000:4000 swarm manage -H :4000 --replication \
--advertise {{master_0}}:4000 \
consul://{{consul_0}}:8500

#### START THE WORKERS REMOTELY FROM THE MASTER
docker -H={{worker_0}}:2375 run -d --restart=unless-stopped --name=swarm-worker0 swarm join \
--advertise={{worker_0}}:2375 \
consul://{{consul_0}}:8500/

docker -H={{worker_1}}:2375 run -d --restart=unless-stopped --name=swarm-worker1 swarm join \
--advertise={{worker_1}}:2375 \
consul://{{consul_0}}:8500/

docker -H={{worker_2}}:2375 run -d --restart=unless-stopped --name=swarm-worker2 swarm join \
--advertise={{worker_2}}:2375 \
consul://{{consul_0}}:8500/

#### START THE WORKER SERVICE DISCOVERY
docker -H={{worker_0}}:2375 run -d --restart=unless-stopped \
-h {{worker_0}} --name registrator0 -v /var/run/docker.sock:/tmp/docker.sock gliderlabs/registrator \
consul://{{consul_0}}:8500

docker -H={{worker_1}}:2375 run -d --restart=unless-stopped \
-h {{worker_1}} --name registrator1 -v /var/run/docker.sock:/tmp/docker.sock gliderlabs/registrator \
consul://{{consul_0}}:8500

docker -H={{worker_2}}:2375 run -d --restart=unless-stopped \
-h {{worker_2}} --name registrator2 -v /var/run/docker.sock:/tmp/docker.sock gliderlabs/registrator \
consul://{{consul_0}}:8500

Note that simply adding two extra consul://{{consul_i}}:8500 (for the other two consul servers) to the end of each docker run command will not connect the containers to the other consul servers.

  • Docker image with R (rocker/r.base) and python does not work when running on EC2, but local is fine
  • Docker compose volume mapping with NodeJS app
  • Setting specific mac address in docker
  • Development workflow for server and client using Docker Compose?
  • Building a Docker file
  • Cassandra in Docker unable to make directory on mounted volume
  • Is there a way to watch the change of service`tasks in docker swarm mode?
  • Update docker image resulting in orphan image
  • Which hostname to choose from a group of zookeepers
  • Fail over with Docker Compose, two approaches
  • Manually running container exists with code 145
  • Docker - only image with operating system?
  • 4 Solutions collect form web for “How to connect Docker Swarm to multiple consul servers for failover?”

    According to @slugonamission there is no way to connect swarm to the multiple IP addresses of multiple consul servers.

    However I was able to create an haproxy load balancer that sat in front of my consul servers. So my load balancer forwarded all traffic from my load balancers port 8500 on to port 8500 on all of my consul servers. By doing this I was able to use the IP address of my load balancer in place of {{CONSUL0}}. Heres my pretty basic haproxy.cfg

    # $CONSUL0 $CONSUL0 and $CONSUL0 are the IP addresses of my consul servers
    
    global
        log 127.0.0.1 local0 notice
        maxconn 2000
        user haproxy
        group haproxy
    
    defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        timeout connect  5000
        timeout client  10000
        timeout server  10000
    
    listen appname 0.0.0.0:8500
        mode http
        stats enable
        stats uri /haproxy?stats
        stats realm Strictly\ Private
        stats auth ubuntu
        balance roundrobin
        option httpclose
        option forwardfor
        server consul0 $CONSUL0:8500 check
        server consul1 $CONSUL1:8500 check
        server consul2 $CONSUL2:8500 check
    

    After making the change my consul servers can individually go down and swarm will continue working.

    There doesn’t actually seem to be a way to do this straight out of the box; Swarm eventually (via libkv) gets down the the Consul HTTP API, which only connects to the single specified endpoint. Worse, libkv will throw an exception if multiple Consul hosts are passed.

    There is a way you can achieve this with some more work though. If you start a Consul agent on each node running Swarm and join them to one of the Consul servers, they will learn about the state of the cluster. If you then specify the address of the Consul agent as Swarm’s discovery service, then the Consul agent will forward the request to one of the functioning Consul servers.

    Another solution is to just run a consul client on every server you want to run a swarm worker. Then when you create your swarm workers just have them bind themselves to the consul agent running on the local machine:

    docker run -d --restart=unless-stopped --name=swarm-client \
      swarm join \
      --advertise=$(hostname -i):2375 \
      consul://$(hostname -i):8500/
    

    Note that this will cause swarm to break if consul dies on the local machine.

    If you are deploying in AWS, you may register the consul server nodes behind an ELB and then point swarm managers/nodes to the ELB DNS

    Alternatively, run a consul client agent on all of swarm host machines and point your swarm managers/nodes to the consul agent, i.e docker0 interface IP, 172.17.0.1:8500

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.