Why do Docker overlay networks require consensus?

Just spent my afternoon reading up on Docker overlay networks, very cool stuff. I just can’t seem to find an answer to one thing.

According to the docs:

  • Create a volume in docker from windows host
  • How to run jar file using docker file in docker container
  • What is the container docker is trying to start?
  • Use Prometheus “target relabeling” to extract cAdvisor's Docker image name without tag
  • Issue with Jenkins pipeline script and docker maven image
  • Vagrant with docker provisioning
    • If you install and use Docker Swarm, you get overlay networks across your manager/worker hosts automagically, and don’t need to configure anything more; but
    • If you simply want a (non-Swarm) overlay network across multiple hosts, you need to configure that network with an external “KV Store” (consensus server) like Consul or ZooKeeper

    I’m wondering: why?!? Clearly, overlay networks require consensus amongst peers, but I’m not sure why or who those “peers” even are.

    And I’m just guessing that, with Swarm, there’s some internal/under-the-hood consensus server running out of the box. Yes? No? Thanks in advance!

  • Docker Update Secrets via API
  • Blocking of docker-compose up on ubuntu 16.04
  • Setting Team City Build Agent Port Number in Marathon
  • Why does GCR's container registry ignore the _catalog pagination parameters
  • Can't connect to Redis/Memcache from inside Docker Container
  • Set execution timeout limit while creating docker
  • One Solution collect form web for “Why do Docker overlay networks require consensus?”

    Swarm Mode uses Raft for it’s manager consensus with a built-in KV store. Before swarm mode, overlay networking was possible with third party KV stores. Overlay networking itself doesn’t require consensus, it just relies on whatever the KV store says regardless of the other nodes or even it’s own local state (I’ve found this out the hard way). The KV stores out there are typically setup with consensus for HA.

    The KV store tracks IP allocations to containers running on each host (IPAM). This allows docker to only allocate a given address once, and to know which docker host it needs to communicate with when you connect to a container running on another host. This needs to be external from any one docker host, and preferably in an HA configuration (like swarm mode’s consensus) so that it can continue to work even when some docker nodes are down.

    Overlay networking between docker nodes only involves the nodes that have containers on that overlay network. So once the IP is allocated and discovered, all the communication only happens between the nodes with the relevant containers. This is easy to see with swarm mode if you create a network and then list networks on a worker, it won’t be there. Once a container on that network gets scheduled, the network will appear. From docker, this reduces overhead of multi-host networking while also adding to the security of the architecture. The result looks like this graphic:

    Docker multi-host networking

    The raft consensus itself is only needed for leader election. Once a node is selected to be the leader and enough nodes remain to have consensus, only one node is writing to the KV store and maintaining the current state. Everyone else is a follower. This animation describes it better than I ever could.

    Lastly, you don’t need to setup an external KV store to use overlay networking outside of swarm mode services. You can implement swarm mode, configure overlay networks with the --attachable option, and run containers outside of swarm mode on that network as you would have with an external KV store. I’ve used this in the past as a transition state to get containers into swarm mode, where some were running with docker-compose and others had been deployed as a swarm stack.

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.