Why do Docker overlay networks require consensus?

Just spent my afternoon reading up on Docker overlay networks, very cool stuff. I just can’t seem to find an answer to one thing.

According to the docs:

  • Centos VM with Docker getting host unreachable when trying to connect to itself
  • GCE persistent disk, kubernetes, and data persistence
  • Docker rancher/agent won't start
  • python2 interpreter in Docker does not support remote project creation with PyCharm
  • kube-proxy not redirect to correct node
  • Docker WordPress Container Exits after ~30 seconds
    • If you install and use Docker Swarm, you get overlay networks across your manager/worker hosts automagically, and don’t need to configure anything more; but
    • If you simply want a (non-Swarm) overlay network across multiple hosts, you need to configure that network with an external “KV Store” (consensus server) like Consul or ZooKeeper

    I’m wondering: why?!? Clearly, overlay networks require consensus amongst peers, but I’m not sure why or who those “peers” even are.

    And I’m just guessing that, with Swarm, there’s some internal/under-the-hood consensus server running out of the box. Yes? No? Thanks in advance!

  • Unable to connect to mySql running on my host from docker container
  • Host unreachable inside Docker container
  • How to clean docker devicemapper folder properly ?
  • Strange behaviour mounting Windows drive in docker container
  • MongoDB, Docker, Meteor: Connection Refused
  • How to run install pip requirements from private repo in google cloud source?
  • One Solution collect form web for “Why do Docker overlay networks require consensus?”

    Swarm Mode uses Raft for it’s manager consensus with a built-in KV store. Before swarm mode, overlay networking was possible with third party KV stores. Overlay networking itself doesn’t require consensus, it just relies on whatever the KV store says regardless of the other nodes or even it’s own local state (I’ve found this out the hard way). The KV stores out there are typically setup with consensus for HA.

    The KV store tracks IP allocations to containers running on each host (IPAM). This allows docker to only allocate a given address once, and to know which docker host it needs to communicate with when you connect to a container running on another host. This needs to be external from any one docker host, and preferably in an HA configuration (like swarm mode’s consensus) so that it can continue to work even when some docker nodes are down.

    Overlay networking between docker nodes only involves the nodes that have containers on that overlay network. So once the IP is allocated and discovered, all the communication only happens between the nodes with the relevant containers. This is easy to see with swarm mode if you create a network and then list networks on a worker, it won’t be there. Once a container on that network gets scheduled, the network will appear. From docker, this reduces overhead of multi-host networking while also adding to the security of the architecture. The result looks like this graphic:

    Docker multi-host networking

    The raft consensus itself is only needed for leader election. Once a node is selected to be the leader and enough nodes remain to have consensus, only one node is writing to the KV store and maintaining the current state. Everyone else is a follower. This animation describes it better than I ever could.

    Lastly, you don’t need to setup an external KV store to use overlay networking outside of swarm mode services. You can implement swarm mode, configure overlay networks with the --attachable option, and run containers outside of swarm mode on that network as you would have with an external KV store. I’ve used this in the past as a transition state to get containers into swarm mode, where some were running with docker-compose and others had been deployed as a swarm stack.

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.