How to run HDFS cluster without DNS

I’m building a local HDFS dev environment (actually hadoop + mesos + zk + kafka) to ease development of Spark jobs and facilitate local integrated testing.
All other components are working fine but I’m having issues with HDFS. When the Data Node tries to connect to the name node, I get a DisallowedDataNodeException:

org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode

Most questions related to the same issue boil down to name resolution of the data node at the name node either static through the etc/hosts files or using dns. Static resolution is not an option with docker, as I don’t know the data nodes when the name node container is created. I would like to avoid creating and maintaining an additional DNS service. Ideally, I would like to wire everything using the --link feature of docker.

  • Bind physical NICs to containers for docker
  • Does a docker container have its own TCP/IP stack?
  • CircleCI './docker-compose.yml' service 'version' doesn't have any configuration options
  • Login docker container get return info:cannot set user id: Resource temporarily unavailable
  • MySQL container: grant access and create a new image
  • Amazon S3 + Docker - “403 Forbidden: The difference between the request time and the current time is too large”
  • Is there a way to configure HDFS in such a way that it only uses IP addresses to work?

    I found this property and set to false, but it didn’t do the trick:

    dfs.namenode.datanode.registration.ip-hostname-check (default: true)

    Is there a way to have a multi-node local HDFS cluster working only using IP addresses and without using DNS?

  • Is it possible to install a complex server inside a Docker container?
  • Docker /dev/mapper permission
  • ERROR: repository hyperledger/fabric-tools not found: does not exist or no pull access
  • mount /var/log in docker-compose
  • docker-compose creates file owned by root
  • Rebuild Docker container on file changes
  • One Solution collect form web for “How to run HDFS cluster without DNS”

    I would look at reconfiguring your Docker image to use a different hosts file [1]. In particular:

    1. In the Dockerfile(s), do the switch-a-roo [1]
    2. Bring up the master node
    3. Bring up the data nodes, linked
    4. Before starting the datanode, copy over the /etc/hosts to the new location, /tmp/hosts
    5. Append the master node’s name and master node ip to the new hosts file

    Hope this works for you!

    [1] https://github.com/dotcloud/docker/issues/2267#issuecomment-40364340

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.