distributed wide and deep with tf.contrib.learn api example stuck on k8s

I am new to distributed tensorflow. I tried to run distributed wide-and-deep example on one node k8s cluster, but the worker tasks all stuck at INFO:tensorflow:Create CheckpointSaverHook.

Test in localhost and in docker are all OK.

  • using ansible for provisioning docker containers
  • how to run Docker in Travis hosted in travis-ci.com
  • Docker, varnish, Connection reset by peer
  • ASP core HttpClient.Get From container to localhost site is failing
  • How to connect to Cassandra in Docker
  • How can I make Atifactory docker registry images use docker manifest version 2?
  • Here is my code. https://github.com/zhoudongyan/wide-and-deep

    • docker version: 17.03.1-ce
    • k8s version: v1.6.3
    • tensorflow version: 1.1.0, python3
    • os: ubuntu 14.04 64bit

    Anyone know how to run it correctly? Thanks a lot!

  • How to start a mongodb shell in docker container?
  • Vagrant & Docker: The container started never left the “stopped” state
  • Docker: Why does my home directory disappear after the build?
  • How do you define a network in a version 2 docker-compose definition file?
  • Getting wordpress docker to run on local host
  • Is it possible to run multiple docker/lxc/lxd container based on one base container?
  • Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.