distributed wide and deep with tf.contrib.learn api example stuck on k8s

I am new to distributed tensorflow. I tried to run distributed wide-and-deep example on one node k8s cluster, but the worker tasks all stuck at INFO:tensorflow:Create CheckpointSaverHook.

Test in localhost and in docker are all OK.

  • controlling docker-machine (using NAT) outgoing port
  • How to properly start Docker inside Jenkins that is also running in Docker
  • docker run hello-world still fails, permission denied
  • node.JS application in docker container
  • Kubernetes v1.2.2 api-server dosen't start
  • Docker container won't access MySQL on host machine
  • Here is my code. https://github.com/zhoudongyan/wide-and-deep

    • docker version: 17.03.1-ce
    • k8s version: v1.6.3
    • tensorflow version: 1.1.0, python3
    • os: ubuntu 14.04 64bit

    Anyone know how to run it correctly? Thanks a lot!

  • Docker kill not working when executed in shell script
  • Static webpage on Nginx Docker Container Missing CSS
  • Jenkins + Docker - How To Deal With Versions
  • How to add more feature to a official postgres image from docker hub?
  • How does docker run differ from running a command from a shell within the container
  • Keepalived health check can't connect to 127.0.0.1
  • Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.