distributed wide and deep with tf.contrib.learn api example stuck on k8s

I am new to distributed tensorflow. I tried to run distributed wide-and-deep example on one node k8s cluster, but the worker tasks all stuck at INFO:tensorflow:Create CheckpointSaverHook.

Test in localhost and in docker are all OK.

  • Here is my code. https://github.com/zhoudongyan/wide-and-deep

    • docker version: 17.03.1-ce
    • k8s version: v1.6.3
    • tensorflow version: 1.1.0, python3
    • os: ubuntu 14.04 64bit

    Anyone know how to run it correctly? Thanks a lot!

