Spark standalone cluster on docker in network “bridge”

My problem is for the connection between slaves from other node to the master.
I have 3 nodes setup as follow :

  • 1 node with the master and 1 worker launched on the same docker
  • 2 node with 1 worker each on docker

The docker-compose open theses ports :

  • spark-submit proxy host / port configuration not respected when deploy mode is cluster
  • Need for service discovery for docker engine swarm mode
  • Docker Clustering multiple proxies fronting multiple tomcats
  • Spark app unable to write to elasticsearch cluster running in docker
  • RabbitMQ Cluster does not elect new master
  • Does Spark Job Server have to be deployed on the same host as Spark Master?
  • version: '2'
        image: xxxxxxxx/spark
        tty: true
        stdin_open: true
        container_name: spark
         - /var/data/dockerSpark/:/var/data
     - "7077:7077"
     - ""
     - "7078:7078"
     - ""
     - ""
     - "4040:4040"
     - "18080:18080"
     - "6066:6066"
     - "9000:9000"

    The conf/ is as follow :

     #export STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx #This is the docker Ip adress on the node
     export SPARK_WORKER_MEMORY=7g
     export SPARK_WORKER_CORES=4
     export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=86400 -Dspark.worker.cleanup.appDataTtl=86400"

    My problem is for the connection between slaves from other node to the master, so i begin by starting master sbin/
    During my first attempts the 2 first lines was commented and the master started at this adress spark://c96____37fb:7077.
    I connected succesfully nodes using theses commands :

    • sbin/ spark://c96____37fb:7077 –port 7078 for the collocated slave
    • sbin/ spark://masterNodeIP:7077 –port 7078 for the two others slaves

    All the port cited previously are redirected from nodeMaster to the corresponding docker.

    So the webUI show me that my cluster had 3 connected nodes, unfortunately when it comes to run, only the collocated nodes was working, the two others continuously disconnect and reconnect to the application without doing anything.

    Next i tried to change STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx to the value of 1 the nodeMasterIP but the master doesn’t started and 2 by the address which is the docker ip adress inside masterNode. The 2nd attempt works and the webUi shows me the follow adress spark://172.xx.xx.xx:7077.
    Then the slaves connected succesfully but again the two external slaves do not show any sign of activity.


    Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers gives me a part of the answear but not the one i want because by adding network_mode: “host” to the docker-compose.yml i succeed to build my cluster at STANDALONE_SPARK_MASTER_HOST=ipNodeMaster and connect slaves to it. Execution was OK but stopped at a collect operation with this error org.apache.spark.shuffle.FetchFailedException: Failed to connect to xxx/yy.yy.yy.yy:36801 which seems to be a port issue.

    But my real concern is that i don’t want to run the spark master docker on the localhost of the masterNode but on its own docker network (“bridge”).

    Thank you for your wises advices !

  • Accessing tag as an environment variable inside a Docker container
  • Starting mysql container using docker machine with virtual box shared folder
  • Is Docker a replacement for git source control? [closed]
  • AWS ECS container exiting without specific reason
  • Unable to locate runnable browser in Docker container
  • Upgrade of docker gitlab image to 8.6 breaks gitlab
  • Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.