Spark standalone cluster on docker in network “bridge”

My problem is for the connection between slaves from other node to the master.
I have 3 nodes setup as follow :

  • 1 node with the master and 1 worker launched on the same docker
  • 2 node with 1 worker each on docker

The docker-compose open theses ports :

  • Cannot start Spark in Docker container
  • Failed to connect to master, spark in docker
  • How to set up autoscaling RabbitMQ Cluster AWS
  • Trouble accessing Kubernetes endpoints
  • Install docker on a compute cluster with a shared file-system
  • Clustering Docker containers behind load-balancers for HA
  • version: '2'
    services:
      spark:
        image: xxxxxxxx/spark
        tty: true
        stdin_open: true
        container_name: spark
        volumes:
         - /var/data/dockerSpark/:/var/data
    ports:
     - "7077:7077"
     - "127.0.0.1:8080:8080"
     - "7078:7078"
     - "127.0.0.1:8081:8081"
     - "127.0.0.1:9010:9010"
     - "4040:4040"
     - "18080:18080"
     - "6066:6066"
     - "9000:9000"
    

    The conf/spark-env.sh is as follow :

     #export STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx #This is the docker Ip adress on the node
     #export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST
     export SPARK_WORKER_MEMORY=7g
     export SPARK_EXECUTOR_MEMORY=6G
     export SPARK_WORKER_CORES=4
     export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=86400 -Dspark.worker.cleanup.appDataTtl=86400"
    

    My problem is for the connection between slaves from other node to the master, so i begin by starting master sbin/start-master.sh.
    During my first attempts the 2 first lines was commented and the master started at this adress spark://c96____37fb:7077.
    I connected succesfully nodes using theses commands :

    • sbin/start-slave.sh spark://c96____37fb:7077 –port 7078 for the collocated slave
    • sbin/start-slave.sh spark://masterNodeIP:7077 –port 7078 for the two others slaves

    All the port cited previously are redirected from nodeMaster to the corresponding docker.

    So the webUI show me that my cluster had 3 connected nodes, unfortunately when it comes to run, only the collocated nodes was working, the two others continuously disconnect and reconnect to the application without doing anything.

    Next i tried to change STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx to the value of 1 the nodeMasterIP but the master doesn’t started and 2 by the 172.xxx address which is the docker ip adress inside masterNode. The 2nd attempt works and the webUi shows me the follow adress spark://172.xx.xx.xx:7077.
    Then the slaves connected succesfully but again the two external slaves do not show any sign of activity.

    Edit

    Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers gives me a part of the answear but not the one i want because by adding network_mode: “host” to the docker-compose.yml i succeed to build my cluster at STANDALONE_SPARK_MASTER_HOST=ipNodeMaster and connect slaves to it. Execution was OK but stopped at a collect operation with this error org.apache.spark.shuffle.FetchFailedException: Failed to connect to xxx/yy.yy.yy.yy:36801 which seems to be a port issue.

    But my real concern is that i don’t want to run the spark master docker on the localhost of the masterNode but on its own docker network (“bridge”).

    Thank you for your wises advices !

  • Docker: cannot open port 8080
  • Change docker log messages location
  • I have multiple flask microservices that all communicate with each other, how would I configure docker?
  • How to run an application as 100 services with docker? [closed]
  • docker: installing a node.js application has issues, since docker runs as root
  • How to test and deploy a Docker production image?
  • Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.