Spark standalone cluster on docker in network “bridge”
My problem is for the connection between slaves from other node to the master.
I have 3 nodes setup as follow :
- 1 node with the master and 1 worker launched on the same docker
- 2 node with 1 worker each on docker
The docker-compose open theses ports :
version: '2' services: spark: image: xxxxxxxx/spark tty: true stdin_open: true container_name: spark volumes: - /var/data/dockerSpark/:/var/data ports: - "7077:7077" - "127.0.0.1:8080:8080" - "7078:7078" - "127.0.0.1:8081:8081" - "127.0.0.1:9010:9010" - "4040:4040" - "18080:18080" - "6066:6066" - "9000:9000"
The conf/spark-env.sh is as follow :
#export STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx #This is the docker Ip adress on the node #export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST export SPARK_WORKER_MEMORY=7g export SPARK_EXECUTOR_MEMORY=6G export SPARK_WORKER_CORES=4 export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=86400 -Dspark.worker.cleanup.appDataTtl=86400"
My problem is for the connection between slaves from other node to the master, so i begin by starting master sbin/start-master.sh.
During my first attempts the 2 first lines was commented and the master started at this adress spark://c96____37fb:7077.
I connected succesfully nodes using theses commands :
- sbin/start-slave.sh spark://c96____37fb:7077 –port 7078 for the collocated slave
- sbin/start-slave.sh spark://masterNodeIP:7077 –port 7078 for the two others slaves
All the port cited previously are redirected from nodeMaster to the corresponding docker.
So the webUI show me that my cluster had 3 connected nodes, unfortunately when it comes to run, only the collocated nodes was working, the two others continuously disconnect and reconnect to the application without doing anything.
Next i tried to change STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx to the value of 1 the nodeMasterIP but the master doesn’t started and 2 by the 172.xxx address which is the docker ip adress inside masterNode. The 2nd attempt works and the webUi shows me the follow adress spark://172.xx.xx.xx:7077.
Then the slaves connected succesfully but again the two external slaves do not show any sign of activity.
Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers gives me a part of the answear but not the one i want because by adding network_mode: “host” to the docker-compose.yml i succeed to build my cluster at STANDALONE_SPARK_MASTER_HOST=ipNodeMaster and connect slaves to it. Execution was OK but stopped at a collect operation with this error org.apache.spark.shuffle.FetchFailedException: Failed to connect to xxx/yy.yy.yy.yy:36801 which seems to be a port issue.
But my real concern is that i don’t want to run the spark master docker on the localhost of the masterNode but on its own docker network (“bridge”).
Thank you for your wises advices !