Connect spark master to spark slave through docker compose

I’m using gettyimages as the spark master container, while at the same time I’ve got a spark image that’s going to be launching a slave node. Here is my corresponding docker file.

FROM debian:jessie

RUN apt-get update \
 && apt-get install -y locales \
 && dpkg-reconfigure -f noninteractive locales \
 && locale-gen C.UTF-8 \
 && /usr/sbin/update-locale LANG=C.UTF-8 \
 && echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
 && locale-gen \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

# Users with other locales should set this in their derivative image
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

RUN apt-get update \
 && apt-get install -y curl unzip \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*


# JAVA
ARG JAVA_MAJOR_VERSION=8
ARG JAVA_UPDATE_VERSION=92
ARG JAVA_BUILD_NUMBER=14
ENV JAVA_HOME /usr/jdk1.${JAVA_MAJOR_VERSION}.0_${JAVA_UPDATE_VERSION}

ENV PATH $PATH:$JAVA_HOME/bin
RUN curl -sL --retry 3 --insecure \
  --header "Cookie: oraclelicense=accept-securebackup-cookie;" \
  "http://download.oracle.com/otn-pub/java/jdk/${JAVA_MAJOR_VERSION}u${JAVA_UPDATE_VERSION}-b${JAVA_BUILD_NUMBER}/server-jre-${JAVA_MAJOR_VERSION}u${JAVA_UPDATE_VERSION}-linux-x64.tar.gz" \
  | gunzip \
  | tar x -C /usr/ \
  && ln -s $JAVA_HOME /usr/java \
  && rm -rf $JAVA_HOME/man

# HADOOP
ENV HADOOP_VERSION 2.7.2
ENV HADOOP_HOME /usr/hadoop-$HADOOP_VERSION
ENV HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
ENV PATH $PATH:$HADOOP_HOME/bin
RUN curl -sL --retry 3 \
  "http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz" \
  | gunzip \
  | tar -x -C /usr/ \
 && rm -rf $HADOOP_HOME/share/doc \
 && chown -R root:root $HADOOP_HOME

# SPARK
ENV SPARK_VERSION 2.0.1
ENV SPARK_PACKAGE spark-${SPARK_VERSION}-bin-without-hadoop
ENV SPARK_HOME /usr/spark-${SPARK_VERSION}
ENV SPARK_DIST_CLASSPATH="$HADOOP_HOME/etc/hadoop/*:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/hdfs/lib/*:$HADOOP_HOME/share/hadoop/hdfs/*:$HADOOP_HOME/share/hadoop/yarn/lib/*:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/tools/lib/*"
ENV PATH $PATH:${SPARK_HOME}/bin
RUN curl -sL --retry 3 \
  "http://d3kbcqa49mib13.cloudfront.net/${SPARK_PACKAGE}.tgz" \
  | gunzip \
  | tar x -C /usr/ \
 && mv /usr/$SPARK_PACKAGE $SPARK_HOME \
 && chown -R root:root $SPARK_HOME

WORKDIR $SPARK_HOME 
CMD ["bin/spark-class","org.apache.spark.deploy.worker.Worker", //TODO: Figure out what this should be]

I’m wondering how could I get the slave to be able to access the masters host and port if I’m setting this up through docker compose.

  • Docker Ubuntu 12.04
  • Docker - pull from docker repo fails (EOF / 403) but download from RH repo works
  • Issue in deploying a simple docker image to openshift
  • Error running quickstart-camelservlet in OpenShift V3 with Fabric8 and Docker
  • Node npm test to seeded postgres, Docker network container seeing varying results
  • How do I extend a default docker image command without interrupting the default behaviour
  • akka cluster nodes behind NAT (using docker)
  • Can a Docker Task in a Swarm know which instance number or how many total instances there are?
  • Combining Chef And Docker
  • Docker: Error creating aufs mounts: Invalid argument (Booting from live usb)
  • Why are certain operations on mounted volume in a Docker container really slow?
  • Docker is restarting again and agin
  • One Solution collect form web for “Connect spark master to spark slave through docker compose”

    Lets say you have docker-compose something like:

    version: '2'
    services:
      spark-master:
       image: spark-master
       ports:
       - "7077:7077"
       - "8080:8080"
      spark-slave1:
       image: spark-slave
       ports:
       - "8081:8081"
       depends_on:
       - spark-master
    

    in the Dockerfile for your spark slave you need to define master like:
    ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT

    but using IP address is not very good idea. So you can use hostname from docker-compose (spark-master is the hostname and it added to /etc/hosts):

    CMD ["bin/spark-class","org.apache.spark.deploy.worker.Worker", "spark://spark-master:7077"]
    

    now you can go to:
    DOCKER_IP:8080 -> and see 1 worker in “Workers”
    DOCKER_IP:8081 -> and see details of worker

    In case you want to have more workers you can add another services to docker-compose. Following docker-compose will create 2 workers – details of 1. worker on port 8081 and second on 8082.

    version: '2'
    services:
      spark-master:
       image: spark-master
       ports:
       - "7077:7077"
       - "8080:8080"
      spark-slave1:
       image: spark-slave
       ports:
       - "8081:8081"
       depends_on:
       - spark-master
      spark-slave2:
       image: spark-slave
       ports:
       - "8082:8081"
       depends_on:
       - spark-master
    
    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.