neo4j-mazerunner, How to Increase memory size in docker-compose.yml

Using kbastani/spark-neo4j with docker-compose on MacBook pro (16gb mem), I’m trying to analyze strongly_connected_components of my graph.

I have a graph with about 60,000 nodes (n1:Node {id:1})-[r:NEXT {count:100}]->(n2:Node {id:2}).

  • Cannot restart the MySQL Docker container, gives errors like `Can't open the mysql.plugin table` and `Table 'mysql.user' doesn't exist`
  • Getting an “operation not supported” error when trying to RUN something while building a docker image
  • Connection refused when trying to access SOLR instance running in boot2docker on windows
  • Connecting a Dockerized MongoDb instance, behing a reverse Nginx proxy on EC2
  • What does virtual size of docker image mean?
  • How should I create a Dockerfile to run more than one services in one instance?
  • Using the neo4j browser I’ve managed to get pagerank processed back to my nodes.

    However, when I try to run a more complex algorithm like strongly_connected_components, I get the following error:

    mazerunner_1  | 16/11/29 14:58:01 ERROR Utils: Uncaught exception in thread SparkListenerBus
    mazerunner_1  | java.lang.OutOfMemoryError: Java heap space
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5$$anonfun$apply$9.apply(JobProgressListener.scala:200)
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5$$anonfun$apply$9.apply(JobProgressListener.scala:200)
    mazerunner_1  |     at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
    mazerunner_1  |     at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5.apply(JobProgressListener.scala:200)
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5.apply(JobProgressListener.scala:198)
    mazerunner_1  |     at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    mazerunner_1  |     at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener.onJobStart(JobProgressListener.scala:198)
    mazerunner_1  |     at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34)
    mazerunner_1  |     at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    mazerunner_1  |     at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    mazerunner_1  |     at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    mazerunner_1  |     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60)
    mazerunner_1  | Exception in thread "SparkListenerBus" java.lang.OutOfMemoryError: Java heap space
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5$$anonfun$apply$9.apply(JobProgressListener.scala:200)
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5$$anonfun$apply$9.apply(JobProgressListener.scala:200)
    mazerunner_1  |     at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
    mazerunner_1  |     at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5.apply(JobProgressListener.scala:200)
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener$$anonfun$onJobStart$5.apply(JobProgressListener.scala:198)
    mazerunner_1  |     at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    mazerunner_1  |     at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
    mazerunner_1  |     at org.apache.spark.ui.jobs.JobProgressListener.onJobStart(JobProgressListener.scala:198)
    mazerunner_1  |     at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:34)
    mazerunner_1  |     at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    mazerunner_1  |     at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    mazerunner_1  |     at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    mazerunner_1  |     at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
    mazerunner_1  |     at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60)
    

    I have tried to modify my docker-compose.yml file like so:

    hdfs:
      environment:
        - "JAVA_OPTS=-Xmx5g"
      image: sequenceiq/hadoop-docker:2.4.1
      command: /etc/bootstrap.sh -d -bash
    mazerunner:
      environment:
        - "JAVA_OPTS=-Xmx5g"
      image: kbastani/neo4j-graph-analytics:latest
      links:
       - hdfs
    graphdb:
      environment:
        - "JAVA_OPTS=-Xmx2g"
      image: kbastani/docker-neo4j:latest
      ports:
       - "7474:7474"
       - "1337:1337"
      volumes:
       - /opt/data
      links:
       - mazerunner
       - hdfs
    

    with no success. How do I configure spark & hdfs to use the maximum available memory?

  • How do you see Jupyter notebook on a microsoft/cntk docker?
  • Docker is not exporting env var
  • mysql does not start with large log files
  • SbtNativePackager “No EXPOSE directive found in Dockerfile”
  • Brew install docker does not include docker engine?
  • How to set the container name used for --link option with Docker
  • One Solution collect form web for “neo4j-mazerunner, How to Increase memory size in docker-compose.yml”

    My solution was increasing the memory size of the virtual machine. On my Virtual Box UI, I have adgjusted the “Base memory” size slider.

    enter image description here

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.