Image of a data volume using docker

I am very interested in reproducible data science work. To that end, I am now exploring Docker as a platform which enables bundling of code, data and environment’s settings. My first simple attempt is a Docker image which contains the data it needs (link).

However, this is only the first step, in this example, the data is part of the image, and thus when the image is loaded into a container, the data is already there. My next objective is to decouple the code of the analysis and the data. As far as I understand, that would mean to have two containers, one with the code (code) and one with the data (data).

  • Is the data in unbound docker container volumes committed to the image?
  • Deploying Cassandra on ECS?
  • How to access external network from the Docker
  • docker COPY with file globbing
  • Create Docker image for NodeJS + PostgreSQL web application
  • How to change docker default ip binding?
  • For the code I use a simple Dockerfile:

    FROM continuumio/miniconda3 
    RUN conda install ipython

    and for the data:

    FROM atlassian/ubuntu-minimal
    COPY data.csv /tmp

    where data.csv is a data file I’m copying to the image.

    After building these two images I can run them as described in this solution:

    docker run -i -t --name code --net=data-testing --net-alias=code drorata/minimal-python /bin/bash
    docker run -i -t --name data --net=data-testing --net-alias=data drorata/data-image /bin/bash

    after starting a network: docker network create data-testing

    After these steps I can ping one container from the other, and probably also access data.csv from code. But I have this feeling this is a sub optimal solution and cannot be considered good practice.

    What is considered a good practice to have a container that can access data? I read a little about data volumes but I don’t understand how to utilize them and how to turn them into images.

  • Why won't my container run?
  • Docker and securing passwords
  • Docker Containers on Marathon disappeared
  • Docker/Rails - Permission denied @ dir_s_mkdir Errno::EACCESS
  • I can't find my Docker image after building it
  • Starting container as a non-root user vs starting as root and then downgrade to non-root
  • One Solution collect form web for “Image of a data volume using docker”

    the use of a container as data storage is largely considered outdated and deprecated, at this point. you should be using data volumes instead.

    but a data volume is not something that you can turn into an image. really, there is no need for this.

    if you want to deliver a .csv file to someone and let them use that in their docker container, just give them the .csv file.

    the easiest way to get the file into the container and be able to use it, is with a host mounted volume.

    using the -v flag on docker run, you can specify a local folder or file to be mounted into the docker container.

    Say, for example, your docker image expects to find a file at /data/input.csv. When you call docker run and you want to provide your own input.csv file, you would do something like

    docker run -v /my/file/path/input.csv:/data/ my-image

    i’m not providing all of the options in this example that you are showing, but i am illustrating the -v flag. this will take your local filesystem’s input.csv and mount it into the docker container. now your container will be able to use your copy of that data.

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.