Image of a data volume using docker

I am very interested in reproducible data science work. To that end, I am now exploring Docker as a platform which enables bundling of code, data and environment’s settings. My first simple attempt is a Docker image which contains the data it needs (link).

However, this is only the first step, in this example, the data is part of the image, and thus when the image is loaded into a container, the data is already there. My next objective is to decouple the code of the analysis and the data. As far as I understand, that would mean to have two containers, one with the code (code) and one with the data (data).

  • Issue returning https url when returning response from a docker container
  • Deploying local Docker image (DockerFIle) as local Kubernetes pod
  • Docker tcp lookup fails
  • Can't push docker image to internal Artifactory
  • Rancher agent can't run on master server?
  • Docker mongodb share volume with mac os x
  • For the code I use a simple Dockerfile:

    FROM continuumio/miniconda3 
    RUN conda install ipython
    

    and for the data:

    FROM atlassian/ubuntu-minimal
    COPY data.csv /tmp
    

    where data.csv is a data file I’m copying to the image.

    After building these two images I can run them as described in this solution:

    docker run -i -t --name code --net=data-testing --net-alias=code drorata/minimal-python /bin/bash
    docker run -i -t --name data --net=data-testing --net-alias=data drorata/data-image /bin/bash
    

    after starting a network: docker network create data-testing

    After these steps I can ping one container from the other, and probably also access data.csv from code. But I have this feeling this is a sub optimal solution and cannot be considered good practice.

    What is considered a good practice to have a container that can access data? I read a little about data volumes but I don’t understand how to utilize them and how to turn them into images.

  • Unable to connect to dockerized postgres on localhost, but remote connection is working
  • docker build fails - “cannot allocate memory”
  • In docker, difference with commit and dockerfile
  • Access camera inside docker container
  • accessing a docker container from another another container
  • Why can't redis access the disk in docker-compose?
  • One Solution collect form web for “Image of a data volume using docker”

    the use of a container as data storage is largely considered outdated and deprecated, at this point. you should be using data volumes instead.

    but a data volume is not something that you can turn into an image. really, there is no need for this.

    if you want to deliver a .csv file to someone and let them use that in their docker container, just give them the .csv file.

    the easiest way to get the file into the container and be able to use it, is with a host mounted volume.

    using the -v flag on docker run, you can specify a local folder or file to be mounted into the docker container.

    Say, for example, your docker image expects to find a file at /data/input.csv. When you call docker run and you want to provide your own input.csv file, you would do something like

    docker run -v /my/file/path/input.csv:/data/ my-image

    i’m not providing all of the options in this example that you are showing, but i am illustrating the -v flag. this will take your local filesystem’s input.csv and mount it into the docker container. now your container will be able to use your copy of that data.

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.