Image of a data volume using docker
I am very interested in reproducible data science work. To that end, I am now exploring Docker as a platform which enables bundling of code, data and environment’s settings. My first simple attempt is a Docker image which contains the data it needs (link).
However, this is only the first step, in this example, the data is part of the image, and thus when the image is loaded into a container, the data is already there. My next objective is to decouple the code of the analysis and the data. As far as I understand, that would mean to have two containers, one with the code (
code) and one with the data (
code I use a simple
FROM continuumio/miniconda3 RUN conda install ipython
and for the
FROM atlassian/ubuntu-minimal COPY data.csv /tmp
data.csv is a data file I’m copying to the image.
After building these two images I can run them as described in this solution:
docker run -i -t --name code --net=data-testing --net-alias=code drorata/minimal-python /bin/bash docker run -i -t --name data --net=data-testing --net-alias=data drorata/data-image /bin/bash
after starting a network:
docker network create data-testing
After these steps I can ping one container from the other, and probably also access
code. But I have this feeling this is a sub optimal solution and cannot be considered good practice.
What is considered a good practice to have a container that can access data? I read a little about data volumes but I don’t understand how to utilize them and how to turn them into images.
One Solution collect form web for “Image of a data volume using docker”
the use of a container as data storage is largely considered outdated and deprecated, at this point. you should be using data volumes instead.
but a data volume is not something that you can turn into an image. really, there is no need for this.
if you want to deliver a .csv file to someone and let them use that in their docker container, just give them the .csv file.
the easiest way to get the file into the container and be able to use it, is with a host mounted volume.
-v flag on
docker run, you can specify a local folder or file to be mounted into the docker container.
Say, for example, your docker image expects to find a file at
/data/input.csv. When you call
docker run and you want to provide your own input.csv file, you would do something like
docker run -v /my/file/path/input.csv:/data/ my-image
i’m not providing all of the options in this example that you are showing, but i am illustrating the
-v flag. this will take your local filesystem’s
input.csv and mount it into the docker container. now your container will be able to use your copy of that data.