Docker and sensitive information used at run-time

We are dockerizing an application (written in Node.js) that will need to access some sensitive data at run-time (API tokens for different services) and I can’t find any recommended approach to deal with that.

Some information:

  • Best way to reduce the size of a custom Docker image
  • Bcrypt: invalid ELF header with Docker and Sails.JS
  • GCE persistent disk, kubernetes, and data persistence
  • Docker 1.12 Swarm Nodes IP's
  • How to squash Dockerfiles?
  • How can I launch an IronWorker with Docker without needing to pay for private Docker repositories, nor make my code public?
    • The sensitive information is not in our codebase, but it’s kept on another repository in encrypted format.
    • On our current deployment, without Docker, we update the codebase with git, and then we manually copy the sensitive information via SSH.
    • The docker images will be stored in a private, self-hosted registry

    I can think of some different approaches, but all of them have some drawbacks:

    1. Include the sensitive information in the Docker images at build time. This is certainly the easiest one; however, it makes them available to anyone with access to the image (I don’t know if we should trust the registry that much).
    2. Like 1, but having the credentials in a data-only image.
    3. Create a volume in the image that links to a directory in the host system, and manually copy the credentials over SSH like we’re doing right now. This is very convenient too, but then we can’t spin up new servers easily (maybe we could use something like etcd to synchronize them?)
    4. Pass the information as environment variables. However, we have 5 different pairs of API credentials right now, which makes this a bit inconvenient. Most importantly, however, we would need to keep another copy of the sensitive information in the configuration scripts (the commands that will be executed to run Docker images), and this can easily create problems (e.g. credentials accidentally included in git, etc).

    PS: I’ve done some research but couldn’t find anything similar to my problem. Other questions (like this one) were about sensitive information needed at build-time; in our case, we need the information at run-time

  • What is the best way to iterate while building a docker container?
  • How to restart apache2 without crashing docker container?
  • How to extend an existing docker image?
  • Docker: how to provide secret information to the container?
  • Getting “Invalid address” error on gunicorn bind with my server ip . i am trying this within the docker
  • apt-get install -f does not resolve dependencies
  • One Solution collect form web for “Docker and sensitive information used at run-time”

    I’ve used your options 3 and 4 to solve this in the past. To rephrase/elaborate:

    Create a volume in the image that links to a directory in the host system, and manually copy the credentials over SSH like we’re doing right now.

    I use config management (Chef or Ansible) to set up the credentials on the host. If the app takes a config file needing API tokens or database credentials, I use config management to create that file from a template. Chef can read the credentials from encrypted data bag or attributes, set up the files on the host, then start the container with a volume just like you describe.

    Note that in the container you may need a wrapper to run the app. The wrapper copies the config file from whatever the volume is mounted to wherever the application expects it, then starts the app.

    Pass the information as environment variables. However, we have 5 different pairs of API credentials right now, which makes this a bit inconvenient. Most importantly, however, we would need to keep another copy of the sensitive information in the configuration scripts (the commands that will be executed to run Docker images), and this can easily create problems (e.g. credentials accidentally included in git, etc).

    Yes, it’s cumbersome to pass a bunch of env variables using -e key=value syntax, but this is how I prefer to do it. Remember the variables are still exposed to anyone with access to the Docker daemon. If your docker run command is composed programmatically it’s easier.

    If not, use the --env-file flag as discussed here in the Docker docs. You create a file with key=value pairs, then run a container using that file.

    $ cat >> myenv << END
    FOO=BAR
    BAR=BAZ
    END
    $ docker run --env-file myenv
    

    That myenv file can be created using chef/config management as described above.

    If you’re hosting on AWS you can leverage KMS here. Keep either the env file or the config file (that is passed to the container in a volume) encrypted via KMS. In the container, use a wrapper script to call out to KMS, decrypt the file, move it in to place and start the app. This way the config data is not exposed on disk.

    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.