Where are data stored in a clustered environment?

Where on earth are people storing their data when they are creating applications that runs in an clustered environment?

I have created an application that reads XSLT’s from an directory on the host. However if I want to run the same application in Google Cloud Engine inside containers (Docker) I have huge problems if I use services (load balancing). There must be a common data store that all reads/writes from. It should be mounted on each pod (right?).

  • NAS storage in Docker Container
  • Can you pass flags to the command that docker runs?
  • Cannot get a valid response from Docker Registry remote API
  • Set Go Glide in Docker
  • Cannot connect to MySql in Docker. Access Denied Error thrown. Flask-SqlAlchemy
  • Docker 1.10 container's IP in LAN
  • What do I use for this? I tried to use Hadoop but it is impossible to mount (all guides are outdated, I am running Ubuntu 14.04).

    I can’t be the first man on earth trying to read/store data in a clustered environment. How is this done?

  • How to do SSH tunneling with docker (machine & compose)?
  • Docker mongodb - how are data only containers portable
  • How to run docker daemon?
  • Install ember flora editor in docker
  • an established connection was aborted by the software in your host machine Docker
  • docker error: /var/run/docker.sock: no such file or directory
  • 2 Solutions collect form web for “Where are data stored in a clustered environment?”

    Frankly this is a common weakness of all docker orchestration systems out there (AFAIK). Google Container Engine has the persistent disk feature so that volumes can be created that are persistent across container restarts. However, each persistent disk should only be attached to containers that are designed to run on a single instance. Which defeats the purpose of a distributed environment.

    Amazon has a similar setup for docker on elastic-bean-stalk where you can mount ebs volumes onto an instance but again it does not play nice with the concept of docker volumes.

    CoreOS uses etcd for for this purpose by providing a shared key-value store between all clusters. This is not really as useful as a distributed file system but you can at least share some data between containers.

    The point is given the state of affairs right now if you want shared data between containers you will have to roll your own solution.

    Edit: Running the container in privileged mode I was able to mount and s3 bucket into the container using s3fs so this can be one option for rolling your own solution. Although I would not use this for write heavy workloads.

    docker run -privileged -it ubuntu bash
    apt-get install build-essential git libfuse-dev libcurl4-openssl-dev 
           libxml2-dev mime-support automake libtool
    apt-get install pkg-config libssl-dev # See (*3)
    git clone https://github.com/s3fs-fuse/s3fs-fuse
    cd s3fs-fuse/
    ./autogen.sh
    ./configure --prefix=/usr --with-openssl # See (*1)
    make
    sudo make install
    
    echo AWS_KEY:AWS_SECRET>/etc/passwd-s3fs
    chmod 400 /etc/passwd-s3fs
    s3fs my-bucket /mnt
    

    You could use Google Cloud Storage for storing that data, available to any app, even outside Google’s network.

    In particular for access from GCE, see the respective row in the
    Integration with Google Cloud Platform table:

    Use Cloud Storage from within a Compute Engine instance:

    • Using Service Accounts with Applications
    • Exporting an image to Google Cloud Storage
    • Using a startup script stored in Google Cloud Storage
    • Mount a bucket as a file system on a virtual machine instance
    Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.