Where are data stored in a clustered environment?
Where on earth are people storing their data when they are creating applications that runs in an clustered environment?
I have created an application that reads XSLT’s from an directory on the host. However if I want to run the same application in Google Cloud Engine inside containers (Docker) I have huge problems if I use services (load balancing). There must be a common data store that all reads/writes from. It should be mounted on each pod (right?).
What do I use for this? I tried to use Hadoop but it is impossible to mount (all guides are outdated, I am running Ubuntu 14.04).
I can’t be the first man on earth trying to read/store data in a clustered environment. How is this done?
2 Solutions collect form web for “Where are data stored in a clustered environment?”
Frankly this is a common weakness of all docker orchestration systems out there (AFAIK). Google Container Engine has the persistent disk feature so that volumes can be created that are persistent across container restarts. However, each persistent disk should only be attached to containers that are designed to run on a single instance. Which defeats the purpose of a distributed environment.
Amazon has a similar setup for docker on elastic-bean-stalk where you can mount ebs volumes onto an instance but again it does not play nice with the concept of docker volumes.
CoreOS uses etcd for for this purpose by providing a shared key-value store between all clusters. This is not really as useful as a distributed file system but you can at least share some data between containers.
The point is given the state of affairs right now if you want shared data between containers you will have to roll your own solution.
Edit: Running the container in privileged mode I was able to mount and s3 bucket into the container using s3fs so this can be one option for rolling your own solution. Although I would not use this for write heavy workloads.
docker run -privileged -it ubuntu bash apt-get install build-essential git libfuse-dev libcurl4-openssl-dev libxml2-dev mime-support automake libtool apt-get install pkg-config libssl-dev # See (*3) git clone https://github.com/s3fs-fuse/s3fs-fuse cd s3fs-fuse/ ./autogen.sh ./configure --prefix=/usr --with-openssl # See (*1) make sudo make install echo AWS_KEY:AWS_SECRET>/etc/passwd-s3fs chmod 400 /etc/passwd-s3fs s3fs my-bucket /mnt
You could use Google Cloud Storage for storing that data, available to any app, even outside Google’s network.
In particular for access from GCE, see the respective row in the
Integration with Google Cloud Platform table:
Use Cloud Storage from within a Compute Engine instance:
- Using Service Accounts with Applications
- Exporting an image to Google Cloud Storage
- Using a startup script stored in Google Cloud Storage
- Mount a bucket as a file system on a virtual machine instance