Including data in MySQL Docker container
This question is similar to:
Setting up MySQL and importing dump within Dockerfile
But the answer to that question did not address my use case.
I have a MySQL DB that has 5TB of data in Production. For Dev, I only need about 500MB of that data. The integration tests that run as part of the build of my application require access to a MySQL DB. Currently that DB is being created on Jenkins and data is being injected into it by the build process. This is very slow.
I would like to replace this part of this process with Docker. My idea is that I would have a Docker container that runs MySQL and that has my 500MB of data already baked into the container, rather than relying on the standard process associated with the MySQL Docker image of only executing the MySQL import when the container launches. Based on tests to date, the standard process is taking 4-5 minutes, where as I would like to get this down to seconds.
I would have thought this would be a common use case, but pre-baking data in MySQL Docker containers seems to be frowned upon, and there isn’t really any guidance re. this method.
Has anyone any experience in this regard? Is there a very good reason why data should not be pre-baked into a MySQL Docker container?
2 Solutions collect form web for “Including data in MySQL Docker container”
Based on investigation I have done with this, it isn’t really possible to include data in a container that uses the standard MySQL image as its base.
I tried to get around this by deploying a container from this base and manipulating it, before then doing a commit to a new image.
However, there is a key thing to understand about the MySQL base image. Both its data directory (/var/lib/mysql/) and config directory (/etc/mysql/) are set up as Docker volumes, which means their contents map to locations on your host system.
Volumes like these aren’t saved as part of commits, so you can’t manipulate and save. In addition, the image has features that prevent manipulation of these locations with ENTRYPOINT routines.
All of this is by design, as it is envisaged that this image be used with either persistent or independent data sets. It would be nice if there were an option to include the data in the container, but this looks like something the developers really do not want to entertain.
To resolve my issue, I want back to a base Ubuntu image, built my DB on it, and committed that to a new image, which works fine. The container size is a bit larger, but the deployment as part of our build job is significantly faster than waiting for the MySQL-based container to run the 500MB import at start up.
The main argument against this is that your image is a snapshot of the data and the schema at a point in time – it will get stale quickly, and you’ll need a good process to generate new images with fresh data easily, to make it useful without being expensive to maintain.
That said, I wouldn’t frown upon this – I think it’s a particularly good use-case for a non-production Docker image. A 500MB image is pretty cheap to move around, so you could have lots of them – tagged versions for different releases of your database schema, and even multiple images with different datasets for different test scenarios.
A pre-loaded database container should start in seconds, so you can easily run the relevant container as a step in your build pipeline before running integration tests. Just be aware of the maintenance overhead – I would look at automating the data extract from live, cleansing, shrinking and packaging right from the start.