Troubleshooting API timeout from Django+Celery in Docker Container
I have a micro-services architecture of let say 9 services, each one running in its own container.
The services use a mix of technologies, but mainly Django, Celery (with a Redis Queue), a shared PostgreSQL database (in its own container), and some more specific services/libraries.
The micro-services talk to each other through REST API.
The problem is that, sometimes in a random way, some containers API doesn’t respond anymore and get stuck.
When I issue a
curl request on their interface I get a timeout.
At that moment, all the other containers answer well.
There is two stucking containers.
What I noticed is that both of the blocking containers use:
- An embedded Redis as a Celery broker
- An access to a PostgreSQL DB that stands in another container
I can’t figure out how to troubleshoot the problem since no relevant information is visible in the Services or Docker logs.
The problem is that these API’s are stuck only at random moments. To make it work again, I need to stop the blocking container, and start it again.
I was wondering if it could be a python GIL problem, but I don’t know how to check this hypothesis…
Any idea about how to troubleshot this?
One Solution collect form web for “Troubleshooting API timeout from Django+Celery in Docker Container”
You can shell into the running container and check things out. Is the celery process still running, etc…
docker exec -ti my-container-name /bin/bash
If you are using django, for example, you could go to your django directory and do
manage.py shell and start poking around there.
I have a similar setup where I run multiple web services using django/celery/celerybeat/nginx/…
However, as a rule I run one process per container (kind of exception is django and gunicorn run in same container). I then share things by using
For example, the gunicorn app writes to a .sock file, and the container has its own nginx config; the nginx container does a
--volumes-from the django container to get this info. That way, I can use a stock nginx container for all of my web services.
Another handy thing for debugging is to log to stdout and use docker’s log driver (splunk, logstash, etc.) for production, but have it log to the container when debugging. That way you can get a lot of information from ‘docker logs’ when you’ve got it under test. One of the great things about docker is you can take the exact code that is failing in production and run it under the microscope to debug it.