How do I write a unit test to check for correct behavior under conditions of transient system resource constraints?

I have a Python 3 application which uses the multiprocessing module to spread a large, resource intensive scientific calculation over many CPUs in parallel. There are a couple of steps inside the calculation that require the process to allocate moderately large arrays. The application works just fine when I run it on my OS X laptop, however, the long term plan is that it will typically be run from inside a docker container, which will in turn run on an Amazon Web Services EC2 instance or other comparable cloud-based virtual machine–effectively nesting the application within two levels of machine virtualization technology.

I am anticipating that in the future, some other user may attempt to run the application with the virtual machine resources (memory, swap space, etc.) configured to values which have been set a bit too aggressively small. (There are clear financial incentives to do this, because you tend to pay less for cloud computing services, the fewer resources you use.)

  • getting a ptty for docker start
  • Default arguments in Docker
  • Dockerfile inconsistent caching
  • How can I access from outside my docker container in mac?
  • Error executing “docker ps --format”. Error: flag provided but not defined: --format
  • Docker: supervisord doesn't start uwsgi
  • This brings up the possibility, under a tightly constrained resources scenario, that one of the processes could try to allocate memory for an array at a moment when a sufficiently large block of memory is temporarily not available, thereby triggering a Python MemoryError exception.

    A solution to this problem might look something like the code snippet below: try to allocate the memory, catch the exception if it occurs, then wait a while and try again:

    import numpy as np
    import time
    import datetime
    import os
    from warnings import warn
    def getHugeArray(n, retry=1, warnuser=5):
        # Don't retry too often
        if retry < 0.1:
            retry = 1
        # Don't send a useless flood of warning messages
        if warnuser < retry:
            warnuser = retry
        success, outarray = False, None
        startwait, lastcount =, 0
        # Keep re-asking the OS for memory allocation until the OS gives it to us
        while success is False:
                outarray = np.zeros(n)
                success = True
            except MemoryError:
                wait = (
                newcount = int(wait/warnuser)
                if newcount > lastcount:
                    msg = 'PID {0}: waiting for memory allocation for {1} seconds'
                    warn(msg.format(os.getpid(), wait))
                    lastcount = newcount
        return outarray

    My question: how would I construct a unit test to verify the waiting behavior? Or is that even possible? It seems like I would have to write a test which sets itself up by first hogging up most of the memory resources on my system, then waits for a while as the getHugeArray() function begins to execute, then releases the resources so that getHugeArray() may grab them, and ultimately checks to see if the getHugeArray() function comes back with a correct return value.

    Is this something that would be better to try to do within an integration testing framework rather than as a unit test? Or in any case, what is an appropriate tool or test framework that I could use to test my code in this type of situation, and how should I set it up?

  • Java Spring Hibernate application runs on local, but not in aws's ecs docker
  • Cannot connect to Go GRPC server running in local Docker container
  • How to expose docker container's ip and port to outside docker host without port mapping?
  • docker-machine:Error with pre-create check: “exit status 126”
  • docker registry v2, where do buffered or pushed images get stored
  • Is it possible to dockerize Internet Explorer?
  • Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.