How do I write a unit test to check for correct behavior under conditions of transient system resource constraints?

I have a Python 3 application which uses the multiprocessing module to spread a large, resource intensive scientific calculation over many CPUs in parallel. There are a couple of steps inside the calculation that require the process to allocate moderately large arrays. The application works just fine when I run it on my OS X laptop, however, the long term plan is that it will typically be run from inside a docker container, which will in turn run on an Amazon Web Services EC2 instance or other comparable cloud-based virtual machine–effectively nesting the application within two levels of machine virtualization technology.

I am anticipating that in the future, some other user may attempt to run the application with the virtual machine resources (memory, swap space, etc.) configured to values which have been set a bit too aggressively small. (There are clear financial incentives to do this, because you tend to pay less for cloud computing services, the fewer resources you use.)

  • How to run multiple times a Docker container with different parameters in Kubernetes?
  • not able to build a specific Dockerfile
  • Docker error in Linux Mint
  • Docker swarm TLS Failed to validate pending node
  • Accessing host machine as localhost from a Docker container when building the image
  • Docker EC2 & port binding
  • This brings up the possibility, under a tightly constrained resources scenario, that one of the processes could try to allocate memory for an array at a moment when a sufficiently large block of memory is temporarily not available, thereby triggering a Python MemoryError exception.

    A solution to this problem might look something like the code snippet below: try to allocate the memory, catch the exception if it occurs, then wait a while and try again:

    import numpy as np
    import time
    import datetime
    import os
    from warnings import warn
    
    def getHugeArray(n, retry=1, warnuser=5):
    
        # Don't retry too often
        if retry < 0.1:
            retry = 1
        # Don't send a useless flood of warning messages
        if warnuser < retry:
            warnuser = retry
    
        success, outarray = False, None
        startwait, lastcount = datetime.datetime.now(), 0
        # Keep re-asking the OS for memory allocation until the OS gives it to us
        while success is False:
            try:
                outarray = np.zeros(n)
                success = True
            except MemoryError:
                time.sleep(retry)
                wait = (datetime.datetime.now()-startwait).total_seconds()
                newcount = int(wait/warnuser)
                if newcount > lastcount:
                    msg = 'PID {0}: waiting for memory allocation for {1} seconds'
                    warn(msg.format(os.getpid(), wait))
                    lastcount = newcount
    
        return outarray
    

    My question: how would I construct a unit test to verify the waiting behavior? Or is that even possible? It seems like I would have to write a test which sets itself up by first hogging up most of the memory resources on my system, then waits for a while as the getHugeArray() function begins to execute, then releases the resources so that getHugeArray() may grab them, and ultimately checks to see if the getHugeArray() function comes back with a correct return value.

    Is this something that would be better to try to do within an integration testing framework rather than as a unit test? Or in any case, what is an appropriate tool or test framework that I could use to test my code in this type of situation, and how should I set it up?

  • docker nginx load balancing not working with Azure
  • Calling Git in Jenkins build script in Docker
  • docker run makes i/o timeout error during installing tensorflow on windows10
  • When does a running Docker container run out of disk space?
  • Docker vs Vagrant
  • dokku - Run Rails 4 app from Subfolder
  • Docker will be the best open platform for developers and sysadmins to build, ship, and run distributed applications.