diff options
-rw-r--r-- | Help/command/ctest_test.rst | 6 | ||||
-rw-r--r-- | Help/manual/cmake-properties.7.rst | 1 | ||||
-rw-r--r-- | Help/manual/ctest.1.rst | 229 | ||||
-rw-r--r-- | Help/prop_test/PROCESSES.rst | 54 | ||||
-rw-r--r-- | Help/prop_test/RESOURCE_LOCK.rst | 8 | ||||
-rw-r--r-- | Help/release/dev/ctest-hardware-allocation.rst | 6 |
6 files changed, 304 insertions, 0 deletions
diff --git a/Help/command/ctest_test.rst b/Help/command/ctest_test.rst index 4a69491..0a33da3 100644 --- a/Help/command/ctest_test.rst +++ b/Help/command/ctest_test.rst @@ -17,6 +17,7 @@ Perform the :ref:`CTest Test Step` as a :ref:`Dashboard Client`. [EXCLUDE_FIXTURE_SETUP <regex>] [EXCLUDE_FIXTURE_CLEANUP <regex>] [PARALLEL_LEVEL <level>] + [HARDWARE_SPEC_FILE <file>] [TEST_LOAD <threshold>] [SCHEDULE_RANDOM <ON|OFF>] [STOP_TIME <time-of-day>] @@ -82,6 +83,11 @@ The options are: Specify a positive number representing the number of tests to be run in parallel. +``HARDWARE_SPEC_FILE <file>`` + Specify a + :ref:`hardware specification file <ctest-hardware-specification-file>`. See + :ref:`ctest-hardware-allocation` for more information. + ``TEST_LOAD <threshold>`` While running tests in parallel, try not to start tests when they may cause the CPU load to pass above a given threshold. If not diff --git a/Help/manual/cmake-properties.7.rst b/Help/manual/cmake-properties.7.rst index 3fe609b..1369aa3 100644 --- a/Help/manual/cmake-properties.7.rst +++ b/Help/manual/cmake-properties.7.rst @@ -414,6 +414,7 @@ Properties on Tests /prop_test/LABELS /prop_test/MEASUREMENT /prop_test/PASS_REGULAR_EXPRESSION + /prop_test/PROCESSES /prop_test/PROCESSOR_AFFINITY /prop_test/PROCESSORS /prop_test/REQUIRED_FILES diff --git a/Help/manual/ctest.1.rst b/Help/manual/ctest.1.rst index 9d93bb8..a18d43f 100644 --- a/Help/manual/ctest.1.rst +++ b/Help/manual/ctest.1.rst @@ -90,6 +90,15 @@ Options See `Label and Subproject Summary`_. +``--hardware-spec-file <file>`` + Run CTest with :ref:`hardware allocation <ctest-hardware-allocation>` enabled, + using the + :ref:`hardware specification file <ctest-hardware-specification-file>` + specified in ``<file>``. + + When ``ctest`` is run as a `Dashboard Client`_ this sets the + ``HardwareSpecFile`` option of the `CTest Test Step`_. + ``--test-load <level>`` While running tests in parallel (e.g. with ``-j``), try not to start tests when they may cause the CPU load to pass above a given threshold. @@ -958,6 +967,11 @@ Arguments to the command may specify some of the step settings. Configuration settings include: +``HardwareSpecFile`` + Specify a + :ref:`hardware specification file <ctest-hardware-specification-file>`. See + :ref:`ctest-hardware-allocation` for more information. + ``LabelsForSubprojects`` Specify a semicolon-separated list of labels that will be treated as subprojects. This mapping will be passed on to CDash when configure, test or @@ -1267,6 +1281,221 @@ model is defined as follows: Test properties. Can contain keys for each of the supported test properties. +.. _`ctest-hardware-allocation`: + +Hardware Allocation +=================== + +CTest provides a mechanism for tests to specify the hardware that they need and +how much of it they need, and for users to specify the hardware availiable on +the running machine. This allows CTest to internally keep track of which +hardware is in use and which is free, scheduling tests in a way that prevents +them from trying to claim hardware that is not available. + +A common use case for this feature is for tests that require the use of a GPU. +Multiple tests can simultaneously allocate memory from a GPU, but if too many +tests try to do this at once, some of them will fail to allocate, resulting in +a failed test, even though the test would have succeeded if it had the memory +it needed. By using the hardware allocation feature, each test can specify how +much memory it requires from a GPU, allowing CTest to schedule tests in a way +that running several of these tests at once does not exhaust the GPU's memory +pool. + +Please note that CTest has no concept of what a GPU is or how much memory it +has, nor does it have any way of communicating with a GPU to retrieve this +information or perform any memory management. CTest simply keeps track of a +list of abstract resource types, each of which has a certain number of slots +available for tests to use. Each test specifies the number of slots that it +requires from a certain resource, and CTest then schedules them in a way that +prevents the total number of slots in use from exceeding the listed capacity. +When a test is executed, and slots from a resource are allocated to that test, +tests may assume that they have exclusive use of those slots for the duration +of the test's process. + +The CTest hardware allocation feature consists of two inputs: + +* The :ref:`hardware specification file <ctest-hardware-specification-file>`, + described below, which describes the hardware resources available on the + system, and +* The :prop_test:`PROCESSES` property of tests, which describes the resources + required by the test + +When CTest runs a test, the hardware allocated to that test is passed in the +form of a set of +:ref:`environment variables <ctest-hardware-environment-variables>` as +described below. Using this information to decide which resource to connect to +is left to the test writer. + +Please note that these processes are not spawned by CTest. The ``PROCESSES`` +property merely tells CTest what processes the test expects to launch. It is up +to the test itself to do this process spawning, and read the :ref:`environment +variables <ctest-hardware-environment-variables>` to determine which resources +each process has been allocated. + +.. _`ctest-hardware-specification-file`: + +Hardware Specification File +--------------------------- + +The hardware specification file is a JSON file which is passed to CTest, either +on the :manual:`ctest(1)` command line as ``--hardware-spec-file``, or as the +``HARDWARE_SPEC_FILE`` argument of :command:`ctest_test`. The hardware +specification file must be a JSON object. All examples in this document assume +the following hardware specification file: + +.. code-block:: json + + { + "local": [ + { + "gpus": [ + { + "id": "0", + "slots": 2 + }, + { + "id": "1", + "slots": 4 + }, + { + "id": "2", + "slots": 2 + }, + { + "id": "3" + } + ], + "crypto_chips": [ + { + "id": "card0", + "slots": 4 + } + ] + } + ] + } + +The members are: + +``local`` + A JSON array consisting of CPU sockets present on the system. Currently, only + one socket is supported. + + Each socket is a JSON object with members whose names are equal to the + desired resource types, such as ``gpu``. These names must start with a + lowercase letter or an underscore, and subsequent characters can be a + lowercase letter, a digit, or an underscore. Uppercase letters are not + allowed, because certain platforms have case-insensitive environment + variables. See the `Environment Variables`_ section below for + more information. It is recommended that the resource type name be the plural + of a noun, such as ``gpus`` or ``crypto_chips`` (and not ``gpu`` or + ``crypto_chip``.) + + Please note that the names ``gpus`` and ``crypto_chips`` are just examples, + and CTest does not interpret them in any way. You are free to make up any + resource type you want to meet your own requirements. + + The value for each resource type is a JSON array consisting of JSON objects, + each of which describe a specific instance of the specified resource. These + objects have the following members: + + ``id`` + A string consisting of an identifier for the resource. Each character in + the identifier can be a lowercase letter, a digit, or an underscore. + Uppercase letters are not allowed. + + Identifiers must be unique within a resource type. However, they do not + have to be unique across resource types. For example, it is valid to have a + ``gpus`` resource named ``0`` and a ``crypto_chips`` resource named ``0``, + but not two ``gpus`` resources both named ``0``. + + Please note that the IDs ``0``, ``1``, ``2``, ``3``, and ``card0`` are just + examples, and CTest does not interpret them in any way. You are free to + make up any IDs you want to meet your own requirements. + + ``slots`` + An optional unsigned number specifying the number of slots available on the + resource. For example, this could be megabytes of RAM on a GPU, or + cryptography units available on a cryptography chip. If ``slots`` is not + specified, a default value of ``1`` is assumed. + +In the example file above, there are four GPUs with ID's 0 through 3. GPU 0 has +2 slots, GPU 1 has 4, GPU 2 has 2, and GPU 3 has a default of 1 slot. There is +also one cryptography chip with 4 slots. + +``PROCESSES`` Property +---------------------- + +See :prop_test:`PROCESSES` for a description of this property. + +.. _`ctest-hardware-environment-variables`: + +Environment Variables +--------------------- + +Once CTest has decided which resources to allocate to a test, it passes this +information to the test executable as a series of environment variables. For +each example below, we will assume that the test in question has a +:prop_test:`PROCESSES` property of ``2,gpus:2;gpus:4,gpus:1,crypto_chips:2``. + +The following variables are passed to the test process: + +.. envvar:: CTEST_PROCESS_COUNT + + The total number of processes specified by the :prop_test:`PROCESSES` + property. For example: + + * ``CTEST_PROCESS_COUNT=3`` + + This variable will only be defined if :manual:`ctest(1)` has been given a + ``--hardware-spec-file``, or if :command:`ctest_test` has been given a + ``HARDWARE_SPEC_FILE``. If no hardware specification file has been given, + this variable will not be defined. + +.. envvar:: CTEST_PROCESS_<num> + + The list of resource types allocated to each process, with each item + separated by a comma. ``<num>`` is a number from zero to + ``CTEST_PROCESS_COUNT`` minus one. ``CTEST_PROCESS_<num>`` is defined for + each ``<num>`` in this range. For example: + + * ``CTEST_PROCESS_0=gpus`` + * ``CTEST_PROCESS_1=gpus`` + * ``CTEST_PROCESS_2=crypto_chips,gpus`` + +.. envvar:: CTEST_PROCESS_<num>_<resource-type> + + The list of resource IDs and number of slots from each ID allocated to each + process for a given resource type. This variable consists of a series of + pairs, each pair separated by a semicolon, and with the two items in the pair + separated by a comma. The first item in each pair is ``id:`` followed by the + ID of a resource of type ``<resource-type>``, and the second item is + ``slots:`` followed by the number of slots from that resource allocated to + the given process. For example: + + * ``CTEST_PROCESS_0_GPUS=id:0,slots:2`` + * ``CTEST_PROCESS_1_GPUS=id:2,slots:2`` + * ``CTEST_PROCESS_2_GPUS=id:1,slots:4;id:3,slots:1`` + * ``CTEST_PROCESS_2_CRYPTO_CHIPS=id:card0,slots:2`` + + In this example, process 0 gets 2 slots from GPU ``0``, process 1 gets 2 slots + from GPU ``2``, and process 2 gets 4 slots from GPU ``1`` and 2 slots from + cryptography chip ``card0``. + + ``<num>`` is a number from zero to ``CTEST_PROCESS_COUNT`` minus one. + ``<resource-type>`` is the name of a resource type, converted to uppercase. + ``CTEST_PROCESS_<num>_<resource-type>`` is defined for the product of each + ``<num>`` in the range listed above and each resource type listed in + ``CTEST_PROCESS_<num>``. + + Because some platforms have case-insensitive names for environment variables, + the names of resource types may not clash in a case-insensitive environment. + Because of this, for the sake of simplicity, all resource types must be + listed in all lowercase in the + :ref:`hardware specification file <ctest-hardware-specification-file>` and in + the :prop_test:`PROCESSES` property, and they are converted to all uppercase + in the ``CTEST_PROCESS_<num>_<resource-type>`` environment variable. + See Also ======== diff --git a/Help/prop_test/PROCESSES.rst b/Help/prop_test/PROCESSES.rst new file mode 100644 index 0000000..d09c6d1 --- /dev/null +++ b/Help/prop_test/PROCESSES.rst @@ -0,0 +1,54 @@ +PROCESSES +---------- + +Set to specify the number of processes spawned by a test, and the resources +that they require. See :ref:`hardware allocation <ctest-hardware-allocation>` +for more information on how this property integrates into the CTest hardware +allocation feature. + +The ``PROCESSES`` property is a :ref:`semicolon-separated list <CMake Language +Lists>` of process descriptions. Each process description consists of an +optional number of processes for the description followed by a series of +resource requirements for those processes. These requirements (and the number +of processes) are separated by commas. The resource requirements consist of the +name of a resource type, followed by a colon, followed by an unsigned integer +specifying the number of slots required on one resource of the given type. + +Please note that these processes are not spawned by CTest. The ``PROCESSES`` +property merely tells CTest what processes the test expects to launch. It is up +to the test itself to do this process spawning, and read the :ref:`environment +variables <ctest-hardware-environment-variables>` to determine which resources +each process has been allocated. + +Consider the following example: + +.. code-block:: cmake + + add_test(NAME MyTest COMMAND MyExe) + set_property(TEST MyTest PROPERTY PROCESSES + "2,gpus:2" + "gpus:4,crypto_chips:2") + +In this example, there are two process descriptions (implicitly separated by a +semicolon.) The content of the first description is ``2,gpus:2``. This +description spawns 2 processes, each of which requires 2 slots from a single +GPU. The content of the second description is ``gpus:4,crypto_chips:2``. This +description does not specify a process count, so a default of 1 is assumed. +This single process requires 4 slots from a single GPU and 2 slots from a +single cryptography chip. In total, 3 processes are spawned from this test, +each with their own unique requirements. + +When CTest sets the :ref:`environment variables +<ctest-hardware-environment-variables>` for a test, it assigns a process number +based on the process description, starting at 0 on the left and the number of +processes minus 1 on the right. For example, in the example above, the two +processes in the first description would have IDs of 0 and 1, and the single +process in the second description would have an ID of 2. + +Both the ``PROCESSES`` and :prop_test:`RESOURCE_LOCK` properties serve similar +purposes, but they are distinct and orthogonal. Resources specified by +``PROCESSES`` do not affect :prop_test:`RESOURCE_LOCK`, and vice versa. Whereas +:prop_test:`RESOURCE_LOCK` is a simpler property that is used for locking one +global resource, ``PROCESSES`` is a more advanced property that allows multiple +tests to simultaneously use multiple resources of the same type, specifying +their requirements in a fine-grained manner. diff --git a/Help/prop_test/RESOURCE_LOCK.rst b/Help/prop_test/RESOURCE_LOCK.rst index 755e0aa..7d61f77 100644 --- a/Help/prop_test/RESOURCE_LOCK.rst +++ b/Help/prop_test/RESOURCE_LOCK.rst @@ -8,3 +8,11 @@ not to run concurrently. See also :prop_test:`FIXTURES_REQUIRED` if the resource requires any setup or cleanup steps. + +Both the :prop_test:`PROCESSES` and ``RESOURCE_LOCK`` properties serve similar +purposes, but they are distinct and orthogonal. Resources specified by +:prop_test:`PROCESSES` do not affect ``RESOURCE_LOCK``, and vice versa. Whereas +``RESOURCE_LOCK`` is a simpler property that is used for locking one global +resource, :prop_test:`PROCESSES` is a more advanced property that allows +multiple tests to simultaneously use multiple resources of the same type, +specifying their requirements in a fine-grained manner. diff --git a/Help/release/dev/ctest-hardware-allocation.rst b/Help/release/dev/ctest-hardware-allocation.rst new file mode 100644 index 0000000..875dbdc --- /dev/null +++ b/Help/release/dev/ctest-hardware-allocation.rst @@ -0,0 +1,6 @@ +ctest-hardware-allocation +------------------------- + +* :manual:`ctest(1)` now has the ability to serialize tests based on hardware + requirements for each test. See :ref:`ctest-hardware-allocation` for + details. |