summaryrefslogtreecommitdiffstats
path: root/Help/manual/ctest.1.rst
diff options
context:
space:
mode:
authorKyle Edwards <kyle.edwards@kitware.com>2019-07-11 21:14:01 (GMT)
committerBrad King <brad.king@kitware.com>2019-10-02 13:33:54 (GMT)
commite9500271a3acdc0d1ee448cae5912f768491f429 (patch)
tree31fadd6b09989bdcfb79f4551e9bc33c930d4ded /Help/manual/ctest.1.rst
parentd1f100a41582fecc05c3e04a87c72de94b0e6316 (diff)
downloadCMake-e9500271a3acdc0d1ee448cae5912f768491f429.zip
CMake-e9500271a3acdc0d1ee448cae5912f768491f429.tar.gz
CMake-e9500271a3acdc0d1ee448cae5912f768491f429.tar.bz2
Help: Add documentation for CTest hardware allocation
Diffstat (limited to 'Help/manual/ctest.1.rst')
-rw-r--r--Help/manual/ctest.1.rst229
1 files changed, 229 insertions, 0 deletions
diff --git a/Help/manual/ctest.1.rst b/Help/manual/ctest.1.rst
index 9d93bb8..a18d43f 100644
--- a/Help/manual/ctest.1.rst
+++ b/Help/manual/ctest.1.rst
@@ -90,6 +90,15 @@ Options
See `Label and Subproject Summary`_.
+``--hardware-spec-file <file>``
+ Run CTest with :ref:`hardware allocation <ctest-hardware-allocation>` enabled,
+ using the
+ :ref:`hardware specification file <ctest-hardware-specification-file>`
+ specified in ``<file>``.
+
+ When ``ctest`` is run as a `Dashboard Client`_ this sets the
+ ``HardwareSpecFile`` option of the `CTest Test Step`_.
+
``--test-load <level>``
While running tests in parallel (e.g. with ``-j``), try not to start
tests when they may cause the CPU load to pass above a given threshold.
@@ -958,6 +967,11 @@ Arguments to the command may specify some of the step settings.
Configuration settings include:
+``HardwareSpecFile``
+ Specify a
+ :ref:`hardware specification file <ctest-hardware-specification-file>`. See
+ :ref:`ctest-hardware-allocation` for more information.
+
``LabelsForSubprojects``
Specify a semicolon-separated list of labels that will be treated as
subprojects. This mapping will be passed on to CDash when configure, test or
@@ -1267,6 +1281,221 @@ model is defined as follows:
Test properties.
Can contain keys for each of the supported test properties.
+.. _`ctest-hardware-allocation`:
+
+Hardware Allocation
+===================
+
+CTest provides a mechanism for tests to specify the hardware that they need and
+how much of it they need, and for users to specify the hardware availiable on
+the running machine. This allows CTest to internally keep track of which
+hardware is in use and which is free, scheduling tests in a way that prevents
+them from trying to claim hardware that is not available.
+
+A common use case for this feature is for tests that require the use of a GPU.
+Multiple tests can simultaneously allocate memory from a GPU, but if too many
+tests try to do this at once, some of them will fail to allocate, resulting in
+a failed test, even though the test would have succeeded if it had the memory
+it needed. By using the hardware allocation feature, each test can specify how
+much memory it requires from a GPU, allowing CTest to schedule tests in a way
+that running several of these tests at once does not exhaust the GPU's memory
+pool.
+
+Please note that CTest has no concept of what a GPU is or how much memory it
+has, nor does it have any way of communicating with a GPU to retrieve this
+information or perform any memory management. CTest simply keeps track of a
+list of abstract resource types, each of which has a certain number of slots
+available for tests to use. Each test specifies the number of slots that it
+requires from a certain resource, and CTest then schedules them in a way that
+prevents the total number of slots in use from exceeding the listed capacity.
+When a test is executed, and slots from a resource are allocated to that test,
+tests may assume that they have exclusive use of those slots for the duration
+of the test's process.
+
+The CTest hardware allocation feature consists of two inputs:
+
+* The :ref:`hardware specification file <ctest-hardware-specification-file>`,
+ described below, which describes the hardware resources available on the
+ system, and
+* The :prop_test:`PROCESSES` property of tests, which describes the resources
+ required by the test
+
+When CTest runs a test, the hardware allocated to that test is passed in the
+form of a set of
+:ref:`environment variables <ctest-hardware-environment-variables>` as
+described below. Using this information to decide which resource to connect to
+is left to the test writer.
+
+Please note that these processes are not spawned by CTest. The ``PROCESSES``
+property merely tells CTest what processes the test expects to launch. It is up
+to the test itself to do this process spawning, and read the :ref:`environment
+variables <ctest-hardware-environment-variables>` to determine which resources
+each process has been allocated.
+
+.. _`ctest-hardware-specification-file`:
+
+Hardware Specification File
+---------------------------
+
+The hardware specification file is a JSON file which is passed to CTest, either
+on the :manual:`ctest(1)` command line as ``--hardware-spec-file``, or as the
+``HARDWARE_SPEC_FILE`` argument of :command:`ctest_test`. The hardware
+specification file must be a JSON object. All examples in this document assume
+the following hardware specification file:
+
+.. code-block:: json
+
+ {
+ "local": [
+ {
+ "gpus": [
+ {
+ "id": "0",
+ "slots": 2
+ },
+ {
+ "id": "1",
+ "slots": 4
+ },
+ {
+ "id": "2",
+ "slots": 2
+ },
+ {
+ "id": "3"
+ }
+ ],
+ "crypto_chips": [
+ {
+ "id": "card0",
+ "slots": 4
+ }
+ ]
+ }
+ ]
+ }
+
+The members are:
+
+``local``
+ A JSON array consisting of CPU sockets present on the system. Currently, only
+ one socket is supported.
+
+ Each socket is a JSON object with members whose names are equal to the
+ desired resource types, such as ``gpu``. These names must start with a
+ lowercase letter or an underscore, and subsequent characters can be a
+ lowercase letter, a digit, or an underscore. Uppercase letters are not
+ allowed, because certain platforms have case-insensitive environment
+ variables. See the `Environment Variables`_ section below for
+ more information. It is recommended that the resource type name be the plural
+ of a noun, such as ``gpus`` or ``crypto_chips`` (and not ``gpu`` or
+ ``crypto_chip``.)
+
+ Please note that the names ``gpus`` and ``crypto_chips`` are just examples,
+ and CTest does not interpret them in any way. You are free to make up any
+ resource type you want to meet your own requirements.
+
+ The value for each resource type is a JSON array consisting of JSON objects,
+ each of which describe a specific instance of the specified resource. These
+ objects have the following members:
+
+ ``id``
+ A string consisting of an identifier for the resource. Each character in
+ the identifier can be a lowercase letter, a digit, or an underscore.
+ Uppercase letters are not allowed.
+
+ Identifiers must be unique within a resource type. However, they do not
+ have to be unique across resource types. For example, it is valid to have a
+ ``gpus`` resource named ``0`` and a ``crypto_chips`` resource named ``0``,
+ but not two ``gpus`` resources both named ``0``.
+
+ Please note that the IDs ``0``, ``1``, ``2``, ``3``, and ``card0`` are just
+ examples, and CTest does not interpret them in any way. You are free to
+ make up any IDs you want to meet your own requirements.
+
+ ``slots``
+ An optional unsigned number specifying the number of slots available on the
+ resource. For example, this could be megabytes of RAM on a GPU, or
+ cryptography units available on a cryptography chip. If ``slots`` is not
+ specified, a default value of ``1`` is assumed.
+
+In the example file above, there are four GPUs with ID's 0 through 3. GPU 0 has
+2 slots, GPU 1 has 4, GPU 2 has 2, and GPU 3 has a default of 1 slot. There is
+also one cryptography chip with 4 slots.
+
+``PROCESSES`` Property
+----------------------
+
+See :prop_test:`PROCESSES` for a description of this property.
+
+.. _`ctest-hardware-environment-variables`:
+
+Environment Variables
+---------------------
+
+Once CTest has decided which resources to allocate to a test, it passes this
+information to the test executable as a series of environment variables. For
+each example below, we will assume that the test in question has a
+:prop_test:`PROCESSES` property of ``2,gpus:2;gpus:4,gpus:1,crypto_chips:2``.
+
+The following variables are passed to the test process:
+
+.. envvar:: CTEST_PROCESS_COUNT
+
+ The total number of processes specified by the :prop_test:`PROCESSES`
+ property. For example:
+
+ * ``CTEST_PROCESS_COUNT=3``
+
+ This variable will only be defined if :manual:`ctest(1)` has been given a
+ ``--hardware-spec-file``, or if :command:`ctest_test` has been given a
+ ``HARDWARE_SPEC_FILE``. If no hardware specification file has been given,
+ this variable will not be defined.
+
+.. envvar:: CTEST_PROCESS_<num>
+
+ The list of resource types allocated to each process, with each item
+ separated by a comma. ``<num>`` is a number from zero to
+ ``CTEST_PROCESS_COUNT`` minus one. ``CTEST_PROCESS_<num>`` is defined for
+ each ``<num>`` in this range. For example:
+
+ * ``CTEST_PROCESS_0=gpus``
+ * ``CTEST_PROCESS_1=gpus``
+ * ``CTEST_PROCESS_2=crypto_chips,gpus``
+
+.. envvar:: CTEST_PROCESS_<num>_<resource-type>
+
+ The list of resource IDs and number of slots from each ID allocated to each
+ process for a given resource type. This variable consists of a series of
+ pairs, each pair separated by a semicolon, and with the two items in the pair
+ separated by a comma. The first item in each pair is ``id:`` followed by the
+ ID of a resource of type ``<resource-type>``, and the second item is
+ ``slots:`` followed by the number of slots from that resource allocated to
+ the given process. For example:
+
+ * ``CTEST_PROCESS_0_GPUS=id:0,slots:2``
+ * ``CTEST_PROCESS_1_GPUS=id:2,slots:2``
+ * ``CTEST_PROCESS_2_GPUS=id:1,slots:4;id:3,slots:1``
+ * ``CTEST_PROCESS_2_CRYPTO_CHIPS=id:card0,slots:2``
+
+ In this example, process 0 gets 2 slots from GPU ``0``, process 1 gets 2 slots
+ from GPU ``2``, and process 2 gets 4 slots from GPU ``1`` and 2 slots from
+ cryptography chip ``card0``.
+
+ ``<num>`` is a number from zero to ``CTEST_PROCESS_COUNT`` minus one.
+ ``<resource-type>`` is the name of a resource type, converted to uppercase.
+ ``CTEST_PROCESS_<num>_<resource-type>`` is defined for the product of each
+ ``<num>`` in the range listed above and each resource type listed in
+ ``CTEST_PROCESS_<num>``.
+
+ Because some platforms have case-insensitive names for environment variables,
+ the names of resource types may not clash in a case-insensitive environment.
+ Because of this, for the sake of simplicity, all resource types must be
+ listed in all lowercase in the
+ :ref:`hardware specification file <ctest-hardware-specification-file>` and in
+ the :prop_test:`PROCESSES` property, and they are converted to all uppercase
+ in the ``CTEST_PROCESS_<num>_<resource-type>`` environment variable.
+
See Also
========