The Dataspace Interface (H5S)

1. Introduction

The dataspace interface (H5S) provides a mechanism to describe the positions of the elements of a dataset and is designed in such a way as to allow new features to be easily added without disrupting applications that use the dataspace interface. A dataset (defined with the dataset interface) is composed of a collection of raw data points of homogeneous type, defined in the datatype (H5T) interface, organized according to the dataspace with this interface.

A dataspace describes the locations that dataset elements are located at. A dataspace is either a regular N-dimensional array of data points, called a simple dataspace, or a more general collection of data points organized in another manner, called a complex dataspace. A scalar dataspace is a special case of the simple data space and is defined to be a 0-dimensional single data point in size. Currently only scalar and simple dataspaces are supported with this version of the H5S interface. Complex dataspaces will be defined and implemented in a future version. Complex dataspaces are intended to be used for such structures which are awkward to express in simple dataspaces, such as irregularly gridded data or adaptive mesh refinement data. This interface provides functions to set and query properties of a dataspace.

Operations on a dataspace include defining or extending the extent of the dataspace, selecting portions of the dataspace for I/O and storing the dataspaces in the file. The extent of a dataspace is the range of coordinates over which dataset elements are defined and stored. Dataspace selections are subsets of the extent (up to the entire extent) which are selected for some operation.

For example, a 2-dimensional dataspace with an extent of 10 by 10 may have the following very simple selection:

0 1 2 3 4 5 6 7 8 9
0 ----------
1 -XXX------
2 -XXX------
3 -XXX------
4 -XXX------
5 -XXX------
6 ----------
7 ----------
8 ----------
9 ----------

Example 1: Contiguous rectangular selection

Or, a more complex selection may be defined:

0 1 2 3 4 5 6 7 8 9
0 ----------
1 -XXX--X---
2 -X-X------
3 -X-X--X---
4 -X-X------
5 -XXX--X---
6 ----------
7 --XXXX----
8 ----------
9 ----------

Example 2: Non-contiguous selection

Selections within dataspaces have an offset within the extent which is used to locate the selection within the extent of the dataspace. Selection offsets default to 0 in each dimension, but may be changed to move the selection within a dataspace. In example 2 above, if the offset was changed to 1,1, the selection would look like this:

0 1 2 3 4 5 6 7 8 9
0 ----------
1 ----------
2 --XXX--X--
3 --X-X-----
4 --X-X--X--
5 --X-X-----
6 --XXX--X--
7 ----------
8 ---XXXX---
9 ----------

Example 3: Non-contiguous selection with 1,1 offset

Selections also have an linearization ordering of the points selected (defaulting to "C" order, ie. last dimension changing fastest). The linearization order may be specified for each point or it may be chosen by the axis of the dataspace. For example, with the default "C" ordering, example 1's selected points are iterated through in this order: (1,1), (2,1), (3,1), (1,2), (2,2), etc. With "FORTRAN" ordering, example 1's selected points would be iterated through in this order: (1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), etc.

A dataspace may be stored in the file as a permanent object, to allow many datasets to use a commonly defined dataspace. Dataspaces with extendable extents (ie. unlimited dimensions) are not able to be stored as permanent dataspaces.

Dataspaces may be created using an existing permanent dataspace as a container to locate the new dataspace within. These dataspaces are complete dataspaces and may be used to define datasets. A dataspaces with a "parent" can be queried to determine the parent dataspace and the location within the parent. These dataspaces must currently be the same number of dimensions as the parent dataspace.

2. General Dataspace Operations

The functions defined in this section operate on dataspaces as a whole. New dataspaces can be created from scratch or copied from existing data spaces. When a dataspace is no longer needed its resources should be released by calling H5Sclose().
hid_t H5Screate(H5S_class_t type)
This function creates a new dataspace of a particular type. The types currently supported are H5S_SCALAR, H5S_SIMPLE, or H5S_NONE, although others are planned to be added later. The H5S_NONE dataspace can only hold a selection, not an extent.
hid_t H5Sopen(hid_t location, const char *name)
This function opens a permanent dataspace for use in an application. The location argument is a file or group ID and name is an absolute or relative path to the permanent dataspace. The dataspace ID which is returned is a handle to a permanent dataspace which can't be modified.
hid_t H5Scopy (hid_t space)
This function creates a new dataspace which is an exact copy of the dataspace space.
hid_t H5Ssubspace (hid_t space)
This function uses the currently defined selection and offset in space to create a dataspace which is located within space. The space dataspace must be a sharable dataspace located in the file, not a dataspace for a dataset. The relationship of the new dataspace within the existing dataspace is preserved when the new dataspace is used to create datasets. Currently, only subspaces which are equivalent to simple dataspaces (ie. rectangular contiguous areas) are allowed. A subspace is not "simplified" or reduced in the number of dimensions used if the selection is "flat" in one dimension, they always have the same number of dimensions as their parent dataspace.
herr_t H5Scommit (hid_t location, const char *name, hid_t space)
The dataspaces specified with space is stored in the file specified by location. Location may be either a file or group handle and name is an absolute or relative path to the location to store the dataspace. After this call, the dataspace is permanent and can't be modified.
herr_t H5Sclose (hid_t space)
Releases resources associated with a dataspace. Subsequent use of the dataspace identifier after this call is undefined.
H5S_class_t H5Sextent_class (hid_t space)
Query a dataspace to determine the current class of a dataspace. The value which is returned is one of: H5S_SCALAR, H5S_SIMPLE on success or H5S_NO_CLASS on failure.

3. Dataspace Extent Operations

These functions operate on the extent portion of a dataspace.
herr_t H5Sset_extent_simple (hid_t space, int rank, const hsize_t *current_size, const hsize_t *maximum_size)
Sets or resets the size of an existing dataspace, where rank is the dimensionality, or number of dimensions, of the dataspace. current_size is an array of size rank which contains the new size of each dimension in the dataspace. maximum_size is an array of size rank which contains the maximum size of each dimension in the dataspace. Any previous extent is removed from the dataspace, the dataspace type is set to H5S_SIMPLE and the extent is set as specified.
herr_t H5Sset_extent_none (hid_t space)
Removes the extent from a dataspace and sets the type to H5S_NONE.
herr_t H5Sextent_copy (hid_t dest_space, hid_t source_space)
Copies the extent from source_space to dest_space, which may change the type of the dataspace. Returns non-negative on success, negative on failure.
hsize_t H5Sextent_npoints (hid_t space)
This function determines the number of elements in a dataspace. For example, a simple 3-dimensional dataspace with dimensions 2, 3 and 4 would have 24 elements. Returns the number of elements in the dataspace, negative on failure.
int H5Sextent_ndims (hid_t space)
This function determines the dimensionality (or rank) of a dataspace. Returns the number of dimensions in the dataspace, negative on failure.
herr_t H5Sextent_dims (hid_t space, hsize_t *dims, hsize_t *max)
The function retrieves the size of the extent of the dataspace space by placing the size of each dimension in the array dims. Also retrieves the size of the maximum extent of the dataspace, placing the results in max. Returns non-negative on success, negative on failure.

4. Dataspace Selection Operations

Selections are maintained separately from extents in dataspaces and operations on the selection of a dataspace do not affect the extent of the dataspace. Selections are independent of extent type and the boundaries of selections are reconciled with the extent at the time of the data transfer. Selection offsets apply a selection to a location within an extent, allowing the same selection to be moved within the extent without requiring a new selection to be specified. Offsets default to 0 when the dataspace is created. Offsets are applied when an I/O transfer is performed (and checked during calls to H5Sselect_valid). Selections have an iteration order for the points selected, which can be any permutation of the dimensions involved (defaulting to 'C' array order) or a specific order for the selected points, for selections composed of single array elements with H5Sselect_elements. Selections can also be copied or combined together in various ways with H5Sselect_op. Further methods of selecting portions of a dataspace may be added in the future.
herr_t H5Sselect_hyperslab (hid_t space, h5s_selopt_t op, const hssize_t * start, const hsize_t * stride, const hsize_t * count, const hsize_t * block)
This function selects a hyperslab region to add to the current selected region for the space dataspace. The start, stride, count and block arrays must be the same size as the rank of the dataspace. The selection operator op determines how the new selection is to be combined with the already existing selection for the dataspace. Currently, only the H5S_SELECT_SET operator is supported, which replaces the existing selection with the parameters from this call. Overlapping blocks are not supported with the H5S_SELECT_SET operator.

The start array determines the starting coordinates of the hyperslab to select. The stride array chooses array locations from the dataspace with each value in the stride array determining how many elements to move in each dimension. Setting a value in the stride array to 1 moves to each element in that dimension of the dataspace, setting a value of 2 in a location in the stride array moves to every other element in that dimension of the dataspace. In other words, the stride determines the number of elements to move from the start location in each dimension. Stride values of 0 are not allowed. If the stride parameter is NULL, a contiguous hyperslab is selected (as if each value in the stride array was set to all 1's). The count array determines how many blocks to select from the dataspace, in each dimension. The block array determines the size of the element block selected from the dataspace. If the block parameter is set to NULL, the block size defaults to a single element in each dimension (as if the block array was set to all 1's).

For example, in a 2-dimensional dataspace, setting start to [1,1], stride to [4,4], count to [3,7] and block to [2,2] selects 21 2x2 blocks of array elements starting with location (1,1) and selecting blocks at locations (1,1), (5,1), (9,1), (1,5), (5,5), etc.

Regions selected with this function call default to 'C' order iteration when I/O is performed.

herr_t H5Sselect_elements (hid_t space, h5s_selopt_t op, const size_t num_elements, const hssize_t *coord[])
This function selects array elements to be included in the selection for the space dataspace. The number of elements selected must be set with the num_elements. The coord array is a two-dimensional array of size <dataspace rank> by <num_elements> in size (ie. a list of coordinates in the array). The order of the element coordinates in the coord array also specifies the order that the array elements are iterated through when I/O is performed. Duplicate coordinate locations are not checked for.

The selection operator op determines how the new selection is to be combined with the already existing selection for the dataspace. Currently, only the H5S_SELECT_SET operator is supported, which replaces the existing selection with the parameters from this call. When operators other than H5S_SELECT_SET are used to combine a new selection with an existing selection, the selection ordering is reset to 'C' array ordering.

herr_t H5Sselect_all (hid_t space)
This function selects the special H5S_SELECT_ALL region for the space dataspace. H5S_SELECT_ALL selects the entire dataspace for any dataspace is is applied to.
herr_t H5Sselect_none (hid_t space)
This function resets the selection region for the space dataspace not to include any elements.
herr_t H5Sselect_op (hid_t space1, h5s_selopt_t op, hid_t space2)
Uses space2 to perform an operation on space1. The valid operations for op are:
H5S_SELECT_COPY
Copies the selection from space2 into space1, removing any previously defined selection for space1. The selection order and offset are also copied to space1
H5S_SELECT_UNION
Performs a set union of the selection of the dataspace space2 with the selection from the dataspace space1, with the result being stored in space1. The selection order for space1 is reset to 'C' order.
H5S_SELECT_INTERSECT
Performs an set intersection of the selection from space2 with space1, with the result being stored in space1. The selection order for space1 is reset to 'C' order.
H5S_SELECT_DIFFERENCE
Performs a set difference of the selection from space2 with space1, with the result being stored in space1. The selection order for space1 is reset to 'C' order.
herr_t H5Sselect_order (hid_t space, hsize_t perm_vector[])
This function selects the order to iterate through the dimensions of a dataspace when performing I/O on a selection. If a specific order has already been selected for the selection with H5Sselect_elements, this function will remove it and use a dimension oriented ordering on the selected elements. The elements of the perm_vector array must be unique and between 0 and the rank of the dataspace, minus 1. The order of the elements in perm_vector specify the order to iterate through the selection for each dimension of the dataspace. To iterate through a 3-dimensional dataspace selection in 'C' order, specify the elements of the perm_vector as [0, 1, 2], for FORTRAN order they would be [2, 1, 0]. Other orderings, such as [1, 2, 0] are also possible, but may execute slower.
hbool_t H5Sselect_valid (hid_t space)
This function verifies that the selection for a dataspace is within the extent of the dataspace, if the currently set offset for the dataspace is used. Returns TRUE if the selection is contained within the extent, FALSE if it is not contained within the extent and FAIL on error conditions (such as if the selection or extent is not defined).
hsize_t H5Sselect_npoints (hid_t space)
This function determines the number of elements in the current selection of a dataspace.
herr_t H5Soffset_simple (hid_t space, const hssize_t * offset)
Sets the offset of a simple dataspace space. The offset array must be the same number of elements as the number of dimensions for the dataspace. If the offset array is set to NULL, the offset for the dataspace is reset to 0.

5. Misc. Dataspace Operations

herr_t H5Slock (hid_t space)
Locks the dataspace so that it cannot be modified or closed. When the library exits, the dataspace will be unlocked and closed.
hid_t H5Screate_simple(int rank, const hsize_t *current_size, const hsize_t *maximum_size)
This function is a "convenience" wrapper to create a simple dataspace and set it's extent in one call. It is equivalent to calling H5Screate and H5Sset_extent_simple() in two steps.
int H5Sis_subspace(hid_t space)
This function returns positive if space is located within another dataspace, zero if it is not, and negative on a failure.
char *H5Ssubspace_name(hid_t space)
This function returns the name of the named dataspace that space is located within. If space is not located within another dataspace, or an error occurs, NULL is returned. The application is responsible for freeing the string returned.
herr_t H5Ssubspace_location(hid_t space, hsize_t *loc)
If space is located within another dataspace, this function puts the location of the origin of space in the loc array. The loc array must be at least as large as the number of dimensions of space. If space is not located within another dataspace or an error occurs, a negative value is returned, otherwise a non-negative value is returned.

Robb Matzke
Quincey Koziol

Last modified: Thu May 28 15:12:04 EST 1998