summaryrefslogtreecommitdiffstats
path: root/src
diff options
context:
space:
mode:
authorAllen Byrne <50328838+byrnHDF@users.noreply.github.com>2022-07-05 16:08:56 (GMT)
committerGitHub <noreply@github.com>2022-07-05 16:08:56 (GMT)
commit5e3017f31cf220852b37c6aad5daf6597cbc2da8 (patch)
treef4c9d9b541fc0d7eefb2ee2eaa35961b80feeff7 /src
parent4fa879ab4140df94e0edccc258b3d4025dc8616a (diff)
downloadhdf5-5e3017f31cf220852b37c6aad5daf6597cbc2da8.zip
hdf5-5e3017f31cf220852b37c6aad5daf6597cbc2da8.tar.gz
hdf5-5e3017f31cf220852b37c6aad5daf6597cbc2da8.tar.bz2
Add datasets UG chapter, plus fix doxygen refs (#1841)
* Add datasets UG chapter, plus fix doxygen refs * Remove duplicates
Diffstat (limited to 'src')
-rw-r--r--src/H5Dmodule.h2931
-rw-r--r--src/H5Gpublic.h2
-rw-r--r--src/H5Tmodule.h4
-rw-r--r--src/H5Zpublic.h2
4 files changed, 2932 insertions, 7 deletions
diff --git a/src/H5Dmodule.h b/src/H5Dmodule.h
index d04310e..e3742cb 100644
--- a/src/H5Dmodule.h
+++ b/src/H5Dmodule.h
@@ -32,29 +32,2954 @@
/** \page H5D_UG HDF5 Datasets
*
* \section sec_dataset HDF5 Datasets
+ *
* \subsection subsec_dataset_intro Introduction
+ *
+ * An HDF5 dataset is an object composed of a collection of data elements, or raw data, and
+ * metadata that stores a description of the data elements, data layout, and all other information
+ * necessary to write, read, and interpret the stored data. From the viewpoint of the application the
+ * raw data is stored as a one-dimensional or multi-dimensional array of elements (the raw data),
+ * those elements can be any of several numerical or character types, small arrays, or even
+ * compound types similar to C structs. The dataset object may have attribute objects. See the
+ * figure below.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig1.gif "Application view of a dataset"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * A dataset object is stored in a file in two parts: a header and a data array. The header contains
+ * information that is needed to interpret the array portion of the dataset, as well as metadata (or
+ * pointers to metadata) that describes or annotates the dataset. Header information includes the
+ * name of the object, its dimensionality, its number-type, information about how the data itself is
+ * stored on disk (the storage layout), and other information used by the library to speed up access
+ * to the dataset or maintain the file’s integrity.
+ *
+ * The HDF5 dataset interface, comprising the @ref H5D functions, provides a mechanism for managing
+ * HDF5 datasets including the transfer of data between memory and disk and the description of
+ * dataset properties.
+ *
+ * A dataset is used by other HDF5 APIs, either by name or by an identifier. For more information,
+ * \see \ref api-compat-macros.
+ *
+ * \subsubsection subsubsec_dataset_intro_link Link/Unlink
+ * A dataset can be added to a group with one of the H5Lcreate calls, and deleted from a group with
+ * #H5Ldelete. The link and unlink operations use the name of an object, which may be a dataset.
+ * The dataset does not have to open to be linked or unlinked.
+ *
+ * \subsubsection subsubsec_dataset_intro_obj Object Reference
+ * A dataset may be the target of an object reference. The object reference is created by
+ * #H5Rcreate with the name of an object which may be a dataset and the reference type
+ * #H5R_OBJECT. The dataset does not have to be open to create a reference to it.
+ *
+ * An object reference may also refer to a region (selection) of a dataset. The reference is created
+ * with #H5Rcreate and a reference type of #H5R_DATASET_REGION.
+ *
+ * An object reference can be accessed by a call to #H5Rdereference. When the reference is to a
+ * dataset or dataset region, the #H5Rdereference call returns an identifier to the dataset just as if
+ * #H5Dopen has been called.
+ *
+ * \subsubsection subsubsec_dataset_intro_attr Adding Attributes
+ * A dataset may have user-defined attributes which are created with #H5Acreate and accessed
+ * through the @ref H5A API. To create an attribute for a dataset, the dataset must be open, and the
+ * identifier is passed to #H5Acreate. The attributes of a dataset are discovered and opened using
+ * #H5Aopen_name, #H5Aopen_idx, or #H5Aiterate; these functions use the identifier of the dataset.
+ * An attribute can be deleted with #H5Adelete which also uses the identifier of the dataset.
+ *
* \subsection subsec_dataset_function Dataset Function Summaries
+ * Functions that can be used with datasets (@ref H5D functions) and property list functions that can
+ * used with datasets (@ref H5P functions) are listed below.
+ *
+ * <table>
+ * <caption>Dataset functions</caption>
+ * <tr>
+ * <th>Function</th>
+ * <th>Purpose</th>
+ * </tr>
+ * <tr>
+ * <td>#H5Dcreate</td>
+ * <td>Creates a dataset at the specified location. The
+ * C function is a macro: \see \ref api-compat-macros.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dcreate_anon</td>
+ * <td>Creates a dataset in a file without linking it into the file structure.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dopen</td>
+ * <td>Opens an existing dataset. The C function is a macro: \see \ref api-compat-macros.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dclose</td>
+ * <td>Closes the specified dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dget_space</td>
+ * <td>Returns an identifier for a copy of the dataspace for a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dget_space_status</td>
+ * <td>Determines whether space has been allocated for a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dget_type</td>
+ * <td>Returns an identifier for a copy of the datatype for a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dget_create_plist</td>
+ * <td>Returns an identifier for a copy of the dataset creation property list for a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dget_access_plist</td>
+ * <td>Returns the dataset access property list associated with a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dget_offset</td>
+ * <td>Returns the dataset address in a file.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dget_storage_size</td>
+ * <td>Returns the amount of storage required for a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dvlen_get_buf_size</td>
+ * <td>Determines the number of bytes required to store variable-length (VL) data.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dvlen_reclaim</td>
+ * <td>Reclaims VL datatype memory buffers.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dread</td>
+ * <td>Reads raw data from a dataset into a buffer.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dwrite</td>
+ * <td>Writes raw data from a buffer to a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Diterate</td>
+ * <td>Iterates over all selected elements in a dataspace.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dgather</td>
+ * <td>Gathers data from a selection within a memory buffer.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dscatter</td>
+ * <td>Scatters data into a selection within a memory buffer.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dfill</td>
+ * <td>Fills dataspace elements with a fill value in a memory buffer.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Dset_extent</td>
+ * <td>Changes the sizes of a dataset’s dimensions.</td>
+ * </tr>
+ * </table>
+ *
+ * <table>
+ * <caption>Dataset creation property list functions (H5P)</caption>
+ * <tr>
+ * <th>Function</th>
+ * <th>Purpose</th>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_layout</td>
+ * <td>Sets the type of storage used to store the raw data for a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_layout</td>
+ * <td>Returns the layout of the raw data for a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_chunk</td>
+ * <td>Sets the size of the chunks used to store a chunked layout dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_chunk</td>
+ * <td>Retrieves the size of chunks for the raw data of a chunked layout dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_deflate</td>
+ * <td>Sets compression method and compression level.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_fill_value</td>
+ * <td>Sets the fill value for a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_fill_value</td>
+ * <td>Retrieves a dataset fill value.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pfill_value_defined</td>
+ * <td>Determines whether the fill value is defined.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_fill_time</td>
+ * <td>Sets the time when fill values are written to a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_fill_time</td>
+ * <td>Retrieves the time when fill value are written to a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_alloc_time</td>
+ * <td>Sets the timing for storage space allocation.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_alloc_time</td>
+ * <td>Retrieves the timing for storage space allocation.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_filter</td>
+ * <td>Adds a filter to the filter pipeline.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pall_filters_avail</td>
+ * <td>Verifies that all required filters are available.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_nfilters</td>
+ * <td>Returns the number of filters in the pipeline.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_filter</td>
+ * <td>Returns information about a filter in a pipeline.
+ * The C function is a macro: \see \ref api-compat-macros.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_filter_by_id</td>
+ * <td>Returns information about the specified filter.
+ * The C function is a macro: \see \ref api-compat-macros.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pmodify_filter</td>
+ * <td>Modifies a filter in the filter pipeline.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Premove_filter</td>
+ * <td>Deletes one or more filters in the filter pipeline.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_fletcher32</td>
+ * <td>Sets up use of the Fletcher32 checksum filter.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_nbit</td>
+ * <td>Sets up use of the n-bit filter.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_scaleoffset</td>
+ * <td>Sets up use of the scale-offset filter.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_shuffle</td>
+ * <td>Sets up use of the shuffle filter.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_szip</td>
+ * <td>Sets up use of the Szip compression filter.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_external</td>
+ * <td>Adds an external file to the list of external files.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_external_count</td>
+ * <td>Returns the number of external files for a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_external</td>
+ * <td>Returns information about an external file.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_char_encoding</td>
+ * <td>Sets the character encoding used to encode a string. Use to set ASCII or UTF-8 character
+ * encoding for object names.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_char_encoding</td>
+ * <td>Retrieves the character encoding used to create a string.</td>
+ * </tr>
+ * </table>
+ *
+ * <table>
+ * <caption>Dataset access property list functions (H5P)</caption>
+ * <tr>
+ * <th>Function</th>
+ * <th>Purpose</th>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_buffer</td>
+ * <td>Sets type conversion and background buffers.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_buffer</td>
+ * <td>Reads buffer settings.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_chunk_cache</td>
+ * <td>Sets the raw data chunk cache parameters.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_chunk_cache</td>
+ * <td>Retrieves the raw data chunk cache parameters.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_edc_check</td>
+ * <td>Sets whether to enable error-detection when reading a dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_edc_check</td>
+ * <td>Determines whether error-detection is enabled for dataset reads.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_filter_callback</td>
+ * <td>Sets user-defined filter callback function.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_data_transform</td>
+ * <td>Sets a data transform expression.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_data_transform</td>
+ * <td>Retrieves a data transform expression.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_type_conv_cb</td>
+ * <td>Sets user-defined datatype conversion callback function.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_type_conv_cb</td>
+ * <td>Gets user-defined datatype conversion callback function.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_hyper_vector_size</td>
+ * <td>Sets number of I/O vectors to be read/written in hyperslab I/O.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_hyper_vector_size</td>
+ * <td>Retrieves number of I/O vectors to be read/written in hyperslab I/O.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_btree_ratios</td>
+ * <td>Sets B-tree split ratios for a dataset transfer property list.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_btree_ratios</td>
+ * <td>Gets B-tree split ratios for a dataset transfer property list.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_vlen_mem_manager</td>
+ * <td>Sets the memory manager for variable-length datatype allocation in #H5Dread and
+ * #H5Dvlen_reclaim.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_vlen_mem_manager</td>
+ * <td>Gets the memory manager for variable-length datatype allocation in #H5Dread and
+ * #H5Dvlen_reclaim.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_dxpl_mpio</td>
+ * <td>Sets data transfer mode.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_dxpl_mpio</td>
+ * <td>Returns the data transfer mode.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_dxpl_mpio_chunk_opt</td>
+ * <td>Sets a flag specifying linked-chunk I/O or multi-chunk I/O.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_dxpl_mpio_chunk_opt_num</td>
+ * <td>Sets a numeric threshold for linked-chunk I/O.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_dxpl_mpio_chunk_opt_ratio</td>
+ * <td>Sets a ratio threshold for collective I/O.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_dxpl_mpio_collective_opt</td>
+ * <td>Sets a flag governing the use of independent versus collective I/O.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_multi_type</td>
+ * <td>Sets the type of data property for the MULTI driver.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_multi_type</td>
+ * <td>Retrieves the type of data property for the MULTI driver.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_small_data_block_size</td>
+ * <td>Sets the size of a contiguous block reserved for small data.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_small_data_block_size</td>
+ * <td>Retrieves the current small data block size setting.</td>
+ * </tr>
+ * </table>
+ *
* \subsection subsec_dataset_program Programming Model for Datasets
+ * This section explains the programming model for datasets.
+ *
* \subsubsection subsubsec_dataset_program_general General Model
+ *
+ * The programming model for using a dataset has three main phases:
+ * \li Obtain access to the dataset
+ * \li Operate on the dataset using the dataset identifier returned at access
+ * \li Release the dataset
+ *
+ * These three phases or steps are described in more detail below the figure.
+ *
+ * A dataset may be opened several times and operations performed with several different
+ * identifiers to the same dataset. All the operations affect the dataset although the calling program
+ * must synchronize if necessary to serialize accesses.
+ *
+ * Note that the dataset remains open until every identifier is closed. The figure below shows the
+ * basic sequence of operations.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig2.gif "Dataset programming sequence"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * Creation and data access operations may have optional parameters which are set with property
+ * lists. The general programming model is:
+ * \li Create property list of appropriate class (dataset create, dataset transfer)
+ * \li Set properties as needed; each type of property has its own format and datatype
+ * \li Pass the property list as a parameter of the API call
+ *
+ * The steps below describe the programming phases or steps for using a dataset.
+ * <h4>Step 1. Obtain Access</h4>
+ * A new dataset is created by a call to #H5Dcreate. If successful, the call returns an identifier for the
+ * newly created dataset.
+ *
+ * Access to an existing dataset is obtained by a call to #H5Dopen. This call returns an identifier for
+ * the existing dataset.
+ *
+ * An object reference may be dereferenced to obtain an identifier to the dataset it points to.
+ *
+ * In each of these cases, the successful call returns an identifier to the dataset. The identifier is
+ * used in subsequent operations until the dataset is closed.
+ *
+ * <h4>Step 2. Operate on the Dataset</h4>
+ * The dataset identifier can be used to write and read data to the dataset, to query and set
+ * properties, and to perform other operations such as adding attributes, linking in groups, and
+ * creating references.
+ *
+ * The dataset identifier can be used for any number of operations until the dataset is closed.
+ *
+ * <h4>Step 3. Close the Dataset</h4>
+ * When all operations are completed, the dataset identifier should be closed with a call to
+ * #H5Dclose. This releases the dataset.
+ *
+ * After the identifier is closed, it cannot be used for further operations.
+ *
* \subsubsection subsubsec_dataset_program_create Create Dataset
+ *
+ * A dataset is created and initialized with a call to #H5Dcreate. The dataset create operation sets
+ * permanent properties of the dataset:
+ * \li Name
+ * \li Dataspace
+ * \li Datatype
+ * \li Storage properties
+ *
+ * These properties cannot be changed for the life of the dataset, although the dataspace may be
+ * expanded up to its maximum dimensions.
+ *
+ * <h4>Name</h4>
+ * A dataset name is a sequence of alphanumeric ASCII characters. The full name would include a
+ * tracing of the group hierarchy from the root group of the file. An example is
+ * /rootGroup/groupA/subgroup23/dataset1. The local name or relative name within the lowest-
+ * level group containing the dataset would include none of the group hierarchy. An example is
+ * Dataset1.
+ *
+ * <h4>Dataspace</h4>
+ * The dataspace of a dataset defines the number of dimensions and the size of each dimension. The
+ * dataspace defines the number of dimensions, and the maximum dimension sizes and current size
+ * of each dimension. The maximum dimension size can be a fixed value or the constant
+ * #H5S_UNLIMITED, in which case the actual dimension size can be changed with calls to
+ * #H5Dset_extent, up to the maximum set with the maxdims parameter in the #H5Screate_simple
+ * call that established the dataset’s original dimensions. The maximum dimension size is set when
+ * the dataset is created and cannot be changed.
+ *
+ * <h4>Datatype</h4>
+ * Raw data has a datatype which describes the layout of the raw data stored in the file. The
+ * datatype is set when the dataset is created and can never be changed. When data is transferred to
+ * and from the dataset, the HDF5 library will assure that the data is transformed to and from the
+ * stored format.
+ *
+ * <h4>Storage Properties</h4>
+ * Storage properties of the dataset are set when it is created. The required inputs table below shows
+ * the categories of storage properties. The storage properties cannot be changed after the dataset is
+ * created.
+ *
+ * <h4>Filters</h4>
+ * When a dataset is created, optional filters are specified. The filters are added to the data transfer
+ * pipeline when data is read or written. The standard library includes filters to implement
+ * compression, data shuffling, and error detection code. Additional user-defined filters may also be
+ * used.
+ *
+ * The required filters are stored as part of the dataset, and the list may not be changed after the
+ * dataset is created. The HDF5 library automatically applies the filters whenever data is
+ * transferred.
+ *
+ * <h4>Summary</h4>
+ *
+ * A newly created dataset has no attributes and no data values. The dimensions, datatype, storage
+ * properties, and selected filters are set. The table below lists the required inputs, and the second
+ * table below lists the optional inputs.
+ *
+ * <table>
+ * <caption>Required inputs</caption>
+ * <tr>
+ * <th>Required Inputs</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>Dataspace</td>
+ * <td>The shape of the array.</td>
+ * </tr>
+ * <tr>
+ * <td>Datatype</td>
+ * <td>The layout of the stored elements.</td>
+ * </tr>
+ * <tr>
+ * <td>Name</td>
+ * <td>The name of the dataset in the group.</td>
+ * </tr>
+ * </table>
+ *
+ * <table>
+ * <caption>Optional inputs</caption>
+ * <tr>
+ * <th>Optional Inputs</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>Storage Layout</td>
+ * <td>How the data is organized in the file including chunking.</td>
+ * </tr>
+ * <tr>
+ * <td>Fill Value</td>
+ * <td>The behavior and value for uninitialized data.</td>
+ * </tr>
+ * <tr>
+ * <td>External Storage</td>
+ * <td>Option to store the raw data in an external file.</td>
+ * </tr>
+ * <tr>
+ * <td>Filters</td>
+ * <td>Select optional filters to be applied. One of the filters that might be applied is compression.</td>
+ * </tr>
+ * </table>
+ *
+ * <h4>Example</h4>
+ * To create a new dataset, go through the following general steps:
+ * \li Set dataset characteristics (optional where default settings are acceptable)
+ * \li Datatype
+ * \li Dataspace
+ * \li Dataset creation property list
+ * \li Create the dataset
+ * \li Close the datatype, dataspace, and property list (as necessary)
+ * \li Close the dataset
+ *
+ * Example 1 below shows example code to create an empty dataset. The dataspace is 7 x 8, and the
+ * datatype is a big-endian integer. The dataset is created with the name “dset1” and is a member of
+ * the root group, “/”.
+ *
+ * <em> Example 1. Create an empty dataset</em>
+ * \code
+ * hid_t dataset, datatype, dataspace;
+ *
+ * // Create dataspace: Describe the size of the array and create the dataspace for fixed-size dataset.
+ * dimsf[0] = 7;
+ * dimsf[1] = 8;
+ * dataspace = H5Screate_simple(2, dimsf, NULL);
+ *
+ * // Define datatype for the data in the file.
+ * // For this example, store little-endian integer numbers.
+ * datatype = H5Tcopy(H5T_NATIVE_INT);
+ * status = H5Tset_order(datatype, H5T_ORDER_LE);
+ *
+ * // Create a new dataset within the file using defined
+ * // dataspace and datatype. No properties are set.
+ * dataset = H5Dcreate(file, "/dset", datatype, dataspace, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
+ * H5Dclose(dataset);
+ * H5Sclose(dataspace);
+ * H5Tclose(datatype);
+ * \endcode
+ *
+ * Example 2, below, shows example code to create a similar dataset with a fill value of ‘-1’. This
+ * code has the same steps as in the example above, but uses a non-default property list. A file
+ * creation property list is created, and then the fill value is set to the desired value. Then the
+ * property list is passed to the #H5Dcreate call.
+ *
+ * <em> Example 2. Create a dataset with fill value set</em>
+ * \code
+ * hid_t plist; // property list
+ * hid_t dataset, datatype, dataspace;
+ * int fillval = -1;
+ *
+ * dimsf[0] = 7;
+ * dimsf[1] = 8;
+ * dataspace = H5Screate_simple(2, dimsf, NULL);
+ * datatype = H5Tcopy(H5T_NATIVE_INT);
+ * status = H5Tset_order(datatype, H5T_ORDER_LE);
+ *
+ * // Example of Dataset Creation property list: set fill value to '-1'
+ * plist = H5Pcreate(H5P_DATASET_CREATE);
+ * status = H5Pset_fill_value(plist, datatype, &fillval);
+ *
+ * // Same as above, but use the property list
+ * dataset = H5Dcreate(file, "/dset", datatype, dataspace, H5P_DEFAULT, plist, H5P_DEFAULT);
+ * H5Dclose(dataset);
+ * H5Sclose(dataspace);
+ * H5Tclose(datatype);
+ * H5Pclose(plist);
+ * \endcode
+ *
+ * After this code is executed, the dataset has been created and written to the file. The data array is
+ * uninitialized. Depending on the storage strategy and fill value options that have been selected,
+ * some or all of the space may be allocated in the file, and fill values may be written in the file.
+ *
* \subsubsection subsubsec_dataset_program_transfer Data Transfer Operations on a Dataset
+ * Data is transferred between memory and the raw data array of the dataset through #H5Dwrite and
+ * #H5Dread operations. A data transfer has the following basic steps:
+ * \li 1. Allocate and initialize memory space as needed
+ * \li 2. Define the datatype of the memory elements
+ * \li 3. Define the elements to be transferred (a selection, or all the elements)
+ * \li 4. Set data transfer properties (including parameters for filters or file drivers) as needed
+ * \li 5. Call the @ref H5D API
+ *
+ * Note that the location of the data in the file, the datatype of the data in the file, the storage
+ * properties, and the filters do not need to be specified because these are stored as a permanent part
+ * of the dataset. A selection of elements from the dataspace is specified; the selected elements may
+ * be the whole dataspace.
+ *
+ * The following figure shows a diagram of a write operation which
+ * transfers a data array from memory to a dataset in the file (usually on disk). A read operation has
+ * similar parameters with the data flowing the other direction.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig3.gif "A write operation"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * <h4>Memory Space</h4>
+ * The calling program must allocate sufficient memory to store the data elements to be transferred.
+ * For a write (from memory to the file), the memory must be initialized with the data to be written
+ * to the file. For a read, the memory must be large enough to store the elements that will be read.
+ * The amount of storage needed can be computed from the memory datatype (which defines the
+ * size of each data element) and the number of elements in the selection.
+ *
+ * <h4>Memory Datatype</h4>
+ * The memory layout of a single data element is specified by the memory datatype. This specifies
+ * the size, alignment, and byte order of the element as well as the datatype class. Note that the
+ * memory datatype must be the same datatype class as the file, but may have different byte order
+ * and other properties. The HDF5 Library automatically transforms data elements between the
+ * source and destination layouts. For more information, \ref sec_datatype.
+ *
+ * For a write, the memory datatype defines the layout of the data to be written; an example is IEEE
+ * floating-point numbers in native byte order. If the file datatype (defined when the dataset is
+ * created) is different but compatible, the HDF5 Library will transform each data element when it
+ * is written. For example, if the file byte order is different than the native byte order, the HDF5
+ * library will swap the bytes.
+ *
+ * For a read, the memory datatype defines the desired layout of the data to be read. This must be
+ * compatible with the file datatype, but should generally use native formats such as byte orders.
+ * The HDF5 library will transform each data element as it is read.
+ *
+ * <h4>Selection</h4>
+ * The data transfer will transfer some or all of the elements of the dataset depending on the
+ * dataspace selection. The selection has two dataspace objects: one for the source, and one for the
+ * destination. These objects describe which elements of the dataspace to be transferred. Some
+ * (partial I/O) or all of the data may be transferred. Partial I/O is defined by defining hyperslabs or
+ * lists of elements in a dataspace object.
+ *
+ * The dataspace selection for the source defines the indices of the elements to be read or written.
+ * The two selections must define the same number of points, but the order and layout may be
+ * different. The HDF5 Library automatically selects and distributes the elements according to the
+ * selections. It might, for example, perform a scatter-gather or sub-set of the data.
+ *
+ * <h4>Data Transfer Properties</h4>
+ * For some data transfers, additional parameters should be set using the transfer property list. The
+ * table below lists the categories of transfer properties. These properties set parameters for the
+ * HDF5 Library and may be used to pass parameters for optional filters and file drivers. For
+ * example, transfer properties are used to select independent or collective operation when using
+ * MPI-I/O.
+ *
+ * <table>
+ * <caption>Categories of transfer properties</caption>
+ * <tr>
+ * <th>Properties</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>Library parameters</td>
+ * <td>Internal caches, buffers, B-Trees, etc.</td>
+ * </tr>
+ * <tr>
+ * <td>Memory management</td>
+ * <td>Variable-length memory management, data overwrite</td>
+ * </tr>
+ * <tr>
+ * <td>File driver management</td>
+ * <td>Parameters for file drivers</td>
+ * </tr>
+ * <tr>
+ * <td>Filter management</td>
+ * <td>Parameters for filters</td>
+ * </tr>
+ * </table>
+ *
+ * <h4>Data Transfer Operation (Read or Write)</h4>
+ * The data transfer is done by calling #H5Dread or #H5Dwrite with the parameters described above.
+ * The HDF5 Library constructs the required pipeline, which will scatter-gather, transform
+ * datatypes, apply the requested filters, and use the correct file driver.
+ *
+ * During the data transfer, the transformations and filters are applied to each element of the data in
+ * the required order until all the data is transferred.
+ *
+ * <h4>Summary</h4>
+ * To perform a data transfer, it is necessary to allocate and initialize memory, describe the source
+ * and destination, set required and optional transfer properties, and call the \ref H5D API.
+ *
+ * <h4>Examples</h4>
+ * The basic procedure to write to a dataset is the following:
+ * \li Open the dataset.
+ * \li Set the dataset dataspace for the write (optional if dataspace is #H5S_ALL).
+ * \li Write data.
+ * \li Close the datatype, dataspace, and property list (as necessary).
+ * \li Close the dataset.
+ *
+ * Example 3 below shows example code to write a 4 x 6 array of integers. In the example, the data
+ * is initialized in the memory array dset_data. The dataset has already been created in the file, so it
+ * is opened with H5Dopen.
+ *
+ * The data is written with #H5Dwrite. The arguments are the dataset identifier, the memory
+ * datatype (#H5T_NATIVE_INT), the memory and file selections (#H5S_ALL in this case: the
+ * whole array), and the default (empty) property list. The last argument is the data to be
+ * transferred.
+ *
+ * <em> Example 3. Write an array of integers</em>
+ * \code
+ * hid_t file_id, dataset_id; // identifiers
+ * herr_t status;
+ * int i, j, dset_data[4][6];
+ *
+ * // Initialize the dataset.
+ * for (i = 0; i < 4; i++)
+ * for (j = 0; j < 6; j++)
+ * dset_data[i][j] = i * 6 + j + 1;
+ *
+ * // Open an existing file.
+ * file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);
+ *
+ * // Open an existing dataset.
+ * dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
+ *
+ * // Write the entire dataset, using 'dset_data': memory type is 'native int'
+ * // write the entire dataspace to the entire dataspace, no transfer properties
+ * status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);
+ *
+ * status = H5Dclose(dataset_id);
+ * \endcode
+ *
+ * Example 4 below shows a similar write except for setting a non-default value for the transfer
+ * buffer. The code is the same as Example 3, but a transfer property list is created, and the desired
+ * buffer size is set. The #H5Dwrite function has the same arguments, but uses the property list to set
+ * the buffer.
+ *
+ * <em> Example 4. Write an array using a property list</em>
+ * \code
+ * hid_t file_id, dataset_id;
+ * hid_t xferplist;
+ * herr_t status;
+ * int i, j, dset_data[4][6];
+ *
+ * file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);
+ * dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
+ *
+ * // Example: set type conversion buffer to 64MB
+ * xferplist = H5Pcreate(H5P_DATASET_XFER);
+ * status = H5Pset_buffer( xferplist, 64 * 1024 *1024, NULL, NULL);
+ *
+ * // Write the entire dataset, using 'dset_data': memory type is 'native int'
+ * write the entire dataspace to the entire dataspace, set the buffer size with the property list
+ * status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, xferplist, dset_data);
+ *
+ * status = H5Dclose(dataset_id);
+ * \endcode
+ *
+ * The basic procedure to read from a dataset is the following:
+ * \li Define the memory dataspace of the read (optional if dataspace is #H5S_ALL).
+ * \li Open the dataset.
+ * \li Get the dataset dataspace (if using #H5S_ALL above).
+ *
+ * Else define dataset dataspace of read.
+ * \li Define the memory datatype (optional).
+ * \li Define the memory buffer.
+ * \li Open the dataset.
+ * \li Read data.
+ * \li Close the datatype, dataspace, and property list (as necessary).
+ * \li Close the dataset.
+ *
+ * The example below shows code that reads a 4 x 6 array of integers from a dataset called “dset1”.
+ * First, the dataset is opened. The #H5Dread call has parameters:
+ * \li The dataset identifier (from #H5Dopen)
+ * \li The memory datatype (#H5T_NATIVE_INT)
+ * \li The memory and file dataspace (#H5S_ALL, the whole array)
+ * \li A default (empty) property list
+ * \li The memory to be filled
+ *
+ * <em> Example 5. Read an array from a dataset</em>
+ * \code
+ * hid_t file_id, dataset_id;
+ * herr_t status;
+ * int i, j, dset_data[4][6];
+ *
+ * // Open an existing file.
+ * file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);
+ *
+ * // Open an existing dataset.
+ * dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
+ *
+ * // read the entire dataset, into 'dset_data': memory type is 'native int'
+ * // read the entire dataspace to the entire dataspace, no transfer properties,
+ * status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);
+ *
+ * status = H5Dclose(dataset_id);
+ * \endcode
+ *
* \subsubsection subsubsec_dataset_program_read Retrieve the Properties of a Dataset
+ * The functions listed below allow the user to retrieve information regarding a dataset including
+ * the datatype, the dataspace, the dataset creation property list, and the total stored size of the data.
+ *
+ * <table>
+ * <caption>Retrieve dataset information</caption>
+ * <tr>
+ * <th>Query Function</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>H5Dget_space</td>
+ * <td>Retrieve the dataspace of the dataset as stored in the file.</td>
+ * </tr>
+ * <tr>
+ * <td>H5Dget_type</td>
+ * <td>Retrieve the datatype of the dataset as stored in the file.</td>
+ * </tr>
+ * <tr>
+ * <td>H5Dget_create_plist</td>
+ * <td>Retrieve the dataset creation properties.</td>
+ * </tr>
+ * <tr>
+ * <td>H5Dget_storage_size</td>
+ * <td>Retrieve the total bytes for all the data of the dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>H5Dvlen_get_buf_size</td>
+ * <td>Retrieve the total bytes for all the variable-length data of the dataset.</td>
+ * </tr>
+ * </table>
+ *
+ * The example below illustrates how to retrieve dataset information.
+ *
+ * <em> Example 6. Retrieve dataset</em>
+ * \code
+ * hid_t file_id, dataset_id;
+ * hid_t dspace_id, dtype_id, plist_id;
+ * herr_t status;
+ *
+ * // Open an existing file.
+ * file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);
+ *
+ * // Open an existing dataset.
+ * dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
+ * dspace_id = H5Dget_space(dataset_id);
+ * dtype_id = H5Dget_type(dataset_id);
+ * plist_id = H5Dget_create_plist(dataset_id);
+ *
+ * // use the objects to discover the properties of the dataset
+ * status = H5Dclose(dataset_id);
+ * \endcode
+ *
* \subsection subsec_dataset_transfer Data Transfer
+ * The HDF5 library implements data transfers through a pipeline which implements data
+ * transformations (according to the datatype and selections), chunking (as requested), and I/O
+ * operations using different mechanisms (file drivers). The pipeline is automatically configured by
+ * the HDF5 library. Metadata is stored in the file so that the correct pipeline can be constructed to
+ * retrieve the data. In addition, optional filters such as compression may be added to the standard
+ * pipeline.
+ *
+ * The figure below illustrates data layouts for different layers of an application using HDF5. The
+ * application data is organized as a multidimensional array of elements. The HDF5 format
+ * specification defines the stored layout of the data and metadata. The storage layout properties
+ * define the organization of the abstract data. This data is written to and read from some storage
+ * medium.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig4.gif "Data layouts in an application"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * The last stage of a write (and first stage of a read) is managed by an HDF5 file driver module.
+ * The virtual file layer of the HDF5 Library implements a standard interface to alternative I/O
+ * methods, including memory (AKA “core”) files, single serial file I/O, multiple file I/O, and
+ * parallel I/O. The file driver maps a simple abstract HDF5 file to the specific access methods.
+ *
+ * The raw data of an HDF5 dataset is conceived to be a multidimensional array of data elements.
+ * This array may be stored in the file according to several storage strategies:
+ * \li Contiguous
+ * \li Chunked
+ * \li Compact
+ *
+ * The storage strategy does not affect data access methods except that certain operations may be
+ * more or less efficient depending on the storage strategy and the access patterns.
+ *
+ * Overall, the data transfer operations (#H5Dread and #H5Dwrite) work identically for any storage
+ * method, for any file driver, and for any filters and transformations. The HDF5 library
+ * automatically manages the data transfer process. In some cases, transfer properties should or
+ * must be used to pass additional parameters such as MPI/IO directives when using the parallel file
+ * driver.
+ *
* \subsubsection subsubsec_dataset_transfer_pipe The Data Pipeline
+ * When data is written or read to or from an HDF5 file, the HDF5 library passes the data through a
+ * sequence of processing steps which are known as the HDF5 data pipeline. This data pipeline
+ * performs operations on the data in memory such as byte swapping, alignment, scatter-gather, and
+ * hyperslab selections. The HDF5 library automatically determines which operations are needed
+ * and manages the organization of memory operations such as extracting selected elements from a
+ * data block. The data pipeline modules operate on data buffers: each module processes a buffer
+ * and passes the transformed buffer to the next stage.
+ *
+ * The table below lists the stages of the data pipeline. The figure below the table shows the order
+ * of processing during a read or write.
+ *
+ * <table>
+ * <caption>Stages of the data pipeline</caption>
+ * <tr>
+ * <th>Layers</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>I/O initiation</td>
+ * <td>Initiation of HDF5 I/O activities (#H5Dwrite and #H5Dread) in a user’s application program.</td>
+ * </tr>
+ * <tr>
+ * <td>Memory hyperslab operation</td>
+ * <td>Data is scattered to (for read), or gathered from (for write) the application’s memory buffer
+ * (bypassed if no datatype conversion is needed).</td>
+ * </tr>
+ * <tr>
+ * <td>Datatype conversion</td>
+ * <td>Datatype is converted if it is different between memory and storage (bypassed if no datatype
+ * conversion is needed).</td>
+ * </tr>
+ * <tr>
+ * <td>File hyperslab operation</td>
+ * <td>Data is gathered from (for read), or scattered to (for write) to file space in memory (bypassed
+ * if no datatype conversion is needed).</td>
+ * </tr>
+ * <tr>
+ * <td>Filter pipeline</td>
+ * <td>Data is processed by filters when it passes. Data can be modified and restored here (bypassed
+ * if no datatype conversion is needed, no filter is enabled, or dataset is not chunked).</td>
+ * </tr>
+ * <tr>
+ * <td>Virtual File Layer</td>
+ * <td>Facilitate easy plug-in file drivers such as MPIO or POSIX I/O.</td>
+ * </tr>
+ * <tr>
+ * <td>Actual I/O</td>
+ * <td>Actual file driver used by the library such as MPIO or STDIO.</td>
+ * </tr>
+ * </table>
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig5.gif "The processing order in the data pipeline"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * The HDF5 library automatically applies the stages as needed.
+ *
+ * When the memory dataspace selection is other than the whole dataspace, the memory hyperslab
+ * stage scatters/gathers the data elements between the application memory (described by the
+ * selection) and a contiguous memory buffer for the pipeline. On a write, this is a gather operation;
+ * on a read, this is a scatter operation.
+ *
+ * When the memory datatype is different from the file datatype, the datatype conversion stage
+ * transforms each data element. For example, if data is written from 32-bit big-endian memory,
+ * and the file datatype is 32-bit little-endian, the datatype conversion stage will swap the bytes of
+ * every element. Similarly, when data is read from the file to native memory, byte swapping will
+ * be applied automatically when needed.
+ *
+ * The file hyperslab stage is similar to the memory hyperslab stage, but is managing the
+ * arrangement of the elements according to the dataspace selection. When data is read, data
+ * elements are gathered from the data blocks from the file to fill the contiguous buffers which are
+ * then processed by the pipeline. When data is read, the elements from a buffer are scattered to the
+ * data blocks of the file.
+ *
* \subsubsection subsubsec_dataset_transfer_filter Data Pipeline Filters
+ * In addition to the standard pipeline, optional stages, called filters, can be inserted in the pipeline.
+ * The standard distribution includes optional filters to implement compression and error checking.
+ * User applications may add custom filters as well.
+ *
+ * The HDF5 library distribution includes or employs several optional filters. These are listed in the
+ * table below. The filters are applied in the pipeline between the virtual file layer and the file
+ * hyperslab operation. See the figure above. The application can use any number of filters in any
+ * order.
+ *
+ * <table>
+ * <caption>Data pipeline filters</caption>
+ * <tr>
+ * <th>Filter</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>gzip compression</td>
+ * <td>Data compression using zlib.</td>
+ * </tr>
+ * <tr>
+ * <td>Szip compression</td>
+ * <td>Data compression using the Szip library. See The HDF Group website for more information
+ * regarding the Szip filter.</td>
+ * </tr>
+ * <tr>
+ * <td>N-bit compression</td>
+ * <td>Data compression using an algorithm specialized for n-bit datatypes.</td>
+ * </tr>
+ * <tr>
+ * <td>Scale-offset compression</td>
+ * <td>Data compression using a “scale and offset” algorithm.</td>
+ * </tr>
+ * <tr>
+ * <td>Shuffling</td>
+ * <td>To improve compression performance, data is regrouped by its byte position in the data
+ * unit. In other words, the 1st, 2nd, 3rd, and 4th bytes of integers are stored together
+ * respectively.</td>
+ * </tr>
+ * <tr>
+ * <td>Fletcher32</td>
+ * <td>Fletcher32 checksum for error-detection.</td>
+ * </tr>
+ * </table>
+ *
+ * Filters may be used only for chunked data and are applied to chunks of data between the file
+ * hyperslab stage and the virtual file layer. At this stage in the pipeline, the data is organized as
+ * fixed-size blocks of elements, and the filter stage processes each chunk separately.
+ *
+ * Filters are selected by dataset creation properties, and some behavior may be controlled by data
+ * transfer properties. The library determines what filters must be applied and applies them in the
+ * order in which they were set by the application. That is, if an application calls
+ * #H5Pset_shuffle and then #H5Pset_deflate when creating a dataset’s creation property list, the
+ * library will apply the shuffle filter first and then the deflate filter.
+ *
+ * For more information,
+ * \li @see @ref subsubsec_dataset_filters_nbit
+ * \li @see @ref subsubsec_dataset_filters_scale
+ *
* \subsubsection subsubsec_dataset_transfer_drive File Drivers
+ * I/O is performed by the HDF5 virtual file layer. The file driver interface writes and reads blocks
+ * of data; each driver module implements the interface using different I/O mechanisms. The table
+ * below lists the file drivers currently supported. Note that the I/O mechanisms are separated from
+ * the pipeline processing: the pipeline and filter operations are identical no matter what data access
+ * mechanism is used.
+ *
+ * <table>
+ * <caption>I/O file drivers</caption>
+ * <tr>
+ * <th>File Driver</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>#H5FD_CORE</td>
+ * <td>Store in memory (optional backing store to disk file).</td>
+ * </tr>
+ * <tr>
+ * <td>#H5FD_FAMILY</td>
+ * <td>Store in a set of files.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5FD_LOG</td>
+ * <td>Store in logging file.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5FD_MPIO</td>
+ * <td>Store using MPI/IO.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5FD_MULTI</td>
+ * <td>Store in multiple files. There are several options to control layout.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5FD_SEC2</td>
+ * <td>Serial I/O to file using Unix “section 2” functions.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5FD_STDIO</td>
+ * <td>Serial I/O to file using Unix “stdio” functions.</td>
+ * </tr>
+ * </table>
+ *
+ * Each file driver writes/reads contiguous blocks of bytes from a logically contiguous address
+ * space. The file driver is responsible for managing the details of the different physical storage
+ * methods.
+ *
+ * In serial environments, everything above the virtual file layer tends to work identically no matter
+ * what storage method is used.
+ *
+ * Some options may have substantially different performance depending on the file driver that is
+ * used. In particular, multi-file and parallel I/O may perform considerably differently from serial
+ * drivers depending on chunking and other settings.
+ *
* \subsubsection subsubsec_dataset_transfer_props Data Transfer Properties to Manage the Pipeline
+ * Data transfer properties set optional parameters that control parts of the data pipeline. The
+ * function listing below shows transfer properties that control the behavior of the library.
+ *
+ * <table>
+ * <caption>Data transfer property list functions</caption>
+ * <tr>
+ * <th>C Function</th>
+ * <th>Purpose</th>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_buffer</td>
+ * <td>Maximum size for the type conversion buffer and the background buffer. May also supply
+ * pointers to application-allocated buffers.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_hyper_vector_size</td>
+ * <td>set the number of "I/O vectors" (offset and length pairs) which are to be
+ * accumulated in memory before being issued to the lower levels
+ * of the library for reading or writing the actual data.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_btree_ratios</td>
+ * <td>Set the B-tree split ratios for a dataset transfer property list. The split ratios determine
+ * what percent of children go in the first node when a node splits.</td>
+ * </tr>
+ * </table>
+ *
+ * Some filters and file drivers require or use additional parameters from the application program.
+ * These can be passed in the data transfer property list. The table below shows file driver property
+ * list functions.
+ *
+ * <table>
+ * <caption>File driver property list functions</caption>
+ * <tr>
+ * <th>C Function</th>
+ * <th>Purpose</th>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_dxpl_mpio</td>
+ * <td>Control the MPI I/O transfer mode (independent or collective) during data I/O operations.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_small_data_block_size</td>
+ * <td>Reserves blocks of size bytes for the contiguous storage of the raw data portion of small
+ * datasets. The HDF5 Library then writes the raw data from small datasets to this reserved space
+ * which reduces unnecessary discontinuities within blocks of metadata and improves
+ * I/O performance.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_edc_check</td>
+ * <td>Disable/enable EDC checking for read. When selected, EDC is always written.</td>
+ * </tr>
+ * </table>
+ *
+ * The transfer properties are set in a property list which is passed as a parameter of the #H5Dread or
+ * #H5Dwrite call. The transfer properties are passed to each pipeline stage. Each stage may use or
+ * ignore any property in the list. In short, there is one property list that contains all the properties.
+ *
* \subsubsection subsubsec_dataset_transfer_store Storage Strategies
+ * The raw data is conceptually a multi-dimensional array of elements that is stored as a contiguous
+ * array of bytes. The data may be physically stored in the file in several ways. The table below lists
+ * the storage strategies for a dataset.
+ *
+ * <table>
+ * <caption> Dataset storage strategies</caption>
+ * <tr>
+ * <th>Storage Strategy</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>Contiguous</td>
+ * <td>The dataset is stored as one continuous array of bytes.</td>
+ * </tr>
+ * <tr>
+ * <td>Chunked </td>
+ * <td>The dataset is stored as fixed-size chunks.</td>
+ * </tr>
+ * <tr>
+ * <td>Compact</td>
+ * <td>A small dataset is stored in the metadata header.</td>
+ * </tr>
+ * </table>
+ *
+ * The different storage strategies do not affect the data transfer operations of the dataset: reads and
+ * writes work the same for any storage strategy.
+ *
+ * These strategies are described in the following sections.
+ *
+ * <h4>Contiguous</h4>
+ * A contiguous dataset is stored in the file as a header and a single continuous array of bytes. See
+ * the figure below. In the case of a multi-dimensional array, the data is serialized in row major order. By
+ * default, data is stored contiguously.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig6.gif "Contiguous data storage"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * Contiguous storage is the simplest model. It has several limitations. First, the dataset must be a
+ * fixed-size: it is not possible to extend the limit of the dataset or to have unlimited dimensions. In
+ * other words, if the number of dimensions of the array might change over time, then chunking
+ * storage must be used instead of contiguous. Second, because data is passed through the pipeline
+ * as fixed-size blocks, compression and other filters cannot be used with contiguous data.
+ *
+ * <h4>Chunked</h4>
+ * The data of a dataset may be stored as fixed-size chunks. A chunk is a hyper-
+ * rectangle of any shape. When a dataset is chunked, each chunk is read or written as a single I/O
+ * operation, and individually passed from stage to stage of the data pipeline.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig7.gif "Chunked data storage"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * Chunks may be any size and shape that fits in the dataspace of the dataset. For example, a three
+ * dimensional dataspace can be chunked as 3-D cubes, 2-D planes, or 1-D lines. The chunks may
+ * extend beyond the size of the dataspace. For example, a 3 x 3 dataset might by chunked in 2 x 2
+ * chunks. Sufficient chunks will be allocated to store the array, and any extra space will not be
+ * accessible. So, to store the 3 x 3 array, four 2 x 2 chunks would be allocated with 5 unused
+ * elements stored.
+ *
+ * Chunked datasets can be unlimited in any direction and can be compressed or filtered.
+ *
+ * Since the data is read or written by chunks, chunking can have a dramatic effect on performance
+ * by optimizing what is read and written. Note, too, that for specific access patterns such as
+ * parallel I/O, decomposition into chunks can have a large impact on performance.
+ *
+ * Two restrictions have been placed on chunk shape and size:
+ * <ul><li> The rank of a chunk must be less than or equal to the rank of the dataset</li>
+ * <li> Chunk size cannot exceed the size of a fixed-size dataset; for example, a dataset consisting of
+ * a 5 x 4 fixed-size array cannot be defined with 10 x 10 chunks</li></ul>
+ *
+ * <h4>Compact</h4>
+ * For contiguous and chunked storage, the dataset header information and data are stored in two
+ * (or more) blocks. Therefore, at least two I/O operations are required to access the data: one to
+ * access the header, and one (or more) to access data. For a small dataset, this is considerable
+ * overhead.
+ *
+ * A small dataset may be stored in a continuous array of bytes in the header block using the
+ * compact storage option. This dataset can be read entirely in one operation which retrieves the
+ * header and data. The dataset must fit in the header. This may vary depending on the metadata
+ * that is stored. In general, a compact dataset should be approximately 30 KB or less total size.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig8.gif "Compact data storage"
+ * </td>
+ * </tr>
+ * </table>
+ *
* \subsubsection subsubsec_dataset_transfer_partial Partial I/O Sub‐setting and Hyperslabs
+ * Data transfers can write or read some of the data elements of the dataset. This is controlled by
+ * specifying two selections: one for the source and one for the destination. Selections are specified
+ * by creating a dataspace with selections.
+ *
+ * Selections may be a union of hyperslabs or a list of points. A hyperslab is a contiguous hyper-
+ * rectangle from the dataspace. Selected fields of a compound datatype may be read or written. In
+ * this case, the selection is controlled by the memory and file datatypes.
+ *
+ * Summary of procedure:
+ * \li 1. Open the dataset
+ * \li 2. Define the memory datatype
+ * \li 3. Define the memory dataspace selection and file dataspace selection
+ * \li 4. Transfer data (#H5Dread or #H5Dwrite)
+ *
+ * For more information,
+ * @see @ref sec_dataspace
+ *
* \subsection subsec_dataset_allocation Allocation of Space in the File
+ * When a dataset is created, space is allocated in the file for its header and initial data. The amount
+of space allocated when the dataset is created depends on the storage properties. When the
+dataset is modified (data is written, attributes added, or other changes), additional storage may be
+allocated if necessary.
+ *
+ * <table>
+ * <caption>Initial dataset size</caption>
+ * <tr>
+ * <th>Object</th>
+ * <th>Size</th>
+ * </tr>
+ * <tr>
+ * <td>Header</td>
+ * <td>Variable, but typically around 256 bytes at the creation of a simple dataset with a simple
+ * datatype.</td>
+ * </tr>
+ * <tr>
+ * <td>Data</td>
+ * <td>Size of the data array (number of elements x size of element). Space allocated in
+ * the file depends on the storage strategy and the allocation strategy.</td>
+ * </tr>
+ * </table>
+ *
+ * <h4>Header</h4>
+ * A dataset header consists of one or more header messages containing persistent metadata
+ * describing various aspects of the dataset. These records are defined in the HDF5 File Format
+ * Specification. The amount of storage required for the metadata depends on the metadata to be
+ * stored. The table below summarizes the metadata.
+ *
+ * <table>
+ * <caption>Metadata storage sizes</caption>
+ * <tr>
+ * <th>Header Information</th>
+ * <th>Approximate Storage Size</th>
+ * </tr>
+ * <tr>
+ * <td>Datatype (required)</td>
+ * <td>Bytes or more. Depends on type.</td>
+ * </tr>
+ * <tr>
+ * <td>Dataspace (required)</td>
+ * <td>Bytes or more. Depends on number of dimensions and hsize_t.</td>
+ * </tr>
+ * <tr>
+ * <td>Layout (required)</td>
+ * <td>Points to the stored data. Bytes or more. Depends on hsize_t and number of dimensions.</td>
+ * </tr>
+ * <tr>
+ * <td>Filters</td>
+ * <td>Depends on the number of filters. The size of the filter message depends on the name and
+ * data that will be passed.</td>
+ * </tr>
+ * </table>
+ *
+ * The header blocks also store the name and values of attributes, so the total storage depends on
+ * the number and size of the attributes.
+ *
+ * In addition, the dataset must have at least one link, including a name, which is stored in the file
+ * and in the group it is linked from.
+ *
+ * The different storage strategies determine when and how much space is allocated for the data
+ * array. See the discussion of fill values below for a detailed explanation of the storage allocation.
+ *
+ * <h4>Contiguous Storage</h4>
+ * For a continuous storage option, the data is stored in a single, contiguous block in the file. The
+ * data is nominally a fixed-size, (number of elements x size of element). The figure below shows
+ * an example of a two dimensional array stored as a contiguous dataset.
+ *
+ * Depending on the fill value properties, the space may be allocated when the dataset is created or
+ * when first written (default), and filled with fill values if specified. For parallel I/O, by default the
+ * space is allocated when the dataset is created.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig9.gif "A two dimensional array stored as a contiguous dataset"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * <h4>Chunked Storage</h4>
+ * For chunked storage, the data is stored in one or more chunks. Each chunk is a continuous block
+ * in the file, but chunks are not necessarily stored contiguously. Each chunk has the same size. The
+ * data array has the same nominal size as a contiguous array (number of elements x size of
+ * element), but the storage is allocated in chunks, so the total size in the file can be larger than the
+ * nominal size of the array. See the figure below.
+ *
+ * If a fill value is defined, each chunk will be filled with the fill value. Chunks must be allocated
+ * when data is written, but they may be allocated when the file is created, as the file expands, or
+ * when data is written.
+ *
+ * For serial I/O, by default chunks are allocated incrementally, as data is written to the chunk. For
+ * a sparse dataset, chunks are allocated only for the parts of the dataset that are written. In this
+ * case, if the dataset is extended, no storage is allocated.
+ *
+ * For parallel I/O, by default chunks are allocated when the dataset is created or extended with fill
+ * values written to the chunk.
+ *
+ * In either case, the default can be changed using fill value properties. For example, using serial
+ * I/O, the properties can select to allocate chunks when the dataset is created.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig10.gif "A two dimensional array stored in chunks"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * <h4>Changing Dataset Dimensions</h4>
+ * #H5Dset_extent is used to change the current dimensions of the dataset within the limits of the
+ * dataspace. Each dimension can be extended up to its maximum or unlimited. Extending the
+ * dataspace may or may not allocate space in the file and may or may not write fill values, if they
+ * are defined. See the example code below.
+ *
+ * The dimensions of the dataset can also be reduced. If the sizes specified are smaller than the
+ * dataset’s current dimension sizes, #H5Dset_extent will reduce the dataset’s dimension sizes to the
+ * specified values. It is the user’s responsibility to ensure that valuable data is not lost;
+ * #H5Dset_extent does not check.
+ *
+ * <em>Using #H5Dset_extent to increase the size of a dataset</em>
+ * \code
+ * hid_t file_id, dataset_id;
+ * herr_t status;
+ * size_t newdims[2];
+ *
+ * // Open an existing file.
+ * file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);
+ *
+ * // Open an existing dataset.
+ * dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
+ *
+ * // Example: dataset is 2 x 3, each dimension is UNLIMITED
+ * // extend to 2 x 7
+ * newdims[0] = 2;
+ * newdims[1] = 7;
+ * status = H5Dset_extent(dataset_id, newdims);
+ *
+ * // dataset is now 2 x 7
+ *
+ * status = H5Dclose(dataset_id);
+ * \endcode
+ *
* \subsubsection subsubsec_dataset_allocation_store Storage Allocation in the File: Early, Incremental, Late
+ * The HDF5 Library implements several strategies for when storage is allocated if and when it is
+ * filled with fill values for elements not yet written by the user. Different strategies are
+ * recommended for different storage layouts and file drivers. In particular, a parallel program
+ * needs storage allocated during a collective call (for example, create or extend), while serial
+ * programs may benefit from delaying the allocation until the data is written.
+ *
+ * Two file creation properties control when to allocate space, when to write the fill value, and the
+ * actual fill value to write.
+ *
+ * <h4>When to Allocate Space</h4>
+ * The table below shows the options for when data is allocated in the file. Early allocation is done
+ * during the dataset create call. Certain file drivers (especially MPI-I/O and MPI-POSIX) require
+ * space to be allocated when a dataset is created, so all processors will have the correct view of the
+ * data.
+ *
+ * <table>
+ * <caption>File storage allocation options</caption>
+ * <tr>
+ * <th>Strategy</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>Early</td>
+ * <td>Allocate storage for the dataset immediately when the dataset is created.</td>
+ * </tr>
+ * <tr>
+ * <td>Late</td>
+ * <td>Defer allocating space for storing the dataset until the dataset is written.</td>
+ * </tr>
+ * <tr>
+ * <td>Incremental</td>
+ * <td>Defer allocating space for storing each chunk until the chunk is written.</td>
+ * </tr>
+ * <tr>
+ * <td>Default</td>
+ * <td>Use the strategy (Early, Late, or Incremental) for the storage method and
+ * access method. This is the recommended strategy.</td>
+ * </tr>
+ * </table>
+ *
+ * Late allocation is done at the time of the first write to dataset. Space for the whole dataset is
+ * allocated at the first write.
+ *
+ * Incremental allocation (chunks only) is done at the time of the first write to the chunk. Chunks
+ * that have never been written are not allocated in the file. In a sparsely populated dataset, this
+ * option allocates chunks only where data is actually written.
+ *
+ * The “Default” property selects the option recommended as appropriate for the storage method
+ * and access method. The defaults are shown in the table below. Note that Early allocation is
+ * recommended for all Parallel I/O, while other options are recommended as the default for serial
+ * I/O cases.
+ *
+ * <table>
+ * <caption>Default storage options</caption>
+ * <tr>
+ * <th>Storage Type</th>
+ * <th>Serial I/O</th>
+ * <th>Parallel I/O</th>
+ * </tr>
+ * <tr>
+ * <td>Contiguous</td>
+ * <td>Late</td>
+ * <td>Early</td>
+ * </tr>
+ * <tr>
+ * <td>Chunked</td>
+ * <td>Incremental</td>
+ * <td>Early</td>
+ * </tr>
+ * <tr>
+ * <td>Compact</td>
+ * <td>Early</td>
+ * <td>Early</td>
+ * </tr>
+ * </table>
+ *
+ * <h4>When to Write the Fill Value</h4>
+ * The second property is when to write the fill value. The possible values are “Never” and
+ * “Allocation”. The table below shows these options.
+ *
+ * <table>
+ * <caption>When to write fill values</caption>
+ * <tr>
+ * <th>When</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>Never</td>
+ * <td>Fill value will never be written.</td>
+ * </tr>
+ * <tr>
+ * <td>Allocation</td>
+ * <td>Fill value is written when space is allocated. (Default for chunked and contiguous
+ * data storage.)</td>
+ * </tr>
+ * </table>
+ *
+ * <h4>What Fill Value to Write</h4>
+ * The third property is the fill value to write. The table below shows the values. By default, the
+ * data is filled with zeros. The application may choose no fill value (Undefined). In this case,
+ * uninitialized data may have random values. The application may define a fill value of an
+ * appropriate type. For more information, @see @ref subsec_datatype_fill.
+ *
+ * <table>
+ * <caption>Fill values to write</caption>
+ * <tr>
+ * <th>What to Write</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>Default</td>
+ * <td>By default, the library fills allocated space with zeros.</td>
+ * </tr>
+ * <tr>
+ * <td>Undefined</td>
+ * <td>Allocated space is filled with random values.</td>
+ * </tr>
+ * <tr>
+ * <td>User-defined</td>
+ * <td>The application specifies the fill value.</td>
+ * </tr>
+ * </table>
+ *
+ * Together these three properties control the library’s behavior. The table below summarizes the
+ * possibilities during the dataset create-write-close cycle.
+ *
+ * <table>
+ * <caption>Storage allocation and fill summary</caption>
+ * <tr>
+ * <th>When to allocate space</th>
+ * <th>When to write fill value</th>
+ * <th>What fill value to write</th>
+ * <th>Library create-write-close behavior</th>
+ * </tr>
+ * <tr>
+ * <td>Early</td>
+ * <td>Never</td>
+ * <td>-</td>
+ * <td>Library allocates space when dataset is created, but never writes a fill value to dataset. A read
+ * of unwritten data returns undefined values.</td>
+ * </tr>
+ * <tr>
+ * <td>Late</td>
+ * <td>Never</td>
+ * <td>-</td>
+ * <td>Library allocates space when dataset is written to, but never writes a fill value to the dataset. A
+ * read of unwritten data returns undefined values.</td>
+ * </tr>
+ * <tr>
+ * <td>Incremental</td>
+ * <td>Never</td>
+ * <td>-</td>
+ * <td>Library allocates space when a dataset or chunk (whichever is the smallest unit of space)
+ * is written to, but it never writes a fill value to a dataset or a chunk. A read of unwritten data
+ * returns undefined values.</td>
+ * </tr>
+ * <tr>
+ * <td>-</td>
+ * <td>Allocation</td>
+ * <td>Undefined</td>
+ * <td>Error on creating the dataset. The dataset is not created.</td>
+ * </tr>
+ * <tr>
+ * <td>Early</td>
+ * <td>Allocation</td>
+ * <td>Default or User-defined</td>
+ * <td>Allocate space for the dataset when the dataset is created. Write the fill value (default or
+ * user-defined) to the entire dataset when the dataset is created.</td>
+ * </tr>
+ * <tr>
+ * <td>Late</td>
+ * <td>Allocation</td>
+ * <td>Default or User-define</td>
+ * <td>Allocate space for the dataset when the application first writes data values to the dataset.
+ * Write the fill value to the entire dataset before writing application data values.</td>
+ * </tr>
+ * <tr>
+ * <td>Incremental</td>
+ * <td>Allocation</td>
+ * <td>Default or User-define</td>
+ * <td>Allocate space for the dataset when the application first writes data values to the dataset or
+ * chunk (whichever is the smallest unit of space). Write the fill value to the entire dataset
+ * or chunk before writing application data values.</td>
+ * </tr>
+ * </table>
+ *
+ * During the #H5Dread function call, the library behavior depends on whether space has been
+ * allocated, whether the fill value has been written to storage, how the fill value is defined, and
+ * when to write the fill value. The table below summarizes the different behaviors.
+ *
+ * <table>
+ * <caption>H5Dread summary</caption>
+ * <tr>
+ * <th>Is space allocated in the file?</th>
+ * <th>What is the fill value?</th>
+ * <th>When to write the fill value?</th>
+ * <th>Library read behavior</th>
+ * </tr>
+ * <tr>
+ * <td>No</td>
+ * <td>Undefined</td>
+ * <td>anytime</td>
+ * <td>Error. Cannot create this dataset.</td>
+ * </tr>
+ * <tr>
+ * <td>No</td>
+ * <td>Default or User-define</td>
+ * <td>anytime</td>
+ * <td>Fill the memory buffer with the fill value.</td>
+ * </tr>
+ * <tr>
+ * <td>Yes</td>
+ * <td>Undefined</td>
+ * <td>anytime</td>
+ * <td>Return data from storage (dataset). Trash is possible if the application has not written data
+ * to the portion of the dataset being read.</td>
+ * </tr>
+ * <tr>
+ * <td>Yes</td>
+ * <td>Default or User-define</td>
+ * <td>Never</td>
+ * <td>Return data from storage (dataset). Trash is possible if the application has not written data
+ * to the portion of the dataset being read.</td>
+ * </tr>
+ * <tr>
+ * <td>Yes</td>
+ * <td>Default or User-define</td>
+ * <td>Allocation</td>
+ * <td>Return data from storage (dataset).</td>
+ * </tr>
+ * </table>
+ *
+ * There are two cases to consider depending on whether the space in the file has been allocated
+ * before the read or not. When space has not yet been allocated and if a fill value is defined, the
+ * memory buffer will be filled with the fill values and returned. In other words, no data has been
+ * read from the disk. If space has been allocated, the values are returned from the stored data. The
+ * unwritten elements will be filled according to the fill value.
+ *
* \subsubsection subsubsec_dataset_allocation_delete Deleting a Dataset from a File and Reclaiming Space
+ * HDF5 does not at this time provide an easy mechanism to remove a dataset from a file or to
+ * reclaim the storage space occupied by a deleted object.
+ *
+ * Removing a dataset and reclaiming the space it used can be done with the #H5Ldelete function
+ * and the h5repack utility program. With the H5Ldelete function, links to a dataset can be removed
+ * from the file structure. After all the links have been removed, the dataset becomes inaccessible to
+ * any application and is effectively removed from the file. The way to recover the space occupied
+ * by an unlinked dataset is to write all of the objects of the file into a new file. Any unlinked object
+ * is inaccessible to the application and will not be included in the new file. Writing objects to a
+ * new file can be done with a custom program or with the h5repack utility program.
+ *
+ * For more information, @see @ref sec_group
+ *
* \subsubsection subsubsec_dataset_allocation_release Releasing Memory Resources
+ * The system resources required for HDF5 objects such as datasets, datatypes, and dataspaces
+ * should be released once access to the object is no longer needed. This is accomplished via the
+ * appropriate close function. This is not unique to datasets but a general requirement when
+ * working with the HDF5 Library; failure to close objects will result in resource leaks.
+ *
+ * In the case where a dataset is created or data has been transferred, there are several objects that
+ * must be closed. These objects include datasets, datatypes, dataspaces, and property lists.
+ *
+ * The application program must free any memory variables and buffers it allocates. When
+ * accessing data from the file, the amount of memory required can be determined by calculating
+ * the size of the memory datatype and the number of elements in the memory selection.
+ *
+ * Variable-length data are organized in two or more areas of memory. For more information,
+ * \see \ref h4_vlen_datatype "Variable-length Datatypes".
+ *
+ * When writing data, the application creates an array of
+ * vl_info_t which contains pointers to the elements. The elements might be, for example, strings.
+ * In the file, the variable-length data is stored in two parts: a heap with the variable-length values
+ * of the data elements and an array of vl_info_t elements. When the data is read, the amount of
+ * memory required for the heap can be determined with the #H5Dvlen_get_buf_size call.
+ *
+ * The data transfer property may be used to set a custom memory manager for allocating variable-
+ * length data for a #H5Dread. This is set with the #H5Pset_vlen_mem_manager call.
+ * To free the memory for variable-length data, it is necessary to visit each element, free the
+ * variable-length data, and reset the element. The application must free the memory it has
+ * allocated. For memory allocated by the HDF5 Library during a read, the #H5Dvlen_reclaim
+ * function can be used to perform this operation.
+ *
* \subsubsection subsubsec_dataset_allocation_ext External Storage Properties
+ * The external storage format allows data to be stored across a set of non-HDF5 files. A set of
+ * segments (offsets and sizes) in one or more files is defined as an external file list, or EFL, and
+ * the contiguous logical addresses of the data storage are mapped onto these segments. Currently,
+ * only the #H5D_CONTIGUOUS storage format allows external storage. External storage is
+ * enabled by a dataset creation property. The table below shows the API.
+ *
+ * <table>
+ * <caption>External storage API</caption>
+ * <tr>
+ * <th>Function</th>
+ * <th>Description</th>
+ * </tr>
+ * <tr>
+ * <td>#H5Pset_external</td>
+ * <td>This function adds a new segment to the end of the external file list of the specified dataset
+ * creation property list. The segment begins a byte offset of file name and continues for size
+ * bytes. The space represented by this segment is adjacent to the space already represented by
+ * the external file list. The last segment in a file list may have the size #H5F_UNLIMITED, in
+ * which case the external file may be of unlimited size and no more files can be added to the
+ * external files list.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_external_count</td>
+ * <td>Calling this function returns the number of segments in an external file list. If the dataset
+ * creation property list has no external data, then zero is returned.</td>
+ * </tr>
+ * <tr>
+ * <td>#H5Pget_external</td>
+ * <td>This is the counterpart for the #H5Pset_external function. Given a dataset creation
+ * property list and a zero-based index into that list, the file name, byte offset, and segment
+ * size are returned through non-null arguments. At most name_size characters are copied into
+ * the name argument which is not null terminated if the file name is longer than the
+ * supplied name buffer (this is similar to strncpy()).</td>
+ * </tr>
+ * </table>
+ *
+ * The figure below shows an example of how a contiguous, one-dimensional dataset is partitioned
+ * into three parts and each of those parts is stored in a segment of an external file. The top
+ * rectangle represents the logical address space of the dataset while the bottom rectangle represents
+ * an external file.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig11.gif "External file storage"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * The example below shows code that defines the external storage for the example. Note that the
+ * segments are defined in order of the logical addresses they represent, not their order within the
+ * external file. It would also have been possible to put the segments in separate files. Care should
+ * be taken when setting up segments in a single file since the library does not automatically check
+ * for segments that overlap.
+ *
+ * <em>External storage</em>
+ * \code
+ * plist = H5Pcreate (H5P_DATASET_CREATE);
+ * H5Pset_external (plist, "velocity.data", 3000, 1000);
+ * H5Pset_external (plist, "velocity.data", 0, 2500);
+ * H5Pset_external (plist, "velocity.data", 4500, 1500);
+ * \endcode
+ *
+ * The figure below shows an example of how a contiguous, two-dimensional dataset is partitioned
+ * into three parts and each of those parts is stored in a separate external file. The top rectangle
+ * represents the logical address space of the dataset while the bottom rectangles represent external
+ * files.
+ *
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_fig12.gif "Partitioning a 2-D dataset for external storage"
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * The example below shows code for the partitioning described above. In this example, the library
+ * maps the multi-dimensional array onto a linear address space as defined by the HDF5 format
+ * specification, and then maps that address space into the segments defined in the external file list.
+ *
+ * <em>Partitioning a 2-D dataset for external storage</em>
+ * \code
+ * plist = H5Pcreate (H5P_DATASET_CREATE);
+ * H5Pset_external (plist, "scan1.data", 0, 24);
+ * H5Pset_external (plist, "scan2.data", 0, 24);
+ * H5Pset_external (plist, "scan3.data", 0, 16);
+ * \endcode
+ *
+ * The segments of an external file can exist beyond the end of the (external) file. The library reads
+ * that part of a segment as zeros. When writing to a segment that exists beyond the end of a file,
+ * the external file is automatically extended. Using this feature, one can create a segment (or set of
+ * segments) which is larger than the current size of the dataset. This allows the dataset to be
+ * extended at a future time (provided the dataspace also allows the extension).
+ *
+ * All referenced external data files must exist before performing raw data I/O on the dataset. This
+ * is normally not a problem since those files are being managed directly by the application or
+ * indirectly through some other library. However, if the file is transferred from its original context,
+ * care must be taken to assure that all the external files are accessible in the new location.
+ *
* \subsection subsec_dataset_filters Using HDF5 Filters
- * \subsubsection subsubsec_dataset_filters_nbit N‐bit Filter
- * \subsubsection subsubsec_dataset_filters_scale Scale‐offset Filter
- * \subsubsection subsubsec_dataset_filters_szip Szip Filter
+ * This section describes in detail how to use the n-bit, scale-offset filters and szip filters.
+ *
+ * \subsubsection subsubsec_dataset_filters_nbit Using the N‐bit Filter
+ * N-bit data has n significant bits, where n may not correspond to a precise number of bytes. On
+ * the other hand, computing systems and applications universally, or nearly so, run most efficiently
+ * when manipulating data as whole bytes or multiple bytes.
+ *
+ * Consider the case of 12-bit integer data. In memory, that data will be handled in at least 2 bytes,
+ * or 16 bits, and on some platforms in 4 or even 8 bytes. The size of such a dataset can be
+ * significantly reduced when written to disk if the unused bits are stripped out.
+ *
+ * The n-bit filter is provided for this purpose, packing n-bit data on output by stripping off all
+ * unused bits and unpacking on input, restoring the extra bits required by the computational
+ * processor.
+ *
+ * <h4>N-bit Datatype</h4>
+ * An n-bit datatype is a datatype of n significant bits. Unless it is packed, an n-bit datatype is
+ * presented as an n-bit bitfield within a larger-sized value. For example, a 12-bit datatype might be
+ * presented as a 12-bit field in a 16-bit, or 2-byte, value.
+ *
+ * Currently, the datatype classes of n-bit datatype or n-bit field of a compound datatype or an array
+ * datatype are limited to integer or floating-point.
+ *
+ * The HDF5 user can create an n-bit datatype through a series of function calls. For example, the
+ * following calls create a 16-bit datatype that is stored in a 32-bit value with a 4-bit offset:
+ * \code
+ * hid_t nbit_datatype = H5Tcopy(H5T_STD_I32LE);
+ * H5Tset_precision(nbit_datatype, 16);
+ * H5Tset_offset(nbit_datatype, 4);
+ * \endcode
+ *
+ * In memory, one value of the above example n-bit datatype would be stored on a little-endian
+ * machine as follows:
+ * <table>
+ * <tr>
+ * <th>byte 3</th>
+ * <th>byte 2</th>
+ * <th>byte 1</th>
+ * <th>byte 0</th>
+ * </tr>
+ * <tr>
+ * <td>????????</td>
+ * <td>????SPPP</td>
+ * <td>PPPPPPPP</td>
+ * <td>PPPP????</td>
+ * </tr>
+ * <tr>
+ * <td colspan="4">
+ * <em>Note: Key: S - sign bit, E - exponent bit, M - mantissa bit, ? - padding bit. Sign bit is
+ * included in signed integer datatype precision.</em>
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * <h4>N-bit Filter</h4>
+ * When data of an n-bit datatype is stored on disk using the n-bit filter, the filter packs the data by
+ * stripping off the padding bits; only the significant bits are retained and stored. The values on disk
+ * will appear as follows:
+ * <table>
+ * <tr>
+ * <th>1st value</th>
+ * <th>2nd value</th>
+ * <th>nth value</th>
+ * </tr>
+ * <tr>
+ * <td>SPPPPPPP PPPPPPPP</td>
+ * <td>SPPPPPPP PPPPPPPP</td>
+ * <td>...</td>
+ * </tr>
+ * <tr>
+ * <td colspan="3">
+ * <em>Note: Key: S - sign bit, E - exponent bit, M - mantissa bit, ? - padding bit. Sign bit is
+ * included in signed integer datatype precision.</em>
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * <h4>How Does the N-bit Filter Work?</h4>
+ * The n-bit filter always compresses and decompresses according to dataset properties supplied by
+ * the HDF5 library in the datatype, dataspace, or dataset creation property list.
+ *
+ * The dataset datatype refers to how data is stored in an HDF5 file while the memory datatype
+ * refers to how data is stored in memory. The HDF5 library will do datatype conversion when
+ * writing data in memory to the dataset or reading data from the dataset to memory if the memory
+ * datatype differs from the dataset datatype. Datatype conversion is performed by HDF5 library
+ * before n-bit compression and after n-bit decompression.
+ *
+ * The following sub-sections examine the common cases:
+ * \li N-bit integer conversions
+ * \li N-bit floating-point conversions
+ *
+ * <h4>N-bit Integer Conversions</h4>
+ * Integer data with a dataset of integer datatype of less than full precision and a memory datatype
+ * of #H5T_NATIVE_INT, provides the simplest application of the n-bit filter.
+ *
+ * The precision of #H5T_NATIVE_INT is 8 multiplied by sizeof(int). This value, the size of an
+ * int in bytes, differs from platform to platform; we assume a value of 4 for the following
+ * illustration. We further assume the memory byte order to be little-endian.
+ *
+ * In memory, therefore, the precision of #H5T_NATIVE_INT is 32 and the offset is 0. One value of
+ * #H5T_NATIVE_INT is laid out in memory as follows:
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_NbitInteger1.gif "H5T_NATIVE_INT in memory"<br />
+ * <em>Note: Key: S - sign bit, E - exponent bit, M - mantissa bit, ? - padding bit. Sign bit is
+ * included in signed integer datatype precision.</em>
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * Suppose the dataset datatype has a precision of 16 and an offset of 4. After HDF5 converts
+ * values from the memory datatype to the dataset datatype, it passes something like the following
+ * to the n-bit filter for compression:
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_NbitInteger2.gif "Passed to the n-bit filter"<br />
+ * <em>Note: Key: S - sign bit, E - exponent bit, M - mantissa bit, ? - padding bit. Sign bit is
+ * included in signed integer datatype precision.</em>
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * Notice that only the specified 16 bits (15 significant bits and the sign bit) are retained in the
+ * conversion. All other significant bits of the memory datatype are discarded because the dataset
+ * datatype calls for only 16 bits of precision. After n-bit compression, none of these discarded bits,
+ * known as padding bits will be stored on disk.
+ *
+ * <h4>N-bit Floating-point Conversions</h4>
+ * Things get more complicated in the case of a floating-point dataset datatype class. This sub-
+ * section provides an example that illustrates the conversion from a memory datatype of
+ * #H5T_NATIVE_FLOAT to a dataset datatype of class floating-point.
+ *
+ * As before, let the #H5T_NATIVE_FLOAT be 4 bytes long, and let the memory byte order be
+ * little-endian. Per the IEEE standard, one value of #H5T_NATIVE_FLOAT is laid out in memory
+ * as follows:
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_NbitFloating1.gif "H5T_NATIVE_FLOAT in memory"<br />
+ * <em>Note: Key: S - sign bit, E - exponent bit, M - mantissa bit, ? - padding bit. Sign bit is
+ * included in floating-point datatype precision.</em>
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * Suppose the dataset datatype has a precision of 20, offset of 7, mantissa size of 13, mantissa
+ * position of 7, exponent size of 6, exponent position of 20, and sign position of 26. For more
+ * information, @see @ref subsubsec_datatype_program_define.
+ *
+ * After HDF5 converts values from the memory datatype to the dataset datatype, it passes
+ * something like the following to the n-bit filter for compression:
+ * <table>
+ * <tr>
+ * <td>
+ * \image html Dsets_NbitFloating2.gif "Passed to the n-bit filter"<br />
+ * <em>Note: Key: S - sign bit, E - exponent bit, M - mantissa bit, ? - padding bit. Sign bit is
+ * included in floating-point datatype precision.</em>
+ * </td>
+ * </tr>
+ * </table>
+ *
+ * The sign bit and truncated mantissa bits are not changed during datatype conversion by the
+ * HDF5 library. On the other hand, the conversion of the 8-bit exponent to a 6-bit exponent is a
+ * little tricky:
+ *
+ * The bias for the new exponent in the n-bit datatype is:
+ * <code>
+ * 2<sup>(n-1)</sup>-1
+ * </code>
+ *
+ * The following formula is used for this exponent conversion:<br />
+ * <code>
+ * exp8 - (2<sup>(8-1)</sup> -1) = exp6 - (2<sup>(6-1)</sup>-1) = actual exponent value
+ * </code><br />
+ * where exp8 is the stored decimal value as represented by the 8-bit exponent, and exp6 is the
+ * stored decimal value as represented by the 6-bit exponent.
+ *
+ * In this example, caution must be taken to ensure that, after conversion, the actual exponent value
+ * is within the range that can be represented by a 6-bit exponent. For example, an 8-bit exponent
+ * can represent values from -127 to 128 while a 6-bit exponent can represent values only from -31
+ * to 32.
+ *
+ * <h4>N-bit Filter Behavior</h4>
+ * The n-bit filter was designed to treat the incoming data byte by byte at the lowest level. The
+ * purpose was to make the n-bit filter as generic as possible so that no pointer cast related to the
+ * datatype is needed.
+ *
+ * Bitwise operations are employed for packing and unpacking at the byte level.
+ *
+ * Recursive function calls are used to treat compound and array datatypes.
+ *
+ * <h4>N-bit Compression</h4>
+ * The main idea of n-bit compression is to use a loop to compress each data element in a chunk.
+ * Depending on the datatype of each element, the n-bit filter will call one of four functions. Each
+ * of these functions performs one of the following tasks:
+ * \li Compress a data element of a no-op datatype
+ * \li Compress a data element of an atomic datatype
+ * \li Compress a data element of a compound datatype
+ * \li Compress a data element of an array datatype
+ *
+ * No-op datatypes: The n-bit filter does not actually compress no-op datatypes. Rather, it copies
+ * the data buffer of the no-op datatype from the non-compressed buffer to the proper location in
+ * the compressed buffer; the compressed buffer has no holes. The term “compress” is used here
+ * simply to distinguish this function from the function that performs the reverse operation during
+ * decompression.
+ *
+ * Atomic datatypes: The n-bit filter will find the bytes where significant bits are located and try to
+ * compress these bytes, one byte at a time, using a loop. At this level, the filter needs the following
+ * information:
+ * <ul><li>The byte offset of the beginning of the current data element with respect to the
+ * beginning of the input data buffer</li>
+ * <li>Datatype size, precision, offset, and byte order</li></ul>
+ *
+ * The n-bit filter compresses from the most significant byte containing significant bits to the least
+ * significant byte. For big-endian data, therefore, the loop index progresses from smaller to larger
+ * while for little-endian, the loop index progresses from larger to smaller.
+ *
+ * In the extreme case of when the n-bit datatype has full precision, this function copies the content
+ * of the entire non-compressed datatype to the compressed output buffer.
+ *
+ * Compound datatypes: The n-bit filter will compress each data member of the compound
+ * datatype. If the member datatype is of an integer or floating-point datatype, the n-bit filter will
+ * call the function described above. If the member datatype is of a no-op datatype, the filter will
+ * call the function described above. If the member datatype is of a compound datatype, the filter
+ * will make a recursive call to itself. If the member datatype is of an array datatype, the filter will
+ * call the function described below.
+ *
+ * Array datatypes: The n-bit filter will use a loop to compress each array element in the array. If
+ * the base datatype of array element is of an integer or floating-point datatype, the n-bit filter will
+ * call the function described above. If the base datatype is of a no-op datatype, the filter will call
+ * the function described above. If the base datatype is of a compound datatype, the filter will call
+ * the function described above. If the member datatype is of an array datatype, the filter will make
+ * a recursive call of itself.
+ *
+ * <h4>N-bit Decompression</h4>
+ * The n-bit decompression algorithm is very similar to n-bit compression. The only difference is
+ * that at the byte level, compression packs out all padding bits and stores only significant bits into
+ * a continuous buffer (unsigned char) while decompression unpacks significant bits and inserts
+ * padding bits (zeros) at the proper positions to recover the data bytes as they existed before
+ * compression.
+ *
+ * <h4>Storing N-bit Parameters to Array cd_value[]</h4>
+ * All of the information, or parameters, required by the n-bit filter are gathered and stored in the
+ * array cd_values[] by the private function H5Z__set_local_nbit and are passed to another private
+ * function, H5Z__filter_nbit, by the HDF5 Library.
+ * These parameters are as follows:
+ * \li Parameters related to the datatype
+ * \li The number of elements within the chunk
+ * \li A flag indicating whether compression is needed
+ *
+ * The first and second parameters can be obtained using the HDF5 dataspace and datatype
+ * interface calls.
+ *
+ * A compound datatype can have members of array or compound datatype. An array datatype’s
+ * base datatype can be a complex compound datatype. Recursive calls are required to set
+ * parameters for these complex situations.
+ *
+ * Before setting the parameters, the number of parameters should be calculated to dynamically
+ * allocate the array cd_values[], which will be passed to the HDF5 Library. This also requires
+ * recursive calls.
+ *
+ * For an atomic datatype (integer or floating-point), parameters that will be stored include the
+ * datatype’s size, endianness, precision, and offset.
+ *
+ * For a no-op datatype, only the size is required.
+ *
+ * For a compound datatype, parameters that will be stored include the datatype’s total size and
+ * number of members. For each member, its member offset needs to be stored. Other parameters
+ * for members will depend on the respective datatype class.
+ *
+ * For an array datatype, the total size parameter should be stored. Other parameters for the array’s
+ * base type depend on the base type’s datatype class.
+ *
+ * Further, to correctly retrieve the parameter for use of n-bit compression or decompression later,
+ * parameters for distinguishing between datatype classes should be stored.
+ *
+ * <h4>Implementation</h4>
+ * Three filter callback functions were written for the n-bit filter:
+ * \li H5Z__can_apply_nbit
+ * \li H5Z__set_local_nbit
+ * \li H5Z__filter_nbit
+ *
+ * These functions are called internally by the HDF5 library. A number of utility functions were
+ * written for the function H5Z__set_local_nbit. Compression and decompression functions were
+ * written and are called by function H5Z__filter_nbit. All these functions are included in the file
+ * H5Znbit.c.
+ *
+ * The public function #H5Pset_nbit is called by the application to set up the use of the n-bit filter.
+ * This function is included in the file H5Pdcpl.c. The application does not need to supply any
+ * parameters.
+ *
+ * <h4>How N-bit Parameters are Stored</h4>
+ * A scheme of storing parameters required by the n-bit filter in the array cd_values[] was
+ * developed utilizing recursive function calls.
+ *
+ * Four private utility functions were written for storing the parameters associated with atomic
+ * (integer or floating-point), no-op, array, and compound datatypes:
+ * \li H5Z__set_parms_atomic
+ * \li H5Z__set_parms_array
+ * \li H5Z__set_parms_nooptype
+ * \li H5Z__set_parms_compound
+ *
+ * The scheme is briefly described below.
+ *
+ * First, assign a numeric code for datatype class atomic (integer or float), no-op, array, and
+ * compound datatype. The code is stored before other datatype related parameters are stored.
+ *
+ * The first three parameters of cd_values[] are reserved for:
+ * \li 1. The number of valid entries in the array cd_values[]
+ * \li 2. A flag indicating whether compression is needed
+ * \li 3. The number of elements in the chunk
+ *
+ * Throughout the balance of this explanation, i represents the index of cd_values[].
+ * In the function H5Z__set_local_nbit:
+ * <ul><li>1. i = 2</li>
+ * <li>2. Get the number of elements in the chunk and store in cd_value[i]; increment i</li>
+ * <li>3. Get the class of the datatype:
+ * <ul><li>For an integer or floating-point datatype, call H5Z__set_parms_atomic</li>
+ * <li>For an array datatype, call H5Z__set_parms_array</li>
+ * <li>For a compound datatype, call H5Z__set_parms_compound</li>
+ * <li>For none of the above, call H5Z__set_parms_noopdatatype</li></ul></li>
+ * <li>4. Store i in cd_value[0] and flag in cd_values[1]</li></ul>
+ *
+ * In the function H5Z__set_parms_atomic:
+ * \li 1. Store the assigned numeric code for the atomic datatype in cd_value[i]; increment i
+ * \li 2. Get the size of the atomic datatype and store in cd_value[i]; increment i
+ * \li 3. Get the order of the atomic datatype and store in cd_value[i]; increment i
+ * \li 4. Get the precision of the atomic datatype and store in cd_value[i]; increment i
+ * \li 5. Get the offset of the atomic datatype and store in cd_value[i]; increment i
+ * \li 6. Determine the need to do compression at this point
+ *
+ * In the function H5Z__set_parms_nooptype:
+ * \li 1. Store the assigned numeric code for the no-op datatype in cd_value[i]; increment i
+ * \li 2. Get the size of the no-op datatype and store in cd_value[i]; increment i
+ *
+ * In the function H5Z__set_parms_array:
+ * <ul><li>1. Store the assigned numeric code for the array datatype in cd_value[i]; increment i</li>
+ * <li>2. Get the size of the array datatype and store in cd_value[i]; increment i</li>
+ * <li>3. Get the class of the array’s base datatype.
+ * <ul><li>For an integer or floating-point datatype, call H5Z__set_parms_atomic</li>
+ * <li>For an array datatype, call H5Z__set_parms_array</li>
+ * <li>For a compound datatype, call H5Z__set_parms_compound</li>
+ * <li>If none of the above, call H5Z__set_parms_noopdatatype</li></ul></li></ul>
+ *
+ * In the function H5Z__set_parms_compound:
+ * <ul><li>1. Store the assigned numeric code for the compound datatype in cd_value[i]; increment i</li>
+ * <li>2. Get the size of the compound datatype and store in cd_value[i]; increment i</li>
+ * <li>3. Get the number of members and store in cd_values[i]; increment i</li>
+ * <li>4. For each member
+ * <ul><li>Get the member offset and store in cd_values[i]; increment i</li>
+ * <li>Get the class of the member datatype</li>
+ * <li>For an integer or floating-point datatype, call H5Z__set_parms_atomic</li>
+ * <li>For an array datatype, call H5Z__set_parms_array</li>
+ * <li>For a compound datatype, call H5Z__set_parms_compound</li>
+ * <li>If none of the above, call H5Z__set_parms_noopdatatype</li></ul></li></ul>
+ *
+ * <h4>N-bit Compression and Decompression Functions</h4>
+ * The n-bit compression and decompression functions above are called by the private HDF5
+ * function H5Z__filter_nbit. The compress and decompress functions retrieve the n-bit parameters
+ * from cd_values[] as it was passed by H5Z__filter_nbit. Parameters are retrieved in exactly the
+ * same order in which they are stored and lower-level compression and decompression functions
+ * for different datatype classes are called.
+ *
+ * N-bit compression is not implemented in place. Due to the difficulty of calculating actual output
+ * buffer size after compression, the same space as that of the input buffer is allocated for the output
+ * buffer as passed to the compression function. However, the size of the output buffer passed by
+ * reference to the compression function will be changed (smaller) after the compression is
+ * complete.
+ *
+ * <h4>Usage Examples</h4>
+ *
+ * The following code example illustrates the use of the n-bit filter for writing and reading n-bit
+ * integer data.
+ *
+ * <em>N-bit compression for integer data</em>
+ * \code
+ * #include "hdf5.h"
+ * #include "stdlib.h"
+ * #include "math.h"
+ *
+ * #define H5FILE_NAME "nbit_test_int.h5"
+ * #define DATASET_NAME "nbit_int"
+ * #define NX 200
+ * #define NY 300
+ * #define CH_NX 10
+ * #define CH_NY 15
+ *
+ * int main(void)
+ * {
+ * hid_t file, dataspace, dataset, datatype, mem_datatype, dset_create_props;
+ * hsize_t dims[2], chunk_size[2];
+ * int orig_data[NX][NY];
+ * int new_data[NX][NY];
+ * int i, j;
+ * size_t precision, offset;
+ *
+ * // Define dataset datatype (integer), and set precision, offset
+ * datatype = H5Tcopy(H5T_NATIVE_INT);
+ * precision = 17; // precision includes sign bit
+ * if(H5Tset_precision(datatype,precision) < 0) {
+ * printf("Error: fail to set precision\n");
+ * return -1;
+ * }
+ * offset = 4;
+ * if(H5Tset_offset(datatype,offset) < 0) {
+ * printf("Error: fail to set offset\n");
+ * return -1;
+ * }
+ *
+ * // Copy to memory datatype
+ * mem_datatype = H5Tcopy(datatype);
+ *
+ * // Set order of dataset datatype
+ * if(H5Tset_order(datatype, H5T_ORDER_BE) < 0) {
+ * printf("Error: fail to set endianness\n");
+ * return -1;
+ * }
+ *
+ * // Initialize data buffer with random data within correct
+ * // range corresponding to the memory datatype's precision
+ * // and offset.
+ * for (i = 0; i < NX; i++)
+ * for (j = 0; j < NY; j++)
+ * orig_data[i][j] = rand() % (int)pow(2, precision-1) << offset;
+ *
+ * // Describe the size of the array.
+ * dims[0] = NX;
+ * dims[1] = NY;
+ * if((dataspace = H5Screate_simple (2, dims, NULL)) < 0) {
+ * printf("Error: fail to create dataspace\n");
+ * return -1;
+ * }
+ *
+ * // Create a new file using read/write access, default file
+ * // creation properties, and default file access properties.
+ * if((file = H5Fcreate (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to create file\n");
+ * return -1;
+ * }
+ *
+ * // Set the dataset creation property list to specify that
+ * // the raw data is to be partitioned into 10 x 15 element
+ * // chunks and that each chunk is to be compressed.
+ * chunk_size[0] = CH_NX;
+ * chunk_size[1] = CH_NY;
+ * if((dset_create_props = H5Pcreate (H5P_DATASET_CREATE)) < 0) {
+ * printf("Error: fail to create dataset property\n");
+ * return -1;
+ * }
+ * if(H5Pset_chunk (dset_create_props, 2, chunk_size) < 0) {
+ * printf("Error: fail to set chunk\n");
+ * return -1;
+ * }
+ *
+ * // Set parameters for n-bit compression; check the description
+ * // of the H5Pset_nbit function in the HDF5 Reference Manual
+ * // for more information.
+ * if(H5Pset_nbit (dset_create_props) < 0) {
+ * printf("Error: fail to set nbit filter\n");
+ * return -1;
+ * }
+ *
+ * // Create a new dataset within the file. The datatype
+ * // and dataspace describe the data on disk, which may
+ * // be different from the format used in the application's
+ * // memory.
+ * if((dataset = H5Dcreate(file, DATASET_NAME, datatype, dataspace,
+ * H5P_DEFAULT, dset_create_props, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to create dataset\n");
+ * return -1;
+ * }
+ *
+ * // Write the array to the file. The datatype and dataspace
+ * // describe the format of the data in the 'orig_data' buffer.
+ * // The raw data is translated to the format required on disk,
+ * // as defined above. We use default raw data transfer
+ * // properties.
+ * if(H5Dwrite (dataset, mem_datatype, H5S_ALL, H5S_ALL, H5P_DEFAULT, orig_data) < 0) {
+ * printf("Error: fail to write to dataset\n");
+ * return -1;
+ * }
+ * H5Dclose (dataset);
+ *
+ * if((dataset = H5Dopen(file, DATASET_NAME, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to open dataset\n");
+ * return -1;
+ * }
+ *
+ * // Read the array. This is similar to writing data,
+ * // except the data flows in the opposite direction.
+ * // Note: Decompression is automatic.
+ * if(H5Dread (dataset, mem_datatype, H5S_ALL, H5S_ALL, H5P_DEFAULT, new_data) < 0) {
+ * printf("Error: fail to read from dataset\n");
+ * return -1;
+ * }
+ *
+ * H5Tclose (datatype);
+ * H5Tclose (mem_datatype);
+ * H5Dclose (dataset);
+ * H5Sclose (dataspace);
+ * H5Pclose (dset_create_props);
+ * H5Fclose (file);
+ *
+ * return 0;
+ * }
+ * \endcode
+ *
+ * The following code example illustrates the use of the n-bit filter for writing and reading n-bit
+ * floating-point data.
+ *
+ * <em>N-bit compression for floating-point data</em>
+ * \code
+ * #include "hdf5.h"
+ *
+ * #define H5FILE_NAME "nbit_test_float.h5"
+ * #define DATASET_NAME "nbit_float"
+ * #define NX 2
+ * #define NY 5
+ * #define CH_NX 2
+ * #define CH_NY 5
+ *
+ * int main(void)
+ * {
+ * hid_t file, dataspace, dataset, datatype, dset_create_props;
+ * hsize_t dims[2], chunk_size[2];
+ *
+ * // orig_data[] are initialized to be within the range that
+ * // can be represented by dataset datatype (no precision
+ * // loss during datatype conversion)
+ * //
+ * float orig_data[NX][NY] = {{188384.00, 19.103516,-1.0831790e9, -84.242188, 5.2045898},
+ * {-49140.000, 2350.2500, -3.2110596e-1, 6.4998865e-5, -0.0000000}};
+ * float new_data[NX][NY];
+ * size_t precision, offset;
+ *
+ * // Define single-precision floating-point type for dataset
+ * //---------------------------------------------------------------
+ * // size=4 byte, precision=20 bits, offset=7 bits,
+ * // mantissa size=13 bits, mantissa position=7,
+ * // exponent size=6 bits, exponent position=20,
+ * // exponent bias=31.
+ * // It can be illustrated in little-endian order as:
+ * // (S - sign bit, E - exponent bit, M - mantissa bit, ? - padding bit)
+ * //
+ * // 3 2 1 0
+ * // ?????SEE EEEEMMMM MMMMMMMM M???????
+ * //
+ * // To create a new floating-point type, the following
+ * // properties must be set in the order of
+ * // set fields -> set offset -> set precision -> set size.
+ * // All these properties must be set before the type can
+ * // function. Other properties can be set anytime. Derived
+ * // type size cannot be expanded bigger than original size
+ * // but can be decreased. There should be no holes
+ * // among the significant bits. Exponent bias usually
+ * // is set 2^(n-1)-1, where n is the exponent size.
+ * //---------------------------------------------------------------
+ * datatype = H5Tcopy(H5T_IEEE_F32BE);
+ * if(H5Tset_fields(datatype, 26, 20, 6, 7, 13) < 0) {
+ * printf("Error: fail to set fields\n");
+ * return -1;
+ * }
+ * offset = 7;
+ * if(H5Tset_offset(datatype,offset) < 0) {
+ * printf("Error: fail to set offset\n");
+ * return -1;
+ * }
+ * precision = 20;
+ * if(H5Tset_precision(datatype,precision) < 0) {
+ * printf("Error: fail to set precision\n");
+ * return -1;
+ * }
+ * if(H5Tset_size(datatype, 4) < 0) {
+ * printf("Error: fail to set size\n");
+ * return -1;
+ * }
+ * if(H5Tset_ebias(datatype, 31) < 0) {
+ * printf("Error: fail to set exponent bias\n");
+ * return -1;
+ * }
+ *
+ * // Describe the size of the array.
+ * dims[0] = NX;
+ * dims[1] = NY;
+ * if((dataspace = H5Screate_simple (2, dims, NULL)) < 0) {
+ * printf("Error: fail to create dataspace\n");
+ * return -1;
+ * }
+ *
+ * // Create a new file using read/write access, default file
+ * // creation properties, and default file access properties.
+ * if((file = H5Fcreate (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to create file\n");
+ * return -1;
+ * }
+ *
+ * // Set the dataset creation property list to specify that
+ * // the raw data is to be partitioned into 2 x 5 element
+ * // chunks and that each chunk is to be compressed.
+ * chunk_size[0] = CH_NX;
+ * chunk_size[1] = CH_NY;
+ * if((dset_create_props = H5Pcreate (H5P_DATASET_CREATE)) < 0) {
+ * printf("Error: fail to create dataset property\n");
+ * return -1;
+ * }
+ * if(H5Pset_chunk (dset_create_props, 2, chunk_size) < 0) {
+ * printf("Error: fail to set chunk\n");
+ * return -1;
+ * }
+ *
+ * // Set parameters for n-bit compression; check the description
+ * // of the H5Pset_nbit function in the HDF5 Reference Manual
+ * // for more information.
+ * if(H5Pset_nbit (dset_create_props) < 0) {
+ * printf("Error: fail to set nbit filter\n");
+ * return -1;
+ * }
+ *
+ * // Create a new dataset within the file. The datatype
+ * // and dataspace describe the data on disk, which may
+ * // be different from the format used in the application's memory.
+ * if((dataset = H5Dcreate(file, DATASET_NAME, datatype, dataspace, H5P_DEFAULT,
+ * dset_create_plists, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to create dataset\n");
+ * return -1;
+ * }
+ *
+ * // Write the array to the file. The datatype and dataspace
+ * // describe the format of the data in the 'orig_data' buffer.
+ * // The raw data is translated to the format required on disk,
+ * // as defined above. We use default raw data transfer properties.
+ * if(H5Dwrite (dataset, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, orig_data) < 0) {
+ * printf("Error: fail to write to dataset\n");
+ * return -1;
+ * }
+ * H5Dclose (dataset);
+ * if((dataset = H5Dopen(file, DATASET_NAME, H5P_DEFAULT))<0) {
+ * printf("Error: fail to open dataset\n");
+ * return -1;
+ * }
+ *
+ * // Read the array. This is similar to writing data,
+ * // except the data flows in the opposite direction.
+ * // Note: Decompression is automatic.
+ * if(H5Dread (dataset, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, new_data) < 0) {
+ * printf("Error: fail to read from dataset\n");
+ * return -1;
+ * }
+ * H5Tclose (datatype);
+ * H5Dclose (dataset);
+ * H5Sclose (dataspace);
+ * H5Pclose (dset_create_props);
+ * H5Fclose (file);
+ *
+ * return 0
+ * }
+ * \endcode
+ *
+ * <h4>Limitations</h4>
+ * Because the array cd_values[] has to fit into an object header message of 64K, the n-bit filter has
+ * an upper limit on the number of n-bit parameters that can be stored in it. To be conservative, a
+ * maximum of 4K is allowed for the number of parameters.
+ *
+ * The n-bit filter currently only compresses n-bit datatypes or fields derived from integer or
+ * floating-point datatypes. The n-bit filter assumes padding bits of zero. This may not be true since
+ * the HDF5 user can set padding bit to be zero, one, or leave the background alone. However, it is
+ * expected the n-bit filter will be modified to adjust to such situations.
+ *
+ * The n-bit filter does not have a way to handle the situation where the fill value of a dataset is
+ * defined and the fill value is not of an n-bit datatype although the dataset datatype is.
+ *
+ * \subsubsection subsubsec_dataset_filters_scale Using the Scale‐offset Filter
+ * Generally speaking, scale-offset compression performs a scale and/or offset operation on each
+ * data value and truncates the resulting value to a minimum number of bits (minimum-bits) before
+ * storing it.
+ *
+ * The current scale-offset filter supports integer and floating-point datatypes only. For the floating-
+ * point datatype, float and double are supported, but long double is not supported.
+ *
+ * Integer data compression uses a straight-forward algorithm. Floating-point data compression
+ * adopts the GRiB data packing mechanism which offers two alternate methods: a fixed minimum-
+ * bits method, and a variable minimum-bits method. Currently, only the variable minimum-bits
+ * method is implemented.
+ *
+ * Like other I/O filters supported by the HDF5 library, applications using the scale-offset filter
+ * must store data with chunked storage.
+ *
+ * Integer type: The minimum-bits of integer data can be determined by the filter. For example, if
+ * the maximum value of data to be compressed is 7065 and the minimum value is 2970. Then the
+ * “span” of dataset values is equal to (max-min+1), which is 4676. If no fill value is defined for the
+ * dataset, the minimum-bits is: ceiling(log2(span)) = 12. With fill value set, the minimum-bits is:
+ * ceiling(log2(span+1)) = 13.
+ *
+ * HDF5 users can also set the minimum-bits. However, if the user gives a minimum-bits that is
+ * less than that calculated by the filter, the compression will be lossy.
+ *
+ * Floating-point type: The basic idea of the scale-offset filter for the floating-point type is to
+ * transform the data by some kind of scaling to integer data, and then to follow the procedure of
+ * the scale-offset filter for the integer type to do the data compression. Due to the data
+ * transformation from floating-point to integer, the scale-offset filter is lossy in nature.
+ *
+ * Two methods of scaling the floating-point data are used: the so-called D-scaling and E-scaling.
+ * D-scaling is more straightforward and easy to understand. For HDF5 1.8 release, only the
+ * D-scaling method had been implemented.
+ *
+ * <h4>Design</h4>
+ * Before the filter does any real work, it needs to gather some information from the HDF5 Library
+ * through API calls. The parameters the filter needs are:
+ * \li The minimum-bits of the data value
+ * \li The number of data elements in the chunk
+ * \li The datatype class, size, sign (only for integer type), byte order, and fill value if defined
+ *
+ * Size and sign are needed to determine what kind of pointer cast to use when retrieving values
+ * from the data buffer.
+ *
+ * The pipeline of the filter can be divided into four parts: (1)pre-compression; (2)compression;
+ * (3)decompression; (4)post-decompression.
+ *
+ * Depending on whether a fill value is defined or not, the filter will handle pre-compression and
+ * post-decompression differently.
+ *
+ * The scale-offset filter only needs the memory byte order, size of datatype, and minimum-bits for
+ * compression and decompression.
+ *
+ * Since decompression has no access to the original data, the minimum-bits and the minimum
+ * value need to be stored with the compressed data for decompression and post-decompression.
+ *
+ * <h4>Integer Type</h4>
+ * Pre-compression: During pre-compression minimum-bits is calculated if it is not set by the user.
+ * For more information on how minimum-bits are calculated, @see @ref subsubsec_dataset_filters_nbit.
+ *
+ * If the fill value is defined, finding the maximum and minimum values should ignore the data
+ * element whose value is equal to the fill value.
+ *
+ * If no fill value is defined, the value of each data element is subtracted by the minimum value
+ * during this stage.
+ *
+ * If the fill value is defined, the fill value is assigned to the maximum value. In this way minimum-
+ * bits can represent a data element whose value is equal to the fill value and subtracts the
+ * minimum value from a data element whose value is not equal to the fill value.
+ *
+ * The fill value (if defined), the number of elements in a chunk, the class of the datatype, the size
+ * of the datatype, the memory order of the datatype, and other similar elements will be stored in
+ * the HDF5 object header for the post-decompression usage.
+ *
+ * After pre-compression, all values are non-negative and are within the range that can be stored by
+ * minimum-bits.
+ *
+ * Compression: All modified data values after pre-compression are packed together into the
+ * compressed data buffer. The number of bits for each data value decreases from the number of
+ * bits of integer (32 for most platforms) to minimum-bits. The value of minimum-bits and the
+ * minimum value are added to the data buffer and the whole buffer is sent back to the library. In
+ * this way, the number of bits for each modified value is no more than the size of minimum-bits.
+ *
+ * Decompression: In this stage, the number of bits for each data value is resumed from minimum-
+ * bits to the number of bits of integer.
+ *
+ * Post-decompression: For the post-decompression stage, the filter does the opposite of what it
+ * does during pre-compression except that it does not calculate the minimum-bits or the minimum
+ * value. These values were saved during compression and can be retrieved through the resumed
+ * data buffer. If no fill value is defined, the filter adds the minimum value back to each data
+ * element.
+ *
+ * If the fill value is defined, the filter assigns the fill value to the data element whose value is equal
+ * to the maximum value that minimum-bits can represent and adds the minimum value back to
+ * each data element whose value is not equal to the maximum value that minimum-bits can
+ * represent.
+ *
+ * @anchor h4_float_datatype <h4>Floating-point Type</h4>
+ * The filter will do data transformation from floating-point type to integer type and then handle the
+ * data by using the procedure for handling the integer data inside the filter. Insignificant bits of
+ * floating-point data will be cut off during data transformation, so this filter is a lossy compression
+ * method.
+ *
+ * There are two scaling methods: D-scaling and E-scaling. The HDF5 1.8 release only supports D-
+ * scaling. D-scaling is short for decimal scaling. E-scaling should be similar conceptually. In order
+ * to transform data from floating-point to integer, a scale factor is introduced. The minimum value
+ * will be calculated. Each data element value will subtract the minimum value. The modified data
+ * will be multiplied by 10 (Decimal) to the power of scale_factor, and only the integer part will be
+ * kept and manipulated through the routines for the integer type of the filter during pre-
+ * compression and compression. Integer data will be divided by 10 to the power of scale_factor to
+ * transform back to floating-point data during decompression and post-decompression. Each data
+ * element value will then add the minimum value, and the floating-point data are resumed.
+ * However, the resumed data will lose some insignificant bits compared with the original value.
+ *
+ * For example, the following floating-point data are manipulated by the filter, and the D-scaling
+ * factor is 2.
+ * <em>{104.561, 99.459, 100.545, 105.644}</em>
+ *
+ * The minimum value is 99.459, each data element subtracts 99.459, the modified data is
+ * <em>{5.102, 0, 1.086, 6.185}</em>
+ *
+ * Since the D-scaling factor is 2, all floating-point data will be multiplied by 10^2 with this result:
+ * <em>{510.2, 0, 108.6, 618.5}</em>
+ *
+ * The digit after decimal point will be rounded off, and then the set looks like:
+ * <em>{510, 0, 109, 619}</em>
+ *
+ * After decompression, each value will be divided by 10^2 and will be added to the offset 99.459.
+ * The floating-point data becomes
+ * <em>{104.559, 99.459, 100.549, 105.649}</em>
+ *
+ * The relative error for each value should be no more than 5* (10^(D-scaling factor +1)).
+ * D-scaling sometimes is also referred as a variable minimum-bits method since for different datasets
+ * the minimum-bits to represent the same decimal precision will vary. The data value is modified
+ * to 2 to power of scale_factor for E-scaling. E-scaling is also called fixed-bits method since for
+ * different datasets the minimum-bits will always be fixed to the scale factor of E-scaling.
+ * Currently, HDF5 ONLY supports the D-scaling (variable minimum-bits) method.
+ *
+ * <h4>Implementation</h4>
+ * The scale-offset filter implementation was written and included in the file H5Zscaleoffset.c.
+ * Function #H5Pset_scaleoffset was written and included in the file “H5Pdcpl.c”. The HDF5 user
+ * can supply minimum-bits by calling function #H5Pset_scaleoffset.
+ *
+ * The scale-offset filter was implemented based on the design outlined in this section. However,
+ * the following factors need to be considered:
+ * <ol><li>
+ * The filter needs the appropriate cast pointer whenever it needs to retrieve data values.
+ * </li>
+ * <li>
+ * The HDF5 Library passes to the filter the to-be-compressed data in the format of the dataset
+ * datatype, and the filter passes back the decompressed data in the same format. If a fill value is
+ * defined, it is also in dataset datatype format. For example, if the byte order of the dataset data-
+ * type is different from that of the memory datatype of the platform, compression or decompression performs
+ * an endianness conversion of data buffer. Moreover, it should be aware that
+ * memory byte order can be different during compression and decompression.
+ * </li>
+ * <li>
+ * The difference of endianness and datatype between file and memory should be considered
+ * when saving and retrieval of minimum-bits, minimum value, and fill value.
+ * <li>
+ * If the user sets the minimum-bits to full precision of the datatype, no operation is needed at
+ * the filter side. If the full precision is a result of calculation by the filter, then the minimum-bits
+ * needs to be saved for decompression but no compression or decompression is needed (only a
+ * copy of the input buffer is needed).</li>
+ * <li>
+ * If by calculation of the filter, the minimum-bits is equal to zero, special handling is needed.
+ * Since it means all values are the same, no compression or decompression is needed. But the
+ * minimum-bits and minimum value still need to be saved during compression.</li>
+ * <li>
+ * For floating-point data, the minimum value of the dataset should be calculated at first. Each
+ * data element value will then subtract the minimum value to obtain the “offset” data. The offset
+ * data will then follow the steps outlined above in the discussion of floating-point types to do data
+ * transformation to integer and rounding. For more information, @see @ref h4_float_datatype.
+ * </li></ol>
+ *
+ * <h4>Usage Examples</h4>
+ * The following code example illustrates the use of the scale-offset filter for writing and reading
+ * integer data.
+ *
+ * <em>Scale-offset compression integer data</em>
+ * \code
+ * #include "hdf5.h"
+ * #include "stdlib.h"
+ *
+ * #define H5FILE_NAME "scaleoffset_test_int.h5"
+ * #define DATASET_NAME "scaleoffset_int"
+ * #define NX 200
+ * #define NY 300
+ * #define CH_NX 10
+ * #define CH_NY 15
+ * int main(void)
+ * {
+ * hid_t file, dataspace, dataset, datatype, dset_create_props;
+ * hsize_t dims[2], chunk_size[2];
+ * int orig_data[NX][NY];
+ * int new_data[NX][NY];
+ * int i, j, fill_val;
+ *
+ * // Define dataset datatype
+ * datatype = H5Tcopy(H5T_NATIVE_INT);
+ *
+ * // Initiliaze data buffer
+ * for (i=0; i < NX; i++)
+ * for (j=0; j < NY; j++)
+ * orig_data[i][j] = rand() % 10000;
+ *
+ * // Describe the size of the array.
+ * dims[0] = NX;
+ * dims[1] = NY;
+ * if((dataspace = H5Screate_simple (2, dims, NULL)) < 0) {
+ * printf("Error: fail to create dataspace\n");
+ * return -1;
+ * }
+ *
+ * // Create a new file using read/write access, default file
+ * // creation properties, and default file access properties.
+ * if((file = H5Fcreate (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to create file\n");
+ * return -1;
+ * }
+ *
+ * // Set the dataset creation property list to specify that
+ * // the raw data is to be partitioned into 10 x 15 element
+ * // chunks and that each chunk is to be compressed.
+ * chunk_size[0] = CH_NX;
+ * chunk_size[1] = CH_NY;
+ * if((dset_create_props = H5Pcreate (H5P_DATASET_CREATE)) < 0) {
+ * printf("Error: fail to create dataset property\n");
+ * return -1;
+ * }
+ * if(H5Pset_chunk (dset_create_props, 2, chunk_size) < 0) {
+ * printf("Error: fail to set chunk\n");
+ * return -1;
+ * }
+ *
+ * // Set the fill value of dataset
+ * fill_val = 10000;
+ * if (H5Pset_fill_value(dset_create_props, H5T_NATIVE_INT, &fill_val)<0) {
+ * printf("Error: can not set fill value for dataset\n");
+ * return -1;
+ * }
+ *
+ * // Set parameters for scale-offset compression. Check the
+ * // description of the H5Pset_scaleoffset function in the
+ * // HDF5 Reference Manual for more information [3].
+ * if(H5Pset_scaleoffset (dset_create_props, H5Z_SO_INT, H5Z_SO_INT_MINIMUMBITS_DEFAULT) < 0) {
+ * printf("Error: fail to set scaleoffset filter\n");
+ * return -1;
+ * }
+ *
+ * // Create a new dataset within the file. The datatype
+ * // and dataspace describe the data on disk, which may
+ * // or may not be different from the format used in the
+ * // application's memory. The link creation and
+ * // dataset access property list parameters are passed
+ * // with default values.
+ * if((dataset = H5Dcreate (file, DATASET_NAME, datatype, dataspace, H5P_DEFAULT,
+ * dset_create_props, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to create dataset\n");
+ * return -1;
+ * }
+ *
+ * // Write the array to the file. The datatype and dataspace
+ * // describe the format of the data in the 'orig_data' buffer.
+ * // We use default raw data transfer properties.
+ * if(H5Dwrite (dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, orig_data) < 0) {
+ * printf("Error: fail to write to dataset\n");
+ * return -1;
+ * }
+ *
+ * H5Dclose (dataset);
+ *
+ * if((dataset = H5Dopen(file, DATASET_NAME, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to open dataset\n");
+ * return -1;
+ * }
+ *
+ * // Read the array. This is similar to writing data,
+ * // except the data flows in the opposite direction.
+ * // Note: Decompression is automatic.
+ * if(H5Dread (dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, new_data) < 0) {
+ * printf("Error: fail to read from dataset\n");
+ * return -1;
+ * }
+ *
+ * H5Tclose (datatype);
+ * H5Dclose (dataset);
+ * H5Sclose (dataspace);
+ * H5Pclose (dset_create_props);
+ * H5Fclose (file);
+ *
+ * return 0;
+ * }
+ * \endcode
+ *
+ * The following code example illustrates the use of the scale-offset filter (set for variable
+ * minimum-bits method) for writing and reading floating-point data.
+ *
+ * <em>Scale-offset compression floating-point data</em>
+ * \code
+ * #include "hdf5.h"
+ * #include "stdlib.h"
+ *
+ * #define H5FILE_NAME "scaleoffset_test_float_Dscale.h5"
+ * #define DATASET_NAME "scaleoffset_float_Dscale"
+ * #define NX 200
+ * #define NY 300
+ * #define CH_NX 10
+ * #define CH_NY 15
+ *
+ * int main(void)
+ * {
+ * hid_t file, dataspace, dataset, datatype, dset_create_props;
+ * hsize_t dims[2], chunk_size[2];
+ * float orig_data[NX][NY];
+ * float new_data[NX][NY];
+ * float fill_val;
+ * int i, j;
+ *
+ * // Define dataset datatype
+ * datatype = H5Tcopy(H5T_NATIVE_FLOAT);
+ *
+ * // Initiliaze data buffer
+ * for (i=0; i < NX; i++)
+ * for (j=0; j < NY; j++)
+ * orig_data[i][j] = (rand() % 10000) / 1000.0;
+ *
+ * // Describe the size of the array.
+ * dims[0] = NX;
+ * dims[1] = NY;
+ * if((dataspace = H5Screate_simple (2, dims, NULL)) < 0) {
+ * printf("Error: fail to create dataspace\n");
+ * return -1;
+ * }
+ *
+ * // Create a new file using read/write access, default file
+ * // creation properties, and default file access properties.
+ * if((file = H5Fcreate (H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to create file\n");
+ * return -1;
+ * }
+ *
+ * // Set the dataset creation property list to specify that
+ * // the raw data is to be partitioned into 10 x 15 element
+ * // chunks and that each chunk is to be compressed.
+ * chunk_size[0] = CH_NX;
+ * chunk_size[1] = CH_NY;
+ * if((dset_create_props = H5Pcreate (H5P_DATASET_CREATE)) < 0) {
+ * printf("Error: fail to create dataset property\n");
+ * return -1;
+ * }
+ * if(H5Pset_chunk (dset_create_props, 2, chunk_size) < 0) {
+ * printf("Error: fail to set chunk\n");
+ * return -1;
+ * }
+ *
+ * // Set the fill value of dataset
+ * fill_val = 10000.0;
+ * if (H5Pset_fill_value(dset_create_props, H5T_NATIVE_FLOAT, &fill_val) < 0) {
+ * printf("Error: can not set fill value for dataset\n");
+ * return -1;
+ * }
+ *
+ * // Set parameters for scale-offset compression; use variable
+ * // minimum-bits method, set decimal scale factor to 3. Check
+ * // the description of the H5Pset_scaleoffset function in the
+ * // HDF5 Reference Manual for more information [3].
+ * if(H5Pset_scaleoffset (dset_create_props, H5Z_SO_FLOAT_DSCALE, 3) < 0) {
+ * printf("Error: fail to set scaleoffset filter\n");
+ * return -1;
+ * }
+ *
+ * // Create a new dataset within the file. The datatype
+ * // and dataspace describe the data on disk, which may
+ * // or may not be different from the format used in the
+ * // application's memory.
+ * if((dataset = H5Dcreate (file, DATASET_NAME, datatype, dataspace, H5P_DEFAULT,
+ * dset_create_props, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to create dataset\n");
+ * return -1;
+ * }
+ *
+ * // Write the array to the file. The datatype and dataspace
+ * // describe the format of the data in the 'orig_data' buffer.
+ * // We use default raw data transfer properties.
+ * if(H5Dwrite (dataset, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, orig_data) < 0) {
+ * printf("Error: fail to write to dataset\n");
+ * return -1;
+ * }
+ *
+ * H5Dclose (dataset);
+ *
+ * if((dataset = H5Dopen(file, DATASET_NAME, H5P_DEFAULT)) < 0) {
+ * printf("Error: fail to open dataset\n");
+ * return -1;
+ * }
+ *
+ * // Read the array. This is similar to writing data,
+ * // except the data flows in the opposite direction.
+ * // Note: Decompression is automatic.
+ * if(H5Dread (dataset, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, new_data) < 0) {
+ * printf("Error: fail to read from dataset\n");
+ * return -1;
+ * }
+ *
+ * H5Tclose (datatype);
+ * H5Dclose (dataset);
+ * H5Sclose (dataspace);
+ * H5Pclose (dset_create_props);
+ * H5Fclose (file);
+ *
+ * return 0;
+ * }
+ * \endcode
+ *
+ * <h4>Limitations</h4>
+ * For floating-point data handling, there are some algorithmic limitations to the GRiB data packing
+ * mechanism:
+ * <ol><li>
+ * Both the E-scaling and D-scaling methods are lossy compression
+ * </li>
+ * <li>
+ * For the D-scaling method, since data values have been rounded to integer values (positive)
+ * before truncating to the minimum-bits, their range is limited by the maximum value that can be
+ * represented by the corresponding unsigned integer type (the same size as that of the floating-
+ * point type)
+ * </li></ol>
+ *
+ * <h4>Suggestions</h4>
+ * The following are some suggestions for using the filter for floating-point data:
+ * <ol><li>
+ * It is better to convert the units of data so that the units are within certain common range (for
+ * example, 1200m to 1.2km)
+ * </li>
+ * <li>
+ * If data values to be compressed are very near to zero, it is strongly recommended that the
+ * user sets the fill value away from zero (for example, a large positive number); if the user does
+ * nothing, the HDF5 library will set the fill value to zero, and this may cause undesirable
+ * compression results
+ * </li>
+ * <li>
+ * Users are not encouraged to use a very large decimal scale factor (for example, 100) for the
+ * D-scaling method; this can cause the filter not to ignore the fill value when finding maximum
+ * and minimum values, and they will get a much larger minimum-bits (poor compression)
+ * </li></ol>
+ *
+ * \subsubsection subsubsec_dataset_filters_szip Using the Szip Filter
+ * See The HDF Group website for further information regarding the Szip filter.
+ *
+ * Previous Chapter \ref sec_group - Next Chapter \ref sec_datatype
+ *
*/
/**
diff --git a/src/H5Gpublic.h b/src/H5Gpublic.h
index b91c09c..74f0da7 100644
--- a/src/H5Gpublic.h
+++ b/src/H5Gpublic.h
@@ -167,7 +167,7 @@ H5_DLL hid_t H5Gcreate2(hid_t loc_id, const char *name, hid_t lcpl_id, hid_t gcp
* H5Gclose() when the group is no longer needed so that resource
* leaks will not develop.
*
- * \see H5Olink(), H5Dcreate(), Using Identifiers
+ * \see H5Olink(), H5Dcreate(), \ref api-compat-macros
*
* \since 1.8.0
*
diff --git a/src/H5Tmodule.h b/src/H5Tmodule.h
index 5820393..c9d52bd 100644
--- a/src/H5Tmodule.h
+++ b/src/H5Tmodule.h
@@ -2717,7 +2717,7 @@ filled according to the value of this property. The padding can be:
* </tr>
* </table>
*
- * <h4>Variable-length Datatypes</h4>
+ * @anchor h4_vlen_datatype <h4>Variable-length Datatypes</h4>
*
* A variable-length (VL) datatype is a one-dimensional sequence of a datatype which are not fixed
* in length from one dataset location to another. In other words, each data element may have a
@@ -3854,7 +3854,7 @@ filled according to the value of this property. The padding can be:
* goto out;
* \endcode
*
- *
+ * Previous Chapter \ref sec_dataset - Next Chapter \ref sec_dataspace
*
*/
diff --git a/src/H5Zpublic.h b/src/H5Zpublic.h
index 78e8543..595c183 100644
--- a/src/H5Zpublic.h
+++ b/src/H5Zpublic.h
@@ -445,7 +445,7 @@ typedef struct H5Z_class2_t {
* H5Z_class2_t, depending on the needs of the application. To affect
* only this macro, H5Z_class_t_vers may be defined to either 1 or 2.
* Otherwise, it will behave in the same manner as other API
- * compatibility macros. See API Compatibility Macros in HDF5 for more
+ * compatibility macros. \see \ref api-compat-macros for more
* information. H5Z_class1_t matches the #H5Z_class_t structure that is
* used in the 1.6.x versions of the HDF5 library.
*