A dataset is a multidimensional array of data elements, together with supporting metadata. To create a dataset, the application program must specify the location at which to create the dataset, the dataset name, the datatype and dataspace of the data array, and the dataset creation property list.
There are two categories of datatypes in HDF5: atomic and compound datatypes. An atomic datatype is a datatype which cannot be decomposed into smaller datatype units at the API level. These include the integer, float, date and time, string, bitfield, and opaque datatypes. A compound datatype is a collection of one or more atomic datatypes and/or small arrays of such datatypes.
Figure 5.1 shows the HDF5 datatypes. Some of the HDF5 predefined atomic datatypes are listed in Figures 5.2a and 5.2b. In this tutorial, we consider only HDF5 predefined integers. For further information on datatypes, see The Datatype Interface (H5T) in the HDF5 User's Guide.
Fig 5.1 HDF5 datatypes
+-- integer +-- floating point +---- atomic ----+-- date and time | +-- character string HDF5 datatypes --| +-- bitfield | +-- opaque | +---- compound
Fig. 5.2a Examples of HDF5 predefined datatypes
|
Fig. 5.2b Examples of HDF5 predefined native datatypes
|
Fig 5.3 HDF5 dataspaces
+-- simple HDF5 dataspaces --| +-- complexThe dimensions of a dataset can be fixed (unchanging), or they may be unlimited, which means that they are extensible. A dataspace can also describe a portion of a dataset, making it possible to do partial I/O operations on selections.
In HDF5, datatypes and dataspaces are independent objects which are created separately from any dataset that they might be attached to. Because of this, the creation of a dataset requires definition of the datatype and dataspace. In this tutorial, we use HDF5 predefined datatypes (integer) and consider only simple dataspaces. Hence, only the creation of dataspace objects is needed.
To create an empty dataset (no data written) the following steps need to be taken:
C:
space_id = H5Screate_simple (rank, dims, maxdims); status = H5Sclose (space_id );FORTRAN:
CALL h5screate_simple_f (rank, dims, space_id, hdferr, maxdims=max_dims) or CALL h5screate_simple_f (rank, dims, space_id, hdferr) CALL h5sclose_f (space_id, hdferr)To create a dataset, the calling program must contain calls to create and close the dataset. For example:
C:
dset_id = H5Dcreate (hid_t loc_id, const char *name, hid_t type_id, hid_t space_id, hid_t creation_prp); status = H5Dclose (dset_id);FORTRAN:
CALL h5dcreate_f (loc_id, name, type_id, space_id, dset_id, & hdferr, creation_prp=creat_plist_id) or CALL h5dcreate_f (loc_id, name, type_id, space_id, dset_id, hdferr) CALL h5dclose_f (dset_id, hdferr)If using the pre-defined datatypes in FORTRAN, then a call must be made to initialize and terminate access to the pre-defined datatypes:
CALL h5init_types_f (hdferr) CALL h5close_types_f (hdferr)
h5init_types_f
must be called before any HDF5 library
subroutine calls are made;
h5close_types_f
must be called after the final HDF5 library
subroutine call.
See the programming example below for an illustration of the use of
these calls.
dset.h5
in the C version
(dsetf.h5
in Fortran), defines the dataset dataspace, creates a
dataset which is a 4x6 integer array, and then closes the dataspace,
the dataset, and the file. h5_crtdat.c
dsetexample.f90
CreateDataset.java
H5Screate_simple
/h5screate_simple_f
creates a new simple dataspace and returns a dataspace identifier.
C: hid_t H5Screate_simple (int rank, const hsize_t * dims, const hsize_t * maxdims) FORTRAN: h5screate_simple_f (rank, dims, space_id, hdferr, maxdims) rank INTEGER dims(*) INTEGER(HSIZE_T) space_id INTEGER(HID_T) hdferr INTEGER (Valid values: 0 on success and -1 on failure) maxdims(*) INTEGER(HSIZE_T), OPTIONAL
H5Dcreate
/h5dcreate_f
creates a dataset
at the specified location and returns a dataset identifier.
C: hid_t H5Dcreate (hid_t loc_id, const char *name, hid_t type_id, hid_t space_id, hid_t creation_prp) FORTRAN: h5dcreate_f (loc_id, name, type_id, space_id, dset_id, & hdferr, creation_prp) loc_id INTEGER(HID_T) name CHARACTER(LEN=*) type_id INTEGER(HID_T) space_id INTEGER(HID_T) dset_id INTEGER(HID_T) hdferr INTEGER (Valid values: 0 on success and -1 on failure) creation_prp INTEGER(HID_T), OPTIONAL
H5P_DEFAULT
in C and H5P_DEFAULT_F
in FORTRAN
specify the default dataset creation property list.
This parameter is optional in FORTRAN; if it is omitted,
the default dataset creation property list will be used.
H5Dcreate
/h5dcreate_f
creates an empty array
and initializes the data to 0.
H5Dclose
/h5dclose_f
must be called to release
the resource used by the dataset. This call is mandatory.
C: hid_t H5Dclose (hid_t dset_id) FORTRAN: h5dclose_f (dset_id, hdferr) dset_id INTEGER(HID_T) hdferr INTEGER (Valid values: 0 on success and -1 on failure)
dset.h5
(dsetf.h5
for FORTRAN) are shown in Figure 5.4 and Figures 5.5a
and 5.5b.
Figure 5.4 Contents of dset.h5 ( dsetf.h5 )
|
Figure 5.5a dset.h5 in DDL |
Figure 5.5b dsetf.h5 in DDL |
HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } } } } |
HDF5 "dsetf.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 6, 4 ) / ( 6, 4 ) } DATA { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } } } } |
Note in Figures 5.5a and 5.5b that
Fig. 5.6 HDF5 Dataset Definition
H5T_STD_I32BE
, a 32-bit Big Endian integer,
is an HDF atomic datatype.
Dataset Definition in DDL
The following is the simplified DDL dataset definition:
<dataset> ::= DATASET "<dataset_name>" { <datatype>
<dataspace>
<data>
<dataset_attribute>* }
<datatype> ::= DATATYPE { <atomic_type> }
<dataspace> ::= DATASPACE { SIMPLE <current_dims> / <max_dims> }
<dataset_attribute> ::= <attribute>
The National Center for Supercomputing Applications
University of Illinois
at Urbana-Champaign
hdfhelp@@ncsa.uiuc.edu
Last Modified: June 22, 2001