NCSA

[ HDF5 Tutorial Top ]

Creating a Dataset


Contents:


What is a Dataset?

A dataset is a multidimensional array of data elements, together with supporting metadata. To create a dataset, the application program must specify the location to create the dataset, the dataset name, the data type and space of the data array, and the dataset creation properties.

Data Types

A data type is a collection of data type properties, all of which can be stored on disk, and which when taken as a whole, provide complete information for data conversion to or from that data type.

There are two categories of data types in HDF5: atomic and compound data types. An atomic type is a type which cannot be decomposed into smaller units at the API level. A compound data type is a collection of one or more atomic types or small arrays of such types.

Atomic types include integer, float, date and time, string, bit field, and opaque. Figure 5.1 shows the HDF5 data types. Some of the HDF5 predefined atomic data types are listed in Figure 5.2. In this tutorial, we consider only HDF5 predefined integers. For information on data types, see the HDF5 User's Guide.

Fig 5.1   HDF5 data types


                                          +--  integer
                                          +--  floating point
                        +---- atomic  ----+--  date and time
                        |                 +--  character string
       HDF5 datatypes --|                 +--  bit field
                        |                 +--  opaque
                        |
                        +---- compound

Fig. 5.2   Examples of HDF5 predefined data types
Data Type Description
H5T_STD_I32LE Four-byte, little-endian, signed two's complement integer
H5T_STD_U16BE Two-byte, big-endian, unsigned integer
H5T_IEEE_F32BE Four-byte, big-endian, IEEE floating point
H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating point
H5T_C_S1 One-byte, null-terminated string of eight-bit characters

Dataspaces

A dataspace describes the dimensionality of the data array. A dataspace is either a regular N-dimensional array of data points, called a simple dataspace, or a more general collection of data points organized in another manner, called a complex dataspace. Figure 5.3 shows HDF5 dataspaces. In this tutorial, we only consider simple dataspaces.

Fig 5.3   HDF5 dataspaces


                         +-- simple
       HDF5 dataspaces --|
                         +-- complex

The dimensions of a dataset can be fixed (unchanging), or they may be unlimited, which means that they are extendible. A dataspace can also describe portions of a dataset, making it possible to do partial I/O operations on selections.

Dataset creation properties

When creating a dataset, HDF5 allows users to specify how raw data is organized on disk and how the raw data is compressed. This information is stored in a dataset creation property list and passed to the dataset interface. The raw data on disk can be stored contiguously (in the same linear way that it is organized in memory), partitioned into chunks and stored externally, etc. In this tutorial, we use the default creation property list; that is, no compression and contiguous storage layout is used. For more information about the creation properties, see the HDF5 User's Guide.

In HDF5, data types and spaces are independent objects, which are created separately from any dataset that they might be attached to. Because of this the creation of a dataset requires definitions of data type and dataspace. In this tutorial, we use HDF5 predefined data types (integer) and consider only simple dataspaces. Hence, only the creation of dataspace objects is needed.

To create an empty dataset (no data written) the following steps need to be taken:

  1. Obtain the location id where the dataset is to be created.
  2. Define the dataset characteristics and creation properties.
    • define a data type
    • define a dataspace
    • specify dataset creation properties
  3. Create the dataset.
  4. Close the data type, dataspace, and the property list if necessary.
  5. Close the dataset.
To create a simple dataspace, the calling program must contain the following calls:
   dataspace_id = H5Screate_simple(rank, dims, maxdims);
   H5Sclose(dataspace_id );
To create a dataset, the calling program must contain the following calls:
   dataset_id = H5Dcreate(hid_t loc_id, const char *name, hid_t type_id,
                          hid_t space_id, hid_t create_plist_id);
   H5Dclose (dataset_id);

Programming Example

Description

The following example shows how to create an empty dataset. It creates a file called 'dset.h5', defines the dataset dataspace, creates a dataset which is a 4x6 integer array, and then closes the dataspace, the dataset, and the file.
[
Download h5_crtdat.c ]
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

#include <hdf5.h>
#define FILE "dset.h5"

main() {

   hid_t       file_id, dataset_id, dataspace_id;  /* identifiers */
   hsize_t     dims[2];
   herr_t      status;

   /* Create a new file using default properties. */
   file_id = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

   /* Create the data 
                space for the dataset. */
   dims[0] = 4;
   dims[1] = 6;
   dataspace_id = H5Screate_simple(2, dims, NULL);

   /* Create the dataset. */
   dataset_id = H5Dcreate(file_id, "/dset", H5T_STD_I32BE, dataspace_id, 
                H5P_DEFAULT);

   /* End access to the dataset and release resources used by it. */
   status = H5Dclose(dataset_id);

   /* Terminate access to the data space. */
   status = H5Sclose(dataspace_id);

   /* Close the file. */
   status = H5Fclose(file_id);
}
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Remarks

File Contents

The file contents of 'dset.h5' are shown is Figure 5.4 and Figure 5.5.
Figure 5.4   The Contents of 'dset.h5' Figure 5.5   'dset.h5' in DDL
      HDF5 "dset.h5" {
      GROUP "/" {
         DATASET "dset" {
            DATATYPE { H5T_STD_I32BE }
            DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) }
            DATA {
               0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0, 0
            }
         }
      }
      }

Dataset Definition in DDL

The following is the simplified DDL dataset definition:

Fig. 5.6   HDF5 Dataset Definition

      <dataset> ::= DATASET "<dataset_name>" { <data type>
                                               <dataspace>
                                               <data>
                                               <dataset_attribute>* }

      <data type> ::= DATATYPE { <atomic_type> }

      <dataspace> ::= DATASPACE { SIMPLE <current_dims> / <max_dims> }

      <dataset_attribute> ::= <attribute>


NCSA
The National Center for Supercomputing Applications

University of Illinois at Urbana-Champaign

hdfhelp@@ncsa.uiuc.edu
Last Modified: August 27, 1999