diff options
Diffstat (limited to 'doxygen/dox/IntroHDF5.dox')
-rw-r--r-- | doxygen/dox/IntroHDF5.dox | 627 |
1 files changed, 627 insertions, 0 deletions
diff --git a/doxygen/dox/IntroHDF5.dox b/doxygen/dox/IntroHDF5.dox new file mode 100644 index 0000000..ec46217 --- /dev/null +++ b/doxygen/dox/IntroHDF5.dox @@ -0,0 +1,627 @@ +/** @page IntroHDF5 Introduction to HDF5 + +Navigate back: \ref index "Main" / \ref GettingStarted +<hr> + +\section sec_intro_desc HDF5 Description +HDF5 consists of a file format for storing HDF5 data, a data model for logically organizing and accessing +HDF5 data from an application, and the software (libraries, language interfaces, and tools) for working with this format. + +\subsection subsec_intro_desc_file File Format +HDF5 consists of a file format for storing HDF5 data, a data model for logically organizing and accessing HDF5 data from an application, +and the software (libraries, language interfaces, and tools) for working with this format. + +\subsection subsec_intro_desc_dm Data Model +The HDF5 Data Model, also known as the HDF5 Abstract (or Logical) Data Model consists of +the building blocks for data organization and specification in HDF5. + +An HDF5 file (an object in itself) can be thought of as a container (or group) that holds +a variety of heterogeneous data objects (or datasets). The datasets can be images, tables, +graphs, and even documents, such as PDF or Excel: + +<table> +<tr> +<td> +\image html fileobj.png +</td> +</tr> +</table> + +The two primary objects in the HDF5 Data Model are groups and datasets. + +There are also a variety of other objects in the HDF5 Data Model that support groups and datasets, +including datatypes, dataspaces, properties and attributes. + +\subsubsection subsec_intro_desc_dm_group Groups +HDF5 groups (and links) organize data objects. Every HDF5 file contains a root group that can +contain other groups or be linked to objects in other files. + +<table> +<caption>There are two groups in the HDF5 file depicted above: Viz and SimOut. +Under the Viz group are a variety of images and a table that is shared with the SimOut group. +The SimOut group contains a 3-dimensional array, a 2-dimensional array and a link to a 2-dimensional +array in another HDF5 file.</caption> +<tr> +<td> +\image html group.png +</td> +</tr> +</table> + +Working with groups and group members is similar in many ways to working with directories and files +in UNIX. As with UNIX directories and files, objects in an HDF5 file are often described by giving +their full (or absolute) path names. +\li / signifies the root group. +\li /foo signifies a member of the root group called foo. +\li /foo/zoo signifies a member of the group foo, which in turn is a member of the root group. + +\subsubsection subsec_intro_desc_dm_dset Datasets +HDF5 datasets organize and contain the “raw” data values. A dataset consists of metadata +that describes the data, in addition to the data itself: + +<table> +<caption>In this picture, the data is stored as a three dimensional dataset of size 4 x 5 x 6 with an integer datatype. +It contains attributes, Time and Pressure, and the dataset is chunked and compressed.</caption> +<tr> +<td> +\image html dataset.png +</td> +</tr> +</table> + +Datatypes, dataspaces, properties and (optional) attributes are HDF5 objects that describe a dataset. +The datatype describes the individual data elements. + +\subsection subsec_intro_desc_props Datatypes, Dataspaces, Properties and Attributes + +\subsubsection subsec_intro_desc_prop_dtype Datatypes +The datatype describes the individual data elements in a dataset. It provides complete information for +data conversion to or from that datatype. + +<table> +<caption>In the dataset depicted, each element of the dataset is a 32-bit integer.</caption> +<tr> +<td> +\image html datatype.png +</td> +</tr> +</table> + +Datatypes in HDF5 can be grouped into: +<ul> +<li> +<b>Pre-Defined Datatypes</b>: These are datatypes that are created by HDF5. They are actually opened (and closed) +by HDF5 and can have different values from one HDF5 session to the next. There are two types of pre-defined datatypes: +<ul> +<li> +Standard datatypes are the same on all platforms and are what you see in an HDF5 file. Their names are of the form +H5T_ARCH_BASE where ARCH is an architecture name and BASE is a programming type name. For example, #H5T_IEEE_F32BE +indicates a standard Big Endian floating point type. +</li> +<li> +Native datatypes are used to simplify memory operations (reading, writing) and are NOT the same on different platforms. +For example, #H5T_NATIVE_INT indicates an int (C). +</li> +</ul> +</li> +<li> +<b>Derived Datatypes</b>: These are datatypes that are created or derived from the pre-defined datatypes. +An example of a commonly used derived datatype is a string of more than one character. Compound datatypes +are also derived types. A compound datatype can be used to create a simple table, and can also be nested, +in which it includes one more other compound datatypes. +<table> +<caption>This is an example of a dataset with a compound datatype. Each element in the dataset consists +of a 16-bit integer, a character, a 32-bit integer, and a 2x3x2 array of 32-bit floats (the datatype). +It is a 2-dimensional 5 x 3 array (the dataspace). The datatype should not be confused with the dataspace. +</caption> +<tr> +<td> +\image html cmpnddtype.png +</td> +</tr> +</table> +</li> +</ul> + +\subsubsection subsec_intro_desc_prop_dspace Dataspaces +A dataspace describes the layout of a dataset’s data elements. It can consist of no elements (NULL), +a single element (scalar), or a simple array. + +<table> +<caption>This image illustrates a dataspace that is an array with dimensions of 5 x 3 and a rank (number of dimensions) of 2.</caption> +<tr> +<td> +\image html dataspace1.png +</td> +</tr> +</table> + +A dataspace can have dimensions that are fixed (unchanging) or unlimited, which means they can grow +in size (i.e. they are extendible). + +There are two roles of a dataspace: +\li It contains the spatial information (logical layout) of a dataset stored in a file. This includes the rank and dimensions of a dataset, which are a permanent part of the dataset definition. +\li It describes an application’s data buffers and data elements participating in I/O. In other words, it can be used to select a portion or subset of a dataset. + +<table> +<caption>The dataspace is used to describe both the logical layout of a dataset and a subset of a dataset.</caption> +<tr> +<td> +\image html dataspace.png +</td> +</tr> +</table> + +\subsubsection subsec_intro_desc_prop_property Properties +A property is a characteristic or feature of an HDF5 object. There are default properties which +handle the most common needs. These default properties can be modified using the HDF5 Property +List API to take advantage of more powerful or unusual features of HDF5 objects. + +<table> +<tr> +<td> +\image html properties.png +</td> +</tr> +</table> + +For example, the data storage layout property of a dataset is contiguous by default. For better +performance, the layout can be modified to be chunked or chunked and compressed: + +\subsubsection subsec_intro_desc_prop_attr Attributes +Attributes can optionally be associated with HDF5 objects. They have two parts: a name and a value. +Attributes are accessed by opening the object that they are attached to so are not independent objects. +Typically an attribute is small in size and contains user metadata about the object that it is attached to. + +Attributes look similar to HDF5 datasets in that they have a datatype and dataspace. However, they +do not support partial I/O operations, and they cannot be compressed or extended. + +\subsection subsec_intro_desc_soft HDF5 Software +The HDF5 software is written in C and includes optional wrappers for C++, FORTRAN (90 and F2003), +and Java. The HDF5 binary distribution consists of the HDF5 libraries, include files, command-line +utilities, scripts for compiling applications, and example programs. + +\subsubsection subsec_intro_desc_soft_apis HDF5 APIs and Libraries +There are APIs for each type of object in HDF5. For example, all C routines in the HDF5 library +begin with a prefix of the form H5*, where * is one or two uppercase letters indicating the type +of object on which the function operates: +\li @ref H5A <b>A</b>ttribute Interface +\li @ref H5D <b>D</b>ataset Interface +\li @ref H5F <b>F</b>ile Interface + +The HDF5 High Level APIs simplify many of the steps required to create and access objects, as well +as providing templates for storing objects. Following is a list of the High Level APIs: +\li @ref H5LT – simplifies steps in creating datasets and attributes +\li @ref H5IM – defines a standard for storing images in HDF5 +\li @ref H5TB – condenses the steps required to create tables +\li @ref H5DS – provides a standard for dimension scale storage +\li @ref H5PT – provides a standard for storing packet data + +\subsubsection subsec_intro_desc_soft_tools Tools +Useful tools for working with HDF5 files include: +\li h5dump: A utility to dump or display the contents of an HDF5 File +\li h5cc, h5c++, h5fc: Unix scripts for compiling applications +\li HDFView: A java browser to view HDF (HDF4 and HDF5) files + +<h4>h5dump</h4> +The h5dump utility displays the contents of an HDF5 file in Data Description Language (\ref DDLBNF110). +Below is an example of h5dump output for an HDF5 file that contains no objects: +\code +$ h5dump file.h5 + HDF5 "file.h5" { + GROUP "/" { + } + } +\endcode + +With large files and datasets the output from h5dump can be overwhelming. +There are options that can be used to examine specific parts of an HDF5 file. +Some useful h5dump options are included below: +\code + -H, --header Display header information only (no data) + -d <name> Display a dataset with a specified path and name + -p Display properties + -n Display the contents of the file +\endcode + +<h4>h5cc, h5fc, h5c++</h4> +The built HDF5 binaries include the h5cc, h5fc, h5c++ compile scripts for compiling applications. +When using these scripts there is no need to specify the HDF5 libraries and include files. +Compiler options can be passed to the scripts. + +<h4>HDFView</h4> +The HDFView tool allows browsing of data in HDF (HDF4 and HDF5) files. + +\section sec_intro_pm Introduction to the HDF5 Programming Model and APIs +The HDF5 Application Programming Interface is extensive, but a few functions do most of the work. + +To introduce the programming model, examples in Python and C are included below. The Python examples +use the HDF5 Python APIs (h5py). See the Examples from "Learning the Basics" page for complete examples +that can be downloaded and run for C, FORTRAN, C++, Java and Python. + +The general paradigm for working with objects in HDF5 is to: +\li Open the object. +\li Access the object. +\li Close the object. + +The library imposes an order on the operations by argument dependencies. For example, a file must be +opened before a dataset because the dataset open call requires a file handle as an argument. Objects +can be closed in any order. However, once an object is closed it no longer can be accessed. + +Keep the following in mind when looking at the example programs included in this section: +<ul> +<li> +<ul> +<li> +C routines begin with the prefix “H5*” where * is a single letter indicating the object on which the +operation is to be performed. +</li> +<li> +FORTRAN routines are similar; they begin with “h5*” and end with “_f”. +</li> +<li> +Java routines are similar; the routine names begin with “H5*” and are prefixed with “H5.” as the class. Constants are +in the HDF5Constants class and are prefixed with "HDF5Constants.". The function arguments +are usually similar, @see @ref HDF5LIB +</li> +</ul> +For example: +<ul> +<li> +File Interface:<ul><li>#H5Fopen (C)</li><li>h5fopen_f (FORTRAN)</li><li>H5.H5Fopen (Java)</li></ul> +</li> +<li> +Dataset Interface:<ul><li>#H5Dopen (C)</li><li>h5dopen_f (FORTRAN)</li><li>H5.H5Dopen (Java)</li></ul> +</li> +<li> +Dataspace interface:<ul><li>#H5Sclose (C)</li><li>h5sclose_f (FORTRAN)</li><li>H5.H5Sclose (Java)</li></ul> +</li> +</ul> +The HDF5 Python APIs use methods associated with specific objects. +</li> +<li> +For portability, the HDF5 library has its own defined types. Some common types that you will see +in the example code are: +<ul> +<li> +#hid_t is used for object handles +</li> +<li> +hsize_t is used for dimensions +</li> +<li> +#herr_t is used for many return values +</li> +</ul> +</li> +<li> +Language specific files must be included in applications: +<ul> +<li> +Python: Add <code>"import h5py / import numpy"</code> +</li> +<li> +C: Add <code>"#include hdf5.h"</code> +</li> +<li> +FORTRAN: Add <code>"USE HDF5"</code> and call h5open_f and h5close_f to initialize and close the HDF5 FORTRAN interface +</li> +<li> +Java: Add <code>"import hdf.hdf5lib.H5; + import hdf.hdf5lib.HDF5Constants;"</code> +</li> +</ul> +</li> +</ul> + +\subsection subsec_intro_pm_file Steps to create a file +To create an HDF5 file you must: +\li Specify property lists (or use the defaults). +\li Create the file. +\li Close the file (and property lists if needed). + +Example: +<table> +<caption>The following Python and C examples create a file, file.h5, and then close it. +The resulting HDF5 file will only contain a root group:</caption> +<tr> +<td> +\image html crtf-pic.png +</td> +</tr> +</table> + +Calling h5py.File with ‘w’ for the file access flag will create a new HDF5 file and overwrite +an existing file with the same name. “file” is the file handle returned from opening the file. +When finished with the file, it must be closed. When not specifying property lists, the default +property lists are used: + +<table> +<tr> +<td> +<em>Python</em> +\code + import h5py + file = h5py.File (‘file.h5’, ‘w’) + file.close () +\endcode +</td> +</tr> +</table> + +The H5Fcreate function creates an HDF5 file. #H5F_ACC_TRUNC is the file access flag to create a new +file and overwrite an existing file with the same name, and #H5P_DEFAULT is the value specified to +use a default property list. + +<table> +<tr> +<td> +<em>C</em> +\code + #include “hdf5.h” + + int main() { + hid_t file_id; + herr_t status; + + file_id = H5Fcreate ("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); + status = H5Fclose (file_id); + } +\endcode +</td> +</tr> +</table> + +\subsection subsec_intro_pm_dataset Steps to create a dataset +As described previously, an HDF5 dataset consists of the raw data, as well as the metadata that +describes the data (datatype, spatial information, and properties). To create a dataset you must: +\li Define the dataset characteristics (datatype, dataspace, properties). +\li Decide which group to attach the dataset to. +\li Create the dataset. +\li Close the dataset handle from step 3. + +Example: +<table> +<caption>The code excerpts below show the calls that need to be made to create a 4 x 6 integer dataset dset +in a file dset.h5. The dataset will be located in the root group:</caption> +<tr> +<td> +\image html crtdset.png +</td> +</tr> +</table> + +With Python, the creation of the dataspace is included as a parameter in the dataset creation method. +Just one call will create a 4 x 6 integer dataset dset. A pre-defined Big Endian 32-bit integer datatype +is specified. The create_dataset method creates the dataset in the root group (the file object). +The dataset is close by the Python interface. + +<table> +<tr> +<td> +<em>Python</em> +\code + dataset = file.create_dataset("dset",(4, 6), h5py.h5t.STD_I32BE) +\endcode +</td> +</tr> +</table> + +To create the same dataset in C, you must specify the dataspace with the #H5Screate_simple function, +create the dataset by calling #H5Dcreate, and then close the dataspace and dataset with calls to #H5Dclose +and #H5Sclose. #H5P_DEFAULT is specified to use a default property list. Note that the file identifier +(file_id) is passed in as the first parameter to #H5Dcreate, which creates the dataset in the root group. + +<table> +<tr> +<td> +<em>C</em> +\code + // Create the dataspace for the dataset. + dims[0] = 4; + dims[1] = 6; + + dataspace_id = H5Screate_simple(2, dims, NULL); + + // Create the dataset. + dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); + + // Close the dataset and dataspace + status = H5Dclose(dataset_id); + status = H5Sclose(dataspace_id); +\endcode +</td> +</tr> +</table> + +\subsection subsec_intro_pm_write Writing to or reading from a dataset +Once you have created or opened a dataset you can write to it: + +<table> +<tr> +<td> +<em>Python</em> +\code + data = np.zeros((4,6)) + for i in range(4): + for j in range(6): + data[i][j]= i*6+j+1 + + dataset[...] = data <-- Write data to dataset + data_read = dataset[...] <-- Read data from dataset +\endcode +</td> +</tr> +</table> + +#H5S_ALL is passed in for the memory and file dataspace parameters to indicate that the entire dataspace +of the dataset is specified. These two parameters can be modified to allow subsetting of a dataset. +The native predefined datatype, #H5T_NATIVE_INT, is used for reading and writing so that HDF5 will do +any necessary integer conversions: + +<table> +<tr> +<td> +<em>C</em> +\code + status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); + status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data); +\endcode +</td> +</tr> +</table> + +\subsection subsec_intro_pm_group Steps to create a group +An HDF5 group is a structure containing zero or more HDF5 objects. Before you can create a group you must +obtain the location identifier of where the group is to be created. Following are the steps that are required: +\li Decide where to put the group – in the “root group” (or file identifier) or in another group. Open the group if it is not already open. +\li Define properties or use the default. +\li Create the group. +\li Close the group. + +<table> +<caption>Creates attributes that are attached to the dataset dset</caption> +<tr> +<td> +\image html crtgrp.png +</td> +</tr> +</table> + +The code below opens the dataset dset.h5 with read/write permission and creates a group MyGroup in the root group. +Properties are not specified so the defaults are used: + +<table> +<tr> +<td> +<em>Python</em> +\code + import h5py + file = h5py.File('dset.h5', 'r+') + group = file.create_group ('MyGroup') + file.close() +\endcode +</td> +</tr> +</table> + +To create the group MyGroup in the root group, you must call #H5Gcreate, passing in the file identifier returned +from opening or creating the file. The default property lists are specified with #H5P_DEFAULT. The group is then +closed: + +<table> +<tr> +<td> +<em>C</em> +\code + group_id = H5Gcreate (file_id, "MyGroup", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); + status = H5Gclose (group_id); +\endcode +</td> +</tr> +</table> + +\subsection subsec_intro_pm_attr Steps to create and write to an attribute +To create an attribute you must open the object that you wish to attach the attribute to. Then you can create, +access, and close the attribute as needed: +\li Open the object that you wish to add an attribute to. +\li Create the attribute +\li Write to the attribute +\li Close the attribute and the object it is attached to. + +<table> +<caption>Creates attributes that are attached to the dataset dset</caption> +<tr> +<td> +\image html crtatt.png +</td> +</tr> +</table> + +The dataspace, datatype, and data are specified in the call to create an attribute in Python: + +<table> +<tr> +<td> +<em>Python</em> +\code + dataset.attrs["Units"] = “Meters per second” <-- Create string + attr_data = np.zeros((2,)) + attr_data[0] = 100 + attr_data[1] = 200 + dataset.attrs.create("Speed", attr_data, (2,), “i”) <-- Create Integer +\endcode +</td> +</tr> +</table> + +To create an integer attribute in C, you must create the dataspace, create the attribute, write +to it and then close it in separate steps: + +<table> +<tr> +<td> +<em>C</em> +\code + hid_t attribute_id, dataspace_id; // identifiers + hsize_t dims; + int attr_data[2]; + herr_t status; + + ... + + // Initialize the attribute data. + attr_data[0] = 100; + attr_data[1] = 200; + + // Create the data space for the attribute. + dims = 2; + dataspace_id = H5Screate_simple(1, &dims, NULL); + + // Create a dataset attribute. + attribute_id = H5Acreate2 (dataset_id, "Units", H5T_STD_I32BE, + dataspace_id, H5P_DEFAULT, H5P_DEFAULT); + + // Write the attribute data. + status = H5Awrite(attribute_id, H5T_NATIVE_INT, attr_data); + + // Close the attribute. + status = H5Aclose(attribute_id); + + // Close the dataspace. + status = H5Sclose(dataspace_id); +\endcode +</td> +</tr> +</table> + +<hr> +Navigate back: \ref index "Main" / \ref GettingStarted + + +@page HDF5Examples HDF5 Examples +Example programs of how to use HDF5 are provided below. +For HDF-EOS specific examples, see the <a href="http://hdfeos.org/zoo/index.php">examples</a> +of how to access and visualize NASA HDF-EOS files using IDL, MATLAB, and NCL on the +<a href="http://hdfeos.org/">HDF-EOS Tools and Information Center</a> page. + +\section secHDF5Examples Examples +\li \ref LBExamples +\li <a href="https://portal.hdfgroup.org/display/HDF5/Examples+by+API">Examples by API</a> +\li <a href="https://portal.hdfgroup.org/display/HDF5/Examples+in+the+Source+Code">Examples in the Source Code</a> +\li <a href="https://portal.hdfgroup.org/display/HDF5/Other+Examples">Other Examples</a> + +\section secHDF5ExamplesCompile How To Compile +For information on compiling in C, C++ and Fortran, see: \ref LBCompiling + +\section secHDF5ExamplesOther Other Examples +<a href="http://hdfeos.org/zoo/index.php">IDL, MATLAB, and NCL Examples for HDF-EOS</a> +Examples of how to access and visualize NASA HDF-EOS files using IDL, MATLAB, and NCL. + +<a href="https://support.hdfgroup.org/ftp/HDF5/examples/misc-examples/">Miscellaneous Examples</a> +These (very old) examples resulted from working with users, and are not fully tested. Most of them are in C, with a few in Fortran and Java. + +<a href="https://support.hdfgroup.org/ftp/HDF5/examples/special_values_HDF5_example.tar">Using Special Values</a> +These examples show how to create special values in an HDF5 application. + +*/ |