summaryrefslogtreecommitdiffstats
path: root/doxygen/dox/IntroHDF5.dox
blob: afe534be614c89d0e1b0ec6cac4e1580da6e0416 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
/** @page IntroHDF5 Introduction to HDF5

Navigate back: \ref index "Main" / \ref GettingStarted
<hr>

\section sec_intro_desc HDF5 Description
HDF5 consists of a file format for storing HDF5 data, a data model for logically organizing and accessing
HDF5 data from an application, and the software (libraries, language interfaces, and tools) for working with this format.

\subsection subsec_intro_desc_file File Format
HDF5 consists of a file format for storing HDF5 data, a data model for logically organizing and accessing HDF5 data from an application,
and the software (libraries, language interfaces, and tools) for working with this format.

\subsection subsec_intro_desc_dm Data Model
The HDF5 Data Model, also known as the HDF5 Abstract (or Logical) Data Model consists of
the building blocks for data organization and specification in HDF5.

An HDF5 file (an object in itself) can be thought of as a container (or group) that holds
a variety of heterogeneous data objects (or datasets). The datasets can be images, tables,
graphs, and even documents, such as PDF or Excel:

<table>
<tr>
<td>
\image html fileobj.png
</td>
</tr>
</table>

The two primary objects in the HDF5 Data Model are groups and datasets.

There are also a variety of other objects in the HDF5 Data Model that support groups and datasets,
including datatypes, dataspaces, properties and attributes.

\subsubsection subsec_intro_desc_dm_group Groups
HDF5 groups (and links) organize data objects. Every HDF5 file contains a root group that can
contain other groups or be linked to objects in other files.

<table>
<caption>There are two groups in the HDF5 file depicted above: Viz and SimOut.
Under the Viz group are a variety of images and a table that is shared with the SimOut group.
The SimOut group contains a 3-dimensional array, a 2-dimensional array and a link to a 2-dimensional
array in another HDF5 file.</caption>
<tr>
<td>
\image html group.png
</td>
</tr>
</table>

Working with groups and group members is similar in many ways to working with directories and files
in UNIX. As with UNIX directories and files, objects in an HDF5 file are often described by giving
their full (or absolute) path names.
\li / signifies the root group.
\li /foo signifies a member of the root group called foo.
\li /foo/zoo signifies a member of the group foo, which in turn is a member of the root group.

\subsubsection subsec_intro_desc_dm_dset Datasets
HDF5 datasets organize and contain the “raw” data values. A dataset consists of metadata
that describes the data, in addition to the data itself:

<table>
<caption>In this picture, the data is stored as a three dimensional dataset of size 4 x 5 x 6 with an integer datatype.
It contains attributes, Time and Pressure, and the dataset is chunked and compressed.</caption>
<tr>
<td>
\image html dataset.png
</td>
</tr>
</table>

Datatypes, dataspaces, properties and (optional) attributes are HDF5 objects that describe a dataset.
The datatype describes the individual data elements.

\subsection subsec_intro_desc_props Datatypes, Dataspaces, Properties and Attributes

\subsubsection subsec_intro_desc_prop_dtype Datatypes
The datatype describes the individual data elements in a dataset. It provides complete information for
data conversion to or from that datatype.

<table>
<caption>In the dataset depicted, each element of the dataset is a 32-bit integer.</caption>
<tr>
<td>
\image html datatype.png
</td>
</tr>
</table>

Datatypes in HDF5 can be grouped into:
<ul>
<li>
<b>Pre-Defined Datatypes</b>: These are datatypes that are created by HDF5. They are actually opened (and closed)
by HDF5 and can have different values from one HDF5 session to the next. There are two types of pre-defined datatypes:
<ul>
<li>
Standard datatypes are the same on all platforms and are what you see in an HDF5 file. Their names are of the form
H5T_ARCH_BASE where ARCH is an architecture name and BASE is a pro­gramming type name. For example, #H5T_IEEE_F32BE
indicates a standard Big Endian floating point type.
</li>
<li>
Native datatypes are used to simplify memory operations (reading, writing) and are NOT the same on different platforms.
For example, #H5T_NATIVE_INT indicates an int (C).
</li>
</ul>
</li>
<li>
<b>Derived Datatypes</b>: These are datatypes that are created or derived from the pre-defined datatypes.
An example of a commonly used derived datatype is a string of more than one character. Compound datatypes
are also derived types.  A compound datatype can be used to create a simple table, and can also be nested,
in which it includes one more other compound datatypes.
<table>
<caption>This is an example of a dataset with a compound datatype. Each element in the dataset consists
of a 16-bit integer, a character, a 32-bit integer, and a 2x3x2 array of 32-bit floats (the datatype).
It is a 2-dimensional 5 x 3 array (the dataspace). The datatype should not be confused with the dataspace.
</caption>
<tr>
<td>
\image html cmpnddtype.png
</td>
</tr>
</table>
</li>
</ul>

\subsubsection subsec_intro_desc_prop_dspace Dataspaces
A dataspace describes the layout of a dataset's data elements. It can consist of no elements (NULL),
a single element (scalar), or a simple array.

<table>
<caption>This image illustrates a dataspace that is an array with dimensions of 5 x 3 and a rank (number of dimensions) of 2.</caption>
<tr>
<td>
\image html dataspace1.png
</td>
</tr>
</table>

A dataspace can have dimensions that are fixed (unchanging) or unlimited, which means they can grow
in size (i.e. they are extendible).

There are two roles of a dataspace:
\li It contains the spatial information (logical layout) of a dataset stored in a file. This includes the rank and dimensions of a dataset, which are a permanent part of the dataset definition.
\li It describes an application's data buffers and data elements participating in I/O. In other words, it can be used to select a portion or subset of a dataset.

<table>
<caption>The dataspace is used to describe both the logical layout of a dataset and a subset of a dataset.</caption>
<tr>
<td>
\image html dataspace.png
</td>
</tr>
</table>

\subsubsection subsec_intro_desc_prop_property Properties
A property is a characteristic or feature of an HDF5 object. There are default properties which
handle the most common needs. These default properties can be modified using the HDF5 Property
List API to take advantage of more powerful or unusual features of HDF5 objects.

<table>
<tr>
<td>
\image html properties.png
</td>
</tr>
</table>

For example, the data storage layout property of a dataset is contiguous by default. For better
performance, the layout can be modified to be chunked or chunked and compressed:

\subsubsection subsec_intro_desc_prop_attr Attributes
Attributes can optionally be associated with HDF5 objects. They have two parts: a name and a value.
Attributes are accessed by opening the object that they are attached to so are not independent objects.
Typically an attribute is small in size and contains user metadata about the object that it is attached to.

Attributes look similar to HDF5 datasets in that they have a datatype and dataspace. However, they
do not support partial I/O operations, and they cannot be compressed or extended.

\subsection subsec_intro_desc_soft HDF5 Software
The HDF5 software is written in C and includes optional wrappers for C++, FORTRAN (90 and F2003),
and Java. The HDF5 binary distribution consists of the HDF5 libraries, include files, command-line
utilities, scripts for compiling applications, and example programs.

\subsubsection subsec_intro_desc_soft_apis HDF5 APIs and Libraries
There are APIs for each type of object in HDF5. For example, all C routines in the HDF5 library
begin with a prefix of the form H5*, where * is one or two uppercase letters indicating the type
of object on which the function operates:
\li @ref H5A     <b>A</b>ttribute Interface
\li @ref H5D     <b>D</b>ataset Interface
\li @ref H5F     <b>F</b>ile Interface

The HDF5 High Level APIs simplify many of the steps required to create and access objects, as well
as providing templates for storing objects. Following is a list of the High Level APIs:
\li @ref H5LT – simplifies steps in creating datasets and attributes
\li @ref H5IM – defines a standard for storing images in HDF5
\li @ref H5TB – condenses the steps required to create tables
\li @ref H5DS – provides a standard for dimension scale storage
\li @ref H5PT – provides a standard for storing packet data

\subsubsection subsec_intro_desc_soft_tools Tools
Useful tools for working with HDF5 files include:
\li h5dump:             A utility to dump or display the contents of an HDF5 File
\li h5cc, h5c++, h5fc:  Unix scripts for compiling applications
\li HDFView:            A java browser to view HDF (HDF4 and HDF5) files

<h4>h5dump</h4>
The h5dump utility displays the contents of an HDF5 file in Data Description Language (\ref DDLBNF114).
Below is an example of h5dump output for an HDF5 file that contains no objects:
\code
$ h5dump file.h5
   HDF5 "file.h5" {
   GROUP "/" {
   }
   }
\endcode

With large files and datasets the output from h5dump can be overwhelming.
There are options that can be used to examine specific parts of an HDF5 file.
Some useful h5dump options are included below:
\code
   -H, --header      Display header information only (no data)
   -d <name>      Display a dataset with a specified path and name
   -p                          Display properties
   -n                          Display the contents of the file
\endcode

<h4>h5cc, h5fc, h5c++</h4>
The built HDF5 binaries include the h5cc, h5fc, h5c++ compile scripts for compiling applications.
When using these scripts there is no need to specify the HDF5 libraries and include files.
Compiler options can be passed to the scripts.

<h4>HDFView</h4>
The HDFView tool allows browsing of data in HDF (HDF4 and HDF5) files. 

\section sec_intro_pm Introduction to the HDF5 Programming Model and APIs
The HDF5 Application Programming Interface is extensive, but a few functions do most of the work.

To introduce the programming model, examples in Python and C are included below. The Python examples
use the HDF5 Python APIs (h5py). See the Examples from "Learning the Basics" page for complete examples
that can be downloaded and run for C, FORTRAN, C++, Java and Python.

The general paradigm for working with objects in HDF5 is to:
\li Open the object.
\li Access the object.
\li Close the object.

The library imposes an order on the operations by argument dependencies. For example, a file must be
opened before a dataset because the dataset open call requires a file handle as an argument. Objects
can be closed in any order. However, once an object is closed it no longer can be accessed.

Keep the following in mind when looking at the example programs included in this section:
<ul>
<li>
<ul>
<li>
C routines begin with the prefix “H5*” where * is a single letter indicating the object on which the
operation is to be performed.
</li>
<li>
FORTRAN routines are similar; they begin with “h5*” and end with “_f”.
</li>
<li>
Java routines are similar; the routine names begin with “H5*” and are prefixed with “H5.” as the class. Constants are
in the HDF5Constants class and are prefixed with "HDF5Constants.". The function arguments
are usually similar, @see @ref HDF5LIB
</li>
</ul>
For example:
<ul>
<li>
File Interface:<ul><li>#H5Fopen (C)</li><li>h5fopen_f (FORTRAN)</li><li>H5.H5Fopen (Java)</li></ul>
</li>
<li>
Dataset Interface:<ul><li>#H5Dopen (C)</li><li>h5dopen_f (FORTRAN)</li><li>H5.H5Dopen (Java)</li></ul>
</li>
<li>
Dataspace interface:<ul><li>#H5Sclose (C)</li><li>h5sclose_f (FORTRAN)</li><li>H5.H5Sclose (Java)</li></ul>
</li>
</ul>
The HDF5 Python APIs use methods associated with specific objects.
</li>
<li>
For portability, the HDF5 library has its own defined types. Some common types that you will see
in the example code are:
<ul>
<li>
#hid_t is used for object handles
</li>
<li>
hsize_t is used for dimensions
</li>
<li>
#herr_t is used for many return values
</li>
</ul>
</li>
<li>
Language specific files must be included in applications:
<ul>
<li>
Python:   Add <code>"import h5py / import numpy"</code>
</li>
<li>
C:        Add <code>"#include hdf5.h"</code>
</li>
<li>
FORTRAN:  Add <code>"USE HDF5"</code> and call h5open_f and h5close_f to initialize and close the HDF5 FORTRAN interface
</li>
<li>
Java:     Add <code>"import hdf.hdf5lib.H5;
                     import hdf.hdf5lib.HDF5Constants;"</code>
</li>
</ul>
</li>
</ul>

\subsection subsec_intro_pm_file Steps to create a file
To create an HDF5 file you must:
\li Specify property lists (or use the defaults).
\li Create the file.
\li Close the file (and property lists if needed).

Example:
<table>
<caption>The following Python and C examples create a file, file.h5, and then close it.
The resulting HDF5 file will only contain a root group:</caption>
<tr>
<td>
\image html crtf-pic.png
</td>
</tr>
</table>

Calling h5py.File with ‘w’ for the file access flag will create a new HDF5 file and overwrite
an existing file with the same name.  “file” is the file handle returned from opening the file.
When finished with the file, it must be closed. When not specifying property lists, the default
property lists are used:

<table>
<tr>
<td>
<em>Python</em>
\code
   import h5py
   file = h5py.File (‘file.h5’, ‘w’)
   file.close ()
\endcode
</td>
</tr>
</table>

The H5Fcreate function creates an HDF5 file. #H5F_ACC_TRUNC is the file access flag to create a new
file and overwrite an existing file with the same name, and #H5P_DEFAULT is the value specified to
use a default property list.

<table>
<tr>
<td>
<em>C</em>
\code
   #include “hdf5.h”

   int main() {
       hid_t       file_id;
       herr_t      status;

       file_id = H5Fcreate ("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);   
       status = H5Fclose (file_id);
   }
\endcode
</td>
</tr>
</table>

\subsection subsec_intro_pm_dataset Steps to create a dataset
As described previously, an HDF5 dataset consists of the raw data, as well as the metadata that
describes the data (datatype, spatial information, and properties). To create a dataset you must:
\li Define the dataset characteristics (datatype, dataspace, properties).
\li Decide which group to attach the dataset to.
\li Create the dataset.
\li Close the dataset handle from step 3.

Example:
<table>
<caption>The code excerpts below show the calls that need to be made to create a 4 x 6 integer dataset dset
in a file dset.h5. The dataset will be located in the root group:</caption>
<tr>
<td>
\image html crtdset.png
</td>
</tr>
</table>

With Python, the creation of the dataspace is included as a parameter in the dataset creation method.
Just one call will create a 4 x 6 integer dataset dset. A pre-defined Big Endian 32-bit integer datatype
is specified. The create_dataset method creates the dataset in the root group (the file object).
The dataset is close by the Python interface.

<table>
<tr>
<td>
<em>Python</em>
\code
   dataset = file.create_dataset("dset",(4, 6), h5py.h5t.STD_I32BE)
\endcode
</td>
</tr>
</table>

To create the same dataset in C, you must specify the dataspace with the #H5Screate_simple function,
create the dataset by calling #H5Dcreate, and then close the dataspace and dataset with calls to #H5Dclose
and #H5Sclose. #H5P_DEFAULT is specified to use a default property list. Note that the file identifier
(file_id) is passed in as the first parameter to #H5Dcreate, which creates the dataset in the root group.

<table>
<tr>
<td>
<em>C</em>
\code
   // Create the dataspace for the dataset.
   dims[0] = 4;
   dims[1] = 6;

   dataspace_id = H5Screate_simple(2, dims, NULL);

   // Create the dataset.
   dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

   // Close the dataset and dataspace
   status = H5Dclose(dataset_id);
   status = H5Sclose(dataspace_id);
\endcode
</td>
</tr>
</table>

\subsection subsec_intro_pm_write Writing to or reading from a dataset
Once you have created or opened a dataset you can write to it:

<table>
<tr>
<td>
<em>Python</em>
\code
   data = np.zeros((4,6))
   for i in range(4):
      for j in range(6):
         data[i][j]= i*6+j+1

   dataset[...] = data       <-- Write data to dataset
   data_read = dataset[...]  <-- Read data from dataset
\endcode
</td>
</tr>
</table>

#H5S_ALL is passed in for the memory and file dataspace parameters to indicate that the entire dataspace
of the dataset is specified. These two parameters can be modified to allow subsetting of a dataset.
The native predefined datatype, #H5T_NATIVE_INT, is used for reading and writing so that HDF5 will do
any necessary integer conversions:

<table>
<tr>
<td>
<em>C</em>
\code
   status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);
   status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);
\endcode
</td>
</tr>
</table>

\subsection subsec_intro_pm_group Steps to create a group
An HDF5 group is a structure containing zero or more HDF5 objects. Before you can create a group you must
obtain the location identifier of where the group is to be created. Following are the steps that are required:
\li Decide where to put the group – in the “root group” (or file identifier) or in another group. Open the group if it is not already open.
\li Define properties or use the default.
\li Create the group.
\li Close the group.

<table>
<caption>Creates attributes that are attached to the dataset dset</caption>
<tr>
<td>
\image html crtgrp.png
</td>
</tr>
</table>

The code below opens the dataset dset.h5 with read/write permission and creates a group MyGroup in the root group.
Properties are not specified so the defaults are used:

<table>
<tr>
<td>
<em>Python</em>
\code
   import h5py
   file = h5py.File('dset.h5', 'r+')  
   group = file.create_group ('MyGroup')
   file.close()
\endcode
</td>
</tr>
</table>

To create the group MyGroup in the root group, you must call #H5Gcreate, passing in the file identifier returned
from opening or creating the file. The default property lists are specified with #H5P_DEFAULT. The group is then
closed:

<table>
<tr>
<td>
<em>C</em>
\code
   group_id = H5Gcreate (file_id, "MyGroup", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
   status = H5Gclose (group_id);
\endcode
</td>
</tr>
</table>

\subsection subsec_intro_pm_attr Steps to create and write to an attribute
To create an attribute you must open the object that you wish to attach the attribute to. Then you can create,
access, and close the attribute as needed:
\li Open the object that you wish to add an attribute to.
\li Create the attribute
\li Write to the attribute
\li Close the attribute and the object it is attached to.

<table>
<caption>Creates attributes that are attached to the dataset dset</caption>
<tr>
<td>
\image html crtatt.png
</td>
</tr>
</table>

The dataspace, datatype, and data are specified in the call to create an attribute in Python:

<table>
<tr>
<td>
<em>Python</em>
\code
   dataset.attrs["Units"] = “Meters per second”  <-- Create string
   attr_data = np.zeros((2,))
   attr_data[0] = 100
   attr_data[1] = 200
   dataset.attrs.create("Speed", attr_data, (2,), “i”) <-- Create Integer
\endcode
</td>
</tr>
</table>

To create an integer attribute in C, you must create the dataspace, create the attribute, write
to it and then close it in separate steps:

<table>
<tr>
<td>
<em>C</em>
\code
   hid_t       attribute_id, dataspace_id;  // identifiers
   hsize_t     dims;
   int         attr_data[2];
   herr_t      status;

      ...

   // Initialize the attribute data.
   attr_data[0] = 100;
   attr_data[1] = 200;

   // Create the data space for the attribute.
   dims = 2;
   dataspace_id = H5Screate_simple(1, &dims, NULL);

   // Create a dataset attribute.
   attribute_id = H5Acreate2 (dataset_id, "Units", H5T_STD_I32BE,
                              dataspace_id, H5P_DEFAULT, H5P_DEFAULT);

   // Write the attribute data.
   status = H5Awrite(attribute_id, H5T_NATIVE_INT, attr_data);

   // Close the attribute.
   status = H5Aclose(attribute_id);

   // Close the dataspace.
   status = H5Sclose(dataspace_id);
\endcode
</td>
</tr>
</table>

<hr>
Navigate back: \ref index "Main" / \ref GettingStarted


@page HDF5Examples HDF5 Examples
Example programs of how to use HDF5 are provided below.
For HDF-EOS specific examples, see the <a href="http://hdfeos.org/zoo/index.php">examples</a>
of how to access and visualize NASA HDF-EOS files using Python, IDL, MATLAB, and NCL
on the <a href="http://hdfeos.org/">HDF-EOS Tools and Information Center</a> page.

\section secHDF5Examples Examples
\li \ref LBExamples
\li \ref ExAPI
\li <a href="https://portal.hdfgroup.org/display/HDF5/Examples+in+the+Source+Code">Examples in the Source Code</a>
\li <a href="https://portal.hdfgroup.org/display/HDF5/Other+Examples">Other Examples</a>

\section secHDF5ExamplesCompile How To Compile
For information on compiling in C, C++ and Fortran, see: \ref LBCompiling

\section secHDF5ExamplesOther Other Examples
<a href="http://hdfeos.org/zoo/index.php">IDL, MATLAB, and NCL Examples for HDF-EOS</a>
Examples of how to access and visualize NASA HDF-EOS files using IDL, MATLAB, and NCL.

<a href="https://support.hdfgroup.org/ftp/HDF5/examples/misc-examples/">Miscellaneous Examples</a>
These (very old) examples resulted from working with users, and are not fully tested. Most of them are in C, with a few in Fortran and Java.

<a href="https://support.hdfgroup.org/ftp/HDF5/examples/special_values_HDF5_example.tar">Using Special Values</a>
These examples show how to create special values in an HDF5 application.

*/