HDF5 documents and links 
Introduction to HDF5 
HDF5 Reference Manual 
HDF5 User's Guide for Release 1.6 
And in this document, the HDF5 User's Guide from Release 1.4.5:    
Files   Datasets   Datatypes   Dataspaces   Groups  
References   Attributes   Property Lists   Error Handling  
Filters   Caching   Chunking   Mounting Files  
Performance   Debugging   Environment   DDL  

Performance Analysis and Issues

1. Introduction

This section includes brief discussions of performance issues in HDF5 and performance analysis tools for HDF5 or pointers to such discussions.

2. Dataset Chunking

Appropriate dataset chunking can make a siginificant difference in HDF5 performance. This topic is discussed in Dataset Chunking Issues elsewhere in this User's Guide.

3. Freespace Management

HDF5 does not yet manage freespace as effectively as it might. While a file is opened, the library actively tracks and re-uses freespace, i.e., space that is freed (or released) during the run. But the library does not yet manage freespace across the closing and reopening of a file; when a file is closed, all knowledge of available freespace is lost. What was freespace becomes an unusable hole in the file.

There are several circumstances that can result in freespace in an HDF5 file:

As stated above, freespace is not managed across the closing and reopening of an HDF5 file; file space that was known freespace while the file remained open becomes an inaccessible hole when the file is closed. Thus, if a file is often closed and reopened, datasets frequently rewritten, or groups and/or datasets frequently added and deleted, that file can develop large numbers of holes and grow unnecessarily large. This can, in turn, seriously impair application or library performance as the file ages.

An h5pack utility would enable packing a file to remove the holes, but writing such a utility to universally pack the file correctly is a complex task and the HDF5 development team has not to date had the resources to complete the task.

For application developers or researchers who find themselves working with files that become bloated in this manner, there are, at this time, two remedies:

1 This is a problem only with compressed chunks. The compression ratio of data is highly dependent on the data itself; regardless of whether the size of the data changes, the size of the compressed data change substantially as the data changes. Uncompressed chunks do not vary in size, so this issue does not arise.

4. Use of the Pablo Instrumentation of HDF5

Pablo HDF5 Trace software provides a means of measuring the performance of programs using HDF5.

The Pablo software consists of an instrumented copy of the HDF5 library, the Pablo Trace and Trace Extensions libraries, and some utilities for processing the output. The instrumented version of the HDF5 library has hooks inserted into the HDF5 code which call routines in the Pablo Trace library just after entry to each instrumented HDF5 routine and just prior to exit from the routine. The Pablo Trace Extension library has programs that track the I/O activity between the entry and exit of the HDF5 routine during execution.

A few lines of code must be inserted in the user's main program to enable tracing and to specify which HDF5 procedures are to be traced. The program is linked with the special HDF5 and Pablo libraries to produce an executable. Running this executable on a single processor produces an output file called the trace file which contains records, called Pablo Self-Defining Data Format (SDDF) records, which can later be analyzed using the HDF5 Analysis Utilities. The HDF5 Analysis Utilites can be used to interpret the SDDF records in the trace files to produce a report describing the HDF5 IO activity that occurred during execution.

For further instructions, see the file READ_ME in the $(toplevel)/hdf5/pablo/ subdirectory of the HDF5 source code distribution.

For further information about Pablo and the Self-Defining Data Format, visit the Pablo website at http://www-pablo.cs.uiuc.edu/.


HDF5 documents and links 
Introduction to HDF5 
HDF5 Reference Manual 
HDF5 User's Guide for Release 1.6 
And in this document, the HDF5 User's Guide from Release 1.4.5:    
Files   Datasets   Datatypes   Dataspaces   Groups  
References   Attributes   Property Lists   Error Handling  
Filters   Caching   Chunking   Mounting Files  
Performance   Debugging   Environment   DDL  

HDF Help Desk
Describes HDF5 Release 1.4.5, February 2003
Last modified: 2 August 2001