Meta Data Caching

The HDF5 library caches two types of data: meta data and raw data. The meta data cache holds file objects like the file header, symbol table nodes, global heap collections, object headers and their messages, etc. in a partially decoded state. The cache has a fixed number of entries which is set with the file access property list (defaults to 10k) and each entry can hold a single meta data object. Collisions between objects are handled by preempting the older object in favor of the new one.

Raw Data Chunk Caching

Raw data chunks are cached because I/O requests at the application level typically don't map well to chunks at the storage level. The chunk cache has a maximum size in bytes set with the file access property list (defaults to 1MB) and when the limit is reached chunks are preempted based on the following set of heuristics.

One should choose large values for w0 if I/O requests typically do not overlap but smaller values for w0 if the requests do overlap. For instance, reading an entire 2d array by reading from non-overlapping "windows" in a row-major order would benefit from a high w0 value while reading a diagonal accross the dataset where each request overlaps the previous request would benefit from a small w0.

The API

The cache parameters for both caches are part of a file access property list and are set and queried with this pair of functions:

herr_t H5Pset_cache(hid_t plist, unsigned int mdc_nelmts, size_t rdcc_nbytes, double w0)
herr_t H5Pget_cache(hid_t plist, unsigned int *mdc_nelmts, size_t *rdcc_nbytes, double w0)
Sets or queries the meta data cache and raw data chunk cache parameters. The plist is a file access property list. The number of elements (objects) in the meta data cache is mdc_nelmts. The total size of the raw data chunk cache and the preemption policy is rdcc_nbytes and w0. For H5Pget_cache() any (or all) of the pointer arguments may be null pointers.

Robb Matzke
Last modified: Tue May 26 15:38:27 EDT 1998