This document describes the various ways that raw data is stored in an HDF5 file and the object header messages which contain the parameters for the storage.
Raw data storage has three components: the mapping from some logical multi-dimensional element space to the linear address space of a file, compression of the raw data on disk, and striping of raw data across multiple files. These components are orthogonal.
Some goals of the storage mechanism are to be able to efficently store data which is:
The Sparse Large, Dynamic Size, and Subslab Access methods
share so much code that they can be described with a single
message. The new Indexed Storage Message (0x0008
)
will replace the old Chunked Object (0x0009
) and
Sparse Object (0x000A
) Messages.
byte | byte | byte | byte |
---|---|---|---|
Address of B-tree |
|||
Number of Dimensions | Reserved | Reserved | Reserved |
Reserved (4 bytes) | |||
Alignment for Dimension 0 (4 bytes) | |||
Alignment for Dimension 1 (4 bytes) | |||
... | |||
Alignment for Dimension N (4 bytes) |
The alignment fields indicate the alignment in logical space to use when allocating new storage areas on disk. For instance, writing every other element of a 100-element one-dimensional array (using one HDF5 I/O partial write operation per element) that has unit storage alignment would result in 50 single-element, discontiguous storage segments. However, using an alignment of 25 would result in only four discontiguous segments. The size of the message varies with the number of dimensions.
A B-tree is used to point to the discontiguous portions of storage which has been allocated for the object. All keys of a particular B-tree are the same size and are a function of the number of dimensions. It is therefore not possible to change the dimensionality of an indexed storage array after its B-tree is created.
byte | byte | byte | byte |
---|---|---|---|
External File Number or Zero (4 bytes) | |||
Chunk Offset in Dimension 0 (4 bytes) | |||
Chunk Offset in Dimension 1 (4 bytes) | |||
... | |||
Chunk Offset in Dimension N (4 bytes) |
The keys within a B-tree obey an ordering based on the chunk offsets. If the offsets in dimension-0 are equal, then dimension-1 is used, etc. The External File Number field contains a 1-origin offset into the External File List message which contains the name of the external file in which that chunk is stored.
The indexed storage will support arbitrary striping at the chunk level; each chunk can be stored in any file. This is accomplished by using the External File Number field of an indexed storage B-tree key as a 1-origin offset into an External File List Message (0x0009) which takes the form:
byte | byte | byte | byte |
---|---|---|---|
Name Heap Address |
|||
Number of Slots Allocated (4 bytes) | |||
Number of File Names (4 bytes) | |||
Byte Offset of Name 1 in Heap (4 bytes) | |||
Byte Offset of Name 2 in Heap (4 bytes) | |||
... | |||
Unused Slot(s) |
Each indexed storage array that has all or part of its data stored in external files will contain a single external file list message. The size of the messages is determined when the message is created, but it may be possible to enlarge the message on demand by moving it. At this time, it's not possible for multiple arrays to share a single external file list message.
H5O_efl_t *H5O_efl_new (H5G_entry_t *object, intn
nslots_hint, intn heap_size_hint)
intn H5O_efl_index (H5O_efl_t *efl, const char *filename)
H5F_low_t *H5O_efl_open (H5O_efl_t *efl, intn index, uintn mode)
herr_t H5O_efl_release (H5O_efl_t *efl)
H5O_efl_new
flushes the message
to disk.