diff options
Diffstat (limited to 'doc/html/storage.html')
-rw-r--r-- | doc/html/storage.html | 274 |
1 files changed, 0 insertions, 274 deletions
diff --git a/doc/html/storage.html b/doc/html/storage.html deleted file mode 100644 index 87ea54d..0000000 --- a/doc/html/storage.html +++ /dev/null @@ -1,274 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> -<html> - <head> - <title>Raw Data Storage in HDF5</title> - </head> - - <body> - <h1>Raw Data Storage in HDF5</h1> - - <p>This document describes the various ways that raw data is - stored in an HDF5 file and the object header messages which - contain the parameters for the storage. - - <p>Raw data storage has three components: the mapping from some - logical multi-dimensional element space to the linear address - space of a file, compression of the raw data on disk, and - striping of raw data across multiple files. These components - are orthogonal. - - <p>Some goals of the storage mechanism are to be able to - efficently store data which is: - - <dl> - <dt>Small - <dd>Small pieces of raw data can be treated as meta data and - stored in the object header. This will be achieved by storing - the raw data in the object header with message 0x0006. - Compression and striping are not supported in this case. - - <dt>Complete Large - <dd>The library should be able to store large arrays - contiguously in the file provided the user knows the final - array size a priori. The array can then be read/written in a - single I/O request. This is accomplished by describing the - storage with object header message 0x0005. Compression and - striping are not supported in this case. - - <dt>Sparse Large - <dd>A large sparse raw data array should be stored in a manner - that is space-efficient but one in which any element can still - be accessed in a reasonable amount of time. Implementation - details are below. - - <dt>Dynamic Size - <dd>One often doesn't have prior knowledge of the size of an - array. It would be nice to allow arrays to grow dynamically in - any dimension. It might also be nice to allow the array to - grow in the negative dimension directions if convenient to - implement. Implementation details are below. - - <dt>Subslab Access - <dd>Some multi-dimensional arrays are almost always accessed by - subslabs. For instance, a 2-d array of pixels might always be - accessed as smaller 1k-by-1k 2-d arrays always aligned on 1k - index values. We should be able to store the array in such a - way that striding though the entire array is not necessary. - Subslab access might also be useful with compression - algorithms where each storage slab can be compressed - independently of the others. Implementation details are below. - - <dt>Compressed - <dd>Various compression algorithms can be applied to the entire - array. We're not planning to support separate algorithms (or a - single algorithm with separate parameters) for each chunk - although it would be possible to implement that in a manner - similar to the way striping across files is - implemented. - - <dt>Striped Across Files - <dd>The array access functions should support arrays stored - discontiguously across a set of files. - </dl> - - <h1>Implementation of Indexed Storage</h1> - - <p>The Sparse Large, Dynamic Size, and Subslab Access methods - share so much code that they can be described with a single - message. The new Indexed Storage Message (<code>0x0008</code>) - will replace the old Chunked Object (<code>0x0009</code>) and - Sparse Object (<code>0x000A</code>) Messages. - - <p> - <center> - <table border cellpadding=4 width="60%"> - <caption align=bottom> - <b>The Format of the Indexed Storage Message</b> - </caption> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - </tr> - - <tr align=center> - <td colspan=4><br>Address of B-tree<br><br></td> - </tr> - <tr align=center> - <td>Number of Dimensions</td> - <td>Reserved</td> - <td>Reserved</td> - <td>Reserved</td> - </tr> - <tr align=center> - <td colspan=4>Reserved (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>Alignment for Dimension 0 (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>Alignment for Dimension 1 (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>...</td> - </tr> - <tr align=center> - <td colspan=4>Alignment for Dimension N (4 bytes)</td> - </tr> - </table> - </center> - - <p>The alignment fields indicate the alignment in logical space to - use when allocating new storage areas on disk. For instance, - writing every other element of a 100-element one-dimensional - array (using one HDF5 I/O partial write operation per element) - that has unit storage alignment would result in 50 - single-element, discontiguous storage segments. However, using - an alignment of 25 would result in only four discontiguous - segments. The size of the message varies with the number of - dimensions. - - <p>A B-tree is used to point to the discontiguous portions of - storage which has been allocated for the object. All keys of a - particular B-tree are the same size and are a function of the - number of dimensions. It is therefore not possible to change the - dimensionality of an indexed storage array after its B-tree is - created. - - <p> - <center> - <table border cellpadding=4 width="60%"> - <caption align=bottom> - <b>The Format of a B-Tree Key</b> - </caption> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - </tr> - - <tr align=center> - <td colspan=4>External File Number or Zero (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>Chunk Offset in Dimension 0 (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>Chunk Offset in Dimension 1 (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>...</td> - </tr> - <tr align=center> - <td colspan=4>Chunk Offset in Dimension N (4 bytes)</td> - </tr> - </table> - </center> - - <p>The keys within a B-tree obey an ordering based on the chunk - offsets. If the offsets in dimension-0 are equal, then - dimension-1 is used, etc. The External File Number field - contains a 1-origin offset into the External File List message - which contains the name of the external file in which that chunk - is stored. - - <h1>Implementation of Striping</h1> - - <p>The indexed storage will support arbitrary striping at the - chunk level; each chunk can be stored in any file. This is - accomplished by using the External File Number field of an - indexed storage B-tree key as a 1-origin offset into an External - File List Message (0x0009) which takes the form: - - <p> - <center> - <table border cellpadding=4 width="60%"> - <caption align=bottom> - <b>The Format of the External File List Message</b> - </caption> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - </tr> - - <tr align=center> - <td colspan=4><br>Name Heap Address<br><br></td> - </tr> - <tr align=center> - <td colspan=4>Number of Slots Allocated (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>Number of File Names (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>Byte Offset of Name 1 in Heap (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>Byte Offset of Name 2 in Heap (4 bytes)</td> - </tr> - <tr align=center> - <td colspan=4>...</td> - </tr> - <tr align=center> - <td colspan=4><br>Unused Slot(s)<br><br></td> - </tr> - </table> - </center> - - <p>Each indexed storage array that has all or part of its data - stored in external files will contain a single external file - list message. The size of the messages is determined when the - message is created, but it may be possible to enlarge the - message on demand by moving it. At this time, it's not possible - for multiple arrays to share a single external file list - message. - - <dl> - <dt><code> - H5O_efl_t *H5O_efl_new (H5G_entry_t *object, intn - nslots_hint, intn heap_size_hint) - </code> - <dd>Adds a new, empty external file list message to an object - header and returns a pointer to that message. The message - acts as a cache for file descriptors of external files that - are open. - - <p><dt><code> - intn H5O_efl_index (H5O_efl_t *efl, const char *filename) - </code> - <dd>Gets the external file index number for a particular file name. - If the name isn't in the external file list then it's added to - the H5O_efl_t struct and immediately written to the object - header to which the external file list message belongs. Name - comparison is textual. Each name should be relative to the - directory which contains the HDF5 file. - - <p><dt><code> - H5F_low_t *H5O_efl_open (H5O_efl_t *efl, intn index, uintn mode) - </code> - <dd>Gets a low-level file descriptor for an external file. The - external file list caches file descriptors because we might - have many more external files than there are file descriptors - available to this process. The caller should not close this file. - - <p><dt><code> - herr_t H5O_efl_release (H5O_efl_t *efl) - </code> - <dd>Releases an external file list, closes all files - associated with that list, and if the list has been modified - since the call to <code>H5O_efl_new</code> flushes the message - to disk. - </dl> - - <hr> - <address><a href="mailto:robb@arborea.spizella.com">Robb Matzke</a></address> -<!-- Created: Fri Oct 3 09:52:32 EST 1997 --> -<!-- hhmts start --> -Last modified: Tue Nov 25 12:36:50 EST 1997 -<!-- hhmts end --> - </body> -</html> |