summaryrefslogtreecommitdiffstats
path: root/doc/html/storage.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html/storage.html')
-rw-r--r--doc/html/storage.html274
1 files changed, 0 insertions, 274 deletions
diff --git a/doc/html/storage.html b/doc/html/storage.html
deleted file mode 100644
index 87ea54d..0000000
--- a/doc/html/storage.html
+++ /dev/null
@@ -1,274 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
-<html>
- <head>
- <title>Raw Data Storage in HDF5</title>
- </head>
-
- <body>
- <h1>Raw Data Storage in HDF5</h1>
-
- <p>This document describes the various ways that raw data is
- stored in an HDF5 file and the object header messages which
- contain the parameters for the storage.
-
- <p>Raw data storage has three components: the mapping from some
- logical multi-dimensional element space to the linear address
- space of a file, compression of the raw data on disk, and
- striping of raw data across multiple files. These components
- are orthogonal.
-
- <p>Some goals of the storage mechanism are to be able to
- efficently store data which is:
-
- <dl>
- <dt>Small
- <dd>Small pieces of raw data can be treated as meta data and
- stored in the object header. This will be achieved by storing
- the raw data in the object header with message 0x0006.
- Compression and striping are not supported in this case.
-
- <dt>Complete Large
- <dd>The library should be able to store large arrays
- contiguously in the file provided the user knows the final
- array size a priori. The array can then be read/written in a
- single I/O request. This is accomplished by describing the
- storage with object header message 0x0005. Compression and
- striping are not supported in this case.
-
- <dt>Sparse Large
- <dd>A large sparse raw data array should be stored in a manner
- that is space-efficient but one in which any element can still
- be accessed in a reasonable amount of time. Implementation
- details are below.
-
- <dt>Dynamic Size
- <dd>One often doesn't have prior knowledge of the size of an
- array. It would be nice to allow arrays to grow dynamically in
- any dimension. It might also be nice to allow the array to
- grow in the negative dimension directions if convenient to
- implement. Implementation details are below.
-
- <dt>Subslab Access
- <dd>Some multi-dimensional arrays are almost always accessed by
- subslabs. For instance, a 2-d array of pixels might always be
- accessed as smaller 1k-by-1k 2-d arrays always aligned on 1k
- index values. We should be able to store the array in such a
- way that striding though the entire array is not necessary.
- Subslab access might also be useful with compression
- algorithms where each storage slab can be compressed
- independently of the others. Implementation details are below.
-
- <dt>Compressed
- <dd>Various compression algorithms can be applied to the entire
- array. We're not planning to support separate algorithms (or a
- single algorithm with separate parameters) for each chunk
- although it would be possible to implement that in a manner
- similar to the way striping across files is
- implemented.
-
- <dt>Striped Across Files
- <dd>The array access functions should support arrays stored
- discontiguously across a set of files.
- </dl>
-
- <h1>Implementation of Indexed Storage</h1>
-
- <p>The Sparse Large, Dynamic Size, and Subslab Access methods
- share so much code that they can be described with a single
- message. The new Indexed Storage Message (<code>0x0008</code>)
- will replace the old Chunked Object (<code>0x0009</code>) and
- Sparse Object (<code>0x000A</code>) Messages.
-
- <p>
- <center>
- <table border cellpadding=4 width="60%">
- <caption align=bottom>
- <b>The Format of the Indexed Storage Message</b>
- </caption>
- <tr align=center>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- </tr>
-
- <tr align=center>
- <td colspan=4><br>Address of B-tree<br><br></td>
- </tr>
- <tr align=center>
- <td>Number of Dimensions</td>
- <td>Reserved</td>
- <td>Reserved</td>
- <td>Reserved</td>
- </tr>
- <tr align=center>
- <td colspan=4>Reserved (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>Alignment for Dimension 0 (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>Alignment for Dimension 1 (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>...</td>
- </tr>
- <tr align=center>
- <td colspan=4>Alignment for Dimension N (4 bytes)</td>
- </tr>
- </table>
- </center>
-
- <p>The alignment fields indicate the alignment in logical space to
- use when allocating new storage areas on disk. For instance,
- writing every other element of a 100-element one-dimensional
- array (using one HDF5 I/O partial write operation per element)
- that has unit storage alignment would result in 50
- single-element, discontiguous storage segments. However, using
- an alignment of 25 would result in only four discontiguous
- segments. The size of the message varies with the number of
- dimensions.
-
- <p>A B-tree is used to point to the discontiguous portions of
- storage which has been allocated for the object. All keys of a
- particular B-tree are the same size and are a function of the
- number of dimensions. It is therefore not possible to change the
- dimensionality of an indexed storage array after its B-tree is
- created.
-
- <p>
- <center>
- <table border cellpadding=4 width="60%">
- <caption align=bottom>
- <b>The Format of a B-Tree Key</b>
- </caption>
- <tr align=center>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- </tr>
-
- <tr align=center>
- <td colspan=4>External File Number or Zero (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>Chunk Offset in Dimension 0 (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>Chunk Offset in Dimension 1 (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>...</td>
- </tr>
- <tr align=center>
- <td colspan=4>Chunk Offset in Dimension N (4 bytes)</td>
- </tr>
- </table>
- </center>
-
- <p>The keys within a B-tree obey an ordering based on the chunk
- offsets. If the offsets in dimension-0 are equal, then
- dimension-1 is used, etc. The External File Number field
- contains a 1-origin offset into the External File List message
- which contains the name of the external file in which that chunk
- is stored.
-
- <h1>Implementation of Striping</h1>
-
- <p>The indexed storage will support arbitrary striping at the
- chunk level; each chunk can be stored in any file. This is
- accomplished by using the External File Number field of an
- indexed storage B-tree key as a 1-origin offset into an External
- File List Message (0x0009) which takes the form:
-
- <p>
- <center>
- <table border cellpadding=4 width="60%">
- <caption align=bottom>
- <b>The Format of the External File List Message</b>
- </caption>
- <tr align=center>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- </tr>
-
- <tr align=center>
- <td colspan=4><br>Name Heap Address<br><br></td>
- </tr>
- <tr align=center>
- <td colspan=4>Number of Slots Allocated (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>Number of File Names (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>Byte Offset of Name 1 in Heap (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>Byte Offset of Name 2 in Heap (4 bytes)</td>
- </tr>
- <tr align=center>
- <td colspan=4>...</td>
- </tr>
- <tr align=center>
- <td colspan=4><br>Unused Slot(s)<br><br></td>
- </tr>
- </table>
- </center>
-
- <p>Each indexed storage array that has all or part of its data
- stored in external files will contain a single external file
- list message. The size of the messages is determined when the
- message is created, but it may be possible to enlarge the
- message on demand by moving it. At this time, it's not possible
- for multiple arrays to share a single external file list
- message.
-
- <dl>
- <dt><code>
- H5O_efl_t *H5O_efl_new (H5G_entry_t *object, intn
- nslots_hint, intn heap_size_hint)
- </code>
- <dd>Adds a new, empty external file list message to an object
- header and returns a pointer to that message. The message
- acts as a cache for file descriptors of external files that
- are open.
-
- <p><dt><code>
- intn H5O_efl_index (H5O_efl_t *efl, const char *filename)
- </code>
- <dd>Gets the external file index number for a particular file name.
- If the name isn't in the external file list then it's added to
- the H5O_efl_t struct and immediately written to the object
- header to which the external file list message belongs. Name
- comparison is textual. Each name should be relative to the
- directory which contains the HDF5 file.
-
- <p><dt><code>
- H5F_low_t *H5O_efl_open (H5O_efl_t *efl, intn index, uintn mode)
- </code>
- <dd>Gets a low-level file descriptor for an external file. The
- external file list caches file descriptors because we might
- have many more external files than there are file descriptors
- available to this process. The caller should not close this file.
-
- <p><dt><code>
- herr_t H5O_efl_release (H5O_efl_t *efl)
- </code>
- <dd>Releases an external file list, closes all files
- associated with that list, and if the list has been modified
- since the call to <code>H5O_efl_new</code> flushes the message
- to disk.
- </dl>
-
- <hr>
- <address><a href="mailto:robb@arborea.spizella.com">Robb Matzke</a></address>
-<!-- Created: Fri Oct 3 09:52:32 EST 1997 -->
-<!-- hhmts start -->
-Last modified: Tue Nov 25 12:36:50 EST 1997
-<!-- hhmts end -->
- </body>
-</html>