[svn-r467] Restructuring documentation.

author: Quincey Koziol <koziol@hdfgroup.org> 1998-07-08 14:54:54 (GMT)
committer: Quincey Koziol <koziol@hdfgroup.org> 1998-07-08 14:54:54 (GMT)
commit: bd1e676c521d881b3143829f493a28b5ced1294b (patch)
tree: 69c50f9fe21ce87f293d8617a6bd51b4cc1e0244 /doc/html/storage.html
parent: 73345095897d9698bb1f2f7df830bf80a56dc65a (diff)
download: hdf5-bd1e676c521d881b3143829f493a28b5ced1294b.zip
hdf5-bd1e676c521d881b3143829f493a28b5ced1294b.tar.gz
hdf5-bd1e676c521d881b3143829f493a28b5ced1294b.tar.bz2
1 files changed, 274 insertions, 0 deletions
diff --git a/doc/html/storage.html b/doc/html/storage.html
new file mode 100644
index 0000000..87ea54d
--- /dev/null
+++ b/doc/html/storage.html
@@ -0,0 +1,274 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+  <head>
+    <title>Raw Data Storage in HDF5</title>
+  </head>
+
+  <body>
+    <h1>Raw Data Storage in HDF5</h1>
+
+    <p>This document describes the various ways that raw data is
+      stored in an HDF5 file and the object header messages which
+      contain the parameters for the storage.
+
+    <p>Raw data storage has three components: the mapping from some
+      logical multi-dimensional element space to the linear address
+      space of a file, compression of the raw data on disk, and
+      striping of raw data across multiple files.  These components
+      are orthogonal.
+      
+    <p>Some goals of the storage mechanism are to be able to
+      efficently store data which is:
+
+    <dl>
+      <dt>Small
+      <dd>Small pieces of raw data can be treated as meta data and
+	stored in the object header.  This will be achieved by storing
+	the raw data in the object header with message 0x0006.
+	Compression and striping are not supported in this case.
+
+      <dt>Complete Large
+      <dd>The library should be able to store large arrays
+	contiguously in the file provided the user knows the final
+	array size a priori.  The array can then be read/written in a
+	single I/O request.  This is accomplished by describing the
+	storage with object header message 0x0005. Compression and
+	striping are not supported in this case.
+
+      <dt>Sparse Large
+      <dd>A large sparse raw data array should be stored in a manner
+	that is space-efficient but one in which any element can still
+	be accessed in a reasonable amount of time. Implementation
+	details are below.
+	
+      <dt>Dynamic Size
+      <dd>One often doesn't have prior knowledge of the size of an
+	array. It would be nice to allow arrays to grow dynamically in
+	any dimension. It might also be nice to allow the array to
+	grow in the negative dimension directions if convenient to
+	implement. Implementation details are below.
+
+      <dt>Subslab Access
+      <dd>Some multi-dimensional arrays are almost always accessed by
+	subslabs. For instance, a 2-d array of pixels might always be
+	accessed as smaller 1k-by-1k 2-d arrays always aligned on 1k
+	index values.  We should be able to store the array in such a
+	way that striding though the entire array is not necessary.
+	Subslab access might also be useful with compression
+	algorithms where each storage slab can be compressed
+	independently of the others. Implementation details are below.
+
+      <dt>Compressed
+      <dd>Various compression algorithms can be applied to the entire
+	array. We're not planning to support separate algorithms (or a
+	single algorithm with separate parameters) for each chunk
+	although it would be possible to implement that in a manner
+	similar to the way striping across files is
+	implemented.
+
+      <dt>Striped Across Files
+      <dd>The array access functions should support arrays stored
+	discontiguously across a set of files.
+    </dl>
+
+    <h1>Implementation of Indexed Storage</h1>
+
+    <p>The Sparse Large, Dynamic Size, and Subslab Access methods
+      share so much code that they can be described with a single
+      message.  The new Indexed Storage Message (<code>0x0008</code>)
+      will replace the old Chunked Object (<code>0x0009</code>) and
+      Sparse Object (<code>0x000A</code>) Messages.
+
+    <p>
+      <center>
+	<table border cellpadding=4 width="60%">
+	  <caption align=bottom>
+	    <b>The Format of the Indexed Storage Message</b>
+	  </caption>
+	  <tr align=center>
+	    <th width="25%">byte</th>
+	    <th width="25%">byte</th>
+	    <th width="25%">byte</th>
+	    <th width="25%">byte</th>
+	  </tr>
+
+	  <tr align=center>
+	    <td colspan=4><br>Address of B-tree<br><br></td>
+	  </tr>
+	  <tr align=center>
+	    <td>Number of Dimensions</td>
+	    <td>Reserved</td>
+	    <td>Reserved</td>
+	    <td>Reserved</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Reserved (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Alignment for Dimension 0 (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Alignment for Dimension 1 (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>...</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Alignment for Dimension N (4 bytes)</td>
+	  </tr>
+	</table>
+      </center>
+
+    <p>The alignment fields indicate the alignment in logical space to
+      use when allocating new storage areas on disk.  For instance,
+      writing every other element of a 100-element one-dimensional
+      array (using one HDF5 I/O partial write operation per element)
+      that has unit storage alignment would result in 50
+      single-element, discontiguous storage segments.  However, using
+      an alignment of 25 would result in only four discontiguous
+      segments.  The size of the message varies with the number of
+      dimensions.
+
+    <p>A B-tree is used to point to the discontiguous portions of
+      storage which has been allocated for the object.  All keys of a
+      particular B-tree are the same size and are a function of the
+      number of dimensions. It is therefore not possible to change the
+      dimensionality of an indexed storage array after its B-tree is
+      created.
+
+    <p>
+      <center>
+	<table border cellpadding=4 width="60%">
+	  <caption align=bottom>
+	    <b>The Format of a B-Tree Key</b>
+	  </caption>
+	  <tr align=center>
+	    <th width="25%">byte</th>
+	    <th width="25%">byte</th>
+	    <th width="25%">byte</th>
+	    <th width="25%">byte</th>
+	  </tr>
+
+	  <tr align=center>
+	    <td colspan=4>External File Number or Zero (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Chunk Offset in Dimension 0 (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Chunk Offset in Dimension 1 (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>...</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Chunk Offset in Dimension N (4 bytes)</td>
+	  </tr>
+	</table>
+      </center>
+
+    <p>The keys within a B-tree obey an ordering based on the chunk
+      offsets.  If the offsets in dimension-0 are equal, then
+      dimension-1 is used, etc. The External File Number field
+      contains a 1-origin offset into the External File List message
+      which contains the name of the external file in which that chunk
+      is stored.
+
+    <h1>Implementation of Striping</h1>
+
+    <p>The indexed storage will support arbitrary striping at the
+      chunk level; each chunk can be stored in any file.  This is
+      accomplished by using the External File Number field of an
+      indexed storage B-tree key as a 1-origin offset into an External
+      File List Message (0x0009) which takes the form:
+
+    <p>
+      <center>
+	<table border cellpadding=4 width="60%">
+	  <caption align=bottom>
+	    <b>The Format of the External File List Message</b>
+	  </caption>
+	  <tr align=center>
+	    <th width="25%">byte</th>
+	    <th width="25%">byte</th>
+	    <th width="25%">byte</th>
+	    <th width="25%">byte</th>
+	  </tr>
+
+	  <tr align=center>
+	    <td colspan=4><br>Name Heap Address<br><br></td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Number of Slots Allocated (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Number of File Names (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Byte Offset of Name 1 in Heap (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>Byte Offset of Name 2 in Heap (4 bytes)</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4>...</td>
+	  </tr>
+	  <tr align=center>
+	    <td colspan=4><br>Unused Slot(s)<br><br></td>
+	  </tr>
+	</table>
+      </center>
+
+    <p>Each indexed storage array that has all or part of its data
+      stored in external files will contain a single external file
+      list message.  The size of the messages is determined when the
+      message is created, but it may be possible to enlarge the
+      message on demand by moving it.  At this time, it's not possible
+      for multiple arrays to share a single external file list
+      message.
+
+    <dl>
+      <dt><code>
+	  H5O_efl_t *H5O_efl_new (H5G_entry_t *object, intn
+	  nslots_hint, intn heap_size_hint)
+	</code>
+      <dd>Adds a new, empty external file list message to an object
+	header and returns a pointer to that message.  The message
+	acts as a cache for file descriptors of external files that
+	are open.
+
+      <p><dt><code>
+	  intn H5O_efl_index (H5O_efl_t *efl, const char *filename)
+	</code>
+      <dd>Gets the external file index number for a particular file name.
+	If the name isn't in the external file list then it's added to
+	the H5O_efl_t struct and immediately written to the object
+	header to which the external file list message belongs. Name
+	comparison is textual.  Each name should be relative to the
+	directory which contains the HDF5 file.
+
+      <p><dt><code>
+	  H5F_low_t *H5O_efl_open (H5O_efl_t *efl, intn index, uintn mode)
+	</code>
+      <dd>Gets a low-level file descriptor for an external file.  The
+	external file list caches file descriptors because we might
+	have many more external files than there are file descriptors
+	available to this process.  The caller should not close this file.
+
+      <p><dt><code>
+	  herr_t H5O_efl_release (H5O_efl_t *efl)
+	</code>
+      <dd>Releases an external file list, closes all files
+	associated with that list, and if the list has been modified
+	since the call to <code>H5O_efl_new</code> flushes the message
+	to disk.
+    </dl>
+
+    <hr>
+    <address><a href="mailto:robb@arborea.spizella.com">Robb Matzke</a></address>
+<!-- Created: Fri Oct  3 09:52:32 EST 1997 -->
+<!-- hhmts start -->
+Last modified: Tue Nov 25 12:36:50 EST 1997
+<!-- hhmts end -->
+  </body>
+</html>
author	Quincey Koziol <koziol@hdfgroup.org>	1998-07-08 14:54:54 (GMT)
committer	Quincey Koziol <koziol@hdfgroup.org>	1998-07-08 14:54:54 (GMT)
commit	bd1e676c521d881b3143829f493a28b5ced1294b (patch)
tree	69c50f9fe21ce87f293d8617a6bd51b4cc1e0244 /doc/html/storage.html
parent	73345095897d9698bb1f2f7df830bf80a56dc65a (diff)
download	hdf5-bd1e676c521d881b3143829f493a28b5ced1294b.zip hdf5-bd1e676c521d881b3143829f493a28b5ced1294b.tar.gz hdf5-bd1e676c521d881b3143829f493a28b5ced1294b.tar.bz2