1 files changed, 409 insertions, 0 deletions
diff --git a/doc/html/Compression.html b/doc/html/Compression.html
new file mode 100644
index 0000000..c3a2a45
--- /dev/null
+++ b/doc/html/Compression.html
@@ -0,0 +1,409 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+  <head>
+    <title>Compression</title>
+  </head>
+
+  <body>
+    <h1>Compression</h1>
+
+    <h2>1. Introduction</h2>
+
+    <p>HDF5 supports compression of raw data by compression methods
+      built into the library or defined by an application.  A
+      compression method is associated with a dataset when the dataset
+      is created and is applied independently to each storage chunk of
+      the dataset.
+
+      The dataset must use the <code>H5D_CHUNKED</code> storage
+      layout. The library doesn't support compression for contiguous
+      datasets because of the difficulty of implementing random access
+      for partial I/O, and compact dataset compression is not
+      supported because it wouldn't produce significant results.
+      
+    <h2>2. Supported Compression Methods</h2>
+
+    <p>The library identifies compression methods with small
+      integers, with values less than 16 reserved for use by NCSA and
+      values between 16 and 255 (inclusive) available for general
+      use.  This range may be extended in the future if it proves to
+      be too small.
+
+    <p>
+      <center>
+	<table align=center width="80%">
+	  <tr>
+	    <th width="30%">Method Name</th>
+	    <th width="70%">Description</th>
+	  </tr>
+
+	  <tr valign=top>
+	    <td><code>H5Z_NONE</code></td>
+	    <td>The default is to not use compression.  Specifying
+	      <code>H5Z_NONE</code> as the compression method results
+	      in better perfomance than writing a function that just
+	      copies data because the library's I/O pipeline
+	      recognizes this method and is able to short circuit
+	      parts of the pipeline.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td><code>H5Z_DEFLATE</code></td>
+	    <td>The <em>deflate</em> method is the algorithm used by
+	      the GNU <code>gzip</code>program.  It's a combination of
+	      a Huffman encoding followed by a 1977 Lempel-Ziv (LZ77)
+	      dictionary encoding.  The aggressiveness of the
+	      compression can be controlled by passing an integer value
+	      to the compressor with <code>H5Pset_deflate()</code>
+	      (see below).  In order for this compression method to be
+	      used, the HDF5 library must be configured and compiled
+	      in the presence of the GNU zlib version 1.1.2 or
+	      later.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td><code>H5Z_RES_<em>N</em></code></td>
+	    <td>These compression methods (where <em>N</em> is in the
+	      range two through 15, inclusive) are reserved by NCSA
+	      for future use.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Values of <em>N</em> between 16 and 255, inclusive</td>
+	    <td>These values can be used to represent application-defined 
+	      compression methods.  We recommend that methods under
+	      testing should be in the high range and when a method is
+	      about to be published it should be given a number near
+	      the low end of the range (or even below 16).  Publishing
+	      the compression method and its numeric ID will make a
+	      file sharable.</td>
+	  </tr>
+	</table>
+      </center>
+
+    <p>Setting the compression for a dataset to a method which was
+      not compiled into the library and/or not registered by the
+      application is allowed, but writing to such a dataset will
+      silently <em>not</em> compress the data.  Reading a compressed
+      dataset for a method which is not available will result in
+      errors (specifically, <code>H5Dread()</code> will return a
+      negative value).  The errors will be displayed in the
+      compression statistics if the library was compiled with
+      debugging turned on for the &quot;z&quot; package.  See the
+      section on diagnostics below for more details.
+
+    <h2>3. Application-Defined Methods</h2>
+
+    <p>Compression methods 16 through 255 can be defined by an
+      application. As mentioned above, methods that have not been
+      released should use high numbers in that range while methods
+      that have been published will be assigned an official number in
+      the low region of the range (possibly less than 16).  Users
+      should be aware that using unpublished compression methods
+      results in unsharable files.
+
+    <p>A compression method has two halves: one have handles
+      compression and the other half handles uncompression.  The
+      halves are implemented as functions
+      <code><em>method</em>_c</code> and
+      <code><em>method</em>_u</code> respectively.  One should not use
+      the names <code>compress</code> or <code>uncompress</code> since
+      they are likely to conflict with other compression libraries
+      (like the GNU zlib).
+
+    <p>Both the <code><em>method</em>_c</code> and
+      <code><em>method</em>_u</code> functions take the same arguments
+      and return the same values.  They are defined with the type:
+
+    <dl>
+      <dt><code>typedef size_t (*H5Z_func_t)(unsigned int
+	  <em>flags</em>, size_t <em>cd_size</em>, const void
+	  *<em>client_data</em>, size_t <em>src_nbytes</em>, const
+	  void *<em>src</em>, size_t <em>dst_nbytes</em>, void
+	  *<em>dst</em>/*out*/)</code>
+      <dd>The <em>flags</em> are an 8-bit vector which is stored in
+	the file and which is defined when the compression method is
+	defined.  The <em>client_data</em> is a pointer to
+	<em>cd_size</em> bytes of configuration data which is also
+	stored in the file.  The function compresses or uncompresses
+	<em>src_nbytes</em> from the source buffer <em>src</em> into
+	at most <em>dst_nbytes</em> of the result buffer <em>dst</em>.
+	The function returns the number of bytes written to the result
+	buffer or zero if an error occurs.  But if a result buffer
+	overrun occurs the function should return a value at least as
+	large as <em>dst_size</em> (the uncompressor will see an
+	overrun only for corrupt data).
+    </dl>
+
+    <p>The application associates the pair of functions with a name
+      and a method number by calling <code>H5Zregister()</code>.  This
+      function can also be used to remove a compression method from
+      the library by supplying null pointers for the functions.
+
+    <dl>
+      <dt><code>herr_t H5Zregister (H5Z_method_t <em>method</em>,
+	  const char *<em>name</em>, H5Z_func_t <em>method_c</em>,
+	  H5Z_func_t <em>method_u</em>)</code>
+      <dd>The pair of functions to be used for compression
+	(<em>method_c</em>) and uncompression (<em>method_u</em>) are
+	associated with a short <em>name</em> used for debugging and a
+	<em>method</em> number in the range 16 through 255.  This
+	function can be called as often as desired for a particular
+	compression method with each call replacing the information
+	stored by the previous call.  Sometimes it's convenient to
+	supply only one half of the compression, for instance in an
+	application that opens files for read-only. Compression
+	statistics for the method are accumulated across calls to this
+	function.
+    </dl>
+
+    <p>
+      <center>
+	<table border align=center width="100%">
+	  <caption align=bottom><h4>Example: Registering an
+	      Application-Defined Compression Method</h4></caption>
+	  <tr>
+	    <td>
+	      <p>Here's a simple-minded &quot;compression&quot; method
+		that just copies the input value to the output.  It's
+		similar to the <code>H5Z_NONE</code> method but
+		slower.  Compression and uncompression are performed
+		by the same function.
+
+	      <p><code><pre>
+size_t
+bogus (unsigned int flags,
+       size_t cd_size, const void *client_data,
+       size_t src_nbytes, const void *src,
+       size_t dst_nbytes, void *dst/*out*/)
+{
+    memcpy (dst, src, src_nbytes);
+    return src_nbytes;
+}
+	      </pre></code>
+
+	      <p>The function could be registered as method 250 as
+		follows:
+
+	      <p><code><pre>
+#define H5Z_BOGUS 250
+H5Zregister (H5Z_BOGUS, "bogus", bogus, bogus);
+	      </pre></code>
+
+	      <p>The function can be unregistered by saying:
+
+	      <p><code><pre>
+H5Zregister (H5Z_BUGUS, "bogus", NULL, NULL);
+	      </pre></code>
+
+	      <p>Notice that we kept the name &quot;bogus&quot; even
+		though we unregistered the functions that perform the
+		compression and uncompression.  This makes compression
+		statistics more understandable when they're printed.
+	    </td>
+	  </tr>
+	</table>
+      </center>
+	
+    <h2>4. Enabling Compression for a Dataset</h2>
+
+    <p>If a dataset is to be compressed then the compression
+      information must be specified when the dataset is created since
+      once a dataset is created compression parameters cannot be
+      adjusted.  The compression is specified through the dataset
+      creation property list (see <code>H5Pcreate()</code>).
+
+    <dl>
+      <dt><code>herr_t H5Pset_deflate (hid_t <em>plist</em>, int
+	  <em>level</em>)</code>
+      <dd>The compression method for dataset creation property list
+	<em>plist</em> is set to <code>H5Z_DEFLATE</code> and the
+	aggression level is set to <em>level</em>.  The <em>level</em>
+	must be a value between one and nine, inclusive, where one
+	indicates no (but fast) compression and nine is aggressive
+	compression.
+
+	<br><br>
+      <dt><code>int H5Pget_deflate (hid_t <em>plist</em>)</code>
+      <dd>If dataset creation property list <em>plist</em> is set to
+	use <code>H5Z_DEFLATE</code> compression then this function
+	will return the aggression level, an integer between one and
+	nine inclusive.  If <em>plist</em> isn't a valid dataset
+	creation property list or it isn't set to use the deflate
+        method then a negative value is returned.
+
+	<br><br>
+      <dt><code>herr_t H5Pset_compression (hid_t <em>plist</em>,
+	  H5Z_method_t <em>method</em>, unsigned int <em>flags</em>,
+	  size_t <em>cd_size</em>, const void *<em>client_data</em>)</code>
+      <dd>This is a catch-all function for defining compresion methods
+	and is intended to be called from a wrapper such as
+	<code>H5Pset_deflate()</code>. The dataset creation property
+	list <em>plist</em> is adjusted to use the specified
+	compression method.  The <em>flags</em> is an 8-bit vector
+	which is stored in the file as part of the compression message
+	and passed to the compress and uncompress functions.  The
+	<em>client_data</em> is a byte array of length
+	<em>cd_size</em> which is copied to the file and passed to the
+	compress and uncompress methods.
+
+	<br><br>
+      <dt><code>H5Z_method_t H5Pget_compression (hid_t <em>plist</em>,
+	  unsigned int *<em>flags</em>, size_t *<em>cd_size</em>, void
+	  *<em>client_data</em>)</code>
+      <dd>This is a catch-all function for querying the compression
+	method associated with dataset creation property list
+	<em>plist</em> and is intended to be called from a wrapper
+	function such as <code>H5Pget_deflate()</code>.  The
+	compression method (or a negative value on error) is returned
+	by value, and compression flags and client data is returned by
+	argument.  The application should allocate the
+	<em>client_data</em> and pass its size as the
+	<em>cd_size</em>.  On return, <em>cd_size</em> will contain
+	the actual size of the client data.  If <em>client_data</em>
+	is not large enough to hold the entire client data then
+	<em>cd_size</em> bytes are copied into <em>client_data</em>
+	and <em>cd_size</em> is set to the total size of the client
+	data, a value larger than the original.
+    </dl>
+
+    <p>It is possible to set the compression to a method which hasn't
+      been defined with <code>H5Zregister()</code> and which isn't
+      supported as a predefined method (for instance, setting the
+      method to <code>H5Z_DEFLATE</code> when the GNU zlib isn't
+      available).  If that happens then data will be written to the
+      file in its uncompressed form and the compression statistics
+      will show failures for the compression.
+
+    <p>
+      <center>
+	<table border align=center width="100%">
+	  <caption align=bottom><h4>Example: Statistics for an
+	      Unsupported Compression Method</h4></caption>
+	  <tr>
+	    <td>
+	      <p>If an application attempts to use an unsupported
+		method then the compression statistics will show large
+		numbers of compression errors and no data
+		uncompressed.
+
+	      <p><code><pre>
+H5Z: compression statistics accumulated over life of library:
+   Method      Total  Overrun  Errors  User  System  Elapsed Bandwidth
+   ------      -----  -------  ------  ----  ------  ------- ---------
+   deflate-c  160000        0  160000  0.00    0.01     0.01 1.884e+07
+   deflate-u       0        0       0  0.00    0.00     0.00       NaN
+	      </pre></code>
+
+	      <p>This example is from a program that tried to use
+		<code>H5Z_DEFLATE</code> on a system that didn't have
+		the GNU zlib to write to a dataset and then read the
+		result.  The read and write both succeeded but the
+		data was not compressed.
+	    </td>
+	  </tr>
+	</table>
+      </center>
+
+    <h2>5. Compression Diagnostics</h2>
+
+    <p>If the library is compiled with debugging turned on for the H5Z
+      layer (usually as a result of <code>configure --enable-debug=z</code>)
+      then statistics about data compression are printed when the
+      application exits normally or the library is closed.  The
+      statistics are written to the standard error stream and include
+      two lines for each compression method that was used:  the first
+      line shows compression statistics while the second shows
+      uncompression statistics.  The following fields are displayed:
+
+    <p>
+      <center>
+	<table align=center width="80%">
+	  <tr>
+	    <th width="30%">Field Name</th>
+	    <th width="70%">Description</th>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Method</td>
+	    <td>This is the name of the method as defined with
+	      <code>H5Zregister()</code> with the letters
+	      &quot;-c&quot; or &quot;-u&quot; appended to indicate
+	      compression or uncompression.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Total</td>
+	    <td>The total number of bytes compressed or decompressed
+	      including buffer overruns and errors.  Bytes of
+	      non-compressed data are counted.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Overrun</td>
+	    <td>During compression, if the algorithm causes the result
+	      to be at least as large as the input then a buffer
+	      overrun error occurs.  This field shows the total number
+	      of bytes from the Total column which can be attributed to
+	      overruns. Overruns for decompression can only happen if
+	      the data has been corrupted in some way and will result
+	      in failure of <code>H5Dread()</code>.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Errors</td>
+	    <td>If an error occurs during compression the data is
+	      stored in it's uncompressed form; and an error during
+	      uncompression causes <code>H5Dread()</code> to return
+	      failure.  This field shows the number of bytes of the
+	      Total column which can be attributed to errors.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>User, System, Elapsed</td>
+	    <td>These are the amount of user time, system time, and
+	      elapsed time in seconds spent by the library to perform
+	      compression.  Elapsed time is sensitive to system
+	      load. These times may be zero on operating systems that
+	      don't support the required operations.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Bandwidth</td>
+	    <td>This is the compression bandwidth which is the total
+	      number of bytes divided by elapsed time.  Since elapsed
+	      time is subject to system load the bandwidth numbers
+	      cannot always be trusted.  Furthermore, the bandwidth
+	      includes overrun and error bytes which may significanly
+	      taint the value.</td>
+	  </tr>
+	</table>
+      </center>
+
+    <p>
+      <center>
+	<table border align=center width="100%">
+	  <caption align=bottom><h4>Example: Compression
+	      Statistics</h4></caption>
+	  <tr>
+	    <td>
+	      <p><code><pre>
+H5Z: compression statistics accumulated over life of library:
+   Method      Total  Overrun  Errors  User  System  Elapsed Bandwidth
+   ------      -----  -------  ------  ----  ------  ------- ---------
+   deflate-c  160000      200       0  0.62    0.74     1.33 1.204e+05
+   deflate-u  120000        0       0  0.11    0.00     0.12 9.885e+05
+	      </pre></code>
+	    </td>
+	  </tr>
+	</table>
+      </center>
+
+    <hr>
+    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Fri Apr 17 13:39:35 EDT 1998 -->
+<!-- hhmts start -->
+Last modified: Fri Apr 17 16:15:21 EDT 1998
+<!-- hhmts end -->
+  </body>
+</html>