[svn-r570] *** empty log message ***

author: Robb Matzke <matzke@llnl.gov> 1998-08-05 22:23:51 (GMT)
committer: Robb Matzke <matzke@llnl.gov> 1998-08-05 22:23:51 (GMT)
commit: 32295ad53dd29e799cef007e05ca0368f0c1ca1d (patch)
tree: e041f9bffa6d839d4f3e13afcad4199deb486d3b /doc
parent: 002b1494b79e2fd638a0676745c340a9a9e9d8e7 (diff)
download: hdf5-32295ad53dd29e799cef007e05ca0368f0c1ca1d.zip
hdf5-32295ad53dd29e799cef007e05ca0368f0c1ca1d.tar.gz
hdf5-32295ad53dd29e799cef007e05ca0368f0c1ca1d.tar.bz2
2 files changed, 463 insertions, 409 deletions
diff --git a/doc/html/Compression.html b/doc/html/Compression.html
deleted file mode 100644
index c3a2a45..0000000
--- a/doc/html/Compression.html
+++ /dev/null
@@ -1,409 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
-<html>
-  <head>
-    <title>Compression</title>
-  </head>
-
-  <body>
-    <h1>Compression</h1>
-
-    <h2>1. Introduction</h2>
-
-    <p>HDF5 supports compression of raw data by compression methods
-      built into the library or defined by an application.  A
-      compression method is associated with a dataset when the dataset
-      is created and is applied independently to each storage chunk of
-      the dataset.
-
-      The dataset must use the <code>H5D_CHUNKED</code> storage
-      layout. The library doesn't support compression for contiguous
-      datasets because of the difficulty of implementing random access
-      for partial I/O, and compact dataset compression is not
-      supported because it wouldn't produce significant results.
-      
-    <h2>2. Supported Compression Methods</h2>
-
-    <p>The library identifies compression methods with small
-      integers, with values less than 16 reserved for use by NCSA and
-      values between 16 and 255 (inclusive) available for general
-      use.  This range may be extended in the future if it proves to
-      be too small.
-
-    <p>
-      <center>
-	<table align=center width="80%">
-	  <tr>
-	    <th width="30%">Method Name</th>
-	    <th width="70%">Description</th>
-	  </tr>
-
-	  <tr valign=top>
-	    <td><code>H5Z_NONE</code></td>
-	    <td>The default is to not use compression.  Specifying
-	      <code>H5Z_NONE</code> as the compression method results
-	      in better perfomance than writing a function that just
-	      copies data because the library's I/O pipeline
-	      recognizes this method and is able to short circuit
-	      parts of the pipeline.</td>
-	  </tr>
-
-	  <tr valign=top>
-	    <td><code>H5Z_DEFLATE</code></td>
-	    <td>The <em>deflate</em> method is the algorithm used by
-	      the GNU <code>gzip</code>program.  It's a combination of
-	      a Huffman encoding followed by a 1977 Lempel-Ziv (LZ77)
-	      dictionary encoding.  The aggressiveness of the
-	      compression can be controlled by passing an integer value
-	      to the compressor with <code>H5Pset_deflate()</code>
-	      (see below).  In order for this compression method to be
-	      used, the HDF5 library must be configured and compiled
-	      in the presence of the GNU zlib version 1.1.2 or
-	      later.</td>
-	  </tr>
-
-	  <tr valign=top>
-	    <td><code>H5Z_RES_<em>N</em></code></td>
-	    <td>These compression methods (where <em>N</em> is in the
-	      range two through 15, inclusive) are reserved by NCSA
-	      for future use.</td>
-	  </tr>
-
-	  <tr valign=top>
-	    <td>Values of <em>N</em> between 16 and 255, inclusive</td>
-	    <td>These values can be used to represent application-defined 
-	      compression methods.  We recommend that methods under
-	      testing should be in the high range and when a method is
-	      about to be published it should be given a number near
-	      the low end of the range (or even below 16).  Publishing
-	      the compression method and its numeric ID will make a
-	      file sharable.</td>
-	  </tr>
-	</table>
-      </center>
-
-    <p>Setting the compression for a dataset to a method which was
-      not compiled into the library and/or not registered by the
-      application is allowed, but writing to such a dataset will
-      silently <em>not</em> compress the data.  Reading a compressed
-      dataset for a method which is not available will result in
-      errors (specifically, <code>H5Dread()</code> will return a
-      negative value).  The errors will be displayed in the
-      compression statistics if the library was compiled with
-      debugging turned on for the &quot;z&quot; package.  See the
-      section on diagnostics below for more details.
-
-    <h2>3. Application-Defined Methods</h2>
-
-    <p>Compression methods 16 through 255 can be defined by an
-      application. As mentioned above, methods that have not been
-      released should use high numbers in that range while methods
-      that have been published will be assigned an official number in
-      the low region of the range (possibly less than 16).  Users
-      should be aware that using unpublished compression methods
-      results in unsharable files.
-
-    <p>A compression method has two halves: one have handles
-      compression and the other half handles uncompression.  The
-      halves are implemented as functions
-      <code><em>method</em>_c</code> and
-      <code><em>method</em>_u</code> respectively.  One should not use
-      the names <code>compress</code> or <code>uncompress</code> since
-      they are likely to conflict with other compression libraries
-      (like the GNU zlib).
-
-    <p>Both the <code><em>method</em>_c</code> and
-      <code><em>method</em>_u</code> functions take the same arguments
-      and return the same values.  They are defined with the type:
-
-    <dl>
-      <dt><code>typedef size_t (*H5Z_func_t)(unsigned int
-	  <em>flags</em>, size_t <em>cd_size</em>, const void
-	  *<em>client_data</em>, size_t <em>src_nbytes</em>, const
-	  void *<em>src</em>, size_t <em>dst_nbytes</em>, void
-	  *<em>dst</em>/*out*/)</code>
-      <dd>The <em>flags</em> are an 8-bit vector which is stored in
-	the file and which is defined when the compression method is
-	defined.  The <em>client_data</em> is a pointer to
-	<em>cd_size</em> bytes of configuration data which is also
-	stored in the file.  The function compresses or uncompresses
-	<em>src_nbytes</em> from the source buffer <em>src</em> into
-	at most <em>dst_nbytes</em> of the result buffer <em>dst</em>.
-	The function returns the number of bytes written to the result
-	buffer or zero if an error occurs.  But if a result buffer
-	overrun occurs the function should return a value at least as
-	large as <em>dst_size</em> (the uncompressor will see an
-	overrun only for corrupt data).
-    </dl>
-
-    <p>The application associates the pair of functions with a name
-      and a method number by calling <code>H5Zregister()</code>.  This
-      function can also be used to remove a compression method from
-      the library by supplying null pointers for the functions.
-
-    <dl>
-      <dt><code>herr_t H5Zregister (H5Z_method_t <em>method</em>,
-	  const char *<em>name</em>, H5Z_func_t <em>method_c</em>,
-	  H5Z_func_t <em>method_u</em>)</code>
-      <dd>The pair of functions to be used for compression
-	(<em>method_c</em>) and uncompression (<em>method_u</em>) are
-	associated with a short <em>name</em> used for debugging and a
-	<em>method</em> number in the range 16 through 255.  This
-	function can be called as often as desired for a particular
-	compression method with each call replacing the information
-	stored by the previous call.  Sometimes it's convenient to
-	supply only one half of the compression, for instance in an
-	application that opens files for read-only. Compression
-	statistics for the method are accumulated across calls to this
-	function.
-    </dl>
-
-    <p>
-      <center>
-	<table border align=center width="100%">
-	  <caption align=bottom><h4>Example: Registering an
-	      Application-Defined Compression Method</h4></caption>
-	  <tr>
-	    <td>
-	      <p>Here's a simple-minded &quot;compression&quot; method
-		that just copies the input value to the output.  It's
-		similar to the <code>H5Z_NONE</code> method but
-		slower.  Compression and uncompression are performed
-		by the same function.
-
-	      <p><code><pre>
-size_t
-bogus (unsigned int flags,
-       size_t cd_size, const void *client_data,
-       size_t src_nbytes, const void *src,
-       size_t dst_nbytes, void *dst/*out*/)
-{
-    memcpy (dst, src, src_nbytes);
-    return src_nbytes;
-}
-	      </pre></code>
-
-	      <p>The function could be registered as method 250 as
-		follows:
-
-	      <p><code><pre>
-#define H5Z_BOGUS 250
-H5Zregister (H5Z_BOGUS, "bogus", bogus, bogus);
-	      </pre></code>
-
-	      <p>The function can be unregistered by saying:
-
-	      <p><code><pre>
-H5Zregister (H5Z_BUGUS, "bogus", NULL, NULL);
-	      </pre></code>
-
-	      <p>Notice that we kept the name &quot;bogus&quot; even
-		though we unregistered the functions that perform the
-		compression and uncompression.  This makes compression
-		statistics more understandable when they're printed.
-	    </td>
-	  </tr>
-	</table>
-      </center>
-	
-    <h2>4. Enabling Compression for a Dataset</h2>
-
-    <p>If a dataset is to be compressed then the compression
-      information must be specified when the dataset is created since
-      once a dataset is created compression parameters cannot be
-      adjusted.  The compression is specified through the dataset
-      creation property list (see <code>H5Pcreate()</code>).
-
-    <dl>
-      <dt><code>herr_t H5Pset_deflate (hid_t <em>plist</em>, int
-	  <em>level</em>)</code>
-      <dd>The compression method for dataset creation property list
-	<em>plist</em> is set to <code>H5Z_DEFLATE</code> and the
-	aggression level is set to <em>level</em>.  The <em>level</em>
-	must be a value between one and nine, inclusive, where one
-	indicates no (but fast) compression and nine is aggressive
-	compression.
-
-	<br><br>
-      <dt><code>int H5Pget_deflate (hid_t <em>plist</em>)</code>
-      <dd>If dataset creation property list <em>plist</em> is set to
-	use <code>H5Z_DEFLATE</code> compression then this function
-	will return the aggression level, an integer between one and
-	nine inclusive.  If <em>plist</em> isn't a valid dataset
-	creation property list or it isn't set to use the deflate
-        method then a negative value is returned.
-
-	<br><br>
-      <dt><code>herr_t H5Pset_compression (hid_t <em>plist</em>,
-	  H5Z_method_t <em>method</em>, unsigned int <em>flags</em>,
-	  size_t <em>cd_size</em>, const void *<em>client_data</em>)</code>
-      <dd>This is a catch-all function for defining compresion methods
-	and is intended to be called from a wrapper such as
-	<code>H5Pset_deflate()</code>. The dataset creation property
-	list <em>plist</em> is adjusted to use the specified
-	compression method.  The <em>flags</em> is an 8-bit vector
-	which is stored in the file as part of the compression message
-	and passed to the compress and uncompress functions.  The
-	<em>client_data</em> is a byte array of length
-	<em>cd_size</em> which is copied to the file and passed to the
-	compress and uncompress methods.
-
-	<br><br>
-      <dt><code>H5Z_method_t H5Pget_compression (hid_t <em>plist</em>,
-	  unsigned int *<em>flags</em>, size_t *<em>cd_size</em>, void
-	  *<em>client_data</em>)</code>
-      <dd>This is a catch-all function for querying the compression
-	method associated with dataset creation property list
-	<em>plist</em> and is intended to be called from a wrapper
-	function such as <code>H5Pget_deflate()</code>.  The
-	compression method (or a negative value on error) is returned
-	by value, and compression flags and client data is returned by
-	argument.  The application should allocate the
-	<em>client_data</em> and pass its size as the
-	<em>cd_size</em>.  On return, <em>cd_size</em> will contain
-	the actual size of the client data.  If <em>client_data</em>
-	is not large enough to hold the entire client data then
-	<em>cd_size</em> bytes are copied into <em>client_data</em>
-	and <em>cd_size</em> is set to the total size of the client
-	data, a value larger than the original.
-    </dl>
-
-    <p>It is possible to set the compression to a method which hasn't
-      been defined with <code>H5Zregister()</code> and which isn't
-      supported as a predefined method (for instance, setting the
-      method to <code>H5Z_DEFLATE</code> when the GNU zlib isn't
-      available).  If that happens then data will be written to the
-      file in its uncompressed form and the compression statistics
-      will show failures for the compression.
-
-    <p>
-      <center>
-	<table border align=center width="100%">
-	  <caption align=bottom><h4>Example: Statistics for an
-	      Unsupported Compression Method</h4></caption>
-	  <tr>
-	    <td>
-	      <p>If an application attempts to use an unsupported
-		method then the compression statistics will show large
-		numbers of compression errors and no data
-		uncompressed.
-
-	      <p><code><pre>
-H5Z: compression statistics accumulated over life of library:
-   Method      Total  Overrun  Errors  User  System  Elapsed Bandwidth
-   ------      -----  -------  ------  ----  ------  ------- ---------
-   deflate-c  160000        0  160000  0.00    0.01     0.01 1.884e+07
-   deflate-u       0        0       0  0.00    0.00     0.00       NaN
-	      </pre></code>
-
-	      <p>This example is from a program that tried to use
-		<code>H5Z_DEFLATE</code> on a system that didn't have
-		the GNU zlib to write to a dataset and then read the
-		result.  The read and write both succeeded but the
-		data was not compressed.
-	    </td>
-	  </tr>
-	</table>
-      </center>
-
-    <h2>5. Compression Diagnostics</h2>
-
-    <p>If the library is compiled with debugging turned on for the H5Z
-      layer (usually as a result of <code>configure --enable-debug=z</code>)
-      then statistics about data compression are printed when the
-      application exits normally or the library is closed.  The
-      statistics are written to the standard error stream and include
-      two lines for each compression method that was used:  the first
-      line shows compression statistics while the second shows
-      uncompression statistics.  The following fields are displayed:
-
-    <p>
-      <center>
-	<table align=center width="80%">
-	  <tr>
-	    <th width="30%">Field Name</th>
-	    <th width="70%">Description</th>
-	  </tr>
-
-	  <tr valign=top>
-	    <td>Method</td>
-	    <td>This is the name of the method as defined with
-	      <code>H5Zregister()</code> with the letters
-	      &quot;-c&quot; or &quot;-u&quot; appended to indicate
-	      compression or uncompression.</td>
-	  </tr>
-
-	  <tr valign=top>
-	    <td>Total</td>
-	    <td>The total number of bytes compressed or decompressed
-	      including buffer overruns and errors.  Bytes of
-	      non-compressed data are counted.</td>
-	  </tr>
-
-	  <tr valign=top>
-	    <td>Overrun</td>
-	    <td>During compression, if the algorithm causes the result
-	      to be at least as large as the input then a buffer
-	      overrun error occurs.  This field shows the total number
-	      of bytes from the Total column which can be attributed to
-	      overruns. Overruns for decompression can only happen if
-	      the data has been corrupted in some way and will result
-	      in failure of <code>H5Dread()</code>.</td>
-	  </tr>
-
-	  <tr valign=top>
-	    <td>Errors</td>
-	    <td>If an error occurs during compression the data is
-	      stored in it's uncompressed form; and an error during
-	      uncompression causes <code>H5Dread()</code> to return
-	      failure.  This field shows the number of bytes of the
-	      Total column which can be attributed to errors.</td>
-	  </tr>
-
-	  <tr valign=top>
-	    <td>User, System, Elapsed</td>
-	    <td>These are the amount of user time, system time, and
-	      elapsed time in seconds spent by the library to perform
-	      compression.  Elapsed time is sensitive to system
-	      load. These times may be zero on operating systems that
-	      don't support the required operations.</td>
-	  </tr>
-
-	  <tr valign=top>
-	    <td>Bandwidth</td>
-	    <td>This is the compression bandwidth which is the total
-	      number of bytes divided by elapsed time.  Since elapsed
-	      time is subject to system load the bandwidth numbers
-	      cannot always be trusted.  Furthermore, the bandwidth
-	      includes overrun and error bytes which may significanly
-	      taint the value.</td>
-	  </tr>
-	</table>
-      </center>
-
-    <p>
-      <center>
-	<table border align=center width="100%">
-	  <caption align=bottom><h4>Example: Compression
-	      Statistics</h4></caption>
-	  <tr>
-	    <td>
-	      <p><code><pre>
-H5Z: compression statistics accumulated over life of library:
-   Method      Total  Overrun  Errors  User  System  Elapsed Bandwidth
-   ------      -----  -------  ------  ----  ------  ------- ---------
-   deflate-c  160000      200       0  0.62    0.74     1.33 1.204e+05
-   deflate-u  120000        0       0  0.11    0.00     0.12 9.885e+05
-	      </pre></code>
-	    </td>
-	  </tr>
-	</table>
-      </center>
-
-    <hr>
-    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
-<!-- Created: Fri Apr 17 13:39:35 EDT 1998 -->
-<!-- hhmts start -->
-Last modified: Fri Apr 17 16:15:21 EDT 1998
-<!-- hhmts end -->
-  </body>
-</html>
diff --git a/doc/html/Filters.html b/doc/html/Filters.html
new file mode 100644
index 0000000..b9785c8
--- /dev/null
+++ b/doc/html/Filters.html
@@ -0,0 +1,463 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+  <head>
+    <title>Filters</title>
+  </head>
+
+  <body>
+    <h1>Filters</h1>
+
+    <b>Note: Transient pipelines described in this document have not
+      been implemented.</b>
+
+    <h2>1. Introduction</h2>
+
+    <p>HDF5 allows chunked data to pass through user-defined filters
+      on the way to or from disk.  The filters operate on chunks of an
+      <code>H5D_CHUNKED</code> dataset can be arranged in a pipeline
+      so output of one filter becomes the input of the next filter.
+
+    <p>Each filter has a two-byte identification number (type
+      <code>H5Z_filter_t</code>) allocated by NCSA and can also be
+      passed application-defined integer resources to control its
+      behavior.  Each filter also has an optional ASCII comment
+      string.
+
+    <p>
+      <center>
+	<table align=center width="80%">
+	  <caption alignment=top>
+	    <b>Values for <code>H5Z_filter_t</code></b>
+	  </caption>
+
+	  <tr>
+	    <th width="30%">Value</th>
+	    <th width="70%">Description</th>
+	  </tr>
+
+	  <tr valign=top>
+	    <td><code>0-255</code></td>
+	    <td>These values are reserved for filters predefined and
+	      registered by the HDF5 library and of use to the general 
+	      public.  They are described in a separate section
+	      below.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td><code>256-511</code></td>
+	    <td>Filter numbers in this range are used for testing only 
+	      and can be used temporarily by any organization.  No
+	      attempt is made to resolve numbering conflicts since all 
+	      definitions are by nature temporary.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td><code>512-65535</code></td>
+	    <td>Reserved for future assignment.  Please contact the
+	      <a href="mailto:hdf5dev@ncsa.uiuc.edu">HDF5 development
+	      team</a> to reserve a value or range of values for
+	      use by your filters.</td>
+	</table>
+      </center>
+
+    <h2>2. Defining and Querying the Filter Pipeline</h2>
+
+    <p>Two types of filters can be applied to raw data I/O: permanent
+      filters and transient filters.  The permanent filter pipeline is
+      defned when the dataset is created while the transient pipeline
+      is defined for each I/O operation.  During an
+      <code>H5Dwrite()</code> the transient filters are applied first
+      in the order defined and then the permanent filters are applied
+      in the order defined.  For an <code>H5Dread()</code> the
+      opposite order is used: permanent filters in reverse order, then
+      transient filters in reverse order.  An <code>H5Dread()</code>
+      must result in the same amount of data for a chunk as the
+      original <code>H5Dwrite()</code>.
+
+    <p>The permanent filter pipeline is defined by calling
+      <code>H5Pset_filter()</code> for a dataset creation property
+      list while the transient filter pipeline is defined by calling
+      that function for a dataset transfer property list.
+
+    <dl>
+      <dt><code>herr_t H5Pset_filter (hid_t <em>plist</em>,
+	  H5Z_filter_t <em>filter</em>, unsigned int <em>flags</em>,
+	  size_t <em>cd_nelmts</em>, const unsigned int
+	  <em>cd_values</em>[])</code>
+      <dd>This function adds the specified <em>filter</em> and
+	corresponding properties to the end of the transient or
+	permanent output filter pipeline (depending on whether
+	<em>plist</em> is a dataset creation or dataset transfer
+	property list).  The <em>flags</em> argument specifies certain
+	general properties of the filter and is documented below. The
+	<em>cd_values</em> is an array of <em>cd_nelmts</em> integers
+	which are auxiliary data for the filter.  The integer values
+	will be stored in the dataset object header as part of the
+	filter information.
+
+	<br><br>
+      <dt><code>int H5Pget_nfilters (hid_t <em>plist</em>)</code>
+      <dd>This function returns the number of filters defined in the
+	permanent or transient filter pipeline depending on whether
+	<em>plist</em> is a dataset creation or dataset transfer
+	property list.  In each pipeline the filters are numbered from
+	0 through <em>N</em>-1 where <em>N</em> is the value returned
+	by this function. During output to the file the filters of a
+	pipeline are applied in increasing order (the inverse is true
+	for input).  Zero is returned if there are no filters in the
+	pipeline and a negative value is returned for errors.
+
+	<br><br>
+      <dt><code>H5Z_filter_t H5Pget_filter (hid_t <em>plist</em>,
+	  int <em>filter_number</em>, unsigned int *<em>flags</em>,
+	  size_t *<em>cd_nelmts</em>, unsigned int
+	  *<em>cd_values</em>)</code>
+      <dd>This is the query counterpart of
+	<code>H5Pset_filter()</code> and returns information about a
+	particular filter number in a permanent or transient pipeline
+	depending on whether <em>plist</em> is a dataset creation or
+	dataset transfer property list.  On input, <em>cd_nelmts</em>
+	indicates the number of entries in the <em>cd_values</em>
+	array allocated by the caller while on exit it contains the
+	number of values defined by the filter.  The
+	<em>filter_number</em> should be a value between zero and
+	<em>N</em>-1 as described for <code>H5Pget_nfilters()</code>
+	and the function will return failure (a negative value) if the
+	filter number is out of range.
+    </dl>
+
+    <p>The flags argument to the functions above is a bit vector of
+      the following fields:
+
+    <p>
+      <center>
+	<table align=center width="80%">
+	  <caption align=top>
+	    <b>Values for the <em>flags</em> argument</b>
+	  </caption>
+
+	  <tr>
+	    <th width="30%">Value</th>
+	    <th width="70%">Description</th>
+	  </tr>
+
+	  <tr valign=top>
+	    <td><code>H5Z_FLAG_OPTIONAL</code></td>
+	    <td>If this bit is set then the filter is optional.  If
+	      the filter fails (see below) during an
+	      <code>H5Dwrite()</code> operation then the filter is
+	      just excluded from the pipeline for the chunk for which
+	      it failed; the filter will not participate in the
+	      pipeline during an <code>H5Dread()</code> of the chunk.
+	      This is commonly used for compression filters: if the
+	      compression result would be larger than the input then
+	      the compression filter returns failure and the
+	      uncompressed data is stored in the file.  If this bit is
+	      clear and a filter fails then the
+	      <code>H5Dwrite()</code> or <code>H5Dread()</code> also
+	      fails.</td>
+	  </tr>
+	</table>
+      </center>
+
+    <h2>3. Defining Filters</h2>
+
+    <p>Each filter is bidirectional, handling both input and output to 
+      the file, and a flag is passed to the filter to indicate the
+      direction.  In either case the filter reads a chunk of data from 
+      a buffer, usually performs some sort of transformation on the
+      data, places the result in the same or new buffer, and returns
+      the buffer pointer and size to the caller. If something goes
+      wrong the filter should return zero to indicate a failure.
+
+    <p>During output, a filter that fails or isn't defined and is
+      marked as optional is silently excluded from the pipeline and
+      will not be used when reading that chunk of data.  A required
+      filter that fails or isn't defined causes the entire output
+      operation to fail. During input, any filter that has not been
+      excluded from the pipeline during output and fails or is not
+      defined will cause the entire input operation to fail.
+
+    <p>Filters are defined in two phases.  The first phase is to
+      define a function to act as the filter and link the function
+      into the application.  The second phase is to register the
+      function, associating the function with an
+      <code>H5Z_filter_t</code> identification number and a comment.
+
+    <dl>
+      <dt><code>typedef size_t (*H5Z_func_t)(unsigned int
+	  <em>flags</em>, size_t <em>cd_nelmts</em>, unsigned int
+	  *<em>cd_values</em>, size_t <em>nbytes</em>, size_t
+	  *<em>buf_size</em>, void **<em>buf</em>)</code>
+      <dd>The <em>flags</em>, <em>cd_nelmts</em>, and
+	<em>cd_values</em> are the same as for the
+	<code>H5Pset_filter()</code> function with the additional flag
+	<code>H5Z_FLAG_REVERSE</code> which is set when the filter is
+	called as part of the input pipeline. The input buffer is
+	pointed to by <em>*buf</em> and has a total size of
+	<em>*buf_size</em> bytes but only <em>nbytes</em> are valid
+	data. The filter should perform the transformation in place if
+	possible and return the number of valid bytes or zero for
+	failure.  If the transformation cannot be done in place then
+	the filter should allocate a new buffer with
+	<code>malloc()</code> and assign it to <em>*buf</em>,
+	assigning the allocated size of that buffer to
+	<em>*buf_size</em>. The old buffer should be freed
+	by calling <code>free()</code>.
+
+	<br><br>
+      <dt><code>herr_t H5Zregister (H5Z_filter_t <em>filter_id</em>,
+	  const char *<em>comment</em>, H5Z_func_t
+	  <em>filter</em>)</code>
+      <dd>The <em>filter</em> function is associated with a filter
+	number and a short ASCII comment which will be stored in the
+	hdf5 file if the filter is used as part of a permanent
+	pipeline during dataset creation.
+    </dl>
+    
+      
+    <h2>4. Predefined Filters</h2>
+
+    <p>If GNU <code>zlib</code> version 1.1.2 or later was found
+      during configuration then the library will define a filter whose
+      <code>H5Z_filter_t</code> number is
+      <code>H5Z_FILTER_DEFLATE</code>. Since this compression method
+      has the potential for generating compressed data which is larger
+      than the original, the <code>H5Z_FLAG_OPTIONAL</code> flag
+      should be turned on so such cases can be handled gracefully by
+      storing the original data instead of the compressed data.  The
+      <em>cd_nvalues</em> should be one with <em>cd_value[0]</em>
+      being a compression agression level between zero and nine,
+      inclusive (zero is the fastest compression while nine results in
+      the best compression ratio).
+
+    <p>A convenience function for adding the
+      <code>H5Z_FILTER_DEFLATE</code> filter to a pipeline is:
+
+    <dl>
+      <dt><code>herr_t H5Pset_deflate (hid_t <em>plist</em>, unsigned
+	  <em>aggression</em>)</code>
+      <dd>The deflate compression method is added to the end of the
+	permanent or transient filter pipeline depending on whether
+	<em>plist</em> is a dataset creation or dataset transfer
+	property list. The <em>aggression</em> is a number between
+	zero and nine (inclusive) to indicate the tradeoff between
+	speed and compression ratio (zero is fastest, nine is best
+	ratio).
+    </dl>
+
+    <p>Even if the GNU <code>zlib</code> isn't detected during
+      configuration the application can define
+      <code>H5Z_FILTER_DEFLATE</code> as a permanent filter.  If the
+      filter is marked as optional (as with
+      <code>H5Pset_deflate()</code>) then it will always fail and be
+      automatically removed from the pipeline.  Applications that read
+      data will fail only if the data is actually compressed; they
+      won't fail if <code>H5Z_FILTER_DEFLATE</code> was part of the
+      permanent output pipeline but was automatically excluded because
+      it didn't exist when the data was written.
+
+    <h2>5. Example</h2>
+
+    <p>This example shows how to define and register a simple filter
+      that adds a checksum capability to the data stream.
+
+    <p>The function that acts as the filter always returns zero
+      (failure) if the <code>md5()</code> function was not detected at 
+      configuration time (left as an excercise for the reader).
+      Otherwise the function is broken down to an input and output
+      half.  The output half calculates a checksum, increases the size 
+      of the output buffer if necessary, and appends the checksum to
+      the end of the buffer.  The input half calculates the checksum
+      on the first part of the buffer and compares it to the checksum
+      already stored at the end of the buffer.  If the two differ then 
+      zero (failure) is returned, otherwise the buffer size is reduced 
+      to exclude the checksum.
+
+    <p>
+      <center>
+	<table border align=center width="100%">
+	  <tr>
+	    <td>
+	      <p><code><pre>
+
+size_t
+md5_filter(unsigned int flags, size_t cd_nelmts, unsigned int *cd_values,
+           size_t nbytes, size_t *buf_size, void **buf)
+{
+#ifdef HAVE_MD5
+    unsigned char       cksum[16];
+
+    if (flags & H5Z_REVERSE) {
+        /* Input */
+        assert(nbytes>=16);
+        md5(nbytes-16, *buf, cksum);
+
+        /* Compare */
+        if (memcmp(cksum, (char*)(*buf)+nbytes-16, 16)) {
+            return 0; /*fail*/
+        }
+
+        /* Strip off checksum */
+        return nbytes-16;
+            
+    } else {
+        /* Output */
+        md5(nbytes, *buf, cksum);
+
+        /* Increase buffer size if necessary */
+        if (nbytes+16>*buf_size) {
+            *buf_size = nbytes + 16;
+            *buf = realloc(*buf, *buf_size);
+        }
+
+        /* Append checksum */
+        memcpy((char*)(*buf)+nbytes, cksum, 16);
+        return nbytes+16;
+    }
+#else
+    return 0; /*fail*/
+#endif
+}
+	      </pre></code>
+	    </td>
+	  </tr>
+	</table>
+      </center>
+
+    <p>Once the filter function is defined it must be registered so
+      the HDF5 library knows about it.  Since we're testing this
+      filter we choose one of the <code>H5Z_filter_t</code> numbers
+      from the reserved range.  We'll randomly choose 305.
+
+    <p>
+      <center>
+	<table border align=center width="100%">
+	  <tr>
+	    <td>
+	      <p><code><pre>
+
+#define FILTER_MD5 305
+herr_t status = H5Zregister(FILTER_MD5, "md5 checksum", md5_filter);
+	      </pre></code>
+	    </td>
+	  </tr>
+	</table>
+      </center>
+
+    <p>Now we can use the filter in a pipeline.  We could have added
+      the filter to the pipeline before defining or registering the
+      filter as long as the filter was defined and registered by time
+      we tried to use it (if the filter is marked as optional then we
+      could have used it without defining it and the library would
+      have automatically removed it from the pipeline for each chunk
+      written before the filter was defined and registered).
+
+    <p>
+      <center>
+	<table border align=center width="100%">
+	  <tr>
+	    <td>
+	      <p><code><pre>
+
+hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE);
+hsize_t chunk_size[3] = {10,10,10};
+H5Pset_chunk(dcpl, 3, chunk_size);
+H5Pset_filter(dcpl, FILTER_MD5, 0, 0, NULL);
+hid_t dset = H5Dcreate(file, "dset", H5T_NATIVE_DOUBLE, space, dcpl);
+	      </pre></code>
+	    </td>
+	  </tr>
+	</table>
+      </center>
+
+    <h2>6. Filter Diagnostics</h2>
+
+    <p>If the library is compiled with debugging turned on for the H5Z
+      layer (usually as a result of <code>configure
+      --enable-debug=z</code>) then filter statistics are printed when
+      the application exits normally or the library is closed.  The
+      statistics are written to the standard error stream and include
+      two lines for each filter that was used: one for input and one
+      for output.  The following fields are displayed:
+
+    <p>
+      <center>
+	<table align=center width="80%">
+	  <tr>
+	    <th width="30%">Field Name</th>
+	    <th width="70%">Description</th>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Method</td>
+	    <td>This is the name of the method as defined with
+	      <code>H5Zregister()</code> with the charaters
+	      &quot;&lt; or &quot;&gt;&quot; prepended to indicate
+	      input or output.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Total</td>
+	    <td>The total number of bytes processed by the filter
+	      including errors.  This is the maximum of the
+	      <em>nbytes</em> argument or the return value.
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Errors</td>
+	    <td>This field shows the number of bytes of the Total
+	      column which can be attributed to errors.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>User, System, Elapsed</td>
+	    <td>These are the amount of user time, system time, and
+	      elapsed time in seconds spent in the filter function.
+	      Elapsed time is sensitive to system load. These times
+	      may be zero on operating systems that don't support the
+	      required operations.</td>
+	  </tr>
+
+	  <tr valign=top>
+	    <td>Bandwidth</td>
+	    <td>This is the filter bandwidth which is the total
+	      number of bytes processed divided by elapsed time.
+	      Since elapsed time is subject to system load the
+	      bandwidth numbers cannot always be trusted.
+	      Furthermore, the bandwidth includes bytes attributed to
+	      errors which may significanly taint the value if the
+	      function is able to detect errors without much
+	      expense.</td>
+	  </tr>
+	</table>
+      </center>
+
+    <p>
+      <center>
+	<table border align=center width="100%">
+	  <caption align=bottom>
+	    <b>Example: Filter Statistics</b>
+	  </caption>
+	  <tr>
+	    <td>
+	      <p><code><pre>
+H5Z: filter statistics accumulated over life of library:
+   Method     Total  Errors  User  System  Elapsed Bandwidth
+   ------     -----  ------  ----  ------  ------- ---------
+   >deflate  160000   40000  0.62    0.74     1.33 117.5 kBs
+   &lt;deflate  120000       0  0.11    0.00     0.12 1.000 MBs
+	      </pre></code>
+	    </td>
+	  </tr>
+	</table>
+      </center>
+
+    <hr>
+    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Fri Apr 17 13:39:35 EDT 1998 -->
+<!-- hhmts start -->
+Last modified: Tue Aug  4 16:04:43 EDT 1998
+<!-- hhmts end -->
+  </body>
+</html>
author	Robb Matzke <matzke@llnl.gov>	1998-08-05 22:23:51 (GMT)
committer	Robb Matzke <matzke@llnl.gov>	1998-08-05 22:23:51 (GMT)
commit	32295ad53dd29e799cef007e05ca0368f0c1ca1d (patch)
tree	e041f9bffa6d839d4f3e13afcad4199deb486d3b /doc
parent	002b1494b79e2fd638a0676745c340a9a9e9d8e7 (diff)
download	hdf5-32295ad53dd29e799cef007e05ca0368f0c1ca1d.zip hdf5-32295ad53dd29e799cef007e05ca0368f0c1ca1d.tar.gz hdf5-32295ad53dd29e799cef007e05ca0368f0c1ca1d.tar.bz2