1.10 Merge doxygen changes from develop (#1543)

* Merge doxygen changes from develop * Remove 1.12 section * Minor comment cleanup
author: Allen Byrne <50328838+byrnHDF@users.noreply.github.com> 2022-03-30 17:34:48 (GMT)
committer: GitHub <noreply@github.com> 2022-03-30 17:34:48 (GMT)
commit: fac4cd9e667e3ec6e62e7975fbdb53ff0d70cdc6 (patch)
tree: 73d047b9cf257c85a708eb813c12f3071ed6d67a /doxygen/examples/Filters.html
parent: d4f151fac40724b0425e697b1c3613b5d86cfa7b (diff)
download: hdf5-fac4cd9e667e3ec6e62e7975fbdb53ff0d70cdc6.zip
hdf5-fac4cd9e667e3ec6e62e7975fbdb53ff0d70cdc6.tar.gz
hdf5-fac4cd9e667e3ec6e62e7975fbdb53ff0d70cdc6.tar.bz2
1 files changed, 450 insertions, 0 deletions
diff --git a/doxygen/examples/Filters.html b/doxygen/examples/Filters.html
new file mode 100644
index 0000000..7054a3b
--- /dev/null
+++ b/doxygen/examples/Filters.html
@@ -0,0 +1,450 @@
+<html>
+  <head>
+    <title>Filters</title>
+    <h1>Filters in HDF5</h1>
+
+    <b>Note: Transient pipelines described in this document have not
+      been implemented.</b>
+
+    <h2>Introduction</h2>
+
+    <p>HDF5 allows chunked data to pass through user-defined filters
+      on the way to or from disk.  The filters operate on chunks of an
+      <code>H5D_CHUNKED</code> dataset can be arranged in a pipeline
+      so output of one filter becomes the input of the next filter.
+
+    </p><p>Each filter has a two-byte identification number (type
+      <code>H5Z_filter_t</code>) allocated by The HDF Group and can also be
+      passed application-defined integer resources to control its
+      behavior.  Each filter also has an optional ASCII comment
+      string.
+
+    </p>
+    <table>
+	  <tbody><tr>
+	      <th>Values for <code>H5Z_filter_t</code></th>
+	      <th>Description</th>
+	    </tr>
+
+	    <tr valign="top">
+	      <td><code>0-255</code></td>
+	      <td>These values are reserved for filters predefined and
+	        registered by the HDF5 library and of use to the general
+	        public.  They are described in a separate section
+	        below.</td>
+	    </tr>
+
+	    <tr valign="top">
+	      <td><code>256-511</code></td>
+	      <td>Filter numbers in this range are used for testing only
+	        and can be used temporarily by any organization.  No
+	        attempt is made to resolve numbering conflicts since all
+	        definitions are by nature temporary.</td>
+	    </tr>
+
+	    <tr valign="top">
+	      <td><code>512-65535</code></td>
+	      <td>Reserved for future assignment. Please contact the
+	        <a href="mailto:help@hdfgroup.org">HDF5 development team</a>
+            to reserve a value or range of values for
+	        use by your filters.</td>
+	</tr></tbody></table>
+
+    <h2>Defining and Querying the Filter Pipeline</h2>
+
+    <p>Two types of filters can be applied to raw data I/O: permanent
+      filters and transient filters.  The permanent filter pipeline is
+      defined when the dataset is created while the transient pipeline
+      is defined for each I/O operation.  During an
+      <code>H5Dwrite()</code> the transient filters are applied first
+      in the order defined and then the permanent filters are applied
+      in the order defined.  For an <code>H5Dread()</code> the
+      opposite order is used: permanent filters in reverse order, then
+      transient filters in reverse order.  An <code>H5Dread()</code>
+      must result in the same amount of data for a chunk as the
+      original <code>H5Dwrite()</code>.
+
+    </p><p>The permanent filter pipeline is defined by calling
+      <code>H5Pset_filter()</code> for a dataset creation property
+      list while the transient filter pipeline is defined by calling
+      that function for a dataset transfer property list.
+
+    </p><dl>
+      <dt><code>herr_t H5Pset_filter (hid_t <em>plist</em>,
+	      H5Z_filter_t <em>filter</em>, unsigned int <em>flags</em>,
+	      size_t <em>cd_nelmts</em>, const unsigned int
+	      <em>cd_values</em>[])</code>
+      </dt><dd>This function adds the specified <em>filter</em> and
+	    corresponding properties to the end of the transient or
+	    permanent output filter pipeline (depending on whether
+	    <em>plist</em> is a dataset creation or dataset transfer
+	    property list).  The <em>flags</em> argument specifies certain
+	    general properties of the filter and is documented below. The
+	    <em>cd_values</em> is an array of <em>cd_nelmts</em> integers
+	    which are auxiliary data for the filter.  The integer values
+	    will be stored in the dataset object header as part of the
+	    filter information.
+      </dd><dt><code>int H5Pget_nfilters (hid_t <em>plist</em>)</code>
+      </dt><dd>This function returns the number of filters defined in the
+	    permanent or transient filter pipeline depending on whether
+	    <em>plist</em> is a dataset creation or dataset transfer
+	    property list.  In each pipeline the filters are numbered from
+	    0 through <em>N</em>-1 where <em>N</em> is the value returned
+	    by this function. During output to the file the filters of a
+	    pipeline are applied in increasing order (the inverse is true
+	    for input).  Zero is returned if there are no filters in the
+	    pipeline and a negative value is returned for errors.
+      </dd><dt><code>H5Z_filter_t H5Pget_filter (hid_t <em>plist</em>,
+	      int <em>filter_number</em>, unsigned int *<em>flags</em>,
+	      size_t *<em>cd_nelmts</em>, unsigned int
+	      *<em>cd_values</em>, size_t namelen, char name[])</code>
+      </dt><dd>This is the query counterpart of
+	    <code>H5Pset_filter()</code> and returns information about a
+	    particular filter number in a permanent or transient pipeline
+	    depending on whether <em>plist</em> is a dataset creation or
+	    dataset transfer property list.  On input, <em>cd_nelmts</em>
+	    indicates the number of entries in the <em>cd_values</em>
+	    array allocated by the caller while on exit it contains the
+	    number of values defined by the filter.  The
+	    <em>filter_number</em> should be a value between zero and
+	    <em>N</em>-1 as described for <code>H5Pget_nfilters()</code>
+	    and the function will return failure (a negative value) if the
+	    filter number is out of range.  If <em>name</em> is a pointer
+	    to an array of at least <em>namelen</em> bytes then the filter
+	    name will be copied into that array.  The name will be null
+	    terminated if the <em>namelen</em> is large enough.  The
+	    filter name returned will be the name appearing in the file or
+	    else the name registered for the filter or else an empty string.
+    </dd></dl>
+
+    <p>The flags argument to the functions above is a bit vector of
+      the following fields:
+
+    </p>
+	  <table>
+	    <tbody><tr>
+	        <th>Values for <em>flags</em></th>
+	        <th>Description</th>
+	      </tr>
+
+	      <tr valign="top">
+	        <td><code>H5Z_FLAG_OPTIONAL</code></td>
+	        <td>If this bit is set then the filter is optional.  If
+	          the filter fails (see below) during an
+	          <code>H5Dwrite()</code> operation then the filter is
+	          just excluded from the pipeline for the chunk for which
+	          it failed; the filter will not participate in the
+	          pipeline during an <code>H5Dread()</code> of the chunk.
+	          This is commonly used for compression filters: if the
+	          compression result would be larger than the input then
+	          the compression filter returns failure and the
+	          uncompressed data is stored in the file.  If this bit is
+	          clear and a filter fails then the
+	          <code>H5Dwrite()</code> or <code>H5Dread()</code> also
+	          fails.</td>
+	      </tr>
+	  </tbody></table>
+
+    <h2>Defining Filters</h2>
+
+    <p>Each filter is bidirectional, handling both input and output to
+      the file, and a flag is passed to the filter to indicate the
+      direction.  In either case the filter reads a chunk of data from
+      a buffer, usually performs some sort of transformation on the
+      data, places the result in the same or new buffer, and returns
+      the buffer pointer and size to the caller. If something goes
+      wrong the filter should return zero to indicate a failure.
+
+    </p><p>During output, a filter that fails or isn't defined and is
+      marked as optional is silently excluded from the pipeline and
+      will not be used when reading that chunk of data.  A required
+      filter that fails or isn't defined causes the entire output
+      operation to fail. During input, any filter that has not been
+      excluded from the pipeline during output and fails or is not
+      defined will cause the entire input operation to fail.
+
+    </p><p>Filters are defined in two phases.  The first phase is to
+      define a function to act as the filter and link the function
+      into the application.  The second phase is to register the
+      function, associating the function with an
+      <code>H5Z_filter_t</code> identification number and a comment.
+
+    </p><dl>
+      <dt><code>typedef size_t (*H5Z_func_t)(unsigned int
+	      <em>flags</em>, size_t <em>cd_nelmts</em>, const unsigned int
+	      <em>cd_values</em>[], size_t <em>nbytes</em>, size_t
+	      *<em>buf_size</em>, void **<em>buf</em>)</code>
+      </dt><dd>The <em>flags</em>, <em>cd_nelmts</em>, and
+	    <em>cd_values</em> are the same as for the
+	    <code>H5Pset_filter()</code> function with the additional flag
+	    <code>H5Z_FLAG_REVERSE</code> which is set when the filter is
+	    called as part of the input pipeline. The input buffer is
+	    pointed to by <em>*buf</em> and has a total size of
+	    <em>*buf_size</em> bytes but only <em>nbytes</em> are valid
+	    data. The filter should perform the transformation in place if
+	    possible and return the number of valid bytes or zero for
+	    failure.  If the transformation cannot be done in place then
+	    the filter should allocate a new buffer with
+	    <code>malloc()</code> and assign it to <em>*buf</em>,
+	    assigning the allocated size of that buffer to
+	    <em>*buf_size</em>. The old buffer should be freed
+	    by calling <code>free()</code>.
+
+	    <br><br>
+      </dd><dt><code>herr_t H5Zregister (H5Z_filter_t <em>filter_id</em>,
+	      const char *<em>comment</em>, H5Z_func_t
+	      <em>filter</em>)</code>
+      </dt><dd>The <em>filter</em> function is associated with a filter
+	    number and a short ASCII comment which will be stored in the
+	    hdf5 file if the filter is used as part of a permanent
+	    pipeline during dataset creation.
+    </dd></dl>
+
+    <h2>Predefined Filters</h2>
+
+    <p>If <code>zlib</code> version 1.1.2 or later was found
+      during configuration then the library will define a filter whose
+      <code>H5Z_filter_t</code> number is
+      <code>H5Z_FILTER_DEFLATE</code>. Since this compression method
+      has the potential for generating compressed data which is larger
+      than the original, the <code>H5Z_FLAG_OPTIONAL</code> flag
+      should be turned on so such cases can be handled gracefully by
+      storing the original data instead of the compressed data.  The
+      <em>cd_nvalues</em> should be one with <em>cd_value[0]</em>
+      being a compression aggression level between zero and nine,
+      inclusive (zero is the fastest compression while nine results in
+      the best compression ratio).
+
+    </p><p>A convenience function for adding the
+      <code>H5Z_FILTER_DEFLATE</code> filter to a pipeline is:
+
+    </p><dl>
+      <dt><code>herr_t H5Pset_deflate (hid_t <em>plist</em>, unsigned
+	      <em>aggression</em>)</code>
+      </dt><dd>The deflate compression method is added to the end of the
+	    permanent or transient filter pipeline depending on whether
+	    <em>plist</em> is a dataset creation or dataset transfer
+	    property list. The <em>aggression</em> is a number between
+	    zero and nine (inclusive) to indicate the tradeoff between
+	    speed and compression ratio (zero is fastest, nine is best
+	    ratio).
+    </dd></dl>
+
+    <p>Even if the <code>zlib</code> isn't detected during
+      configuration the application can define
+      <code>H5Z_FILTER_DEFLATE</code> as a permanent filter.  If the
+      filter is marked as optional (as with
+      <code>H5Pset_deflate()</code>) then it will always fail and be
+      automatically removed from the pipeline.  Applications that read
+      data will fail only if the data is actually compressed; they
+      won't fail if <code>H5Z_FILTER_DEFLATE</code> was part of the
+      permanent output pipeline but was automatically excluded because
+      it didn't exist when the data was written.
+
+    </p><p><code>zlib</code> can be acquired from
+      <code><a href="http://www.cdrom.com/pub/infozip/zlib/">
+          http://www.cdrom.com/pub/infozip/zlib/</a></code>.
+
+    </p><h2>Example</h2>
+
+    <p>This example shows how to define and register a simple filter
+      that adds a checksum capability to the data stream.
+
+    </p><p>The function that acts as the filter always returns zero
+      (failure) if the <code>md5()</code> function was not detected at
+      configuration time (left as an exercise for the reader).
+      Otherwise the function is broken down to an input and output
+      half.  The output half calculates a checksum, increases the size
+      of the output buffer if necessary, and appends the checksum to
+      the end of the buffer.  The input half calculates the checksum
+      on the first part of the buffer and compares it to the checksum
+      already stored at the end of the buffer.  If the two differ then
+      zero (failure) is returned, otherwise the buffer size is reduced
+      to exclude the checksum.
+
+    </p>
+	  <table>
+	    <tbody><tr>
+	        <td>
+	          <p><code></code></p><pre><code>
+                  size_t
+                  md5_filter(unsigned int flags, size_t cd_nelmts,
+                  const unsigned int cd_values[], size_t nbytes,
+                  size_t *buf_size, void **buf)
+                  {
+                  #ifdef HAVE_MD5
+                  unsigned char       cksum[16];
+
+                  if (flags &amp; H5Z_REVERSE) {
+                  /* Input */
+                  assert(nbytes&gt;=16);
+                  md5(nbytes-16, *buf, cksum);
+
+                  /* Compare */
+                  if (memcmp(cksum, (char*)(*buf)+nbytes-16, 16)) {
+                  return 0; /*fail*/
+                  }
+
+                  /* Strip off checksum */
+                  return nbytes-16;
+
+                  } else {
+                  /* Output */
+                  md5(nbytes, *buf, cksum);
+
+                  /* Increase buffer size if necessary */
+                  if (nbytes+16&gt;*buf_size) {
+                  *buf_size = nbytes + 16;
+                  *buf = realloc(*buf, *buf_size);
+                  }
+
+                  /* Append checksum */
+                  memcpy((char*)(*buf)+nbytes, cksum, 16);
+                  return nbytes+16;
+                  }
+                  #else
+                  return 0; /*fail*/
+                  #endif
+                  }
+	          </code></pre>
+	        </td>
+	      </tr>
+	  </tbody></table>
+
+    <p>Once the filter function is defined it must be registered so
+      the HDF5 library knows about it.  Since we're testing this
+      filter we choose one of the <code>H5Z_filter_t</code> numbers
+      from the reserved range.  We'll randomly choose 305.
+
+    </p><p>
+    </p>
+	  <table>
+	    <tbody><tr>
+	        <td>
+	          <p><code></code></p><pre><code>
+                  #define FILTER_MD5 305
+                  herr_t status = H5Zregister(FILTER_MD5, "md5 checksum", md5_filter);
+	          </code></pre>
+	        </td>
+	      </tr>
+	  </tbody></table>
+
+    <p>Now we can use the filter in a pipeline.  We could have added
+      the filter to the pipeline before defining or registering the
+      filter as long as the filter was defined and registered by time
+      we tried to use it (if the filter is marked as optional then we
+      could have used it without defining it and the library would
+      have automatically removed it from the pipeline for each chunk
+      written before the filter was defined and registered).
+
+    </p><p>
+    </p>
+	  <table>
+	    <tbody><tr>
+	        <td>
+	          <p><code></code></p><pre><code>
+                  hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE);
+                  hsize_t chunk_size[3] = {10,10,10};
+                  H5Pset_chunk(dcpl, 3, chunk_size);
+                  H5Pset_filter(dcpl, FILTER_MD5, 0, 0, NULL);
+                  hid_t dset = H5Dcreate(file, "dset", H5T_NATIVE_DOUBLE, space, dcpl);
+	          </code></pre>
+	        </td>
+	      </tr>
+	  </tbody></table>
+
+    <h2>6. Filter Diagnostics</h2>
+
+    <p>If the library is compiled with debugging turned on for the H5Z
+      layer (usually as a result of <code>configure
+        --enable-debug=z</code>) then filter statistics are printed when
+      the application exits normally or the library is closed.  The
+      statistics are written to the standard error stream and include
+      two lines for each filter that was used: one for input and one
+      for output.  The following fields are displayed:
+
+    </p><p>
+    </p>
+	  <table>
+	    <tbody><tr>
+	        <th>Field Name</th>
+	        <th>Description</th>
+	      </tr>
+
+	      <tr valign="top">
+	        <td>Method</td>
+	        <td>This is the name of the method as defined with
+	          <code>H5Zregister()</code> with the characters
+	          "&lt; or "&gt;" prepended to indicate
+	          input or output.</td>
+	      </tr>
+
+	      <tr valign="top">
+	        <td>Total</td>
+	        <td>The total number of bytes processed by the filter
+	          including errors.  This is the maximum of the
+	          <em>nbytes</em> argument or the return value.
+	      </td></tr>
+
+	      <tr valign="top">
+	        <td>Errors</td>
+	        <td>This field shows the number of bytes of the Total
+	          column which can be attributed to errors.</td>
+	      </tr>
+
+	      <tr valign="top">
+	        <td>User, System, Elapsed</td>
+	        <td>These are the amount of user time, system time, and
+	          elapsed time in seconds spent in the filter function.
+	          Elapsed time is sensitive to system load. These times
+	          may be zero on operating systems that don't support the
+	          required operations.</td>
+	      </tr>
+
+	      <tr valign="top">
+	        <td>Bandwidth</td>
+	        <td>This is the filter bandwidth which is the total
+	          number of bytes processed divided by elapsed time.
+	          Since elapsed time is subject to system load the
+	          bandwidth numbers cannot always be trusted.
+	          Furthermore, the bandwidth includes bytes attributed to
+	          errors which may significanly taint the value if the
+	          function is able to detect errors without much
+	          expense.</td>
+	      </tr>
+	  </tbody></table>
+
+    <p>
+    </p>
+	  <table>
+	    <caption align="bottom">
+	      <b>Example: Filter Statistics</b>
+	    </caption>
+	    <tbody><tr>
+	        <td>
+	          <p><code></code></p><pre><code>H5Z: filter statistics accumulated ov=
+                  er life of library:
+                  Method     Total  Errors  User  System  Elapsed Bandwidth
+                  ------     -----  ------  ----  ------  ------- ---------
+                  &gt;deflate  160000   40000  0.62    0.74     1.33 117.5 kBs
+                  &lt;deflate  120000       0  0.11    0.00     0.12 1.000 MBs
+	          </code></pre>
+	        </td>
+	      </tr>
+	  </tbody></table>
+
+    <hr>
+
+
+    <p><a name="fn1">Footnote 1:</a> Dataset chunks can be compressed
+      through the use of filters.  Developers should be aware that
+      reading and rewriting compressed chunked data can result in holes
+      in an HDF5 file.  In time, enough such holes can increase the
+      file size enough to impair application or library performance
+      when working with that file.  See
+      <a href="https://support.hdfgroup.org/HDF5/doc1.6/Performance.html#Freespace">
+        Freespace Management</a>
+      in the chapter
+      <a href="https://support.hdfgroup.org/HDF5/doc1.6/Performance.html">
+        Performance Analysis and Issues</a>.</p>
+</html>
author	Allen Byrne <50328838+byrnHDF@users.noreply.github.com>	2022-03-30 17:34:48 (GMT)
committer	GitHub <noreply@github.com>	2022-03-30 17:34:48 (GMT)
commit	fac4cd9e667e3ec6e62e7975fbdb53ff0d70cdc6 (patch)
tree	73d047b9cf257c85a708eb813c12f3071ed6d67a /doxygen/examples/Filters.html
parent	d4f151fac40724b0425e697b1c3613b5d86cfa7b (diff)
download	hdf5-fac4cd9e667e3ec6e62e7975fbdb53ff0d70cdc6.zip hdf5-fac4cd9e667e3ec6e62e7975fbdb53ff0d70cdc6.tar.gz hdf5-fac4cd9e667e3ec6e62e7975fbdb53ff0d70cdc6.tar.bz2