diff options
Diffstat (limited to 'doc/html/Filters.html')
-rw-r--r-- | doc/html/Filters.html | 593 |
1 files changed, 0 insertions, 593 deletions
diff --git a/doc/html/Filters.html b/doc/html/Filters.html deleted file mode 100644 index a253cfb..0000000 --- a/doc/html/Filters.html +++ /dev/null @@ -1,593 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> -<html> - <head> - <title>Filters</title> - -<!-- #BeginLibraryItem "/ed_libs/styles_UG.lbi" --> -<!-- - * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - * Copyright by the Board of Trustees of the University of Illinois. * - * All rights reserved. * - * * - * This file is part of HDF5. The full HDF5 copyright notice, including * - * terms governing use, modification, and redistribution, is contained in * - * the files COPYING and Copyright.html. COPYING can be found at the root * - * of the source code distribution tree; Copyright.html can be found at the * - * root level of an installed copy of the electronic HDF5 document set and * - * is linked from the top-level documents page. It can also be found at * - * http://hdf.ncsa.uiuc.edu/HDF5/doc/Copyright.html. If you do not have * - * access to either file, you may request a copy from hdfhelp@ncsa.uiuc.edu. * - * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - --> - -<link href="ed_styles/UGelect.css" rel="stylesheet" type="text/css"> -<!-- #EndLibraryItem --></head> - - <body bgcolor="#FFFFFF"> - - -<!-- #BeginLibraryItem "/ed_libs/NavBar_UG.lbi" --><hr> -<center> -<table border=0 width=98%> -<tr><td valign=top align=left> - <a href="index.html">HDF5 documents and links</a> <br> - <a href="H5.intro.html">Introduction to HDF5</a> <br> - <a href="RM_H5Front.html">HDF5 Reference Manual</a> <br> - <a href="http://hdf.ncsa.uiuc.edu/HDF5/doc/UG/index.html">HDF5 User's Guide for Release 1.6</a> <br> - <!-- - <a href="Glossary.html">Glossary</a><br> - --> -</td> -<td valign=top align=right> - And in this document, the - <a href="H5.user.html"><strong>HDF5 User's Guide from Release 1.4.5:</strong></a> - <br> - <a href="Files.html">Files</a> - <a href="Datasets.html">Datasets</a> - <a href="Datatypes.html">Datatypes</a> - <a href="Dataspaces.html">Dataspaces</a> - <a href="Groups.html">Groups</a> - <br> - <a href="References.html">References</a> - <a href="Attributes.html">Attributes</a> - <a href="Properties.html">Property Lists</a> - <a href="Errors.html">Error Handling</a> - <br> - <a href="Filters.html">Filters</a> - <a href="Caching.html">Caching</a> - <a href="Chunking.html">Chunking</a> - <a href="MountingFiles.html">Mounting Files</a> - <br> - <a href="Performance.html">Performance</a> - <a href="Debugging.html">Debugging</a> - <a href="Environment.html">Environment</a> - <a href="ddl.html">DDL</a> -</td></tr> -</table> -</center> -<hr> -<!-- #EndLibraryItem --><h1>Filters in HDF5</h1> - - <b>Note: Transient pipelines described in this document have not - been implemented.</b> - - <h2>1. Introduction</h2> - - <p>HDF5 allows chunked data<sup><a href="#fn1">1</a></sup> - to pass through user-defined filters - on the way to or from disk. The filters operate on chunks of an - <code>H5D_CHUNKED</code> dataset can be arranged in a pipeline - so output of one filter becomes the input of the next filter. - - <p>Each filter has a two-byte identification number (type - <code>H5Z_filter_t</code>) allocated by NCSA and can also be - passed application-defined integer resources to control its - behavior. Each filter also has an optional ASCII comment - string. - - <p> - <center> - <table align=center width="80%"> - <caption alignment=top> - <b>Values for <code>H5Z_filter_t</code></b> - </caption> - - <tr> - <th width="30%">Value</th> - <th width="70%">Description</th> - </tr> - - <tr valign=top> - <td><code>0-255</code></td> - <td>These values are reserved for filters predefined and - registered by the HDF5 library and of use to the general - public. They are described in a separate section - below.</td> - </tr> - - <tr valign=top> - <td><code>256-511</code></td> - <td>Filter numbers in this range are used for testing only - and can be used temporarily by any organization. No - attempt is made to resolve numbering conflicts since all - definitions are by nature temporary.</td> - </tr> - - <tr valign=top> - <td><code>512-65535</code></td> - <td>Reserved for future assignment. Please contact the - <a href="mailto:hdf5dev@ncsa.uiuc.edu">HDF5 development - team</a> to reserve a value or range of values for - use by your filters.</td> - </table> - </center> - - <h2>2. Defining and Querying the Filter Pipeline</h2> - - <p>Two types of filters can be applied to raw data I/O: permanent - filters and transient filters. The permanent filter pipeline is - defned when the dataset is created while the transient pipeline - is defined for each I/O operation. During an - <code>H5Dwrite()</code> the transient filters are applied first - in the order defined and then the permanent filters are applied - in the order defined. For an <code>H5Dread()</code> the - opposite order is used: permanent filters in reverse order, then - transient filters in reverse order. An <code>H5Dread()</code> - must result in the same amount of data for a chunk as the - original <code>H5Dwrite()</code>. - - <p>The permanent filter pipeline is defined by calling - <code>H5Pset_filter()</code> for a dataset creation property - list while the transient filter pipeline is defined by calling - that function for a dataset transfer property list. - - <dl> - <dt><code>herr_t H5Pset_filter (hid_t <em>plist</em>, - H5Z_filter_t <em>filter</em>, unsigned int <em>flags</em>, - size_t <em>cd_nelmts</em>, const unsigned int - <em>cd_values</em>[])</code> - <dd>This function adds the specified <em>filter</em> and - corresponding properties to the end of the transient or - permanent output filter pipeline (depending on whether - <em>plist</em> is a dataset creation or dataset transfer - property list). The <em>flags</em> argument specifies certain - general properties of the filter and is documented below. The - <em>cd_values</em> is an array of <em>cd_nelmts</em> integers - which are auxiliary data for the filter. The integer values - will be stored in the dataset object header as part of the - filter information. - - <br><br> - <dt><code>int H5Pget_nfilters (hid_t <em>plist</em>)</code> - <dd>This function returns the number of filters defined in the - permanent or transient filter pipeline depending on whether - <em>plist</em> is a dataset creation or dataset transfer - property list. In each pipeline the filters are numbered from - 0 through <em>N</em>-1 where <em>N</em> is the value returned - by this function. During output to the file the filters of a - pipeline are applied in increasing order (the inverse is true - for input). Zero is returned if there are no filters in the - pipeline and a negative value is returned for errors. - - <br><br> - <dt><code>H5Z_filter_t H5Pget_filter (hid_t <em>plist</em>, - int <em>filter_number</em>, unsigned int *<em>flags</em>, - size_t *<em>cd_nelmts</em>, unsigned int - *<em>cd_values</em>, size_t namelen, char name[])</code> - <dd>This is the query counterpart of - <code>H5Pset_filter()</code> and returns information about a - particular filter number in a permanent or transient pipeline - depending on whether <em>plist</em> is a dataset creation or - dataset transfer property list. On input, <em>cd_nelmts</em> - indicates the number of entries in the <em>cd_values</em> - array allocated by the caller while on exit it contains the - number of values defined by the filter. The - <em>filter_number</em> should be a value between zero and - <em>N</em>-1 as described for <code>H5Pget_nfilters()</code> - and the function will return failure (a negative value) if the - filter number is out of range. If <em>name</em> is a pointer - to an array of at least <em>namelen</em> bytes then the filter - name will be copied into that array. The name will be null - terminated if the <em>namelen</em> is large enough. The - filter name returned will be the name appearing in the file or - else the name registered for the filter or else an empty string. - </dl> - - <p>The flags argument to the functions above is a bit vector of - the following fields: - - <p> - <center> - <table align=center width="80%"> - <caption align=top> - <b>Values for the <em>flags</em> argument</b> - </caption> - - <tr> - <th width="30%">Value</th> - <th width="70%">Description</th> - </tr> - - <tr valign=top> - <td><code>H5Z_FLAG_OPTIONAL</code></td> - <td>If this bit is set then the filter is optional. If - the filter fails (see below) during an - <code>H5Dwrite()</code> operation then the filter is - just excluded from the pipeline for the chunk for which - it failed; the filter will not participate in the - pipeline during an <code>H5Dread()</code> of the chunk. - This is commonly used for compression filters: if the - compression result would be larger than the input then - the compression filter returns failure and the - uncompressed data is stored in the file. If this bit is - clear and a filter fails then the - <code>H5Dwrite()</code> or <code>H5Dread()</code> also - fails.</td> - </tr> - </table> - </center> - - <h2>3. Defining Filters</h2> - - <p>Each filter is bidirectional, handling both input and output to - the file, and a flag is passed to the filter to indicate the - direction. In either case the filter reads a chunk of data from - a buffer, usually performs some sort of transformation on the - data, places the result in the same or new buffer, and returns - the buffer pointer and size to the caller. If something goes - wrong the filter should return zero to indicate a failure. - - <p>During output, a filter that fails or isn't defined and is - marked as optional is silently excluded from the pipeline and - will not be used when reading that chunk of data. A required - filter that fails or isn't defined causes the entire output - operation to fail. During input, any filter that has not been - excluded from the pipeline during output and fails or is not - defined will cause the entire input operation to fail. - - <p>Filters are defined in two phases. The first phase is to - define a function to act as the filter and link the function - into the application. The second phase is to register the - function, associating the function with an - <code>H5Z_filter_t</code> identification number and a comment. - - <dl> - <dt><code>typedef size_t (*H5Z_func_t)(unsigned int - <em>flags</em>, size_t <em>cd_nelmts</em>, const unsigned int - <em>cd_values</em>[], size_t <em>nbytes</em>, size_t - *<em>buf_size</em>, void **<em>buf</em>)</code> - <dd>The <em>flags</em>, <em>cd_nelmts</em>, and - <em>cd_values</em> are the same as for the - <code>H5Pset_filter()</code> function with the additional flag - <code>H5Z_FLAG_REVERSE</code> which is set when the filter is - called as part of the input pipeline. The input buffer is - pointed to by <em>*buf</em> and has a total size of - <em>*buf_size</em> bytes but only <em>nbytes</em> are valid - data. The filter should perform the transformation in place if - possible and return the number of valid bytes or zero for - failure. If the transformation cannot be done in place then - the filter should allocate a new buffer with - <code>malloc()</code> and assign it to <em>*buf</em>, - assigning the allocated size of that buffer to - <em>*buf_size</em>. The old buffer should be freed - by calling <code>free()</code>. - - <br><br> - <dt><code>herr_t H5Zregister (H5Z_filter_t <em>filter_id</em>, - const char *<em>comment</em>, H5Z_func_t - <em>filter</em>)</code> - <dd>The <em>filter</em> function is associated with a filter - number and a short ASCII comment which will be stored in the - hdf5 file if the filter is used as part of a permanent - pipeline during dataset creation. - </dl> - - - <h2>4. Predefined Filters</h2> - - <p>If <code>zlib</code> version 1.1.2 or later was found - during configuration then the library will define a filter whose - <code>H5Z_filter_t</code> number is - <code>H5Z_FILTER_DEFLATE</code>. Since this compression method - has the potential for generating compressed data which is larger - than the original, the <code>H5Z_FLAG_OPTIONAL</code> flag - should be turned on so such cases can be handled gracefully by - storing the original data instead of the compressed data. The - <em>cd_nvalues</em> should be one with <em>cd_value[0]</em> - being a compression agression level between zero and nine, - inclusive (zero is the fastest compression while nine results in - the best compression ratio). - - <p>A convenience function for adding the - <code>H5Z_FILTER_DEFLATE</code> filter to a pipeline is: - - <dl> - <dt><code>herr_t H5Pset_deflate (hid_t <em>plist</em>, unsigned - <em>aggression</em>)</code> - <dd>The deflate compression method is added to the end of the - permanent or transient filter pipeline depending on whether - <em>plist</em> is a dataset creation or dataset transfer - property list. The <em>aggression</em> is a number between - zero and nine (inclusive) to indicate the tradeoff between - speed and compression ratio (zero is fastest, nine is best - ratio). - </dl> - - <p>Even if the <code>zlib</code> isn't detected during - configuration the application can define - <code>H5Z_FILTER_DEFLATE</code> as a permanent filter. If the - filter is marked as optional (as with - <code>H5Pset_deflate()</code>) then it will always fail and be - automatically removed from the pipeline. Applications that read - data will fail only if the data is actually compressed; they - won't fail if <code>H5Z_FILTER_DEFLATE</code> was part of the - permanent output pipeline but was automatically excluded because - it didn't exist when the data was written. - - <p><code>zlib</code> can be acquired from - <code><a href="http://www.cdrom.com/pub/infozip/zlib/">http://www.cdrom.com/pub/infozip/zlib/</a></code>. - - <h2>5. Example</h2> - - <p>This example shows how to define and register a simple filter - that adds a checksum capability to the data stream. - - <p>The function that acts as the filter always returns zero - (failure) if the <code>md5()</code> function was not detected at - configuration time (left as an excercise for the reader). - Otherwise the function is broken down to an input and output - half. The output half calculates a checksum, increases the size - of the output buffer if necessary, and appends the checksum to - the end of the buffer. The input half calculates the checksum - on the first part of the buffer and compares it to the checksum - already stored at the end of the buffer. If the two differ then - zero (failure) is returned, otherwise the buffer size is reduced - to exclude the checksum. - - <p> - <center> - <table border align=center width="100%"> - <tr> - <td> - <p><code><pre> - -size_t -md5_filter(unsigned int flags, size_t cd_nelmts, - const unsigned int cd_values[], size_t nbytes, - size_t *buf_size, void **buf) -{ -#ifdef HAVE_MD5 - unsigned char cksum[16]; - - if (flags & H5Z_REVERSE) { - /* Input */ - assert(nbytes>=16); - md5(nbytes-16, *buf, cksum); - - /* Compare */ - if (memcmp(cksum, (char*)(*buf)+nbytes-16, 16)) { - return 0; /*fail*/ - } - - /* Strip off checksum */ - return nbytes-16; - - } else { - /* Output */ - md5(nbytes, *buf, cksum); - - /* Increase buffer size if necessary */ - if (nbytes+16>*buf_size) { - *buf_size = nbytes + 16; - *buf = realloc(*buf, *buf_size); - } - - /* Append checksum */ - memcpy((char*)(*buf)+nbytes, cksum, 16); - return nbytes+16; - } -#else - return 0; /*fail*/ -#endif -} - </pre></code> - </td> - </tr> - </table> - </center> - - <p>Once the filter function is defined it must be registered so - the HDF5 library knows about it. Since we're testing this - filter we choose one of the <code>H5Z_filter_t</code> numbers - from the reserved range. We'll randomly choose 305. - - <p> - <center> - <table border align=center width="100%"> - <tr> - <td> - <p><code><pre> - -#define FILTER_MD5 305 -herr_t status = H5Zregister(FILTER_MD5, "md5 checksum", md5_filter); - </pre></code> - </td> - </tr> - </table> - </center> - - <p>Now we can use the filter in a pipeline. We could have added - the filter to the pipeline before defining or registering the - filter as long as the filter was defined and registered by time - we tried to use it (if the filter is marked as optional then we - could have used it without defining it and the library would - have automatically removed it from the pipeline for each chunk - written before the filter was defined and registered). - - <p> - <center> - <table border align=center width="100%"> - <tr> - <td> - <p><code><pre> - -hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE); -hsize_t chunk_size[3] = {10,10,10}; -H5Pset_chunk(dcpl, 3, chunk_size); -H5Pset_filter(dcpl, FILTER_MD5, 0, 0, NULL); -hid_t dset = H5Dcreate(file, "dset", H5T_NATIVE_DOUBLE, space, dcpl); - </pre></code> - </td> - </tr> - </table> - </center> - - <h2>6. Filter Diagnostics</h2> - - <p>If the library is compiled with debugging turned on for the H5Z - layer (usually as a result of <code>configure - --enable-debug=z</code>) then filter statistics are printed when - the application exits normally or the library is closed. The - statistics are written to the standard error stream and include - two lines for each filter that was used: one for input and one - for output. The following fields are displayed: - - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> - - <tr valign=top> - <td>Method</td> - <td>This is the name of the method as defined with - <code>H5Zregister()</code> with the charaters - "< or ">" prepended to indicate - input or output.</td> - </tr> - - <tr valign=top> - <td>Total</td> - <td>The total number of bytes processed by the filter - including errors. This is the maximum of the - <em>nbytes</em> argument or the return value. - </tr> - - <tr valign=top> - <td>Errors</td> - <td>This field shows the number of bytes of the Total - column which can be attributed to errors.</td> - </tr> - - <tr valign=top> - <td>User, System, Elapsed</td> - <td>These are the amount of user time, system time, and - elapsed time in seconds spent in the filter function. - Elapsed time is sensitive to system load. These times - may be zero on operating systems that don't support the - required operations.</td> - </tr> - - <tr valign=top> - <td>Bandwidth</td> - <td>This is the filter bandwidth which is the total - number of bytes processed divided by elapsed time. - Since elapsed time is subject to system load the - bandwidth numbers cannot always be trusted. - Furthermore, the bandwidth includes bytes attributed to - errors which may significanly taint the value if the - function is able to detect errors without much - expense.</td> - </tr> - </table> - </center> - - <p> - <center> - <table border align=center width="100%"> - <caption align=bottom> - <b>Example: Filter Statistics</b> - </caption> - <tr> - <td> - <p><code><pre> -H5Z: filter statistics accumulated over life of library: - Method Total Errors User System Elapsed Bandwidth - ------ ----- ------ ---- ------ ------- --------- - >deflate 160000 40000 0.62 0.74 1.33 117.5 kBs - <deflate 120000 0 0.11 0.00 0.12 1.000 MBs - </pre></code> - </td> - </tr> - </table> - </center> - - -<hr> - - - <p><a name="fn1">Footnote 1:</a> Dataset chunks can be compressed - through the use of filters. Developers should be aware that - reading and rewriting compressed chunked data can result in holes - in an HDF5 file. In time, enough such holes can increase the - file size enough to impair application or library performance - when working with that file. See - “<a href="Performance.html#Freespace">Freespace Management</a>” - in the chapter - “<a href="Performance.html">Performance Analysis and Issues</a>.”</p> - - -<!-- #BeginLibraryItem "/ed_libs/NavBar_UG.lbi" --><hr> -<center> -<table border=0 width=98%> -<tr><td valign=top align=left> - <a href="index.html">HDF5 documents and links</a> <br> - <a href="H5.intro.html">Introduction to HDF5</a> <br> - <a href="RM_H5Front.html">HDF5 Reference Manual</a> <br> - <a href="http://hdf.ncsa.uiuc.edu/HDF5/doc/UG/index.html">HDF5 User's Guide for Release 1.6</a> <br> - <!-- - <a href="Glossary.html">Glossary</a><br> - --> -</td> -<td valign=top align=right> - And in this document, the - <a href="H5.user.html"><strong>HDF5 User's Guide from Release 1.4.5:</strong></a> - <br> - <a href="Files.html">Files</a> - <a href="Datasets.html">Datasets</a> - <a href="Datatypes.html">Datatypes</a> - <a href="Dataspaces.html">Dataspaces</a> - <a href="Groups.html">Groups</a> - <br> - <a href="References.html">References</a> - <a href="Attributes.html">Attributes</a> - <a href="Properties.html">Property Lists</a> - <a href="Errors.html">Error Handling</a> - <br> - <a href="Filters.html">Filters</a> - <a href="Caching.html">Caching</a> - <a href="Chunking.html">Chunking</a> - <a href="MountingFiles.html">Mounting Files</a> - <br> - <a href="Performance.html">Performance</a> - <a href="Debugging.html">Debugging</a> - <a href="Environment.html">Environment</a> - <a href="ddl.html">DDL</a> -</td></tr> -</table> -</center> -<hr> -<!-- #EndLibraryItem --><!-- #BeginLibraryItem "/ed_libs/Footer.lbi" --><address> -<a href="mailto:hdfhelp@ncsa.uiuc.edu">HDF Help Desk</a> -<br> -Describes HDF5 Release 1.4.5, February 2003 -</address><!-- #EndLibraryItem --><!-- Created: Fri Apr 17 13:39:35 EDT 1998 --> -<!-- hhmts start --> -Last modified: 2 August 2001 -<!-- hhmts end --> - - -</body> -</html> |