diff options
author | Robb Matzke <matzke@llnl.gov> | 1998-08-05 22:23:51 (GMT) |
---|---|---|
committer | Robb Matzke <matzke@llnl.gov> | 1998-08-05 22:23:51 (GMT) |
commit | 32295ad53dd29e799cef007e05ca0368f0c1ca1d (patch) | |
tree | e041f9bffa6d839d4f3e13afcad4199deb486d3b /doc | |
parent | 002b1494b79e2fd638a0676745c340a9a9e9d8e7 (diff) | |
download | hdf5-32295ad53dd29e799cef007e05ca0368f0c1ca1d.zip hdf5-32295ad53dd29e799cef007e05ca0368f0c1ca1d.tar.gz hdf5-32295ad53dd29e799cef007e05ca0368f0c1ca1d.tar.bz2 |
[svn-r570] *** empty log message ***
Diffstat (limited to 'doc')
-rw-r--r-- | doc/html/Compression.html | 409 | ||||
-rw-r--r-- | doc/html/Filters.html | 463 |
2 files changed, 463 insertions, 409 deletions
diff --git a/doc/html/Compression.html b/doc/html/Compression.html deleted file mode 100644 index c3a2a45..0000000 --- a/doc/html/Compression.html +++ /dev/null @@ -1,409 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> -<html> - <head> - <title>Compression</title> - </head> - - <body> - <h1>Compression</h1> - - <h2>1. Introduction</h2> - - <p>HDF5 supports compression of raw data by compression methods - built into the library or defined by an application. A - compression method is associated with a dataset when the dataset - is created and is applied independently to each storage chunk of - the dataset. - - The dataset must use the <code>H5D_CHUNKED</code> storage - layout. The library doesn't support compression for contiguous - datasets because of the difficulty of implementing random access - for partial I/O, and compact dataset compression is not - supported because it wouldn't produce significant results. - - <h2>2. Supported Compression Methods</h2> - - <p>The library identifies compression methods with small - integers, with values less than 16 reserved for use by NCSA and - values between 16 and 255 (inclusive) available for general - use. This range may be extended in the future if it proves to - be too small. - - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Method Name</th> - <th width="70%">Description</th> - </tr> - - <tr valign=top> - <td><code>H5Z_NONE</code></td> - <td>The default is to not use compression. Specifying - <code>H5Z_NONE</code> as the compression method results - in better perfomance than writing a function that just - copies data because the library's I/O pipeline - recognizes this method and is able to short circuit - parts of the pipeline.</td> - </tr> - - <tr valign=top> - <td><code>H5Z_DEFLATE</code></td> - <td>The <em>deflate</em> method is the algorithm used by - the GNU <code>gzip</code>program. It's a combination of - a Huffman encoding followed by a 1977 Lempel-Ziv (LZ77) - dictionary encoding. The aggressiveness of the - compression can be controlled by passing an integer value - to the compressor with <code>H5Pset_deflate()</code> - (see below). In order for this compression method to be - used, the HDF5 library must be configured and compiled - in the presence of the GNU zlib version 1.1.2 or - later.</td> - </tr> - - <tr valign=top> - <td><code>H5Z_RES_<em>N</em></code></td> - <td>These compression methods (where <em>N</em> is in the - range two through 15, inclusive) are reserved by NCSA - for future use.</td> - </tr> - - <tr valign=top> - <td>Values of <em>N</em> between 16 and 255, inclusive</td> - <td>These values can be used to represent application-defined - compression methods. We recommend that methods under - testing should be in the high range and when a method is - about to be published it should be given a number near - the low end of the range (or even below 16). Publishing - the compression method and its numeric ID will make a - file sharable.</td> - </tr> - </table> - </center> - - <p>Setting the compression for a dataset to a method which was - not compiled into the library and/or not registered by the - application is allowed, but writing to such a dataset will - silently <em>not</em> compress the data. Reading a compressed - dataset for a method which is not available will result in - errors (specifically, <code>H5Dread()</code> will return a - negative value). The errors will be displayed in the - compression statistics if the library was compiled with - debugging turned on for the "z" package. See the - section on diagnostics below for more details. - - <h2>3. Application-Defined Methods</h2> - - <p>Compression methods 16 through 255 can be defined by an - application. As mentioned above, methods that have not been - released should use high numbers in that range while methods - that have been published will be assigned an official number in - the low region of the range (possibly less than 16). Users - should be aware that using unpublished compression methods - results in unsharable files. - - <p>A compression method has two halves: one have handles - compression and the other half handles uncompression. The - halves are implemented as functions - <code><em>method</em>_c</code> and - <code><em>method</em>_u</code> respectively. One should not use - the names <code>compress</code> or <code>uncompress</code> since - they are likely to conflict with other compression libraries - (like the GNU zlib). - - <p>Both the <code><em>method</em>_c</code> and - <code><em>method</em>_u</code> functions take the same arguments - and return the same values. They are defined with the type: - - <dl> - <dt><code>typedef size_t (*H5Z_func_t)(unsigned int - <em>flags</em>, size_t <em>cd_size</em>, const void - *<em>client_data</em>, size_t <em>src_nbytes</em>, const - void *<em>src</em>, size_t <em>dst_nbytes</em>, void - *<em>dst</em>/*out*/)</code> - <dd>The <em>flags</em> are an 8-bit vector which is stored in - the file and which is defined when the compression method is - defined. The <em>client_data</em> is a pointer to - <em>cd_size</em> bytes of configuration data which is also - stored in the file. The function compresses or uncompresses - <em>src_nbytes</em> from the source buffer <em>src</em> into - at most <em>dst_nbytes</em> of the result buffer <em>dst</em>. - The function returns the number of bytes written to the result - buffer or zero if an error occurs. But if a result buffer - overrun occurs the function should return a value at least as - large as <em>dst_size</em> (the uncompressor will see an - overrun only for corrupt data). - </dl> - - <p>The application associates the pair of functions with a name - and a method number by calling <code>H5Zregister()</code>. This - function can also be used to remove a compression method from - the library by supplying null pointers for the functions. - - <dl> - <dt><code>herr_t H5Zregister (H5Z_method_t <em>method</em>, - const char *<em>name</em>, H5Z_func_t <em>method_c</em>, - H5Z_func_t <em>method_u</em>)</code> - <dd>The pair of functions to be used for compression - (<em>method_c</em>) and uncompression (<em>method_u</em>) are - associated with a short <em>name</em> used for debugging and a - <em>method</em> number in the range 16 through 255. This - function can be called as often as desired for a particular - compression method with each call replacing the information - stored by the previous call. Sometimes it's convenient to - supply only one half of the compression, for instance in an - application that opens files for read-only. Compression - statistics for the method are accumulated across calls to this - function. - </dl> - - <p> - <center> - <table border align=center width="100%"> - <caption align=bottom><h4>Example: Registering an - Application-Defined Compression Method</h4></caption> - <tr> - <td> - <p>Here's a simple-minded "compression" method - that just copies the input value to the output. It's - similar to the <code>H5Z_NONE</code> method but - slower. Compression and uncompression are performed - by the same function. - - <p><code><pre> -size_t -bogus (unsigned int flags, - size_t cd_size, const void *client_data, - size_t src_nbytes, const void *src, - size_t dst_nbytes, void *dst/*out*/) -{ - memcpy (dst, src, src_nbytes); - return src_nbytes; -} - </pre></code> - - <p>The function could be registered as method 250 as - follows: - - <p><code><pre> -#define H5Z_BOGUS 250 -H5Zregister (H5Z_BOGUS, "bogus", bogus, bogus); - </pre></code> - - <p>The function can be unregistered by saying: - - <p><code><pre> -H5Zregister (H5Z_BUGUS, "bogus", NULL, NULL); - </pre></code> - - <p>Notice that we kept the name "bogus" even - though we unregistered the functions that perform the - compression and uncompression. This makes compression - statistics more understandable when they're printed. - </td> - </tr> - </table> - </center> - - <h2>4. Enabling Compression for a Dataset</h2> - - <p>If a dataset is to be compressed then the compression - information must be specified when the dataset is created since - once a dataset is created compression parameters cannot be - adjusted. The compression is specified through the dataset - creation property list (see <code>H5Pcreate()</code>). - - <dl> - <dt><code>herr_t H5Pset_deflate (hid_t <em>plist</em>, int - <em>level</em>)</code> - <dd>The compression method for dataset creation property list - <em>plist</em> is set to <code>H5Z_DEFLATE</code> and the - aggression level is set to <em>level</em>. The <em>level</em> - must be a value between one and nine, inclusive, where one - indicates no (but fast) compression and nine is aggressive - compression. - - <br><br> - <dt><code>int H5Pget_deflate (hid_t <em>plist</em>)</code> - <dd>If dataset creation property list <em>plist</em> is set to - use <code>H5Z_DEFLATE</code> compression then this function - will return the aggression level, an integer between one and - nine inclusive. If <em>plist</em> isn't a valid dataset - creation property list or it isn't set to use the deflate - method then a negative value is returned. - - <br><br> - <dt><code>herr_t H5Pset_compression (hid_t <em>plist</em>, - H5Z_method_t <em>method</em>, unsigned int <em>flags</em>, - size_t <em>cd_size</em>, const void *<em>client_data</em>)</code> - <dd>This is a catch-all function for defining compresion methods - and is intended to be called from a wrapper such as - <code>H5Pset_deflate()</code>. The dataset creation property - list <em>plist</em> is adjusted to use the specified - compression method. The <em>flags</em> is an 8-bit vector - which is stored in the file as part of the compression message - and passed to the compress and uncompress functions. The - <em>client_data</em> is a byte array of length - <em>cd_size</em> which is copied to the file and passed to the - compress and uncompress methods. - - <br><br> - <dt><code>H5Z_method_t H5Pget_compression (hid_t <em>plist</em>, - unsigned int *<em>flags</em>, size_t *<em>cd_size</em>, void - *<em>client_data</em>)</code> - <dd>This is a catch-all function for querying the compression - method associated with dataset creation property list - <em>plist</em> and is intended to be called from a wrapper - function such as <code>H5Pget_deflate()</code>. The - compression method (or a negative value on error) is returned - by value, and compression flags and client data is returned by - argument. The application should allocate the - <em>client_data</em> and pass its size as the - <em>cd_size</em>. On return, <em>cd_size</em> will contain - the actual size of the client data. If <em>client_data</em> - is not large enough to hold the entire client data then - <em>cd_size</em> bytes are copied into <em>client_data</em> - and <em>cd_size</em> is set to the total size of the client - data, a value larger than the original. - </dl> - - <p>It is possible to set the compression to a method which hasn't - been defined with <code>H5Zregister()</code> and which isn't - supported as a predefined method (for instance, setting the - method to <code>H5Z_DEFLATE</code> when the GNU zlib isn't - available). If that happens then data will be written to the - file in its uncompressed form and the compression statistics - will show failures for the compression. - - <p> - <center> - <table border align=center width="100%"> - <caption align=bottom><h4>Example: Statistics for an - Unsupported Compression Method</h4></caption> - <tr> - <td> - <p>If an application attempts to use an unsupported - method then the compression statistics will show large - numbers of compression errors and no data - uncompressed. - - <p><code><pre> -H5Z: compression statistics accumulated over life of library: - Method Total Overrun Errors User System Elapsed Bandwidth - ------ ----- ------- ------ ---- ------ ------- --------- - deflate-c 160000 0 160000 0.00 0.01 0.01 1.884e+07 - deflate-u 0 0 0 0.00 0.00 0.00 NaN - </pre></code> - - <p>This example is from a program that tried to use - <code>H5Z_DEFLATE</code> on a system that didn't have - the GNU zlib to write to a dataset and then read the - result. The read and write both succeeded but the - data was not compressed. - </td> - </tr> - </table> - </center> - - <h2>5. Compression Diagnostics</h2> - - <p>If the library is compiled with debugging turned on for the H5Z - layer (usually as a result of <code>configure --enable-debug=z</code>) - then statistics about data compression are printed when the - application exits normally or the library is closed. The - statistics are written to the standard error stream and include - two lines for each compression method that was used: the first - line shows compression statistics while the second shows - uncompression statistics. The following fields are displayed: - - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> - - <tr valign=top> - <td>Method</td> - <td>This is the name of the method as defined with - <code>H5Zregister()</code> with the letters - "-c" or "-u" appended to indicate - compression or uncompression.</td> - </tr> - - <tr valign=top> - <td>Total</td> - <td>The total number of bytes compressed or decompressed - including buffer overruns and errors. Bytes of - non-compressed data are counted.</td> - </tr> - - <tr valign=top> - <td>Overrun</td> - <td>During compression, if the algorithm causes the result - to be at least as large as the input then a buffer - overrun error occurs. This field shows the total number - of bytes from the Total column which can be attributed to - overruns. Overruns for decompression can only happen if - the data has been corrupted in some way and will result - in failure of <code>H5Dread()</code>.</td> - </tr> - - <tr valign=top> - <td>Errors</td> - <td>If an error occurs during compression the data is - stored in it's uncompressed form; and an error during - uncompression causes <code>H5Dread()</code> to return - failure. This field shows the number of bytes of the - Total column which can be attributed to errors.</td> - </tr> - - <tr valign=top> - <td>User, System, Elapsed</td> - <td>These are the amount of user time, system time, and - elapsed time in seconds spent by the library to perform - compression. Elapsed time is sensitive to system - load. These times may be zero on operating systems that - don't support the required operations.</td> - </tr> - - <tr valign=top> - <td>Bandwidth</td> - <td>This is the compression bandwidth which is the total - number of bytes divided by elapsed time. Since elapsed - time is subject to system load the bandwidth numbers - cannot always be trusted. Furthermore, the bandwidth - includes overrun and error bytes which may significanly - taint the value.</td> - </tr> - </table> - </center> - - <p> - <center> - <table border align=center width="100%"> - <caption align=bottom><h4>Example: Compression - Statistics</h4></caption> - <tr> - <td> - <p><code><pre> -H5Z: compression statistics accumulated over life of library: - Method Total Overrun Errors User System Elapsed Bandwidth - ------ ----- ------- ------ ---- ------ ------- --------- - deflate-c 160000 200 0 0.62 0.74 1.33 1.204e+05 - deflate-u 120000 0 0 0.11 0.00 0.12 9.885e+05 - </pre></code> - </td> - </tr> - </table> - </center> - - <hr> - <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> -<!-- Created: Fri Apr 17 13:39:35 EDT 1998 --> -<!-- hhmts start --> -Last modified: Fri Apr 17 16:15:21 EDT 1998 -<!-- hhmts end --> - </body> -</html> diff --git a/doc/html/Filters.html b/doc/html/Filters.html new file mode 100644 index 0000000..b9785c8 --- /dev/null +++ b/doc/html/Filters.html @@ -0,0 +1,463 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Filters</title> + </head> + + <body> + <h1>Filters</h1> + + <b>Note: Transient pipelines described in this document have not + been implemented.</b> + + <h2>1. Introduction</h2> + + <p>HDF5 allows chunked data to pass through user-defined filters + on the way to or from disk. The filters operate on chunks of an + <code>H5D_CHUNKED</code> dataset can be arranged in a pipeline + so output of one filter becomes the input of the next filter. + + <p>Each filter has a two-byte identification number (type + <code>H5Z_filter_t</code>) allocated by NCSA and can also be + passed application-defined integer resources to control its + behavior. Each filter also has an optional ASCII comment + string. + + <p> + <center> + <table align=center width="80%"> + <caption alignment=top> + <b>Values for <code>H5Z_filter_t</code></b> + </caption> + + <tr> + <th width="30%">Value</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td><code>0-255</code></td> + <td>These values are reserved for filters predefined and + registered by the HDF5 library and of use to the general + public. They are described in a separate section + below.</td> + </tr> + + <tr valign=top> + <td><code>256-511</code></td> + <td>Filter numbers in this range are used for testing only + and can be used temporarily by any organization. No + attempt is made to resolve numbering conflicts since all + definitions are by nature temporary.</td> + </tr> + + <tr valign=top> + <td><code>512-65535</code></td> + <td>Reserved for future assignment. Please contact the + <a href="mailto:hdf5dev@ncsa.uiuc.edu">HDF5 development + team</a> to reserve a value or range of values for + use by your filters.</td> + </table> + </center> + + <h2>2. Defining and Querying the Filter Pipeline</h2> + + <p>Two types of filters can be applied to raw data I/O: permanent + filters and transient filters. The permanent filter pipeline is + defned when the dataset is created while the transient pipeline + is defined for each I/O operation. During an + <code>H5Dwrite()</code> the transient filters are applied first + in the order defined and then the permanent filters are applied + in the order defined. For an <code>H5Dread()</code> the + opposite order is used: permanent filters in reverse order, then + transient filters in reverse order. An <code>H5Dread()</code> + must result in the same amount of data for a chunk as the + original <code>H5Dwrite()</code>. + + <p>The permanent filter pipeline is defined by calling + <code>H5Pset_filter()</code> for a dataset creation property + list while the transient filter pipeline is defined by calling + that function for a dataset transfer property list. + + <dl> + <dt><code>herr_t H5Pset_filter (hid_t <em>plist</em>, + H5Z_filter_t <em>filter</em>, unsigned int <em>flags</em>, + size_t <em>cd_nelmts</em>, const unsigned int + <em>cd_values</em>[])</code> + <dd>This function adds the specified <em>filter</em> and + corresponding properties to the end of the transient or + permanent output filter pipeline (depending on whether + <em>plist</em> is a dataset creation or dataset transfer + property list). The <em>flags</em> argument specifies certain + general properties of the filter and is documented below. The + <em>cd_values</em> is an array of <em>cd_nelmts</em> integers + which are auxiliary data for the filter. The integer values + will be stored in the dataset object header as part of the + filter information. + + <br><br> + <dt><code>int H5Pget_nfilters (hid_t <em>plist</em>)</code> + <dd>This function returns the number of filters defined in the + permanent or transient filter pipeline depending on whether + <em>plist</em> is a dataset creation or dataset transfer + property list. In each pipeline the filters are numbered from + 0 through <em>N</em>-1 where <em>N</em> is the value returned + by this function. During output to the file the filters of a + pipeline are applied in increasing order (the inverse is true + for input). Zero is returned if there are no filters in the + pipeline and a negative value is returned for errors. + + <br><br> + <dt><code>H5Z_filter_t H5Pget_filter (hid_t <em>plist</em>, + int <em>filter_number</em>, unsigned int *<em>flags</em>, + size_t *<em>cd_nelmts</em>, unsigned int + *<em>cd_values</em>)</code> + <dd>This is the query counterpart of + <code>H5Pset_filter()</code> and returns information about a + particular filter number in a permanent or transient pipeline + depending on whether <em>plist</em> is a dataset creation or + dataset transfer property list. On input, <em>cd_nelmts</em> + indicates the number of entries in the <em>cd_values</em> + array allocated by the caller while on exit it contains the + number of values defined by the filter. The + <em>filter_number</em> should be a value between zero and + <em>N</em>-1 as described for <code>H5Pget_nfilters()</code> + and the function will return failure (a negative value) if the + filter number is out of range. + </dl> + + <p>The flags argument to the functions above is a bit vector of + the following fields: + + <p> + <center> + <table align=center width="80%"> + <caption align=top> + <b>Values for the <em>flags</em> argument</b> + </caption> + + <tr> + <th width="30%">Value</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td><code>H5Z_FLAG_OPTIONAL</code></td> + <td>If this bit is set then the filter is optional. If + the filter fails (see below) during an + <code>H5Dwrite()</code> operation then the filter is + just excluded from the pipeline for the chunk for which + it failed; the filter will not participate in the + pipeline during an <code>H5Dread()</code> of the chunk. + This is commonly used for compression filters: if the + compression result would be larger than the input then + the compression filter returns failure and the + uncompressed data is stored in the file. If this bit is + clear and a filter fails then the + <code>H5Dwrite()</code> or <code>H5Dread()</code> also + fails.</td> + </tr> + </table> + </center> + + <h2>3. Defining Filters</h2> + + <p>Each filter is bidirectional, handling both input and output to + the file, and a flag is passed to the filter to indicate the + direction. In either case the filter reads a chunk of data from + a buffer, usually performs some sort of transformation on the + data, places the result in the same or new buffer, and returns + the buffer pointer and size to the caller. If something goes + wrong the filter should return zero to indicate a failure. + + <p>During output, a filter that fails or isn't defined and is + marked as optional is silently excluded from the pipeline and + will not be used when reading that chunk of data. A required + filter that fails or isn't defined causes the entire output + operation to fail. During input, any filter that has not been + excluded from the pipeline during output and fails or is not + defined will cause the entire input operation to fail. + + <p>Filters are defined in two phases. The first phase is to + define a function to act as the filter and link the function + into the application. The second phase is to register the + function, associating the function with an + <code>H5Z_filter_t</code> identification number and a comment. + + <dl> + <dt><code>typedef size_t (*H5Z_func_t)(unsigned int + <em>flags</em>, size_t <em>cd_nelmts</em>, unsigned int + *<em>cd_values</em>, size_t <em>nbytes</em>, size_t + *<em>buf_size</em>, void **<em>buf</em>)</code> + <dd>The <em>flags</em>, <em>cd_nelmts</em>, and + <em>cd_values</em> are the same as for the + <code>H5Pset_filter()</code> function with the additional flag + <code>H5Z_FLAG_REVERSE</code> which is set when the filter is + called as part of the input pipeline. The input buffer is + pointed to by <em>*buf</em> and has a total size of + <em>*buf_size</em> bytes but only <em>nbytes</em> are valid + data. The filter should perform the transformation in place if + possible and return the number of valid bytes or zero for + failure. If the transformation cannot be done in place then + the filter should allocate a new buffer with + <code>malloc()</code> and assign it to <em>*buf</em>, + assigning the allocated size of that buffer to + <em>*buf_size</em>. The old buffer should be freed + by calling <code>free()</code>. + + <br><br> + <dt><code>herr_t H5Zregister (H5Z_filter_t <em>filter_id</em>, + const char *<em>comment</em>, H5Z_func_t + <em>filter</em>)</code> + <dd>The <em>filter</em> function is associated with a filter + number and a short ASCII comment which will be stored in the + hdf5 file if the filter is used as part of a permanent + pipeline during dataset creation. + </dl> + + + <h2>4. Predefined Filters</h2> + + <p>If GNU <code>zlib</code> version 1.1.2 or later was found + during configuration then the library will define a filter whose + <code>H5Z_filter_t</code> number is + <code>H5Z_FILTER_DEFLATE</code>. Since this compression method + has the potential for generating compressed data which is larger + than the original, the <code>H5Z_FLAG_OPTIONAL</code> flag + should be turned on so such cases can be handled gracefully by + storing the original data instead of the compressed data. The + <em>cd_nvalues</em> should be one with <em>cd_value[0]</em> + being a compression agression level between zero and nine, + inclusive (zero is the fastest compression while nine results in + the best compression ratio). + + <p>A convenience function for adding the + <code>H5Z_FILTER_DEFLATE</code> filter to a pipeline is: + + <dl> + <dt><code>herr_t H5Pset_deflate (hid_t <em>plist</em>, unsigned + <em>aggression</em>)</code> + <dd>The deflate compression method is added to the end of the + permanent or transient filter pipeline depending on whether + <em>plist</em> is a dataset creation or dataset transfer + property list. The <em>aggression</em> is a number between + zero and nine (inclusive) to indicate the tradeoff between + speed and compression ratio (zero is fastest, nine is best + ratio). + </dl> + + <p>Even if the GNU <code>zlib</code> isn't detected during + configuration the application can define + <code>H5Z_FILTER_DEFLATE</code> as a permanent filter. If the + filter is marked as optional (as with + <code>H5Pset_deflate()</code>) then it will always fail and be + automatically removed from the pipeline. Applications that read + data will fail only if the data is actually compressed; they + won't fail if <code>H5Z_FILTER_DEFLATE</code> was part of the + permanent output pipeline but was automatically excluded because + it didn't exist when the data was written. + + <h2>5. Example</h2> + + <p>This example shows how to define and register a simple filter + that adds a checksum capability to the data stream. + + <p>The function that acts as the filter always returns zero + (failure) if the <code>md5()</code> function was not detected at + configuration time (left as an excercise for the reader). + Otherwise the function is broken down to an input and output + half. The output half calculates a checksum, increases the size + of the output buffer if necessary, and appends the checksum to + the end of the buffer. The input half calculates the checksum + on the first part of the buffer and compares it to the checksum + already stored at the end of the buffer. If the two differ then + zero (failure) is returned, otherwise the buffer size is reduced + to exclude the checksum. + + <p> + <center> + <table border align=center width="100%"> + <tr> + <td> + <p><code><pre> + +size_t +md5_filter(unsigned int flags, size_t cd_nelmts, unsigned int *cd_values, + size_t nbytes, size_t *buf_size, void **buf) +{ +#ifdef HAVE_MD5 + unsigned char cksum[16]; + + if (flags & H5Z_REVERSE) { + /* Input */ + assert(nbytes>=16); + md5(nbytes-16, *buf, cksum); + + /* Compare */ + if (memcmp(cksum, (char*)(*buf)+nbytes-16, 16)) { + return 0; /*fail*/ + } + + /* Strip off checksum */ + return nbytes-16; + + } else { + /* Output */ + md5(nbytes, *buf, cksum); + + /* Increase buffer size if necessary */ + if (nbytes+16>*buf_size) { + *buf_size = nbytes + 16; + *buf = realloc(*buf, *buf_size); + } + + /* Append checksum */ + memcpy((char*)(*buf)+nbytes, cksum, 16); + return nbytes+16; + } +#else + return 0; /*fail*/ +#endif +} + </pre></code> + </td> + </tr> + </table> + </center> + + <p>Once the filter function is defined it must be registered so + the HDF5 library knows about it. Since we're testing this + filter we choose one of the <code>H5Z_filter_t</code> numbers + from the reserved range. We'll randomly choose 305. + + <p> + <center> + <table border align=center width="100%"> + <tr> + <td> + <p><code><pre> + +#define FILTER_MD5 305 +herr_t status = H5Zregister(FILTER_MD5, "md5 checksum", md5_filter); + </pre></code> + </td> + </tr> + </table> + </center> + + <p>Now we can use the filter in a pipeline. We could have added + the filter to the pipeline before defining or registering the + filter as long as the filter was defined and registered by time + we tried to use it (if the filter is marked as optional then we + could have used it without defining it and the library would + have automatically removed it from the pipeline for each chunk + written before the filter was defined and registered). + + <p> + <center> + <table border align=center width="100%"> + <tr> + <td> + <p><code><pre> + +hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE); +hsize_t chunk_size[3] = {10,10,10}; +H5Pset_chunk(dcpl, 3, chunk_size); +H5Pset_filter(dcpl, FILTER_MD5, 0, 0, NULL); +hid_t dset = H5Dcreate(file, "dset", H5T_NATIVE_DOUBLE, space, dcpl); + </pre></code> + </td> + </tr> + </table> + </center> + + <h2>6. Filter Diagnostics</h2> + + <p>If the library is compiled with debugging turned on for the H5Z + layer (usually as a result of <code>configure + --enable-debug=z</code>) then filter statistics are printed when + the application exits normally or the library is closed. The + statistics are written to the standard error stream and include + two lines for each filter that was used: one for input and one + for output. The following fields are displayed: + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Method</td> + <td>This is the name of the method as defined with + <code>H5Zregister()</code> with the charaters + "< or ">" prepended to indicate + input or output.</td> + </tr> + + <tr valign=top> + <td>Total</td> + <td>The total number of bytes processed by the filter + including errors. This is the maximum of the + <em>nbytes</em> argument or the return value. + </tr> + + <tr valign=top> + <td>Errors</td> + <td>This field shows the number of bytes of the Total + column which can be attributed to errors.</td> + </tr> + + <tr valign=top> + <td>User, System, Elapsed</td> + <td>These are the amount of user time, system time, and + elapsed time in seconds spent in the filter function. + Elapsed time is sensitive to system load. These times + may be zero on operating systems that don't support the + required operations.</td> + </tr> + + <tr valign=top> + <td>Bandwidth</td> + <td>This is the filter bandwidth which is the total + number of bytes processed divided by elapsed time. + Since elapsed time is subject to system load the + bandwidth numbers cannot always be trusted. + Furthermore, the bandwidth includes bytes attributed to + errors which may significanly taint the value if the + function is able to detect errors without much + expense.</td> + </tr> + </table> + </center> + + <p> + <center> + <table border align=center width="100%"> + <caption align=bottom> + <b>Example: Filter Statistics</b> + </caption> + <tr> + <td> + <p><code><pre> +H5Z: filter statistics accumulated over life of library: + Method Total Errors User System Elapsed Bandwidth + ------ ----- ------ ---- ------ ------- --------- + >deflate 160000 40000 0.62 0.74 1.33 117.5 kBs + <deflate 120000 0 0.11 0.00 0.12 1.000 MBs + </pre></code> + </td> + </tr> + </table> + </center> + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Fri Apr 17 13:39:35 EDT 1998 --> +<!-- hhmts start --> +Last modified: Tue Aug 4 16:04:43 EDT 1998 +<!-- hhmts end --> + </body> +</html> |