<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/> <meta http-equiv="X-UA-Compatible" content="IE=9"/> <meta name="generator" content="Doxygen 1.10.0"/> <meta name="viewport" content="width=device-width, initial-scale=1"/> <title>HDF5: HDF5 Filters</title> <link href="tabs.css" rel="stylesheet" type="text/css"/> <script type="text/javascript" src="jquery.js"></script> <script type="text/javascript" src="dynsections.js"></script> <link href="navtree.css" rel="stylesheet" type="text/css"/> <script type="text/javascript" src="resize.js"></script> <script type="text/javascript" src="navtreedata.js"></script> <script type="text/javascript" src="navtree.js"></script> <script type="text/javascript" src="cookie.js"></script> <link href="search/search.css" rel="stylesheet" type="text/css"/> <script type="text/javascript" src="search/searchdata.js"></script> <script type="text/javascript" src="search/search.js"></script> <script type="text/javascript"> /* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */ $(function() { init_search(); }); /* @license-end */ </script> <link href="doxygen.css" rel="stylesheet" type="text/css" /> <link href="hdf5doxy.css" rel="stylesheet" type="text/css"> <!-- <link href="hdf5doxy.css" rel="stylesheet" type="text/css"/> --> <script type="text/javascript" src="hdf5_navtree_hacks.js"></script> </head> <body> <div style="background:#FFDDDD;font-size:120%;text-align:center;margin:0;padding:5px">Please, help us to better serve our user community by answering the following short survey: <a href="https://www.hdfgroup.org/website-survey/">https://www.hdfgroup.org/website-survey/</a></div> <div id="top"><!-- do not remove this div, it is closed by doxygen! --> <div id="titlearea"> <table cellspacing="0" cellpadding="0"> <tbody> <tr style="height: 56px;"> <td id="projectlogo"><img alt="Logo" src="HDFG-logo.png"/></td> <td id="projectalign" style="padding-left: 0.5em;"> <div id="projectname"><a href="https://www.hdfgroup.org">HDF5</a>  <span id="projectnumber">1.15.0.2908dd1</span> </div> <div id="projectbrief">API Reference</div> </td> <td> <div id="MSearchBox" class="MSearchBoxInactive"> <span class="left"> <span id="MSearchSelect" onmouseover="return searchBox.OnSearchSelectShow()" onmouseout="return searchBox.OnSearchSelectHide()"> </span> <input type="text" id="MSearchField" value="" placeholder="Search" accesskey="S" onfocus="searchBox.OnSearchFieldFocus(true)" onblur="searchBox.OnSearchFieldFocus(false)" onkeyup="searchBox.OnSearchFieldChange(event)"/> </span><span class="right"> <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.svg" alt=""/></a> </span> </div> </td> </tr> </tbody> </table> </div> <!-- end header part --> <!-- Generated by Doxygen 1.10.0 --> <script type="text/javascript"> /* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */ var searchBox = new SearchBox("searchBox", "search/",'.html'); /* @license-end */ </script> </div><!-- top --> <div id="side-nav" class="ui-resizable side-nav-resizable"> <div id="nav-tree"> <div id="nav-tree-contents"> <div id="nav-sync" class="sync"></div> </div> </div> <div id="splitbar" style="-moz-user-select:none;" class="ui-resizable-handle"> </div> </div> <script type="text/javascript"> /* @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt MIT */ $(function(){initNavTree('_f_i_l_t_e_r.html',''); initResizable(); }); /* @license-end */ </script> <div id="doc-content"> <!-- window showing the filter options --> <div id="MSearchSelectWindow" onmouseover="return searchBox.OnSearchSelectShow()" onmouseout="return searchBox.OnSearchSelectHide()" onkeydown="return searchBox.OnSearchSelectKey(event)"> </div> <!-- iframe showing the search results (closed by default) --> <div id="MSearchResultsWindow"> <div id="MSearchResults"> <div class="SRPage"> <div id="SRIndex"> <div id="SRResults"></div> <div class="SRStatus" id="Loading">Loading...</div> <div class="SRStatus" id="Searching">Searching...</div> <div class="SRStatus" id="NoMatches">No Matches</div> </div> </div> </div> </div> <div><div class="header"> <div class="headertitle"><div class="title">HDF5 Filters</div></div> </div><!--header--> <div class="contents"> <div class="textblock"><html> <head> <title>Filters</title> <h1>Filters in HDF5</h1> <b>Note: Transient pipelines described in this document have not been implemented.</b> <h2>Introduction</h2> <p>HDF5 allows chunked data to pass through user-defined filters on the way to or from disk. The filters operate on chunks of an <code>H5D_CHUNKED</code> dataset can be arranged in a pipeline so output of one filter becomes the input of the next filter. </p><p>Each filter has a two-byte identification number (type <code>H5Z_filter_t</code>) allocated by The HDF Group and can also be passed application-defined integer resources to control its behavior. Each filter also has an optional ASCII comment string. </p> <table> <tbody><tr> <th>Values for <code>H5Z_filter_t</code></th> <th>Description</th> </tr> <tr valign="top"> <td><code>0-255</code></td> <td>These values are reserved for filters predefined and registered by the HDF5 library and of use to the general public. They are described in a separate section below.</td> </tr> <tr valign="top"> <td><code>256-511</code></td> <td>Filter numbers in this range are used for testing only and can be used temporarily by any organization. No attempt is made to resolve numbering conflicts since all definitions are by nature temporary.</td> </tr> <tr valign="top"> <td><code>512-65535</code></td> <td>Reserved for future assignment. Please contact the <a href="mailto:help@hdfgroup.org">HDF5 development team</a> to reserve a value or range of values for use by your filters.</td> </tr></tbody></table> <h2>Defining and Querying the Filter Pipeline</h2> <p>Two types of filters can be applied to raw data I/O: permanent filters and transient filters. The permanent filter pipeline is defined when the dataset is created while the transient pipeline is defined for each I/O operation. During an <code>H5Dwrite()</code> the transient filters are applied first in the order defined and then the permanent filters are applied in the order defined. For an <code>H5Dread()</code> the opposite order is used: permanent filters in reverse order, then transient filters in reverse order. An <code>H5Dread()</code> must result in the same amount of data for a chunk as the original <code>H5Dwrite()</code>. </p><p>The permanent filter pipeline is defined by calling <code>H5Pset_filter()</code> for a dataset creation property list while the transient filter pipeline is defined by calling that function for a dataset transfer property list. </p><dl> <dt><code>herr_t H5Pset_filter (hid_t <em>plist</em>, H5Z_filter_t <em>filter</em>, unsigned int <em>flags</em>, size_t <em>cd_nelmts</em>, const unsigned int <em>cd_values</em>[])</code> </dt><dd>This function adds the specified <em>filter</em> and corresponding properties to the end of the transient or permanent output filter pipeline (depending on whether <em>plist</em> is a dataset creation or dataset transfer property list). The <em>flags</em> argument specifies certain general properties of the filter and is documented below. The <em>cd_values</em> is an array of <em>cd_nelmts</em> integers which are auxiliary data for the filter. The integer values will be stored in the dataset object header as part of the filter information. </dd><dt><code>int H5Pget_nfilters (hid_t <em>plist</em>)</code> </dt><dd>This function returns the number of filters defined in the permanent or transient filter pipeline depending on whether <em>plist</em> is a dataset creation or dataset transfer property list. In each pipeline the filters are numbered from 0 through <em>N</em>-1 where <em>N</em> is the value returned by this function. During output to the file the filters of a pipeline are applied in increasing order (the inverse is true for input). Zero is returned if there are no filters in the pipeline and a negative value is returned for errors. </dd><dt><code>H5Z_filter_t H5Pget_filter (hid_t <em>plist</em>, int <em>filter_number</em>, unsigned int *<em>flags</em>, size_t *<em>cd_nelmts</em>, unsigned int *<em>cd_values</em>, size_t namelen, char name[])</code> </dt><dd>This is the query counterpart of <code>H5Pset_filter()</code> and returns information about a particular filter number in a permanent or transient pipeline depending on whether <em>plist</em> is a dataset creation or dataset transfer property list. On input, <em>cd_nelmts</em> indicates the number of entries in the <em>cd_values</em> array allocated by the caller while on exit it contains the number of values defined by the filter. The <em>filter_number</em> should be a value between zero and <em>N</em>-1 as described for <code>H5Pget_nfilters()</code> and the function will return failure (a negative value) if the filter number is out of range. If <em>name</em> is a pointer to an array of at least <em>namelen</em> bytes then the filter name will be copied into that array. The name will be null terminated if the <em>namelen</em> is large enough. The filter name returned will be the name appearing in the file or else the name registered for the filter or else an empty string. </dd></dl> <p>The flags argument to the functions above is a bit vector of the following fields: </p> <table> <tbody><tr> <th>Values for <em>flags</em></th> <th>Description</th> </tr> <tr valign="top"> <td><code>H5Z_FLAG_OPTIONAL</code></td> <td>If this bit is set then the filter is optional. If the filter fails (see below) during an <code>H5Dwrite()</code> operation then the filter is just excluded from the pipeline for the chunk for which it failed; the filter will not participate in the pipeline during an <code>H5Dread()</code> of the chunk. This is commonly used for compression filters: if the compression result would be larger than the input then the compression filter returns failure and the uncompressed data is stored in the file. If this bit is clear and a filter fails then the <code>H5Dwrite()</code> or <code>H5Dread()</code> also fails.</td> </tr> </tbody></table> <h2>Defining Filters</h2> <p>Each filter is bidirectional, handling both input and output to the file, and a flag is passed to the filter to indicate the direction. In either case the filter reads a chunk of data from a buffer, usually performs some sort of transformation on the data, places the result in the same or new buffer, and returns the buffer pointer and size to the caller. If something goes wrong the filter should return zero to indicate a failure. </p><p>During output, a filter that fails or isn't defined and is marked as optional is silently excluded from the pipeline and will not be used when reading that chunk of data. A required filter that fails or isn't defined causes the entire output operation to fail. During input, any filter that has not been excluded from the pipeline during output and fails or is not defined will cause the entire input operation to fail. </p><p>Filters are defined in two phases. The first phase is to define a function to act as the filter and link the function into the application. The second phase is to register the function, associating the function with an <code>H5Z_filter_t</code> identification number and a comment. </p><dl> <dt><code>typedef size_t (*H5Z_func_t)(unsigned int <em>flags</em>, size_t <em>cd_nelmts</em>, const unsigned int <em>cd_values</em>[], size_t <em>nbytes</em>, size_t *<em>buf_size</em>, void **<em>buf</em>)</code> </dt><dd>The <em>flags</em>, <em>cd_nelmts</em>, and <em>cd_values</em> are the same as for the <code>H5Pset_filter()</code> function with the additional flag <code>H5Z_FLAG_REVERSE</code> which is set when the filter is called as part of the input pipeline. The input buffer is pointed to by <em>*buf</em> and has a total size of <em>*buf_size</em> bytes but only <em>nbytes</em> are valid data. The filter should perform the transformation in place if possible and return the number of valid bytes or zero for failure. If the transformation cannot be done in place then the filter should allocate a new buffer with <code>malloc()</code> and assign it to <em>*buf</em>, assigning the allocated size of that buffer to <em>*buf_size</em>. The old buffer should be freed by calling <code>free()</code>. <br><br> </dd><dt><code>herr_t H5Zregister (H5Z_filter_t <em>filter_id</em>, const char *<em>comment</em>, H5Z_func_t <em>filter</em>)</code> </dt><dd>The <em>filter</em> function is associated with a filter number and a short ASCII comment which will be stored in the hdf5 file if the filter is used as part of a permanent pipeline during dataset creation. </dd></dl> <h2>Predefined Filters</h2> <p>If <code>zlib</code> version 1.1.2 or later was found during configuration then the library will define a filter whose <code>H5Z_filter_t</code> number is <code>H5Z_FILTER_DEFLATE</code>. Since this compression method has the potential for generating compressed data which is larger than the original, the <code>H5Z_FLAG_OPTIONAL</code> flag should be turned on so such cases can be handled gracefully by storing the original data instead of the compressed data. The <em>cd_nvalues</em> should be one with <em>cd_value[0]</em> being a compression aggression level between zero and nine, inclusive (zero is the fastest compression while nine results in the best compression ratio). </p><p>A convenience function for adding the <code>H5Z_FILTER_DEFLATE</code> filter to a pipeline is: </p><dl> <dt><code>herr_t H5Pset_deflate (hid_t <em>plist</em>, unsigned <em>aggression</em>)</code> </dt><dd>The deflate compression method is added to the end of the permanent or transient filter pipeline depending on whether <em>plist</em> is a dataset creation or dataset transfer property list. The <em>aggression</em> is a number between zero and nine (inclusive) to indicate the tradeoff between speed and compression ratio (zero is fastest, nine is best ratio). </dd></dl> <p>Even if the <code>zlib</code> isn't detected during configuration the application can define <code>H5Z_FILTER_DEFLATE</code> as a permanent filter. If the filter is marked as optional (as with <code>H5Pset_deflate()</code>) then it will always fail and be automatically removed from the pipeline. Applications that read data will fail only if the data is actually compressed; they won't fail if <code>H5Z_FILTER_DEFLATE</code> was part of the permanent output pipeline but was automatically excluded because it didn't exist when the data was written. </p><p><code>zlib</code> can be acquired from <code><a href="https://zlib.net"> https://zlib.net</a></code>. </p><h2>Example</h2> <p>This example shows how to define and register a simple filter that adds a checksum capability to the data stream. </p><p>The function that acts as the filter always returns zero (failure) if the <code>md5()</code> function was not detected at configuration time (left as an exercise for the reader). Otherwise the function is broken down to an input and output half. The output half calculates a checksum, increases the size of the output buffer if necessary, and appends the checksum to the end of the buffer. The input half calculates the checksum on the first part of the buffer and compares it to the checksum already stored at the end of the buffer. If the two differ then zero (failure) is returned, otherwise the buffer size is reduced to exclude the checksum. </p> <table> <tbody><tr> <td> <p><code></code></p><pre><code> size_t md5_filter(unsigned int flags, size_t cd_nelmts, const unsigned int cd_values[], size_t nbytes, size_t *buf_size, void **buf) { #ifdef HAVE_MD5 unsigned char cksum[16]; if (flags & H5Z_REVERSE) { /* Input */ assert(nbytes>=16); md5(nbytes-16, *buf, cksum); /* Compare */ if (memcmp(cksum, (char*)(*buf)+nbytes-16, 16)) { return 0; /*fail*/ } /* Strip off checksum */ return nbytes-16; } else { /* Output */ md5(nbytes, *buf, cksum); /* Increase buffer size if necessary */ if (nbytes+16>*buf_size) { *buf_size = nbytes + 16; *buf = realloc(*buf, *buf_size); } /* Append checksum */ memcpy((char*)(*buf)+nbytes, cksum, 16); return nbytes+16; } #else return 0; /*fail*/ #endif } </code></pre> </td> </tr> </tbody></table> <p>Once the filter function is defined it must be registered so the HDF5 library knows about it. Since we're testing this filter we choose one of the <code>H5Z_filter_t</code> numbers from the reserved range. We'll randomly choose 305. </p><p> </p> <table> <tbody><tr> <td> <p><code></code></p><pre><code> #define FILTER_MD5 305 herr_t status = H5Zregister(FILTER_MD5, "md5 checksum", md5_filter); </code></pre> </td> </tr> </tbody></table> <p>Now we can use the filter in a pipeline. We could have added the filter to the pipeline before defining or registering the filter as long as the filter was defined and registered by time we tried to use it (if the filter is marked as optional then we could have used it without defining it and the library would have automatically removed it from the pipeline for each chunk written before the filter was defined and registered). </p><p> </p> <table> <tbody><tr> <td> <p><code></code></p><pre><code> hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE); hsize_t chunk_size[3] = {10,10,10}; H5Pset_chunk(dcpl, 3, chunk_size); H5Pset_filter(dcpl, FILTER_MD5, 0, 0, NULL); hid_t dset = H5Dcreate(file, "dset", H5T_NATIVE_DOUBLE, space, dcpl); </code></pre> </td> </tr> </tbody></table> <h2>6. Filter Diagnostics</h2> <p>If the library is compiled with debugging turned on for the H5Z layer (usually as a result of <code>configure --enable-debug=z</code>) then filter statistics are printed when the application exits normally or the library is closed. The statistics are written to the standard error stream and include two lines for each filter that was used: one for input and one for output. The following fields are displayed: </p><p> </p> <table> <tbody><tr> <th>Field Name</th> <th>Description</th> </tr> <tr valign="top"> <td>Method</td> <td>This is the name of the method as defined with <code>H5Zregister()</code> with the characters "< or ">" prepended to indicate input or output.</td> </tr> <tr valign="top"> <td>Total</td> <td>The total number of bytes processed by the filter including errors. This is the maximum of the <em>nbytes</em> argument or the return value. </td></tr> <tr valign="top"> <td>Errors</td> <td>This field shows the number of bytes of the Total column which can be attributed to errors.</td> </tr> <tr valign="top"> <td>User, System, Elapsed</td> <td>These are the amount of user time, system time, and elapsed time in seconds spent in the filter function. Elapsed time is sensitive to system load. These times may be zero on operating systems that don't support the required operations.</td> </tr> <tr valign="top"> <td>Bandwidth</td> <td>This is the filter bandwidth which is the total number of bytes processed divided by elapsed time. Since elapsed time is subject to system load the bandwidth numbers cannot always be trusted. Furthermore, the bandwidth includes bytes attributed to errors which may significantly taint the value if the function is able to detect errors without much expense.</td> </tr> </tbody></table> <p> </p> <table> <caption align="bottom"> <b>Example: Filter Statistics</b> </caption> <tbody><tr> <td> <p><code></code></p><pre><code>H5Z: filter statistics accumulated ov= er life of library: Method Total Errors User System Elapsed Bandwidth ------ ----- ------ ---- ------ ------- --------- >deflate 160000 40000 0.62 0.74 1.33 117.5 kBs <deflate 120000 0 0.11 0.00 0.12 1.000 MBs </code></pre> </td> </tr> </tbody></table> <hr> <p><a name="fn1">Footnote 1:</a> Dataset chunks can be compressed through the use of filters. Developers should be aware that reading and rewriting compressed chunked data can result in holes in an HDF5 file. In time, enough such holes can increase the file size enough to impair application or library performance when working with that file. See <a href="https://support.hdfgroup.org/HDF5/doc1.6/Performance.html#Freespace"> Freespace Management</a> in the chapter <a href="https://support.hdfgroup.org/HDF5/doc1.6/Performance.html"> Performance Analysis and Issues</a>.</p> </html> </div></div><!-- contents --> </div><!-- PageDoc --> </div><!-- doc-content --> <!-- start footer part --> <div id="nav-path" class="navpath"><!-- id is needed for treeview function! --> <ul> <li class="footer">Generated by <a href="http://www.doxygen.org/index.html"> <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.10.0 </li> </ul> </div> </body> </html>