From 32295ad53dd29e799cef007e05ca0368f0c1ca1d Mon Sep 17 00:00:00 2001 From: Robb Matzke Date: Wed, 5 Aug 1998 17:23:51 -0500 Subject: [svn-r570] *** empty log message *** --- doc/html/Compression.html | 409 ---------------------------------------- doc/html/Filters.html | 463 ++++++++++++++++++++++++++++++++++++++++++++++ src/H5public.h | 2 +- 3 files changed, 464 insertions(+), 410 deletions(-) delete mode 100644 doc/html/Compression.html create mode 100644 doc/html/Filters.html diff --git a/doc/html/Compression.html b/doc/html/Compression.html deleted file mode 100644 index c3a2a45..0000000 --- a/doc/html/Compression.html +++ /dev/null @@ -1,409 +0,0 @@ - - - - Compression - - - -

Compression

- -

1. Introduction

- -

HDF5 supports compression of raw data by compression methods - built into the library or defined by an application. A - compression method is associated with a dataset when the dataset - is created and is applied independently to each storage chunk of - the dataset. - - The dataset must use the H5D_CHUNKED storage - layout. The library doesn't support compression for contiguous - datasets because of the difficulty of implementing random access - for partial I/O, and compact dataset compression is not - supported because it wouldn't produce significant results. - -

2. Supported Compression Methods

- -

The library identifies compression methods with small - integers, with values less than 16 reserved for use by NCSA and - values between 16 and 255 (inclusive) available for general - use. This range may be extended in the future if it proves to - be too small. - -

-

- - - - - - - - - - - - - - - - - - - - - - - - - -
Method NameDescription
H5Z_NONEThe default is to not use compression. Specifying - H5Z_NONE as the compression method results - in better perfomance than writing a function that just - copies data because the library's I/O pipeline - recognizes this method and is able to short circuit - parts of the pipeline.
H5Z_DEFLATEThe deflate method is the algorithm used by - the GNU gzipprogram. It's a combination of - a Huffman encoding followed by a 1977 Lempel-Ziv (LZ77) - dictionary encoding. The aggressiveness of the - compression can be controlled by passing an integer value - to the compressor with H5Pset_deflate() - (see below). In order for this compression method to be - used, the HDF5 library must be configured and compiled - in the presence of the GNU zlib version 1.1.2 or - later.
H5Z_RES_NThese compression methods (where N is in the - range two through 15, inclusive) are reserved by NCSA - for future use.
Values of N between 16 and 255, inclusiveThese values can be used to represent application-defined - compression methods. We recommend that methods under - testing should be in the high range and when a method is - about to be published it should be given a number near - the low end of the range (or even below 16). Publishing - the compression method and its numeric ID will make a - file sharable.
-
- -

Setting the compression for a dataset to a method which was - not compiled into the library and/or not registered by the - application is allowed, but writing to such a dataset will - silently not compress the data. Reading a compressed - dataset for a method which is not available will result in - errors (specifically, H5Dread() will return a - negative value). The errors will be displayed in the - compression statistics if the library was compiled with - debugging turned on for the "z" package. See the - section on diagnostics below for more details. - -

3. Application-Defined Methods

- -

Compression methods 16 through 255 can be defined by an - application. As mentioned above, methods that have not been - released should use high numbers in that range while methods - that have been published will be assigned an official number in - the low region of the range (possibly less than 16). Users - should be aware that using unpublished compression methods - results in unsharable files. - -

A compression method has two halves: one have handles - compression and the other half handles uncompression. The - halves are implemented as functions - method_c and - method_u respectively. One should not use - the names compress or uncompress since - they are likely to conflict with other compression libraries - (like the GNU zlib). - -

Both the method_c and - method_u functions take the same arguments - and return the same values. They are defined with the type: - -

-
typedef size_t (*H5Z_func_t)(unsigned int - flags, size_t cd_size, const void - *client_data, size_t src_nbytes, const - void *src, size_t dst_nbytes, void - *dst/*out*/) -
The flags are an 8-bit vector which is stored in - the file and which is defined when the compression method is - defined. The client_data is a pointer to - cd_size bytes of configuration data which is also - stored in the file. The function compresses or uncompresses - src_nbytes from the source buffer src into - at most dst_nbytes of the result buffer dst. - The function returns the number of bytes written to the result - buffer or zero if an error occurs. But if a result buffer - overrun occurs the function should return a value at least as - large as dst_size (the uncompressor will see an - overrun only for corrupt data). -
- -

The application associates the pair of functions with a name - and a method number by calling H5Zregister(). This - function can also be used to remove a compression method from - the library by supplying null pointers for the functions. - -

-
herr_t H5Zregister (H5Z_method_t method, - const char *name, H5Z_func_t method_c, - H5Z_func_t method_u) -
The pair of functions to be used for compression - (method_c) and uncompression (method_u) are - associated with a short name used for debugging and a - method number in the range 16 through 255. This - function can be called as often as desired for a particular - compression method with each call replacing the information - stored by the previous call. Sometimes it's convenient to - supply only one half of the compression, for instance in an - application that opens files for read-only. Compression - statistics for the method are accumulated across calls to this - function. -
- -

-

- - - - - -

Example: Registering an - Application-Defined Compression Method

-

Here's a simple-minded "compression" method - that just copies the input value to the output. It's - similar to the H5Z_NONE method but - slower. Compression and uncompression are performed - by the same function. - -

-size_t
-bogus (unsigned int flags,
-       size_t cd_size, const void *client_data,
-       size_t src_nbytes, const void *src,
-       size_t dst_nbytes, void *dst/*out*/)
-{
-    memcpy (dst, src, src_nbytes);
-    return src_nbytes;
-}
-	      
- -

The function could be registered as method 250 as - follows: - -

-#define H5Z_BOGUS 250
-H5Zregister (H5Z_BOGUS, "bogus", bogus, bogus);
-	      
- -

The function can be unregistered by saying: - -

-H5Zregister (H5Z_BUGUS, "bogus", NULL, NULL);
-	      
- -

Notice that we kept the name "bogus" even - though we unregistered the functions that perform the - compression and uncompression. This makes compression - statistics more understandable when they're printed. -

-
- -

4. Enabling Compression for a Dataset

- -

If a dataset is to be compressed then the compression - information must be specified when the dataset is created since - once a dataset is created compression parameters cannot be - adjusted. The compression is specified through the dataset - creation property list (see H5Pcreate()). - -

-
herr_t H5Pset_deflate (hid_t plist, int - level) -
The compression method for dataset creation property list - plist is set to H5Z_DEFLATE and the - aggression level is set to level. The level - must be a value between one and nine, inclusive, where one - indicates no (but fast) compression and nine is aggressive - compression. - -

-
int H5Pget_deflate (hid_t plist) -
If dataset creation property list plist is set to - use H5Z_DEFLATE compression then this function - will return the aggression level, an integer between one and - nine inclusive. If plist isn't a valid dataset - creation property list or it isn't set to use the deflate - method then a negative value is returned. - -

-
herr_t H5Pset_compression (hid_t plist, - H5Z_method_t method, unsigned int flags, - size_t cd_size, const void *client_data) -
This is a catch-all function for defining compresion methods - and is intended to be called from a wrapper such as - H5Pset_deflate(). The dataset creation property - list plist is adjusted to use the specified - compression method. The flags is an 8-bit vector - which is stored in the file as part of the compression message - and passed to the compress and uncompress functions. The - client_data is a byte array of length - cd_size which is copied to the file and passed to the - compress and uncompress methods. - -

-
H5Z_method_t H5Pget_compression (hid_t plist, - unsigned int *flags, size_t *cd_size, void - *client_data) -
This is a catch-all function for querying the compression - method associated with dataset creation property list - plist and is intended to be called from a wrapper - function such as H5Pget_deflate(). The - compression method (or a negative value on error) is returned - by value, and compression flags and client data is returned by - argument. The application should allocate the - client_data and pass its size as the - cd_size. On return, cd_size will contain - the actual size of the client data. If client_data - is not large enough to hold the entire client data then - cd_size bytes are copied into client_data - and cd_size is set to the total size of the client - data, a value larger than the original. -
- -

It is possible to set the compression to a method which hasn't - been defined with H5Zregister() and which isn't - supported as a predefined method (for instance, setting the - method to H5Z_DEFLATE when the GNU zlib isn't - available). If that happens then data will be written to the - file in its uncompressed form and the compression statistics - will show failures for the compression. - -

-

- - - - - -

Example: Statistics for an - Unsupported Compression Method

-

If an application attempts to use an unsupported - method then the compression statistics will show large - numbers of compression errors and no data - uncompressed. - -

-H5Z: compression statistics accumulated over life of library:
-   Method      Total  Overrun  Errors  User  System  Elapsed Bandwidth
-   ------      -----  -------  ------  ----  ------  ------- ---------
-   deflate-c  160000        0  160000  0.00    0.01     0.01 1.884e+07
-   deflate-u       0        0       0  0.00    0.00     0.00       NaN
-	      
- -

This example is from a program that tried to use - H5Z_DEFLATE on a system that didn't have - the GNU zlib to write to a dataset and then read the - result. The read and write both succeeded but the - data was not compressed. -

-
- -

5. Compression Diagnostics

- -

If the library is compiled with debugging turned on for the H5Z - layer (usually as a result of configure --enable-debug=z) - then statistics about data compression are printed when the - application exits normally or the library is closed. The - statistics are written to the standard error stream and include - two lines for each compression method that was used: the first - line shows compression statistics while the second shows - uncompression statistics. The following fields are displayed: - -

-

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Field NameDescription
MethodThis is the name of the method as defined with - H5Zregister() with the letters - "-c" or "-u" appended to indicate - compression or uncompression.
TotalThe total number of bytes compressed or decompressed - including buffer overruns and errors. Bytes of - non-compressed data are counted.
OverrunDuring compression, if the algorithm causes the result - to be at least as large as the input then a buffer - overrun error occurs. This field shows the total number - of bytes from the Total column which can be attributed to - overruns. Overruns for decompression can only happen if - the data has been corrupted in some way and will result - in failure of H5Dread().
ErrorsIf an error occurs during compression the data is - stored in it's uncompressed form; and an error during - uncompression causes H5Dread() to return - failure. This field shows the number of bytes of the - Total column which can be attributed to errors.
User, System, ElapsedThese are the amount of user time, system time, and - elapsed time in seconds spent by the library to perform - compression. Elapsed time is sensitive to system - load. These times may be zero on operating systems that - don't support the required operations.
BandwidthThis is the compression bandwidth which is the total - number of bytes divided by elapsed time. Since elapsed - time is subject to system load the bandwidth numbers - cannot always be trusted. Furthermore, the bandwidth - includes overrun and error bytes which may significanly - taint the value.
-
- -

-

- - - - - -

Example: Compression - Statistics

-

-H5Z: compression statistics accumulated over life of library:
-   Method      Total  Overrun  Errors  User  System  Elapsed Bandwidth
-   ------      -----  -------  ------  ----  ------  ------- ---------
-   deflate-c  160000      200       0  0.62    0.74     1.33 1.204e+05
-   deflate-u  120000        0       0  0.11    0.00     0.12 9.885e+05
-	      
-
-
- -
-
Robb Matzke
- - -Last modified: Fri Apr 17 16:15:21 EDT 1998 - - - diff --git a/doc/html/Filters.html b/doc/html/Filters.html new file mode 100644 index 0000000..b9785c8 --- /dev/null +++ b/doc/html/Filters.html @@ -0,0 +1,463 @@ + + + + Filters + + + +

Filters

+ + Note: Transient pipelines described in this document have not + been implemented. + +

1. Introduction

+ +

HDF5 allows chunked data to pass through user-defined filters + on the way to or from disk. The filters operate on chunks of an + H5D_CHUNKED dataset can be arranged in a pipeline + so output of one filter becomes the input of the next filter. + +

Each filter has a two-byte identification number (type + H5Z_filter_t) allocated by NCSA and can also be + passed application-defined integer resources to control its + behavior. Each filter also has an optional ASCII comment + string. + +

+

+ + + + + + + + + + + + + + + + + + + + + +
+ Values for H5Z_filter_t +
ValueDescription
0-255These values are reserved for filters predefined and + registered by the HDF5 library and of use to the general + public. They are described in a separate section + below.
256-511Filter numbers in this range are used for testing only + and can be used temporarily by any organization. No + attempt is made to resolve numbering conflicts since all + definitions are by nature temporary.
512-65535Reserved for future assignment. Please contact the + HDF5 development + team to reserve a value or range of values for + use by your filters.
+
+ +

2. Defining and Querying the Filter Pipeline

+ +

Two types of filters can be applied to raw data I/O: permanent + filters and transient filters. The permanent filter pipeline is + defned when the dataset is created while the transient pipeline + is defined for each I/O operation. During an + H5Dwrite() the transient filters are applied first + in the order defined and then the permanent filters are applied + in the order defined. For an H5Dread() the + opposite order is used: permanent filters in reverse order, then + transient filters in reverse order. An H5Dread() + must result in the same amount of data for a chunk as the + original H5Dwrite(). + +

The permanent filter pipeline is defined by calling + H5Pset_filter() for a dataset creation property + list while the transient filter pipeline is defined by calling + that function for a dataset transfer property list. + +

+
herr_t H5Pset_filter (hid_t plist, + H5Z_filter_t filter, unsigned int flags, + size_t cd_nelmts, const unsigned int + cd_values[]) +
This function adds the specified filter and + corresponding properties to the end of the transient or + permanent output filter pipeline (depending on whether + plist is a dataset creation or dataset transfer + property list). The flags argument specifies certain + general properties of the filter and is documented below. The + cd_values is an array of cd_nelmts integers + which are auxiliary data for the filter. The integer values + will be stored in the dataset object header as part of the + filter information. + +

+
int H5Pget_nfilters (hid_t plist) +
This function returns the number of filters defined in the + permanent or transient filter pipeline depending on whether + plist is a dataset creation or dataset transfer + property list. In each pipeline the filters are numbered from + 0 through N-1 where N is the value returned + by this function. During output to the file the filters of a + pipeline are applied in increasing order (the inverse is true + for input). Zero is returned if there are no filters in the + pipeline and a negative value is returned for errors. + +

+
H5Z_filter_t H5Pget_filter (hid_t plist, + int filter_number, unsigned int *flags, + size_t *cd_nelmts, unsigned int + *cd_values) +
This is the query counterpart of + H5Pset_filter() and returns information about a + particular filter number in a permanent or transient pipeline + depending on whether plist is a dataset creation or + dataset transfer property list. On input, cd_nelmts + indicates the number of entries in the cd_values + array allocated by the caller while on exit it contains the + number of values defined by the filter. The + filter_number should be a value between zero and + N-1 as described for H5Pget_nfilters() + and the function will return failure (a negative value) if the + filter number is out of range. +
+ +

The flags argument to the functions above is a bit vector of + the following fields: + +

+

+ + + + + + + + + + + + +
+ Values for the flags argument +
ValueDescription
H5Z_FLAG_OPTIONALIf this bit is set then the filter is optional. If + the filter fails (see below) during an + H5Dwrite() operation then the filter is + just excluded from the pipeline for the chunk for which + it failed; the filter will not participate in the + pipeline during an H5Dread() of the chunk. + This is commonly used for compression filters: if the + compression result would be larger than the input then + the compression filter returns failure and the + uncompressed data is stored in the file. If this bit is + clear and a filter fails then the + H5Dwrite() or H5Dread() also + fails.
+
+ +

3. Defining Filters

+ +

Each filter is bidirectional, handling both input and output to + the file, and a flag is passed to the filter to indicate the + direction. In either case the filter reads a chunk of data from + a buffer, usually performs some sort of transformation on the + data, places the result in the same or new buffer, and returns + the buffer pointer and size to the caller. If something goes + wrong the filter should return zero to indicate a failure. + +

During output, a filter that fails or isn't defined and is + marked as optional is silently excluded from the pipeline and + will not be used when reading that chunk of data. A required + filter that fails or isn't defined causes the entire output + operation to fail. During input, any filter that has not been + excluded from the pipeline during output and fails or is not + defined will cause the entire input operation to fail. + +

Filters are defined in two phases. The first phase is to + define a function to act as the filter and link the function + into the application. The second phase is to register the + function, associating the function with an + H5Z_filter_t identification number and a comment. + +

+
typedef size_t (*H5Z_func_t)(unsigned int + flags, size_t cd_nelmts, unsigned int + *cd_values, size_t nbytes, size_t + *buf_size, void **buf) +
The flags, cd_nelmts, and + cd_values are the same as for the + H5Pset_filter() function with the additional flag + H5Z_FLAG_REVERSE which is set when the filter is + called as part of the input pipeline. The input buffer is + pointed to by *buf and has a total size of + *buf_size bytes but only nbytes are valid + data. The filter should perform the transformation in place if + possible and return the number of valid bytes or zero for + failure. If the transformation cannot be done in place then + the filter should allocate a new buffer with + malloc() and assign it to *buf, + assigning the allocated size of that buffer to + *buf_size. The old buffer should be freed + by calling free(). + +

+
herr_t H5Zregister (H5Z_filter_t filter_id, + const char *comment, H5Z_func_t + filter) +
The filter function is associated with a filter + number and a short ASCII comment which will be stored in the + hdf5 file if the filter is used as part of a permanent + pipeline during dataset creation. +
+ + +

4. Predefined Filters

+ +

If GNU zlib version 1.1.2 or later was found + during configuration then the library will define a filter whose + H5Z_filter_t number is + H5Z_FILTER_DEFLATE. Since this compression method + has the potential for generating compressed data which is larger + than the original, the H5Z_FLAG_OPTIONAL flag + should be turned on so such cases can be handled gracefully by + storing the original data instead of the compressed data. The + cd_nvalues should be one with cd_value[0] + being a compression agression level between zero and nine, + inclusive (zero is the fastest compression while nine results in + the best compression ratio). + +

A convenience function for adding the + H5Z_FILTER_DEFLATE filter to a pipeline is: + +

+
herr_t H5Pset_deflate (hid_t plist, unsigned + aggression) +
The deflate compression method is added to the end of the + permanent or transient filter pipeline depending on whether + plist is a dataset creation or dataset transfer + property list. The aggression is a number between + zero and nine (inclusive) to indicate the tradeoff between + speed and compression ratio (zero is fastest, nine is best + ratio). +
+ +

Even if the GNU zlib isn't detected during + configuration the application can define + H5Z_FILTER_DEFLATE as a permanent filter. If the + filter is marked as optional (as with + H5Pset_deflate()) then it will always fail and be + automatically removed from the pipeline. Applications that read + data will fail only if the data is actually compressed; they + won't fail if H5Z_FILTER_DEFLATE was part of the + permanent output pipeline but was automatically excluded because + it didn't exist when the data was written. + +

5. Example

+ +

This example shows how to define and register a simple filter + that adds a checksum capability to the data stream. + +

The function that acts as the filter always returns zero + (failure) if the md5() function was not detected at + configuration time (left as an excercise for the reader). + Otherwise the function is broken down to an input and output + half. The output half calculates a checksum, increases the size + of the output buffer if necessary, and appends the checksum to + the end of the buffer. The input half calculates the checksum + on the first part of the buffer and compares it to the checksum + already stored at the end of the buffer. If the two differ then + zero (failure) is returned, otherwise the buffer size is reduced + to exclude the checksum. + +

+

+ + + + +
+

+
+size_t
+md5_filter(unsigned int flags, size_t cd_nelmts, unsigned int *cd_values,
+           size_t nbytes, size_t *buf_size, void **buf)
+{
+#ifdef HAVE_MD5
+    unsigned char       cksum[16];
+
+    if (flags & H5Z_REVERSE) {
+        /* Input */
+        assert(nbytes>=16);
+        md5(nbytes-16, *buf, cksum);
+
+        /* Compare */
+        if (memcmp(cksum, (char*)(*buf)+nbytes-16, 16)) {
+            return 0; /*fail*/
+        }
+
+        /* Strip off checksum */
+        return nbytes-16;
+            
+    } else {
+        /* Output */
+        md5(nbytes, *buf, cksum);
+
+        /* Increase buffer size if necessary */
+        if (nbytes+16>*buf_size) {
+            *buf_size = nbytes + 16;
+            *buf = realloc(*buf, *buf_size);
+        }
+
+        /* Append checksum */
+        memcpy((char*)(*buf)+nbytes, cksum, 16);
+        return nbytes+16;
+    }
+#else
+    return 0; /*fail*/
+#endif
+}
+	      
+
+
+ +

Once the filter function is defined it must be registered so + the HDF5 library knows about it. Since we're testing this + filter we choose one of the H5Z_filter_t numbers + from the reserved range. We'll randomly choose 305. + +

+

+ + + + +
+

+
+#define FILTER_MD5 305
+herr_t status = H5Zregister(FILTER_MD5, "md5 checksum", md5_filter);
+	      
+
+
+ +

Now we can use the filter in a pipeline. We could have added + the filter to the pipeline before defining or registering the + filter as long as the filter was defined and registered by time + we tried to use it (if the filter is marked as optional then we + could have used it without defining it and the library would + have automatically removed it from the pipeline for each chunk + written before the filter was defined and registered). + +

+

+ + + + +
+

+
+hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE);
+hsize_t chunk_size[3] = {10,10,10};
+H5Pset_chunk(dcpl, 3, chunk_size);
+H5Pset_filter(dcpl, FILTER_MD5, 0, 0, NULL);
+hid_t dset = H5Dcreate(file, "dset", H5T_NATIVE_DOUBLE, space, dcpl);
+	      
+
+
+ +

6. Filter Diagnostics

+ +

If the library is compiled with debugging turned on for the H5Z + layer (usually as a result of configure + --enable-debug=z) then filter statistics are printed when + the application exits normally or the library is closed. The + statistics are written to the standard error stream and include + two lines for each filter that was used: one for input and one + for output. The following fields are displayed: + +

+

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Field NameDescription
MethodThis is the name of the method as defined with + H5Zregister() with the charaters + "< or ">" prepended to indicate + input or output.
TotalThe total number of bytes processed by the filter + including errors. This is the maximum of the + nbytes argument or the return value. +
ErrorsThis field shows the number of bytes of the Total + column which can be attributed to errors.
User, System, ElapsedThese are the amount of user time, system time, and + elapsed time in seconds spent in the filter function. + Elapsed time is sensitive to system load. These times + may be zero on operating systems that don't support the + required operations.
BandwidthThis is the filter bandwidth which is the total + number of bytes processed divided by elapsed time. + Since elapsed time is subject to system load the + bandwidth numbers cannot always be trusted. + Furthermore, the bandwidth includes bytes attributed to + errors which may significanly taint the value if the + function is able to detect errors without much + expense.
+
+ +

+

+ + + + + +
+ Example: Filter Statistics +
+

+H5Z: filter statistics accumulated over life of library:
+   Method     Total  Errors  User  System  Elapsed Bandwidth
+   ------     -----  ------  ----  ------  ------- ---------
+   >deflate  160000   40000  0.62    0.74     1.33 117.5 kBs
+   <deflate  120000       0  0.11    0.00     0.12 1.000 MBs
+	      
+
+
+ +
+
Robb Matzke
+ + +Last modified: Tue Aug 4 16:04:43 EDT 1998 + + + diff --git a/src/H5public.h b/src/H5public.h index 136328f..5647e36 100644 --- a/src/H5public.h +++ b/src/H5public.h @@ -27,7 +27,7 @@ /* Version numbers */ #define H5_VERS_MAJOR 1 /* For major interface/format changes */ #define H5_VERS_MINOR 0 /* For minor interface/format changes */ -#define H5_VERS_RELEASE 51 /* For tweaks, bug-fixes, or development */ +#define H5_VERS_RELEASE 58 /* For tweaks, bug-fixes, or development */ #define H5check() H5vers_check(H5_VERS_MAJOR,H5_VERS_MINOR, \ H5_VERS_RELEASE) -- cgit v0.12