diff options
author | Frank Baker <fbaker@hdfgroup.org> | 2001-07-11 22:01:45 (GMT) |
---|---|---|
committer | Frank Baker <fbaker@hdfgroup.org> | 2001-07-11 22:01:45 (GMT) |
commit | 7c706d9d1447319ef86f4725344f02dae3db7bb9 (patch) | |
tree | 4ecc4d3562c53ea6e802ce0eeaf85c22cafd54eb /doc/html/Performance.html | |
parent | 4b218c6a58dade580bbe876bec1ed49d1ede1c55 (diff) | |
download | hdf5-7c706d9d1447319ef86f4725344f02dae3db7bb9.zip hdf5-7c706d9d1447319ef86f4725344f02dae3db7bb9.tar.gz hdf5-7c706d9d1447319ef86f4725344f02dae3db7bb9.tar.bz2 |
[svn-r4193] Purpose:
New section -- "Freespace Management"
Description:
Added "Freespace Management" section.
Minor formatting.
Platforms tested:
IE 5
Diffstat (limited to 'doc/html/Performance.html')
-rw-r--r-- | doc/html/Performance.html | 102 |
1 files changed, 95 insertions, 7 deletions
diff --git a/doc/html/Performance.html b/doc/html/Performance.html index f3c3a28..36accbf 100644 --- a/doc/html/Performance.html +++ b/doc/html/Performance.html @@ -58,12 +58,100 @@ <h2>2. Dataset Chunking</h2> - Appropriate dataset chunking can make a siginificant difference - in HDF5 performance. This topic is discussed in - <a href="Chunking.html">Dataset Chunking Issues</a> elsewhere - in this <cite>User's Guide</cite>. - - <h2>3. Use of the Pablo Instrumentation of HDF5</h2> + Appropriate dataset chunking can make a siginificant difference + in HDF5 performance. This topic is discussed in + <a href="Chunking.html">Dataset Chunking Issues</a> elsewhere + in this <cite>User's Guide</cite>. + + <h2>3. Freespace Management</h2> + + <p>HDF5 does not yet manage freespace as effectively as it might. + While a file is opened, the library actively tracks and re-uses + <em>freespace</em>, i.e., space that is freed (or released) + during the run. + But the library does not yet manage freespace across the + closing and reopening of a file; when a file is closed, + all knowledge of available freespace is lost. + What was freespace becomes an unusable <em>hole</em> in the file. + + <p>There are several circumstances that can result in freespace + in an HDF5 file: + <ul> + <li>Reading then rewriting a dataset or compressed dataset + chunk.<sup><a href="#footcchunk">1</a></sup> + <ul> + <li>If the rewritten dataset or compressed chunk is the same + size as or smaller than the original, it will be written + to the same file location. + <li>If, however, the dataset or compressed chunk is larger + than the original, it will be written contiguously elsewhere + in the file, leaving freespace at the original location. + <li>If the rewritten dataset or compressed chunk is + substantially smaller than the original, the remaining + space will be released and identified as freespace. + </ul> + <li>Deleting (or unlinking) a dataset or group. + <ul> + <li>If an object, such as a dataset, group, or named datatype, + is deleted (normally with <code>H5Gunlink</code>), + the space previously occupied by the object is released + and identified as freespace. + </ul> + </ul> + + <p>As stated above, freespace is not managed across the + closing and reopening of an HDF5 file; file space that was + known freespace while the file remained open becomes an + inaccessible hole when the file is closed. + Thus, if a file is often closed and reopened, datasets + frequently rewritten, or groups and/or datasets frequently + added and deleted, that file can develop large numbers of + holes and grow unnecessarily large. This can, in turn, + seriously impair application or library performance + as the file ages. + + <p>An <code>h5pack</code> utility would enable <em>packing</em> + a file to remove the holes, but writing such a utility to + universally pack the file correctly is a complex task and the + HDF5 development team has not to date had the resources to + complete the task. + + <p>For application developers or researchers who find themselves + working with files that become bloated in this manner, there + are, at this time, two remedies: + <ul> + <li><code>H5view</code>, an HDF5 Java tool, allows the user + to open a file and, using the <code>Save As...</code> feature, + save the file under a new filename. The new file can then + be closed and will be a packed version of the original file. + This approach is reasonably reliable, but with two caveats: + <ul> + <li>It is not automated. + <li>This ability is a side-effect of the tool's design; + it was not designed for this purpose and this approach + to file packing has not been exhaustively tested. + </ul> + <li>An application developer or researcher can write a utility + that is tuned to their data and file structures. This + untility can then read in a file, copy the structures and + datasets to a new file, and write the new file to storage. + This will eliminate the holes, making the new file a + fully-packed version of the original file. + </ul> + + <a name="footcchunk"> + <p></a> + <sup>1</sup> + <font size=-1> + This is a problem only with compressed chunks. + The compression ratio of data is highly dependent on the data + itself; regardless of whether the <em>size</em> of the data + changes, the size of the compressed data change substantially + as the data changes. Uncompressed chunks do not vary in size, + so this issue does not arise. + </font> + + <h2>4. Use of the Pablo Instrumentation of HDF5</h2> Pablo HDF5 Trace software provides a means of measuring the performance of programs using HDF5. @@ -147,7 +235,7 @@ <!-- Created: Thu Oct 14 16:46:00 CDT 1999 --> <!-- hhmts start --> -Last modified: 14 October 1999 +Last modified: 11 July 2001 <!-- hhmts end --> <br> |