summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorFrank Baker <fbaker@hdfgroup.org>2001-07-11 22:01:45 (GMT)
committerFrank Baker <fbaker@hdfgroup.org>2001-07-11 22:01:45 (GMT)
commit7c706d9d1447319ef86f4725344f02dae3db7bb9 (patch)
tree4ecc4d3562c53ea6e802ce0eeaf85c22cafd54eb
parent4b218c6a58dade580bbe876bec1ed49d1ede1c55 (diff)
downloadhdf5-7c706d9d1447319ef86f4725344f02dae3db7bb9.zip
hdf5-7c706d9d1447319ef86f4725344f02dae3db7bb9.tar.gz
hdf5-7c706d9d1447319ef86f4725344f02dae3db7bb9.tar.bz2
[svn-r4193] Purpose:
New section -- "Freespace Management" Description: Added "Freespace Management" section. Minor formatting. Platforms tested: IE 5
-rw-r--r--doc/html/Performance.html102
1 files changed, 95 insertions, 7 deletions
diff --git a/doc/html/Performance.html b/doc/html/Performance.html
index f3c3a28..36accbf 100644
--- a/doc/html/Performance.html
+++ b/doc/html/Performance.html
@@ -58,12 +58,100 @@
<h2>2. Dataset Chunking</h2>
- Appropriate dataset chunking can make a siginificant difference
- in HDF5 performance. This topic is discussed in
- <a href="Chunking.html">Dataset Chunking Issues</a> elsewhere
- in this <cite>User's Guide</cite>.
-
- <h2>3. Use of the Pablo Instrumentation of HDF5</h2>
+ Appropriate dataset chunking can make a siginificant difference
+ in HDF5 performance. This topic is discussed in
+ <a href="Chunking.html">Dataset Chunking Issues</a> elsewhere
+ in this <cite>User's Guide</cite>.
+
+ <h2>3. Freespace Management</h2>
+
+ <p>HDF5 does not yet manage freespace as effectively as it might.
+ While a file is opened, the library actively tracks and re-uses
+ <em>freespace</em>, i.e., space that is freed (or released)
+ during the run.
+ But the library does not yet manage freespace across the
+ closing and reopening of a file; when a file is closed,
+ all knowledge of available freespace is lost.
+ What was freespace becomes an unusable <em>hole</em> in the file.
+
+ <p>There are several circumstances that can result in freespace
+ in an HDF5 file:
+ <ul>
+ <li>Reading then rewriting a dataset or compressed dataset
+ chunk.<sup><a href="#footcchunk">1</a></sup>
+ <ul>
+ <li>If the rewritten dataset or compressed chunk is the same
+ size as or smaller than the original, it will be written
+ to the same file location.
+ <li>If, however, the dataset or compressed chunk is larger
+ than the original, it will be written contiguously elsewhere
+ in the file, leaving freespace at the original location.
+ <li>If the rewritten dataset or compressed chunk is
+ substantially smaller than the original, the remaining
+ space will be released and identified as freespace.
+ </ul>
+ <li>Deleting (or unlinking) a dataset or group.
+ <ul>
+ <li>If an object, such as a dataset, group, or named datatype,
+ is deleted (normally with <code>H5Gunlink</code>),
+ the space previously occupied by the object is released
+ and identified as freespace.
+ </ul>
+ </ul>
+
+ <p>As stated above, freespace is not managed across the
+ closing and reopening of an HDF5 file; file space that was
+ known freespace while the file remained open becomes an
+ inaccessible hole when the file is closed.
+ Thus, if a file is often closed and reopened, datasets
+ frequently rewritten, or groups and/or datasets frequently
+ added and deleted, that file can develop large numbers of
+ holes and grow unnecessarily large. This can, in turn,
+ seriously impair application or library performance
+ as the file ages.
+
+ <p>An <code>h5pack</code> utility would enable <em>packing</em>
+ a file to remove the holes, but writing such a utility to
+ universally pack the file correctly is a complex task and the
+ HDF5 development team has not to date had the resources to
+ complete the task.
+
+ <p>For application developers or researchers who find themselves
+ working with files that become bloated in this manner, there
+ are, at this time, two remedies:
+ <ul>
+ <li><code>H5view</code>, an HDF5 Java tool, allows the user
+ to open a file and, using the <code>Save As...</code> feature,
+ save the file under a new filename. The new file can then
+ be closed and will be a packed version of the original file.
+ This approach is reasonably reliable, but with two caveats:
+ <ul>
+ <li>It is not automated.
+ <li>This ability is a side-effect of the tool's design;
+ it was not designed for this purpose and this approach
+ to file packing has not been exhaustively tested.
+ </ul>
+ <li>An application developer or researcher can write a utility
+ that is tuned to their data and file structures. This
+ untility can then read in a file, copy the structures and
+ datasets to a new file, and write the new file to storage.
+ This will eliminate the holes, making the new file a
+ fully-packed version of the original file.
+ </ul>
+
+ <a name="footcchunk">
+ <p></a>
+ <sup>1</sup>
+ <font size=-1>
+ This is a problem only with compressed chunks.
+ The compression ratio of data is highly dependent on the data
+ itself; regardless of whether the <em>size</em> of the data
+ changes, the size of the compressed data change substantially
+ as the data changes. Uncompressed chunks do not vary in size,
+ so this issue does not arise.
+ </font>
+
+ <h2>4. Use of the Pablo Instrumentation of HDF5</h2>
Pablo HDF5 Trace software provides a means of measuring the
performance of programs using HDF5.
@@ -147,7 +235,7 @@
<!-- Created: Thu Oct 14 16:46:00 CDT 1999 -->
<!-- hhmts start -->
-Last modified: 14 October 1999
+Last modified: 11 July 2001
<!-- hhmts end -->
<br>