summaryrefslogtreecommitdiffstats
path: root/doc/html/Performance.html
blob: 1f368d648db0db4d6da523fc4b536f977eeda8ba (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>Performance</title>

<!-- #BeginLibraryItem "/ed_libs/styles_UG.lbi" --><link href="ed_styles/UGelect.css" rel="stylesheet" type="text/css">
<!-- #EndLibraryItem --></head>

  <body bgcolor="#FFFFFF">
  
  
<!-- #BeginLibraryItem "/ed_libs/NavBar_UG.lbi" --><hr>
<center>
<table border=0 width=98%>
<tr><td valign=top align=left>
    <a href="index.html">HDF5 documents and links</a>&nbsp;<br>
    <a href="H5.intro.html">Introduction to HDF5</a>&nbsp;<br>
    <a href="RM_H5Front.html">HDF5 Reference Manual</a>&nbsp;<br>   
    <!--
    <a href="Glossary.html">Glossary</a><br>
    -->
</td>
<td valign=top align=right>
    And in this document, the 
    <a href="H5.user.html"><strong>HDF5 User's Guide:</strong></a>&nbsp;&nbsp;&nbsp;&nbsp;
        <br>
        <a href="Files.html">Files</a>&nbsp;&nbsp;
        <a href="Datasets.html">Datasets</a>&nbsp;&nbsp;
        <a href="Datatypes.html">Datatypes</a>&nbsp;&nbsp;
        <a href="Dataspaces.html">Dataspaces</a>&nbsp;&nbsp;
        <a href="Groups.html">Groups</a>&nbsp;&nbsp;
        <br>
        <a href="References.html">References</a>&nbsp;&nbsp;
        <a href="Attributes.html">Attributes</a>&nbsp;&nbsp;
        <a href="Properties.html">Property Lists</a>&nbsp;&nbsp;
        <a href="Errors.html">Error Handling</a>&nbsp;&nbsp;
        <br>
        <a href="Filters.html">Filters</a>&nbsp;&nbsp;
        <a href="Caching.html">Caching</a>&nbsp;&nbsp;
        <a href="Chunking.html">Chunking</a>&nbsp;&nbsp;
        <a href="MountingFiles.html">Mounting Files</a>&nbsp;&nbsp;
        <br>
        <a href="Performance.html">Performance</a>&nbsp;&nbsp;
        <a href="Debugging.html">Debugging</a>&nbsp;&nbsp;
        <a href="Environment.html">Environment</a>&nbsp;&nbsp;
        <a href="ddl.html">DDL</a>&nbsp;&nbsp;
</td></tr>
</table>
</center>
<hr>
<!-- #EndLibraryItem --><h1>Performance Analysis and Issues</h1>

    <h2>1. Introduction</h2>

    <p>This section includes brief discussions of performance issues 
      in HDF5 and performance analysis tools for HDF5 or pointers to 
      such discussions.

    <h2>2. Dataset Chunking</h2>

      Appropriate dataset chunking can make a siginificant difference
      in HDF5 performance.  This topic is discussed in 
      <a href="Chunking.html">Dataset Chunking Issues</a> elsewhere
      in this <cite>User's Guide</cite>.

    <a name="Freespace">
    <h2>3. Freespace Management</h2>
    </a>

     <p>HDF5 does not yet manage freespace as effectively as it might.
      While a file is opened, the library actively tracks and re-uses
      <em>freespace</em>, i.e., space that is freed (or released) 
      during the run.  
      But the library does not yet manage freespace across the 
      closing and reopening of a file; when a file is closed, 
      all knowledge of available freespace is lost.  
      What was freespace becomes an unusable <em>hole</em> in the file.

     <p>There are several circumstances that can result in freespace 
      in an HDF5 file:
      <ul>
      <li>Reading then rewriting a dataset or compressed dataset 
        chunk.<sup><a href="#footcchunk">1</a></sup>  
        <ul>
        <li>If the rewritten dataset or compressed chunk is the same 
          size as or smaller than the original, it will be written 
          to the same file location.  
        <li>If, however, the dataset or compressed chunk is larger 
          than the original, it will be written contiguously elsewhere 
          in the file, leaving freespace at the original location.
        <li>If the rewritten dataset or compressed chunk is 
          substantially smaller than the original, the remaining 
          space will be released and identified as freespace.
        </ul>
      <li>Deleting (or unlinking) a dataset or group.
        <ul>
        <li>If an object, such as a dataset, group, or named datatype, 
          is deleted (normally with <code>H5Gunlink</code>), 
          the space previously occupied by the object is released 
          and identified as freespace.
        </ul>
      </ul>

     <p>As stated above, freespace is not managed across the 
      closing and reopening of an HDF5 file; file space that was 
      known freespace while the file remained open becomes an 
      inaccessible hole when the file is closed.  
      Thus, if a file is often closed and reopened, datasets 
      frequently rewritten, or groups and/or datasets frequently 
      added and deleted, that file can develop large numbers of 
      holes and grow unnecessarily large.  This can, in turn, 
      seriously impair application or library performance 
      as the file ages.

     <p>An <code>h5pack</code> utility would enable <em>packing</em> 
      a file to remove the holes, but writing such a utility to 
      universally pack the file correctly is a complex task and the 
      HDF5 development team has not to date had the resources to 
      complete the task.

     <p>For application developers or researchers who find themselves 
      working with files that become bloated in this manner, there 
      are, at this time, two remedies:
      <ul>
        <li><code>H5view</code>, an HDF5 Java tool, allows the user 
          to open a file and, using the <code>Save As...</code> feature, 
          save the file under a new filename.  The new file can then 
          be closed and will be a packed version of the original file.  
          This approach is reasonably reliable, but with two caveats:
        <ul>
          <li>It is not automated.
          <li>This ability is a side-effect of the tool's design;
            it was not designed for this purpose and this approach 
            to file packing has not been exhaustively tested. 
        </ul>
        <li>An application developer or researcher can write a utility 
          that is tuned to their data and file structures.  This
          untility can then read in a file, copy the structures and
          datasets to a new file, and write the new file to storage.  
          This will eliminate the holes, making the new file a 
          fully-packed version of the original file. 
      </ul>

     <a name="footcchunk">
     <p></a>
      <sup>1</sup>
      <font size=-1>
        This is a problem only with compressed chunks.
        The compression ratio of data is highly dependent on the data 
        itself; regardless of whether the <em>size</em> of the data 
        changes, the size of the compressed data change substantially 
        as the data changes.  Uncompressed chunks do not vary in size, 
        so this issue does not arise.
      </font>

    <h2>4. Use of the Pablo Instrumentation of HDF5</h2>

      Pablo HDF5 Trace software provides a means of measuring the 
      performance of programs using HDF5. 

    <p>The Pablo software consists 
      of an instrumented copy of the HDF5 library, the Pablo Trace and 
      Trace Extensions libraries, and some utilities for processing the 
      output.  The instrumented version of the HDF5 library has hooks 
      inserted into the HDF5 code which call routines in the Pablo Trace 
      library just after entry to each instrumented HDF5 routine and 
      just prior to exit from the routine.  The Pablo Trace Extension 
      library has programs that track the I/O activity between the 
      entry and exit of the HDF5 routine during execution.  

    <p>A few lines of code must be inserted in the user's main program 
      to enable tracing and to specify which HDF5 procedures are to be 
      traced.  The program is linked with the special HDF5 and Pablo 
      libraries to produce an executable.   Running this executable on 
      a single processor produces an output file called the trace file 
      which contains records, called Pablo Self-Defining Data Format 
      (SDDF) records, which can later be analyzed using the 
      HDF5 Analysis Utilities. The HDF5 Analysis Utilites can be used 
      to interpret the SDDF records in the trace files to produce a 
      report describing the HDF5 IO activity that occurred during 
      execution.  

    <p>For further instructions, see the file <code>READ_ME</code> 
      in the <code> $(toplevel)/hdf5/pablo/ </code> subdirectory of 
      the HDF5 source code distribution. 

    <p>For further information about Pablo and the 
      Self-Defining Data Format, visit the Pablo website at
      <code><a href="http://www-pablo.cs.uiuc.edu/">http://www-pablo.cs.uiuc.edu/</a></code>.</p>


<!-- #BeginLibraryItem "/ed_libs/NavBar_UG.lbi" --><hr>
<center>
<table border=0 width=98%>
<tr><td valign=top align=left>
    <a href="index.html">HDF5 documents and links</a>&nbsp;<br>
    <a href="H5.intro.html">Introduction to HDF5</a>&nbsp;<br>
    <a href="RM_H5Front.html">HDF5 Reference Manual</a>&nbsp;<br>   
    <!--
    <a href="Glossary.html">Glossary</a><br>
    -->
</td>
<td valign=top align=right>
    And in this document, the 
    <a href="H5.user.html"><strong>HDF5 User's Guide:</strong></a>&nbsp;&nbsp;&nbsp;&nbsp;
        <br>
        <a href="Files.html">Files</a>&nbsp;&nbsp;
        <a href="Datasets.html">Datasets</a>&nbsp;&nbsp;
        <a href="Datatypes.html">Datatypes</a>&nbsp;&nbsp;
        <a href="Dataspaces.html">Dataspaces</a>&nbsp;&nbsp;
        <a href="Groups.html">Groups</a>&nbsp;&nbsp;
        <br>
        <a href="References.html">References</a>&nbsp;&nbsp;
        <a href="Attributes.html">Attributes</a>&nbsp;&nbsp;
        <a href="Properties.html">Property Lists</a>&nbsp;&nbsp;
        <a href="Errors.html">Error Handling</a>&nbsp;&nbsp;
        <br>
        <a href="Filters.html">Filters</a>&nbsp;&nbsp;
        <a href="Caching.html">Caching</a>&nbsp;&nbsp;
        <a href="Chunking.html">Chunking</a>&nbsp;&nbsp;
        <a href="MountingFiles.html">Mounting Files</a>&nbsp;&nbsp;
        <br>
        <a href="Performance.html">Performance</a>&nbsp;&nbsp;
        <a href="Debugging.html">Debugging</a>&nbsp;&nbsp;
        <a href="Environment.html">Environment</a>&nbsp;&nbsp;
        <a href="ddl.html">DDL</a>&nbsp;&nbsp;
</td></tr>
</table>
</center>
<hr>
<!-- #EndLibraryItem --><!-- #BeginLibraryItem "/ed_libs/Footer.lbi" --><address>
<a href="mailto:hdfhelp@ncsa.uiuc.edu">HDF Help Desk</a> 
<br>
Describes HDF5 Release 1.5, Unreleased Development Branch
</address><!-- #EndLibraryItem -->
 
<!-- Created: Thu Oct 14 16:46:00 CDT 1999 -->
<!-- hhmts start -->
Last modified: 2 August 2001 
<!-- hhmts end -->

</body>
</html>