doc/html/TechNotes/MemoryMgmt.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>Memory Management in HDF5</title>
  </head>

  <body>
      <h1>Memory Management in HDF5</h1>

      <!-- ---------------------------------------------------------------- -->
      <h2>Is a Memory Manager Necessary?</h2>

      <p>Some form of memory management may be necessary in HDF5 when
	the various deletion operators are implemented so that the
	file memory is not permanently orphaned.  However, since an
	HDF5 file was designed with persistent data in mind, the
	importance of a memory manager is questionable.

      <p>On the other hand, when certain meta data containers (file glue)
	grow, they may need to be relocated in order to keep the
	container contiguous.

	<blockquote>
	  <b>Example:</b> An object header consists of up to two
	  chunks of contiguous memory.  The first chunk is a fixed
	  size at a fixed location when the header link count is
	  greater than one.  Thus, inserting additional items into an
	  object header may require the second chunk to expand.  When
	  this occurs, the second chunk may need to move to another
	  location in the file, freeing the file memory which that
	  chunk originally occupied.
      </blockquote>

      <p>The relocation of meta data containers could potentially
	orphan a significant amount of file memory if the application
	has made poor estimates for preallocation sizes.

      <!-- ---------------------------------------------------------------- -->
      <h2>Levels of Memory Management</h2>

      <p>Memory management by the library can be independent of memory
	management support by the file format.  The file format can
	support no memory management, some memory management, or full
	memory management.  Similarly with the library.

      <h3>Support in the Library</h3>

      <dl>
	<dt><b>No Support: I</b>
	<dd>When memory is deallocated it simply becomes unreferenced
	  (orphaned) in the file.  Memory allocation requests are
	  satisfied by extending the file.
	  
	<dd>A separate off-line utility can be used to detect the
	  unreferenced bytes of a file and "bubble" them up to the end
	  of the file and then truncate the file.

	<dt><b>Some Support: II</b>
	<dd>The library could support partial memory management all
	  the time, or full memory management some of the time.
	  Orphaning free blocks instead of adding them to a free list
	  should not affect the file integrity, nor should fulfilling
	  new requests by extending the file instead of using the free
	  list.

	<dt><b>Full Support: III</b>
	<dd>The library supports space-efficient memory management by
	  always fulfilling allocation requests from the free list when
	  possible, and by coalescing adjacent free blocks into a
	  single larger free block.
      </dl>

      <h3>Support in the File Format</h3>
      
      <dl>
	<dt><b>No Support: A</b>
	<dd>The file format does not support memory management; any
	  unreferenced block in the file is assumed to be free.  If
	  the library supports full memory management then it will
	  have to traverse the entire file to determine which blocks
	  are unreferenced.

	<dt><b>Some Support: B</b>
	<dd>Assuming that unreferenced blocks are free can be
	  dangerous in a situation where the file is not consistent.
	  For instance, if a directory tree becomes detached from the
	  main directory hierarchy, then the detached directory and
	  everything that is referenced only through the detached
	  directory become unreferenced.  File repair utilities will
	  be unable to determine which unreferenced blocks need to be
	  linked back into the file hierarchy.

	<dd>Therefore, it might be useful to keep an unsorted,
	  doubly-linked list of free blocks in the file.  The library
	  can add and remove blocks from the list in constant time,
	  and can generate its own internal free-block data structure
	  in time proportional to the number of free blocks instead of
	  the size of the file.  Additionally, a library can use a
	  subset of the free blocks, an alternative which is not
	  feasible if the file format doesn't support any form of
	  memory management.

	<dt><b>Full Support: C</b>
	<dd>The file format can mirror library data structures for
	  space-efficient memory management.  The free blocks are
	  linked in unsorted, doubly-linked lists with one list per
	  free block size.  The heads of the lists are pointed to by a
	  B-tree whose nodes are sorted by free block size.  At the
	  same time, all free blocks are the leaf nodes of another
	  B-tree sorted by starting and ending address.  When the
	  trees are used in combination we can deallocate and allocate
	  memory in O(log <em>N</em>) time where <em>N</em> is the
	  number of free blocks.
      </dl>

      <h3>Combinations of Library and File Format Support</h3>

      <p>We now evaluate each combination of library support with file
	support:

      <dl>
	<dt><b>I-A</b>
	<dd>If neither the library nor the file support memory
	  management, then each allocation request will come from the
	  end of the file and each deallocation request is a no-op
	  that simply leaves the free block unreferenced.

	  <ul>
	    <li>Advantages
	      <ul>
		<li>No file overhead for allocation or deallocation.
		<li>No library overhead for allocation or
		  deallocation.
		<li>No file traversal required at time of open.
		<li>No data needs to be written back to the file when
		  it's closed.
		<li>Trivial to implement (already implemented).
	      </ul>

	    <li>Disadvantages
	      <ul>
		<li>Inefficient use of file space.
		<li>A file repair utility must reclaim lost file space.
		<li>Difficulties for file repair utilities. (Is an
		  unreferenced block a free block or orphaned data?)
	      </ul>
	  </ul>

	<dt><b>II-A</b>
	<dd>In order for the library to support memory management, it
	  will be required to build the internal free block
	  representation by traversing the entire file looking for
	  unreferenced blocks.

	  <ul>
	    <li>Advantages
	      <ul>
		<li>No file overhead for allocation or deallocation.
		<li>Variable amount of library overhead for allocation
		  and deallocation depending on how much work the
		  library wants to do.
		<li>No data needs to be written back to the file when
		  it's closed.
		<li>Might use file space efficiently.
	      </ul>
	    <li>Disadvantages
	      <ul>
		<li>Might use file space inefficiently.
		<li>File traversal required at time of open.
		<li>A file repair utility must reclaim lost file space.
		<li>Difficulties for file repair utilities.
		<li>Sharing of the free list between processes falls
		  outside the HDF5 file format documentation.
	      </ul>
	  </ul>

	<dt><b>III-A</b>
	<dd>In order for the library to support full memory
	  management, it will be required to build the internal free
	  block representation by traversing the entire file looking
	  for unreferenced blocks.

	  <ul>
	    <li>Advantages
	      <ul>
		<li>No file overhead for allocation or deallocation.
		<li>Efficient use of file space.
		<li>No data needs to be written back to the file when
		  it's closed.
	      </ul>
	    <li>Disadvantages
	      <ul>
		<li>Moderate amount of library overhead for allocation
		  and deallocation.
		<li>File traversal required at time of open.
		<li>A file repair utility must reclaim lost file space.
		<li>Difficulties for file repair utilities.
		<li>Sharing of the free list between processes falls
		  outside the HDF5 file format documentation.
	      </ul>
	  </ul>

	<dt><b>I-B</b>
	<dd>If the library doesn't support memory management but the
	  file format supports some level of management, then a file
	  repair utility will have to be run occasionally to reclaim
	  unreferenced blocks.

	  <ul>
	    <li>Advantages
	      <ul>
		<li>No file overhead for allocation or deallocation.
		<li>No library overhead for allocation or
		  deallocation.
		<li>No file traversal required at time of open.
		<li>No data needs to be written back to the file when
		  it's closed.
	      </ul>
	    <li>Disadvantages
	      <ul>
		<li>A file repair utility must reclaim lost file space.
		<li>Difficulties for file repair utilities.
	      </ul>
	  </ul>

	<dt><b>II-B</b>
	<dd>Both the library and the file format support some level
	  of memory management.

	  <ul>
	    <li>Advantages
	      <ul>
		<li>Constant file overhead per allocation or
		  deallocation.
		<li>Variable library overhead per allocation or
		  deallocation depending on how much work the library
		  wants to do.
		<li>Traversal at file open time is on the order of the
		  free list size instead of the file size.
		<li>The library has the option of reading only part of
		  the free list.
		<li>No data needs to be written at file close time if
		  it has been amortized into the cost of allocation
		  and deallocation.
		<li>File repair utilties don't have to be run to
		  reclaim memory.
		<li>File repair utilities can detect whether an
		  unreferenced block is a free block or orphaned data.
		<li>Sharing of the free list between processes might
		  be easier.
		<li>Possible efficient use of file space.
	      </ul>
	    <li>Disadvantages
	      <ul>
		<li>Possible inefficient use of file space.
	      </ul>
	  </ul>

	<dt><b>III-B</b>
	<dd>The library provides space-efficient memory management but
	  the file format only supports an unsorted list of free
	  blocks.

	  <ul>
	    <li>Advantages
	      <ul>
		<li>Constant time file overhead per allocation or
		  deallocation.
		<li>No data needs to be written at file close time if
		  it has been amortized into the cost of allocation
		  and deallocation.
		<li>File repair utilities don't have to be run to
		  reclaim memory.
		<li>File repair utilities can detect whether an
		  unreferenced block is a free block or orphaned data.
		<li>Sharing of the free list between processes might
		  be easier.
		<li>Efficient use of file space.
	      </ul>
	    <li>Disadvantages
	      <ul>
		<li>O(log <em>N</em>) library overhead per allocation or
		  deallocation where <em>N</em> is the total number of
		  free blocks.
		<li>O(<em>N</em>) time to open a file since the entire
		  free list must be read to construct the in-core
		  trees used by the library.
		<li>Library is more complicated.
	      </ul>
	  </ul>

	<dt><b>I-C</b>
	<dd>This has the same advantages and disadvantages as I-C with
	  the added disadvantage that the file format is much more
	  complicated.

	<dt><b>II-C</b>
	<dd>If the library only provides partial memory management but
	  the file requires full memory management, then this method
	  degenerates to the same as II-A with the added disadvantage
	  that the file format is much more complicated.

	<dt><b>III-C</b>
	<dd>The library and file format both provide complete data
	  structures for space-efficient memory management.

	  <ul>
	    <li>Advantages
	      <ul>
		<li>Files can be opened in constant time since the
		  free list is read on demand and amortised into the
		  allocation and deallocation requests.
		<li>No data needs to be written back to the file when
		  it's closed.
		<li>File repair utilities don't have to be run to
		  reclaim memory.
		<li>File repair utilities can detect whether an
		  unreferenced block is a free block or orphaned data.
		<li>Sharing the free list between processes is easy.
		<li>Efficient use of file space.
	      </ul>
	    <li>Disadvantages
	      <ul>
		<li>O(log <em>N</em>) file allocation and deallocation
		  cost where <em>N</em> is the total number of free
		  blocks.
		<li>O(log <em>N</em>) library allocation and
		  deallocation cost.
		<li>Much more complicated file format.
		<li>More complicated library.
	      </ul>
	  </ul>

      </dl>

      <!-- ---------------------------------------------------------------- -->
      <h2>The Algorithm for II-B</h2>

      <p>The file contains an unsorted, doubly-linked list of free
	blocks.  The address of the head of the list appears in the
	boot block.  Each free block contains the following fields:

      <center>
      <table border cellpadding=4 width="60%">
	<tr align=center>
	  <th width="25%">byte</th>
	  <th width="25%">byte</th>
	  <th width="25%">byte</th>
	  <th width="25%">byte</th>

	<tr align=center>
	  <th colspan=4>Free Block Signature</th>

	<tr align=center>
	  <th colspan=4>Total Free Block Size</th>

	<tr align=center>
	  <th colspan=4>Address of Left Sibling</th>

	<tr align=center>
	  <th colspan=4>Address of Right Sibling</th>

	<tr align=center>
	  <th colspan=4><br><br>Remainder of Free Block<br><br><br></th>
      </table>
      </center>
      
      <p>The library reads as much of the free list as convenient when
	convenient and pushes those entries onto stacks.  This can
	occur when a file is opened or any time during the life of the
	file. There is one stack for each free block size and the
	stacks are sorted by size in a balanced tree in memory.

      <p>Deallocation involves finding the correct stack or creating
	a new one (an O(log <em>K</em>) operation where <em>K</em> is
	the number of stacks), pushing the free block info onto the
	stack (a constant-time operation), and inserting the free
	block into the file free block list (a constant-time operation
	which doesn't necessarily involve any I/O since the free blocks
	can be cached like other objects).  No attempt is made to
	coalesce adjacent free blocks into larger blocks.

      <p>Allocation involves finding the correct stack (an O(log
	<em>K</em>) operation), removing the top item from the stack
	(a constant-time operation), and removing the block from the
	file free block list (a constant-time operation).  If there is
	no free block of the requested size or larger, then the file
	is extended.

      <p>To provide sharability of the free list between processes,
	the last step of an allocation will check for the free block
	signature and if it doesn't find one will repeat the process.
	Alternatively, a process can temporarily remove free blocks
	from the file and hold them in it's own private pool.

      <p>To summarize...
	<dl>
	<dt>File opening
	<dd>O(<em>N</em>) amortized over the time the file is open,
	  where <em>N</em> is the number of free blocks.  The library
	  can still function without reading any of the file free
	  block list.

	<dt>Deallocation
	<dd>O(log <em>K</em>) where <em>K</em> is the number of unique
	  sizes of free blocks.  File access is constant.

	<dt>Allocation
	<dd>O(log <em>K</em>).  File access is constant.

	<dt>File closing
	<dd>O(1) even if the library temporarily removes free
	  blocks from the file to hold them in a private pool since
	  the pool can still be a linked list on disk.
      </dl>

      <!-- ---------------------------------------------------------------- -->
      <h2>The Algorithm for III-C</h2>

      <p>The HDF5 file format supports a general B-tree mechanism
	for storing data with keys.  If we use a B-tree to represent
	all parts of the file that are free and the B-tree is indexed
	so that a free file chunk can be found if we know the starting
	or ending address, then we can efficiently determine whether a
	free chunk begins or ends at the specified address.  Call this
	the <em>Address B-Tree</em>.

      <p>If a second B-tree points to a set of stacks where the
	members of a particular stack are all free chunks of the same
	size, and the tree is indexed by chunk size, then we can
	efficiently find the best-fit chunk size for a memory request.
	Call this the <em>Size B-Tree</em>.

      <p>All free blocks of a particular size can be linked together
	with an unsorted, doubly-linked, circular list and the left
	and right sibling addresses can be stored within the free
	chunk, allowing us to remove or insert items from the list in
	constant time.

      <p>Deallocation of a block fo file memory consists of:

	<ol type="I">
	<li>Add the new free block whose address is <em>ADDR</em> to the
	  address B-tree.

	  <ol type="A">
	    <li>If the address B-tree contains an entry for a free
	      block that ends at <em>ADDR</em>-1 then remove that
	      block from the B-tree and from the linked list (if the
	      block was the first on the list then the size B-tree
	      must be updated).  Adjust the size and address of the
	      block being freed to include the block just removed from
	      the free list.  The time required to search for and
	      possibly remove the left block is O(log <em>N</em>)
	      where <em>N</em> is the number of free blocks.

	    <li>If the address B-tree contains an entry for the free
	      block that begins at <em>ADDR</em>+<em>LENGTH</em> then
	      remove that block from the B-tree and from the linked
	      list (if the block was the first on the list then the
	      size B-tree must be updated).  Adjust the size of the
	      block being freed to include the block just removed from
	      the free list.  The time required to search for and
	      possibly remove the right block is O(log <em>N</em>).

	    <li>Add the new (adjusted) block to the address B-tree.
	      The time for this operation is O(log <em>N</em>).
	  </ol>

	<li>Add the new block to the size B-tree and linked list.

	  <ol type="A">
	    <li>If the size B-tree has an entry for this particular
	      size, then add the chunk to the tail of the list. This
	      is an O(log <em>K</em>) operation where <em>K</em> is
	      the number of unique free block sizes.

	    <li>Otherwise make a new entry in the B-tree for chunks of
	      this size.  This is also O(log <em>K</em>).
	  </ol>
      </ol>

      <p>Allocation is similar to deallocation.

      <p>To summarize...

      <dl>
	<dt>File opening
	<dd>O(1)

	<dt>Deallocation
	<dd>O(log <em>N</em>) where <em>N</em> is the total number of
	  free blocks.  File access time is O(log <em>N</em>).

	<dt>Allocation
	<dd>O(log <em>N</em>).  File access time is O(log <em>N</em>).

	<dt>File closing
	<dd>O(1).
      </dl>


      <hr>
      <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Thu Jul 24 15:16:40 PDT 1997 -->
<!-- hhmts start -->
Last modified: Thu Jul 31 14:41:01 EST 
<!-- hhmts end -->
  </body>
</html>