diff options
author | Quincey Koziol <koziol@hdfgroup.org> | 1998-07-08 14:54:54 (GMT) |
---|---|---|
committer | Quincey Koziol <koziol@hdfgroup.org> | 1998-07-08 14:54:54 (GMT) |
commit | bd1e676c521d881b3143829f493a28b5ced1294b (patch) | |
tree | 69c50f9fe21ce87f293d8617a6bd51b4cc1e0244 /doc/html/MemoryManagement.html | |
parent | 73345095897d9698bb1f2f7df830bf80a56dc65a (diff) | |
download | hdf5-bd1e676c521d881b3143829f493a28b5ced1294b.zip hdf5-bd1e676c521d881b3143829f493a28b5ced1294b.tar.gz hdf5-bd1e676c521d881b3143829f493a28b5ced1294b.tar.bz2 |
[svn-r467] Restructuring documentation.
Diffstat (limited to 'doc/html/MemoryManagement.html')
-rw-r--r-- | doc/html/MemoryManagement.html | 510 |
1 files changed, 510 insertions, 0 deletions
diff --git a/doc/html/MemoryManagement.html b/doc/html/MemoryManagement.html new file mode 100644 index 0000000..93782b5 --- /dev/null +++ b/doc/html/MemoryManagement.html @@ -0,0 +1,510 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Memory Management in HDF5</title> + </head> + + <body> + <h1>Memory Management in HDF5</h1> + + <!-- ---------------------------------------------------------------- --> + <h2>Is a Memory Manager Necessary?</h2> + + <p>Some form of memory management may be necessary in HDF5 when + the various deletion operators are implemented so that the + file memory is not permanently orphaned. However, since an + HDF5 file was designed with persistent data in mind, the + importance of a memory manager is questionable. + + <p>On the other hand, when certain meta data containers (file glue) + grow, they may need to be relocated in order to keep the + container contiguous. + + <blockquote> + <b>Example:</b> An object header consists of up to two + chunks of contiguous memory. The first chunk is a fixed + size at a fixed location when the header link count is + greater than one. Thus, inserting additional items into an + object header may require the second chunk to expand. When + this occurs, the second chunk may need to move to another + location in the file, freeing the file memory which that + chunk originally occupied. + </blockquote> + + <p>The relocation of meta data containers could potentially + orphan a significant amount of file memory if the application + has made poor estimates for preallocation sizes. + + <!-- ---------------------------------------------------------------- --> + <h2>Levels of Memory Management</h2> + + <p>Memory management by the library can be independent of memory + management support by the file format. The file format can + support no memory management, some memory management, or full + memory management. Similarly with the library. + + <h3>Support in the Library</h3> + + <dl> + <dt><b>No Support: I</b> + <dd>When memory is deallocated it simply becomes unreferenced + (orphaned) in the file. Memory allocation requests are + satisfied by extending the file. + + <dd>A separate off-line utility can be used to detect the + unreferenced bytes of a file and "bubble" them up to the end + of the file and then truncate the file. + + <dt><b>Some Support: II</b> + <dd>The library could support partial memory management all + the time, or full memory management some of the time. + Orphaning free blocks instead of adding them to a free list + should not affect the file integrity, nor should fulfilling + new requests by extending the file instead of using the free + list. + + <dt><b>Full Support: III</b> + <dd>The library supports space-efficient memory management by + always fulfilling allocation requests from the free list when + possible, and by coalescing adjacent free blocks into a + single larger free block. + </dl> + + <h3>Support in the File Format</h3> + + <dl> + <dt><b>No Support: A</b> + <dd>The file format does not support memory management; any + unreferenced block in the file is assumed to be free. If + the library supports full memory management then it will + have to traverse the entire file to determine which blocks + are unreferenced. + + <dt><b>Some Support: B</b> + <dd>Assuming that unreferenced blocks are free can be + dangerous in a situation where the file is not consistent. + For instance, if a directory tree becomes detached from the + main directory hierarchy, then the detached directory and + everything that is referenced only through the detached + directory become unreferenced. File repair utilities will + be unable to determine which unreferenced blocks need to be + linked back into the file hierarchy. + + <dd>Therefore, it might be useful to keep an unsorted, + doubly-linked list of free blocks in the file. The library + can add and remove blocks from the list in constant time, + and can generate its own internal free-block data structure + in time proportional to the number of free blocks instead of + the size of the file. Additionally, a library can use a + subset of the free blocks, an alternative which is not + feasible if the file format doesn't support any form of + memory management. + + <dt><b>Full Support: C</b> + <dd>The file format can mirror library data structures for + space-efficient memory management. The free blocks are + linked in unsorted, doubly-linked lists with one list per + free block size. The heads of the lists are pointed to by a + B-tree whose nodes are sorted by free block size. At the + same time, all free blocks are the leaf nodes of another + B-tree sorted by starting and ending address. When the + trees are used in combination we can deallocate and allocate + memory in O(log <em>N</em>) time where <em>N</em> is the + number of free blocks. + </dl> + + <h3>Combinations of Library and File Format Support</h3> + + <p>We now evaluate each combination of library support with file + support: + + <dl> + <dt><b>I-A</b> + <dd>If neither the library nor the file support memory + management, then each allocation request will come from the + end of the file and each deallocation request is a no-op + that simply leaves the free block unreferenced. + + <ul> + <li>Advantages + <ul> + <li>No file overhead for allocation or deallocation. + <li>No library overhead for allocation or + deallocation. + <li>No file traversal required at time of open. + <li>No data needs to be written back to the file when + it's closed. + <li>Trivial to implement (already implemented). + </ul> + + <li>Disadvantages + <ul> + <li>Inefficient use of file space. + <li>A file repair utility must reclaim lost file space. + <li>Difficulties for file repair utilities. (Is an + unreferenced block a free block or orphaned data?) + </ul> + </ul> + + <dt><b>II-A</b> + <dd>In order for the library to support memory management, it + will be required to build the internal free block + representation by traversing the entire file looking for + unreferenced blocks. + + <ul> + <li>Advantages + <ul> + <li>No file overhead for allocation or deallocation. + <li>Variable amount of library overhead for allocation + and deallocation depending on how much work the + library wants to do. + <li>No data needs to be written back to the file when + it's closed. + <li>Might use file space efficiently. + </ul> + <li>Disadvantages + <ul> + <li>Might use file space inefficiently. + <li>File traversal required at time of open. + <li>A file repair utility must reclaim lost file space. + <li>Difficulties for file repair utilities. + <li>Sharing of the free list between processes falls + outside the HDF5 file format documentation. + </ul> + </ul> + + <dt><b>III-A</b> + <dd>In order for the library to support full memory + management, it will be required to build the internal free + block representation by traversing the entire file looking + for unreferenced blocks. + + <ul> + <li>Advantages + <ul> + <li>No file overhead for allocation or deallocation. + <li>Efficient use of file space. + <li>No data needs to be written back to the file when + it's closed. + </ul> + <li>Disadvantages + <ul> + <li>Moderate amount of library overhead for allocation + and deallocation. + <li>File traversal required at time of open. + <li>A file repair utility must reclaim lost file space. + <li>Difficulties for file repair utilities. + <li>Sharing of the free list between processes falls + outside the HDF5 file format documentation. + </ul> + </ul> + + <dt><b>I-B</b> + <dd>If the library doesn't support memory management but the + file format supports some level of management, then a file + repair utility will have to be run occasionally to reclaim + unreferenced blocks. + + <ul> + <li>Advantages + <ul> + <li>No file overhead for allocation or deallocation. + <li>No library overhead for allocation or + deallocation. + <li>No file traversal required at time of open. + <li>No data needs to be written back to the file when + it's closed. + </ul> + <li>Disadvantages + <ul> + <li>A file repair utility must reclaim lost file space. + <li>Difficulties for file repair utilities. + </ul> + </ul> + + <dt><b>II-B</b> + <dd>Both the library and the file format support some level + of memory management. + + <ul> + <li>Advantages + <ul> + <li>Constant file overhead per allocation or + deallocation. + <li>Variable library overhead per allocation or + deallocation depending on how much work the library + wants to do. + <li>Traversal at file open time is on the order of the + free list size instead of the file size. + <li>The library has the option of reading only part of + the free list. + <li>No data needs to be written at file close time if + it has been amortized into the cost of allocation + and deallocation. + <li>File repair utilties don't have to be run to + reclaim memory. + <li>File repair utilities can detect whether an + unreferenced block is a free block or orphaned data. + <li>Sharing of the free list between processes might + be easier. + <li>Possible efficient use of file space. + </ul> + <li>Disadvantages + <ul> + <li>Possible inefficient use of file space. + </ul> + </ul> + + <dt><b>III-B</b> + <dd>The library provides space-efficient memory management but + the file format only supports an unsorted list of free + blocks. + + <ul> + <li>Advantages + <ul> + <li>Constant time file overhead per allocation or + deallocation. + <li>No data needs to be written at file close time if + it has been amortized into the cost of allocation + and deallocation. + <li>File repair utilities don't have to be run to + reclaim memory. + <li>File repair utilities can detect whether an + unreferenced block is a free block or orphaned data. + <li>Sharing of the free list between processes might + be easier. + <li>Efficient use of file space. + </ul> + <li>Disadvantages + <ul> + <li>O(log <em>N</em>) library overhead per allocation or + deallocation where <em>N</em> is the total number of + free blocks. + <li>O(<em>N</em>) time to open a file since the entire + free list must be read to construct the in-core + trees used by the library. + <li>Library is more complicated. + </ul> + </ul> + + <dt><b>I-C</b> + <dd>This has the same advantages and disadvantages as I-C with + the added disadvantage that the file format is much more + complicated. + + <dt><b>II-C</b> + <dd>If the library only provides partial memory management but + the file requires full memory management, then this method + degenerates to the same as II-A with the added disadvantage + that the file format is much more complicated. + + <dt><b>III-C</b> + <dd>The library and file format both provide complete data + structures for space-efficient memory management. + + <ul> + <li>Advantages + <ul> + <li>Files can be opened in constant time since the + free list is read on demand and amortised into the + allocation and deallocation requests. + <li>No data needs to be written back to the file when + it's closed. + <li>File repair utilities don't have to be run to + reclaim memory. + <li>File repair utilities can detect whether an + unreferenced block is a free block or orphaned data. + <li>Sharing the free list between processes is easy. + <li>Efficient use of file space. + </ul> + <li>Disadvantages + <ul> + <li>O(log <em>N</em>) file allocation and deallocation + cost where <em>N</em> is the total number of free + blocks. + <li>O(log <em>N</em>) library allocation and + deallocation cost. + <li>Much more complicated file format. + <li>More complicated library. + </ul> + </ul> + + </dl> + + <!-- ---------------------------------------------------------------- --> + <h2>The Algorithm for II-B</h2> + + <p>The file contains an unsorted, doubly-linked list of free + blocks. The address of the head of the list appears in the + boot block. Each free block contains the following fields: + + <center> + <table border cellpadding=4 width="60%"> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <th colspan=4>Free Block Signature</th> + + <tr align=center> + <th colspan=4>Total Free Block Size</th> + + <tr align=center> + <th colspan=4>Address of Left Sibling</th> + + <tr align=center> + <th colspan=4>Address of Right Sibling</th> + + <tr align=center> + <th colspan=4><br><br>Remainder of Free Block<br><br><br></th> + </table> + </center> + + <p>The library reads as much of the free list as convenient when + convenient and pushes those entries onto stacks. This can + occur when a file is opened or any time during the life of the + file. There is one stack for each free block size and the + stacks are sorted by size in a balanced tree in memory. + + <p>Deallocation involves finding the correct stack or creating + a new one (an O(log <em>K</em>) operation where <em>K</em> is + the number of stacks), pushing the free block info onto the + stack (a constant-time operation), and inserting the free + block into the file free block list (a constant-time operation + which doesn't necessarily involve any I/O since the free blocks + can be cached like other objects). No attempt is made to + coalesce adjacent free blocks into larger blocks. + + <p>Allocation involves finding the correct stack (an O(log + <em>K</em>) operation), removing the top item from the stack + (a constant-time operation), and removing the block from the + file free block list (a constant-time operation). If there is + no free block of the requested size or larger, then the file + is extended. + + <p>To provide sharability of the free list between processes, + the last step of an allocation will check for the free block + signature and if it doesn't find one will repeat the process. + Alternatively, a process can temporarily remove free blocks + from the file and hold them in it's own private pool. + + <p>To summarize... + <dl> + <dt>File opening + <dd>O(<em>N</em>) amortized over the time the file is open, + where <em>N</em> is the number of free blocks. The library + can still function without reading any of the file free + block list. + + <dt>Deallocation + <dd>O(log <em>K</em>) where <em>K</em> is the number of unique + sizes of free blocks. File access is constant. + + <dt>Allocation + <dd>O(log <em>K</em>). File access is constant. + + <dt>File closing + <dd>O(1) even if the library temporarily removes free + blocks from the file to hold them in a private pool since + the pool can still be a linked list on disk. + </dl> + + <!-- ---------------------------------------------------------------- --> + <h2>The Algorithm for III-C</h2> + + <p>The HDF5 file format supports a general B-tree mechanism + for storing data with keys. If we use a B-tree to represent + all parts of the file that are free and the B-tree is indexed + so that a free file chunk can be found if we know the starting + or ending address, then we can efficiently determine whether a + free chunk begins or ends at the specified address. Call this + the <em>Address B-Tree</em>. + + <p>If a second B-tree points to a set of stacks where the + members of a particular stack are all free chunks of the same + size, and the tree is indexed by chunk size, then we can + efficiently find the best-fit chunk size for a memory request. + Call this the <em>Size B-Tree</em>. + + <p>All free blocks of a particular size can be linked together + with an unsorted, doubly-linked, circular list and the left + and right sibling addresses can be stored within the free + chunk, allowing us to remove or insert items from the list in + constant time. + + <p>Deallocation of a block fo file memory consists of: + + <ol type="I"> + <li>Add the new free block whose address is <em>ADDR</em> to the + address B-tree. + + <ol type="A"> + <li>If the address B-tree contains an entry for a free + block that ends at <em>ADDR</em>-1 then remove that + block from the B-tree and from the linked list (if the + block was the first on the list then the size B-tree + must be updated). Adjust the size and address of the + block being freed to include the block just removed from + the free list. The time required to search for and + possibly remove the left block is O(log <em>N</em>) + where <em>N</em> is the number of free blocks. + + <li>If the address B-tree contains an entry for the free + block that begins at <em>ADDR</em>+<em>LENGTH</em> then + remove that block from the B-tree and from the linked + list (if the block was the first on the list then the + size B-tree must be updated). Adjust the size of the + block being freed to include the block just removed from + the free list. The time required to search for and + possibly remove the right block is O(log <em>N</em>). + + <li>Add the new (adjusted) block to the address B-tree. + The time for this operation is O(log <em>N</em>). + </ol> + + <li>Add the new block to the size B-tree and linked list. + + <ol type="A"> + <li>If the size B-tree has an entry for this particular + size, then add the chunk to the tail of the list. This + is an O(log <em>K</em>) operation where <em>K</em> is + the number of unique free block sizes. + + <li>Otherwise make a new entry in the B-tree for chunks of + this size. This is also O(log <em>K</em>). + </ol> + </ol> + + <p>Allocation is similar to deallocation. + + <p>To summarize... + + <dl> + <dt>File opening + <dd>O(1) + + <dt>Deallocation + <dd>O(log <em>N</em>) where <em>N</em> is the total number of + free blocks. File access time is O(log <em>N</em>). + + <dt>Allocation + <dd>O(log <em>N</em>). File access time is O(log <em>N</em>). + + <dt>File closing + <dd>O(1). + </dl> + + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Thu Jul 24 15:16:40 PDT 1997 --> +<!-- hhmts start --> +Last modified: Thu Jul 31 14:41:01 EST +<!-- hhmts end --> + </body> +</html> |