<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html> <head> <title>Memory Management in HDF5</title> </head> <body> <h1>Memory Management in HDF5</h1> <!-- ---------------------------------------------------------------- --> <h2>Is a Memory Manager Necessary?</h2> <p>Some form of memory management may be necessary in HDF5 when the various deletion operators are implemented so that the file memory is not permanently orphaned. However, since an HDF5 file was designed with persistent data in mind, the importance of a memory manager is questionable. <p>On the other hand, when certain meta data containers (file glue) grow, they may need to be relocated in order to keep the container contiguous. <blockquote> <b>Example:</b> An object header consists of up to two chunks of contiguous memory. The first chunk is a fixed size at a fixed location when the header link count is greater than one. Thus, inserting additional items into an object header may require the second chunk to expand. When this occurs, the second chunk may need to move to another location in the file, freeing the file memory which that chunk originally occupied. </blockquote> <p>The relocation of meta data containers could potentially orphan a significant amount of file memory if the application has made poor estimates for preallocation sizes. <!-- ---------------------------------------------------------------- --> <h2>Levels of Memory Management</h2> <p>Memory management by the library can be independent of memory management support by the file format. The file format can support no memory management, some memory management, or full memory management. Similarly with the library. <h3>Support in the Library</h3> <dl> <dt><b>No Support: I</b> <dd>When memory is deallocated it simply becomes unreferenced (orphaned) in the file. Memory allocation requests are satisfied by extending the file. <dd>A separate off-line utility can be used to detect the unreferenced bytes of a file and "bubble" them up to the end of the file and then truncate the file. <dt><b>Some Support: II</b> <dd>The library could support partial memory management all the time, or full memory management some of the time. Orphaning free blocks instead of adding them to a free list should not affect the file integrity, nor should fulfilling new requests by extending the file instead of using the free list. <dt><b>Full Support: III</b> <dd>The library supports space-efficient memory management by always fulfilling allocation requests from the free list when possible, and by coalescing adjacent free blocks into a single larger free block. </dl> <h3>Support in the File Format</h3> <dl> <dt><b>No Support: A</b> <dd>The file format does not support memory management; any unreferenced block in the file is assumed to be free. If the library supports full memory management then it will have to traverse the entire file to determine which blocks are unreferenced. <dt><b>Some Support: B</b> <dd>Assuming that unreferenced blocks are free can be dangerous in a situation where the file is not consistent. For instance, if a directory tree becomes detached from the main directory hierarchy, then the detached directory and everything that is referenced only through the detached directory become unreferenced. File repair utilities will be unable to determine which unreferenced blocks need to be linked back into the file hierarchy. <dd>Therefore, it might be useful to keep an unsorted, doubly-linked list of free blocks in the file. The library can add and remove blocks from the list in constant time, and can generate its own internal free-block data structure in time proportional to the number of free blocks instead of the size of the file. Additionally, a library can use a subset of the free blocks, an alternative which is not feasible if the file format doesn't support any form of memory management. <dt><b>Full Support: C</b> <dd>The file format can mirror library data structures for space-efficient memory management. The free blocks are linked in unsorted, doubly-linked lists with one list per free block size. The heads of the lists are pointed to by a B-tree whose nodes are sorted by free block size. At the same time, all free blocks are the leaf nodes of another B-tree sorted by starting and ending address. When the trees are used in combination we can deallocate and allocate memory in O(log <em>N</em>) time where <em>N</em> is the number of free blocks. </dl> <h3>Combinations of Library and File Format Support</h3> <p>We now evaluate each combination of library support with file support: <dl> <dt><b>I-A</b> <dd>If neither the library nor the file support memory management, then each allocation request will come from the end of the file and each deallocation request is a no-op that simply leaves the free block unreferenced. <ul> <li>Advantages <ul> <li>No file overhead for allocation or deallocation. <li>No library overhead for allocation or deallocation. <li>No file traversal required at time of open. <li>No data needs to be written back to the file when it's closed. <li>Trivial to implement (already implemented). </ul> <li>Disadvantages <ul> <li>Inefficient use of file space. <li>A file repair utility must reclaim lost file space. <li>Difficulties for file repair utilities. (Is an unreferenced block a free block or orphaned data?) </ul> </ul> <dt><b>II-A</b> <dd>In order for the library to support memory management, it will be required to build the internal free block representation by traversing the entire file looking for unreferenced blocks. <ul> <li>Advantages <ul> <li>No file overhead for allocation or deallocation. <li>Variable amount of library overhead for allocation and deallocation depending on how much work the library wants to do. <li>No data needs to be written back to the file when it's closed. <li>Might use file space efficiently. </ul> <li>Disadvantages <ul> <li>Might use file space inefficiently. <li>File traversal required at time of open. <li>A file repair utility must reclaim lost file space. <li>Difficulties for file repair utilities. <li>Sharing of the free list between processes falls outside the HDF5 file format documentation. </ul> </ul> <dt><b>III-A</b> <dd>In order for the library to support full memory management, it will be required to build the internal free block representation by traversing the entire file looking for unreferenced blocks. <ul> <li>Advantages <ul> <li>No file overhead for allocation or deallocation. <li>Efficient use of file space. <li>No data needs to be written back to the file when it's closed. </ul> <li>Disadvantages <ul> <li>Moderate amount of library overhead for allocation and deallocation. <li>File traversal required at time of open. <li>A file repair utility must reclaim lost file space. <li>Difficulties for file repair utilities. <li>Sharing of the free list between processes falls outside the HDF5 file format documentation. </ul> </ul> <dt><b>I-B</b> <dd>If the library doesn't support memory management but the file format supports some level of management, then a file repair utility will have to be run occasionally to reclaim unreferenced blocks. <ul> <li>Advantages <ul> <li>No file overhead for allocation or deallocation. <li>No library overhead for allocation or deallocation. <li>No file traversal required at time of open. <li>No data needs to be written back to the file when it's closed. </ul> <li>Disadvantages <ul> <li>A file repair utility must reclaim lost file space. <li>Difficulties for file repair utilities. </ul> </ul> <dt><b>II-B</b> <dd>Both the library and the file format support some level of memory management. <ul> <li>Advantages <ul> <li>Constant file overhead per allocation or deallocation. <li>Variable library overhead per allocation or deallocation depending on how much work the library wants to do. <li>Traversal at file open time is on the order of the free list size instead of the file size. <li>The library has the option of reading only part of the free list. <li>No data needs to be written at file close time if it has been amortized into the cost of allocation and deallocation. <li>File repair utilties don't have to be run to reclaim memory. <li>File repair utilities can detect whether an unreferenced block is a free block or orphaned data. <li>Sharing of the free list between processes might be easier. <li>Possible efficient use of file space. </ul> <li>Disadvantages <ul> <li>Possible inefficient use of file space. </ul> </ul> <dt><b>III-B</b> <dd>The library provides space-efficient memory management but the file format only supports an unsorted list of free blocks. <ul> <li>Advantages <ul> <li>Constant time file overhead per allocation or deallocation. <li>No data needs to be written at file close time if it has been amortized into the cost of allocation and deallocation. <li>File repair utilities don't have to be run to reclaim memory. <li>File repair utilities can detect whether an unreferenced block is a free block or orphaned data. <li>Sharing of the free list between processes might be easier. <li>Efficient use of file space. </ul> <li>Disadvantages <ul> <li>O(log <em>N</em>) library overhead per allocation or deallocation where <em>N</em> is the total number of free blocks. <li>O(<em>N</em>) time to open a file since the entire free list must be read to construct the in-core trees used by the library. <li>Library is more complicated. </ul> </ul> <dt><b>I-C</b> <dd>This has the same advantages and disadvantages as I-C with the added disadvantage that the file format is much more complicated. <dt><b>II-C</b> <dd>If the library only provides partial memory management but the file requires full memory management, then this method degenerates to the same as II-A with the added disadvantage that the file format is much more complicated. <dt><b>III-C</b> <dd>The library and file format both provide complete data structures for space-efficient memory management. <ul> <li>Advantages <ul> <li>Files can be opened in constant time since the free list is read on demand and amortised into the allocation and deallocation requests. <li>No data needs to be written back to the file when it's closed. <li>File repair utilities don't have to be run to reclaim memory. <li>File repair utilities can detect whether an unreferenced block is a free block or orphaned data. <li>Sharing the free list between processes is easy. <li>Efficient use of file space. </ul> <li>Disadvantages <ul> <li>O(log <em>N</em>) file allocation and deallocation cost where <em>N</em> is the total number of free blocks. <li>O(log <em>N</em>) library allocation and deallocation cost. <li>Much more complicated file format. <li>More complicated library. </ul> </ul> </dl> <!-- ---------------------------------------------------------------- --> <h2>The Algorithm for II-B</h2> <p>The file contains an unsorted, doubly-linked list of free blocks. The address of the head of the list appears in the super block. Each free block contains the following fields: <center> <table border cellpadding=4 width="60%"> <tr align=center> <th width="25%">byte</th> <th width="25%">byte</th> <th width="25%">byte</th> <th width="25%">byte</th> <tr align=center> <th colspan=4>Free Block Signature</th> <tr align=center> <th colspan=4>Total Free Block Size</th> <tr align=center> <th colspan=4>Address of Left Sibling</th> <tr align=center> <th colspan=4>Address of Right Sibling</th> <tr align=center> <th colspan=4><br><br>Remainder of Free Block<br><br><br></th> </table> </center> <p>The library reads as much of the free list as convenient when convenient and pushes those entries onto stacks. This can occur when a file is opened or any time during the life of the file. There is one stack for each free block size and the stacks are sorted by size in a balanced tree in memory. <p>Deallocation involves finding the correct stack or creating a new one (an O(log <em>K</em>) operation where <em>K</em> is the number of stacks), pushing the free block info onto the stack (a constant-time operation), and inserting the free block into the file free block list (a constant-time operation which doesn't necessarily involve any I/O since the free blocks can be cached like other objects). No attempt is made to coalesce adjacent free blocks into larger blocks. <p>Allocation involves finding the correct stack (an O(log <em>K</em>) operation), removing the top item from the stack (a constant-time operation), and removing the block from the file free block list (a constant-time operation). If there is no free block of the requested size or larger, then the file is extended. <p>To provide sharability of the free list between processes, the last step of an allocation will check for the free block signature and if it doesn't find one will repeat the process. Alternatively, a process can temporarily remove free blocks from the file and hold them in it's own private pool. <p>To summarize... <dl> <dt>File opening <dd>O(<em>N</em>) amortized over the time the file is open, where <em>N</em> is the number of free blocks. The library can still function without reading any of the file free block list. <dt>Deallocation <dd>O(log <em>K</em>) where <em>K</em> is the number of unique sizes of free blocks. File access is constant. <dt>Allocation <dd>O(log <em>K</em>). File access is constant. <dt>File closing <dd>O(1) even if the library temporarily removes free blocks from the file to hold them in a private pool since the pool can still be a linked list on disk. </dl> <!-- ---------------------------------------------------------------- --> <h2>The Algorithm for III-C</h2> <p>The HDF5 file format supports a general B-tree mechanism for storing data with keys. If we use a B-tree to represent all parts of the file that are free and the B-tree is indexed so that a free file chunk can be found if we know the starting or ending address, then we can efficiently determine whether a free chunk begins or ends at the specified address. Call this the <em>Address B-Tree</em>. <p>If a second B-tree points to a set of stacks where the members of a particular stack are all free chunks of the same size, and the tree is indexed by chunk size, then we can efficiently find the best-fit chunk size for a memory request. Call this the <em>Size B-Tree</em>. <p>All free blocks of a particular size can be linked together with an unsorted, doubly-linked, circular list and the left and right sibling addresses can be stored within the free chunk, allowing us to remove or insert items from the list in constant time. <p>Deallocation of a block fo file memory consists of: <ol type="I"> <li>Add the new free block whose address is <em>ADDR</em> to the address B-tree. <ol type="A"> <li>If the address B-tree contains an entry for a free block that ends at <em>ADDR</em>-1 then remove that block from the B-tree and from the linked list (if the block was the first on the list then the size B-tree must be updated). Adjust the size and address of the block being freed to include the block just removed from the free list. The time required to search for and possibly remove the left block is O(log <em>N</em>) where <em>N</em> is the number of free blocks. <li>If the address B-tree contains an entry for the free block that begins at <em>ADDR</em>+<em>LENGTH</em> then remove that block from the B-tree and from the linked list (if the block was the first on the list then the size B-tree must be updated). Adjust the size of the block being freed to include the block just removed from the free list. The time required to search for and possibly remove the right block is O(log <em>N</em>). <li>Add the new (adjusted) block to the address B-tree. The time for this operation is O(log <em>N</em>). </ol> <li>Add the new block to the size B-tree and linked list. <ol type="A"> <li>If the size B-tree has an entry for this particular size, then add the chunk to the tail of the list. This is an O(log <em>K</em>) operation where <em>K</em> is the number of unique free block sizes. <li>Otherwise make a new entry in the B-tree for chunks of this size. This is also O(log <em>K</em>). </ol> </ol> <p>Allocation is similar to deallocation. <p>To summarize... <dl> <dt>File opening <dd>O(1) <dt>Deallocation <dd>O(log <em>N</em>) where <em>N</em> is the total number of free blocks. File access time is O(log <em>N</em>). <dt>Allocation <dd>O(log <em>N</em>). File access time is O(log <em>N</em>). <dt>File closing <dd>O(1). </dl> <hr> <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> <!-- Created: Thu Jul 24 15:16:40 PDT 1997 --> <!-- hhmts start --> Last modified: Thu Jul 31 14:41:01 EST <!-- hhmts end --> </body> </html>