diff options
Diffstat (limited to 'doc/html/MemoryManagement.html')
-rw-r--r-- | doc/html/MemoryManagement.html | 510 |
1 files changed, 0 insertions, 510 deletions
diff --git a/doc/html/MemoryManagement.html b/doc/html/MemoryManagement.html deleted file mode 100644 index c93dc10..0000000 --- a/doc/html/MemoryManagement.html +++ /dev/null @@ -1,510 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> -<html> - <head> - <title>Memory Management in HDF5</title> - </head> - - <body> - <h1>Memory Management in HDF5</h1> - - <!-- ---------------------------------------------------------------- --> - <h2>Is a Memory Manager Necessary?</h2> - - <p>Some form of memory management may be necessary in HDF5 when - the various deletion operators are implemented so that the - file memory is not permanently orphaned. However, since an - HDF5 file was designed with persistent data in mind, the - importance of a memory manager is questionable. - - <p>On the other hand, when certain meta data containers (file glue) - grow, they may need to be relocated in order to keep the - container contiguous. - - <blockquote> - <b>Example:</b> An object header consists of up to two - chunks of contiguous memory. The first chunk is a fixed - size at a fixed location when the header link count is - greater than one. Thus, inserting additional items into an - object header may require the second chunk to expand. When - this occurs, the second chunk may need to move to another - location in the file, freeing the file memory which that - chunk originally occupied. - </blockquote> - - <p>The relocation of meta data containers could potentially - orphan a significant amount of file memory if the application - has made poor estimates for preallocation sizes. - - <!-- ---------------------------------------------------------------- --> - <h2>Levels of Memory Management</h2> - - <p>Memory management by the library can be independent of memory - management support by the file format. The file format can - support no memory management, some memory management, or full - memory management. Similarly with the library. - - <h3>Support in the Library</h3> - - <dl> - <dt><b>No Support: I</b> - <dd>When memory is deallocated it simply becomes unreferenced - (orphaned) in the file. Memory allocation requests are - satisfied by extending the file. - - <dd>A separate off-line utility can be used to detect the - unreferenced bytes of a file and "bubble" them up to the end - of the file and then truncate the file. - - <dt><b>Some Support: II</b> - <dd>The library could support partial memory management all - the time, or full memory management some of the time. - Orphaning free blocks instead of adding them to a free list - should not affect the file integrity, nor should fulfilling - new requests by extending the file instead of using the free - list. - - <dt><b>Full Support: III</b> - <dd>The library supports space-efficient memory management by - always fulfilling allocation requests from the free list when - possible, and by coalescing adjacent free blocks into a - single larger free block. - </dl> - - <h3>Support in the File Format</h3> - - <dl> - <dt><b>No Support: A</b> - <dd>The file format does not support memory management; any - unreferenced block in the file is assumed to be free. If - the library supports full memory management then it will - have to traverse the entire file to determine which blocks - are unreferenced. - - <dt><b>Some Support: B</b> - <dd>Assuming that unreferenced blocks are free can be - dangerous in a situation where the file is not consistent. - For instance, if a directory tree becomes detached from the - main directory hierarchy, then the detached directory and - everything that is referenced only through the detached - directory become unreferenced. File repair utilities will - be unable to determine which unreferenced blocks need to be - linked back into the file hierarchy. - - <dd>Therefore, it might be useful to keep an unsorted, - doubly-linked list of free blocks in the file. The library - can add and remove blocks from the list in constant time, - and can generate its own internal free-block data structure - in time proportional to the number of free blocks instead of - the size of the file. Additionally, a library can use a - subset of the free blocks, an alternative which is not - feasible if the file format doesn't support any form of - memory management. - - <dt><b>Full Support: C</b> - <dd>The file format can mirror library data structures for - space-efficient memory management. The free blocks are - linked in unsorted, doubly-linked lists with one list per - free block size. The heads of the lists are pointed to by a - B-tree whose nodes are sorted by free block size. At the - same time, all free blocks are the leaf nodes of another - B-tree sorted by starting and ending address. When the - trees are used in combination we can deallocate and allocate - memory in O(log <em>N</em>) time where <em>N</em> is the - number of free blocks. - </dl> - - <h3>Combinations of Library and File Format Support</h3> - - <p>We now evaluate each combination of library support with file - support: - - <dl> - <dt><b>I-A</b> - <dd>If neither the library nor the file support memory - management, then each allocation request will come from the - end of the file and each deallocation request is a no-op - that simply leaves the free block unreferenced. - - <ul> - <li>Advantages - <ul> - <li>No file overhead for allocation or deallocation. - <li>No library overhead for allocation or - deallocation. - <li>No file traversal required at time of open. - <li>No data needs to be written back to the file when - it's closed. - <li>Trivial to implement (already implemented). - </ul> - - <li>Disadvantages - <ul> - <li>Inefficient use of file space. - <li>A file repair utility must reclaim lost file space. - <li>Difficulties for file repair utilities. (Is an - unreferenced block a free block or orphaned data?) - </ul> - </ul> - - <dt><b>II-A</b> - <dd>In order for the library to support memory management, it - will be required to build the internal free block - representation by traversing the entire file looking for - unreferenced blocks. - - <ul> - <li>Advantages - <ul> - <li>No file overhead for allocation or deallocation. - <li>Variable amount of library overhead for allocation - and deallocation depending on how much work the - library wants to do. - <li>No data needs to be written back to the file when - it's closed. - <li>Might use file space efficiently. - </ul> - <li>Disadvantages - <ul> - <li>Might use file space inefficiently. - <li>File traversal required at time of open. - <li>A file repair utility must reclaim lost file space. - <li>Difficulties for file repair utilities. - <li>Sharing of the free list between processes falls - outside the HDF5 file format documentation. - </ul> - </ul> - - <dt><b>III-A</b> - <dd>In order for the library to support full memory - management, it will be required to build the internal free - block representation by traversing the entire file looking - for unreferenced blocks. - - <ul> - <li>Advantages - <ul> - <li>No file overhead for allocation or deallocation. - <li>Efficient use of file space. - <li>No data needs to be written back to the file when - it's closed. - </ul> - <li>Disadvantages - <ul> - <li>Moderate amount of library overhead for allocation - and deallocation. - <li>File traversal required at time of open. - <li>A file repair utility must reclaim lost file space. - <li>Difficulties for file repair utilities. - <li>Sharing of the free list between processes falls - outside the HDF5 file format documentation. - </ul> - </ul> - - <dt><b>I-B</b> - <dd>If the library doesn't support memory management but the - file format supports some level of management, then a file - repair utility will have to be run occasionally to reclaim - unreferenced blocks. - - <ul> - <li>Advantages - <ul> - <li>No file overhead for allocation or deallocation. - <li>No library overhead for allocation or - deallocation. - <li>No file traversal required at time of open. - <li>No data needs to be written back to the file when - it's closed. - </ul> - <li>Disadvantages - <ul> - <li>A file repair utility must reclaim lost file space. - <li>Difficulties for file repair utilities. - </ul> - </ul> - - <dt><b>II-B</b> - <dd>Both the library and the file format support some level - of memory management. - - <ul> - <li>Advantages - <ul> - <li>Constant file overhead per allocation or - deallocation. - <li>Variable library overhead per allocation or - deallocation depending on how much work the library - wants to do. - <li>Traversal at file open time is on the order of the - free list size instead of the file size. - <li>The library has the option of reading only part of - the free list. - <li>No data needs to be written at file close time if - it has been amortized into the cost of allocation - and deallocation. - <li>File repair utilties don't have to be run to - reclaim memory. - <li>File repair utilities can detect whether an - unreferenced block is a free block or orphaned data. - <li>Sharing of the free list between processes might - be easier. - <li>Possible efficient use of file space. - </ul> - <li>Disadvantages - <ul> - <li>Possible inefficient use of file space. - </ul> - </ul> - - <dt><b>III-B</b> - <dd>The library provides space-efficient memory management but - the file format only supports an unsorted list of free - blocks. - - <ul> - <li>Advantages - <ul> - <li>Constant time file overhead per allocation or - deallocation. - <li>No data needs to be written at file close time if - it has been amortized into the cost of allocation - and deallocation. - <li>File repair utilities don't have to be run to - reclaim memory. - <li>File repair utilities can detect whether an - unreferenced block is a free block or orphaned data. - <li>Sharing of the free list between processes might - be easier. - <li>Efficient use of file space. - </ul> - <li>Disadvantages - <ul> - <li>O(log <em>N</em>) library overhead per allocation or - deallocation where <em>N</em> is the total number of - free blocks. - <li>O(<em>N</em>) time to open a file since the entire - free list must be read to construct the in-core - trees used by the library. - <li>Library is more complicated. - </ul> - </ul> - - <dt><b>I-C</b> - <dd>This has the same advantages and disadvantages as I-C with - the added disadvantage that the file format is much more - complicated. - - <dt><b>II-C</b> - <dd>If the library only provides partial memory management but - the file requires full memory management, then this method - degenerates to the same as II-A with the added disadvantage - that the file format is much more complicated. - - <dt><b>III-C</b> - <dd>The library and file format both provide complete data - structures for space-efficient memory management. - - <ul> - <li>Advantages - <ul> - <li>Files can be opened in constant time since the - free list is read on demand and amortised into the - allocation and deallocation requests. - <li>No data needs to be written back to the file when - it's closed. - <li>File repair utilities don't have to be run to - reclaim memory. - <li>File repair utilities can detect whether an - unreferenced block is a free block or orphaned data. - <li>Sharing the free list between processes is easy. - <li>Efficient use of file space. - </ul> - <li>Disadvantages - <ul> - <li>O(log <em>N</em>) file allocation and deallocation - cost where <em>N</em> is the total number of free - blocks. - <li>O(log <em>N</em>) library allocation and - deallocation cost. - <li>Much more complicated file format. - <li>More complicated library. - </ul> - </ul> - - </dl> - - <!-- ---------------------------------------------------------------- --> - <h2>The Algorithm for II-B</h2> - - <p>The file contains an unsorted, doubly-linked list of free - blocks. The address of the head of the list appears in the - super block. Each free block contains the following fields: - - <center> - <table border cellpadding=4 width="60%"> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - - <tr align=center> - <th colspan=4>Free Block Signature</th> - - <tr align=center> - <th colspan=4>Total Free Block Size</th> - - <tr align=center> - <th colspan=4>Address of Left Sibling</th> - - <tr align=center> - <th colspan=4>Address of Right Sibling</th> - - <tr align=center> - <th colspan=4><br><br>Remainder of Free Block<br><br><br></th> - </table> - </center> - - <p>The library reads as much of the free list as convenient when - convenient and pushes those entries onto stacks. This can - occur when a file is opened or any time during the life of the - file. There is one stack for each free block size and the - stacks are sorted by size in a balanced tree in memory. - - <p>Deallocation involves finding the correct stack or creating - a new one (an O(log <em>K</em>) operation where <em>K</em> is - the number of stacks), pushing the free block info onto the - stack (a constant-time operation), and inserting the free - block into the file free block list (a constant-time operation - which doesn't necessarily involve any I/O since the free blocks - can be cached like other objects). No attempt is made to - coalesce adjacent free blocks into larger blocks. - - <p>Allocation involves finding the correct stack (an O(log - <em>K</em>) operation), removing the top item from the stack - (a constant-time operation), and removing the block from the - file free block list (a constant-time operation). If there is - no free block of the requested size or larger, then the file - is extended. - - <p>To provide sharability of the free list between processes, - the last step of an allocation will check for the free block - signature and if it doesn't find one will repeat the process. - Alternatively, a process can temporarily remove free blocks - from the file and hold them in it's own private pool. - - <p>To summarize... - <dl> - <dt>File opening - <dd>O(<em>N</em>) amortized over the time the file is open, - where <em>N</em> is the number of free blocks. The library - can still function without reading any of the file free - block list. - - <dt>Deallocation - <dd>O(log <em>K</em>) where <em>K</em> is the number of unique - sizes of free blocks. File access is constant. - - <dt>Allocation - <dd>O(log <em>K</em>). File access is constant. - - <dt>File closing - <dd>O(1) even if the library temporarily removes free - blocks from the file to hold them in a private pool since - the pool can still be a linked list on disk. - </dl> - - <!-- ---------------------------------------------------------------- --> - <h2>The Algorithm for III-C</h2> - - <p>The HDF5 file format supports a general B-tree mechanism - for storing data with keys. If we use a B-tree to represent - all parts of the file that are free and the B-tree is indexed - so that a free file chunk can be found if we know the starting - or ending address, then we can efficiently determine whether a - free chunk begins or ends at the specified address. Call this - the <em>Address B-Tree</em>. - - <p>If a second B-tree points to a set of stacks where the - members of a particular stack are all free chunks of the same - size, and the tree is indexed by chunk size, then we can - efficiently find the best-fit chunk size for a memory request. - Call this the <em>Size B-Tree</em>. - - <p>All free blocks of a particular size can be linked together - with an unsorted, doubly-linked, circular list and the left - and right sibling addresses can be stored within the free - chunk, allowing us to remove or insert items from the list in - constant time. - - <p>Deallocation of a block fo file memory consists of: - - <ol type="I"> - <li>Add the new free block whose address is <em>ADDR</em> to the - address B-tree. - - <ol type="A"> - <li>If the address B-tree contains an entry for a free - block that ends at <em>ADDR</em>-1 then remove that - block from the B-tree and from the linked list (if the - block was the first on the list then the size B-tree - must be updated). Adjust the size and address of the - block being freed to include the block just removed from - the free list. The time required to search for and - possibly remove the left block is O(log <em>N</em>) - where <em>N</em> is the number of free blocks. - - <li>If the address B-tree contains an entry for the free - block that begins at <em>ADDR</em>+<em>LENGTH</em> then - remove that block from the B-tree and from the linked - list (if the block was the first on the list then the - size B-tree must be updated). Adjust the size of the - block being freed to include the block just removed from - the free list. The time required to search for and - possibly remove the right block is O(log <em>N</em>). - - <li>Add the new (adjusted) block to the address B-tree. - The time for this operation is O(log <em>N</em>). - </ol> - - <li>Add the new block to the size B-tree and linked list. - - <ol type="A"> - <li>If the size B-tree has an entry for this particular - size, then add the chunk to the tail of the list. This - is an O(log <em>K</em>) operation where <em>K</em> is - the number of unique free block sizes. - - <li>Otherwise make a new entry in the B-tree for chunks of - this size. This is also O(log <em>K</em>). - </ol> - </ol> - - <p>Allocation is similar to deallocation. - - <p>To summarize... - - <dl> - <dt>File opening - <dd>O(1) - - <dt>Deallocation - <dd>O(log <em>N</em>) where <em>N</em> is the total number of - free blocks. File access time is O(log <em>N</em>). - - <dt>Allocation - <dd>O(log <em>N</em>). File access time is O(log <em>N</em>). - - <dt>File closing - <dd>O(1). - </dl> - - - <hr> - <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> -<!-- Created: Thu Jul 24 15:16:40 PDT 1997 --> -<!-- hhmts start --> -Last modified: Thu Jul 31 14:41:01 EST -<!-- hhmts end --> - </body> -</html> |