summaryrefslogtreecommitdiffstats
path: root/doc/html/TechNotes/MemoryMgmt.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html/TechNotes/MemoryMgmt.html')
-rw-r--r--doc/html/TechNotes/MemoryMgmt.html510
1 files changed, 0 insertions, 510 deletions
diff --git a/doc/html/TechNotes/MemoryMgmt.html b/doc/html/TechNotes/MemoryMgmt.html
deleted file mode 100644
index 93782b5..0000000
--- a/doc/html/TechNotes/MemoryMgmt.html
+++ /dev/null
@@ -1,510 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
-<html>
- <head>
- <title>Memory Management in HDF5</title>
- </head>
-
- <body>
- <h1>Memory Management in HDF5</h1>
-
- <!-- ---------------------------------------------------------------- -->
- <h2>Is a Memory Manager Necessary?</h2>
-
- <p>Some form of memory management may be necessary in HDF5 when
- the various deletion operators are implemented so that the
- file memory is not permanently orphaned. However, since an
- HDF5 file was designed with persistent data in mind, the
- importance of a memory manager is questionable.
-
- <p>On the other hand, when certain meta data containers (file glue)
- grow, they may need to be relocated in order to keep the
- container contiguous.
-
- <blockquote>
- <b>Example:</b> An object header consists of up to two
- chunks of contiguous memory. The first chunk is a fixed
- size at a fixed location when the header link count is
- greater than one. Thus, inserting additional items into an
- object header may require the second chunk to expand. When
- this occurs, the second chunk may need to move to another
- location in the file, freeing the file memory which that
- chunk originally occupied.
- </blockquote>
-
- <p>The relocation of meta data containers could potentially
- orphan a significant amount of file memory if the application
- has made poor estimates for preallocation sizes.
-
- <!-- ---------------------------------------------------------------- -->
- <h2>Levels of Memory Management</h2>
-
- <p>Memory management by the library can be independent of memory
- management support by the file format. The file format can
- support no memory management, some memory management, or full
- memory management. Similarly with the library.
-
- <h3>Support in the Library</h3>
-
- <dl>
- <dt><b>No Support: I</b>
- <dd>When memory is deallocated it simply becomes unreferenced
- (orphaned) in the file. Memory allocation requests are
- satisfied by extending the file.
-
- <dd>A separate off-line utility can be used to detect the
- unreferenced bytes of a file and "bubble" them up to the end
- of the file and then truncate the file.
-
- <dt><b>Some Support: II</b>
- <dd>The library could support partial memory management all
- the time, or full memory management some of the time.
- Orphaning free blocks instead of adding them to a free list
- should not affect the file integrity, nor should fulfilling
- new requests by extending the file instead of using the free
- list.
-
- <dt><b>Full Support: III</b>
- <dd>The library supports space-efficient memory management by
- always fulfilling allocation requests from the free list when
- possible, and by coalescing adjacent free blocks into a
- single larger free block.
- </dl>
-
- <h3>Support in the File Format</h3>
-
- <dl>
- <dt><b>No Support: A</b>
- <dd>The file format does not support memory management; any
- unreferenced block in the file is assumed to be free. If
- the library supports full memory management then it will
- have to traverse the entire file to determine which blocks
- are unreferenced.
-
- <dt><b>Some Support: B</b>
- <dd>Assuming that unreferenced blocks are free can be
- dangerous in a situation where the file is not consistent.
- For instance, if a directory tree becomes detached from the
- main directory hierarchy, then the detached directory and
- everything that is referenced only through the detached
- directory become unreferenced. File repair utilities will
- be unable to determine which unreferenced blocks need to be
- linked back into the file hierarchy.
-
- <dd>Therefore, it might be useful to keep an unsorted,
- doubly-linked list of free blocks in the file. The library
- can add and remove blocks from the list in constant time,
- and can generate its own internal free-block data structure
- in time proportional to the number of free blocks instead of
- the size of the file. Additionally, a library can use a
- subset of the free blocks, an alternative which is not
- feasible if the file format doesn't support any form of
- memory management.
-
- <dt><b>Full Support: C</b>
- <dd>The file format can mirror library data structures for
- space-efficient memory management. The free blocks are
- linked in unsorted, doubly-linked lists with one list per
- free block size. The heads of the lists are pointed to by a
- B-tree whose nodes are sorted by free block size. At the
- same time, all free blocks are the leaf nodes of another
- B-tree sorted by starting and ending address. When the
- trees are used in combination we can deallocate and allocate
- memory in O(log <em>N</em>) time where <em>N</em> is the
- number of free blocks.
- </dl>
-
- <h3>Combinations of Library and File Format Support</h3>
-
- <p>We now evaluate each combination of library support with file
- support:
-
- <dl>
- <dt><b>I-A</b>
- <dd>If neither the library nor the file support memory
- management, then each allocation request will come from the
- end of the file and each deallocation request is a no-op
- that simply leaves the free block unreferenced.
-
- <ul>
- <li>Advantages
- <ul>
- <li>No file overhead for allocation or deallocation.
- <li>No library overhead for allocation or
- deallocation.
- <li>No file traversal required at time of open.
- <li>No data needs to be written back to the file when
- it's closed.
- <li>Trivial to implement (already implemented).
- </ul>
-
- <li>Disadvantages
- <ul>
- <li>Inefficient use of file space.
- <li>A file repair utility must reclaim lost file space.
- <li>Difficulties for file repair utilities. (Is an
- unreferenced block a free block or orphaned data?)
- </ul>
- </ul>
-
- <dt><b>II-A</b>
- <dd>In order for the library to support memory management, it
- will be required to build the internal free block
- representation by traversing the entire file looking for
- unreferenced blocks.
-
- <ul>
- <li>Advantages
- <ul>
- <li>No file overhead for allocation or deallocation.
- <li>Variable amount of library overhead for allocation
- and deallocation depending on how much work the
- library wants to do.
- <li>No data needs to be written back to the file when
- it's closed.
- <li>Might use file space efficiently.
- </ul>
- <li>Disadvantages
- <ul>
- <li>Might use file space inefficiently.
- <li>File traversal required at time of open.
- <li>A file repair utility must reclaim lost file space.
- <li>Difficulties for file repair utilities.
- <li>Sharing of the free list between processes falls
- outside the HDF5 file format documentation.
- </ul>
- </ul>
-
- <dt><b>III-A</b>
- <dd>In order for the library to support full memory
- management, it will be required to build the internal free
- block representation by traversing the entire file looking
- for unreferenced blocks.
-
- <ul>
- <li>Advantages
- <ul>
- <li>No file overhead for allocation or deallocation.
- <li>Efficient use of file space.
- <li>No data needs to be written back to the file when
- it's closed.
- </ul>
- <li>Disadvantages
- <ul>
- <li>Moderate amount of library overhead for allocation
- and deallocation.
- <li>File traversal required at time of open.
- <li>A file repair utility must reclaim lost file space.
- <li>Difficulties for file repair utilities.
- <li>Sharing of the free list between processes falls
- outside the HDF5 file format documentation.
- </ul>
- </ul>
-
- <dt><b>I-B</b>
- <dd>If the library doesn't support memory management but the
- file format supports some level of management, then a file
- repair utility will have to be run occasionally to reclaim
- unreferenced blocks.
-
- <ul>
- <li>Advantages
- <ul>
- <li>No file overhead for allocation or deallocation.
- <li>No library overhead for allocation or
- deallocation.
- <li>No file traversal required at time of open.
- <li>No data needs to be written back to the file when
- it's closed.
- </ul>
- <li>Disadvantages
- <ul>
- <li>A file repair utility must reclaim lost file space.
- <li>Difficulties for file repair utilities.
- </ul>
- </ul>
-
- <dt><b>II-B</b>
- <dd>Both the library and the file format support some level
- of memory management.
-
- <ul>
- <li>Advantages
- <ul>
- <li>Constant file overhead per allocation or
- deallocation.
- <li>Variable library overhead per allocation or
- deallocation depending on how much work the library
- wants to do.
- <li>Traversal at file open time is on the order of the
- free list size instead of the file size.
- <li>The library has the option of reading only part of
- the free list.
- <li>No data needs to be written at file close time if
- it has been amortized into the cost of allocation
- and deallocation.
- <li>File repair utilties don't have to be run to
- reclaim memory.
- <li>File repair utilities can detect whether an
- unreferenced block is a free block or orphaned data.
- <li>Sharing of the free list between processes might
- be easier.
- <li>Possible efficient use of file space.
- </ul>
- <li>Disadvantages
- <ul>
- <li>Possible inefficient use of file space.
- </ul>
- </ul>
-
- <dt><b>III-B</b>
- <dd>The library provides space-efficient memory management but
- the file format only supports an unsorted list of free
- blocks.
-
- <ul>
- <li>Advantages
- <ul>
- <li>Constant time file overhead per allocation or
- deallocation.
- <li>No data needs to be written at file close time if
- it has been amortized into the cost of allocation
- and deallocation.
- <li>File repair utilities don't have to be run to
- reclaim memory.
- <li>File repair utilities can detect whether an
- unreferenced block is a free block or orphaned data.
- <li>Sharing of the free list between processes might
- be easier.
- <li>Efficient use of file space.
- </ul>
- <li>Disadvantages
- <ul>
- <li>O(log <em>N</em>) library overhead per allocation or
- deallocation where <em>N</em> is the total number of
- free blocks.
- <li>O(<em>N</em>) time to open a file since the entire
- free list must be read to construct the in-core
- trees used by the library.
- <li>Library is more complicated.
- </ul>
- </ul>
-
- <dt><b>I-C</b>
- <dd>This has the same advantages and disadvantages as I-C with
- the added disadvantage that the file format is much more
- complicated.
-
- <dt><b>II-C</b>
- <dd>If the library only provides partial memory management but
- the file requires full memory management, then this method
- degenerates to the same as II-A with the added disadvantage
- that the file format is much more complicated.
-
- <dt><b>III-C</b>
- <dd>The library and file format both provide complete data
- structures for space-efficient memory management.
-
- <ul>
- <li>Advantages
- <ul>
- <li>Files can be opened in constant time since the
- free list is read on demand and amortised into the
- allocation and deallocation requests.
- <li>No data needs to be written back to the file when
- it's closed.
- <li>File repair utilities don't have to be run to
- reclaim memory.
- <li>File repair utilities can detect whether an
- unreferenced block is a free block or orphaned data.
- <li>Sharing the free list between processes is easy.
- <li>Efficient use of file space.
- </ul>
- <li>Disadvantages
- <ul>
- <li>O(log <em>N</em>) file allocation and deallocation
- cost where <em>N</em> is the total number of free
- blocks.
- <li>O(log <em>N</em>) library allocation and
- deallocation cost.
- <li>Much more complicated file format.
- <li>More complicated library.
- </ul>
- </ul>
-
- </dl>
-
- <!-- ---------------------------------------------------------------- -->
- <h2>The Algorithm for II-B</h2>
-
- <p>The file contains an unsorted, doubly-linked list of free
- blocks. The address of the head of the list appears in the
- boot block. Each free block contains the following fields:
-
- <center>
- <table border cellpadding=4 width="60%">
- <tr align=center>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
- <th width="25%">byte</th>
-
- <tr align=center>
- <th colspan=4>Free Block Signature</th>
-
- <tr align=center>
- <th colspan=4>Total Free Block Size</th>
-
- <tr align=center>
- <th colspan=4>Address of Left Sibling</th>
-
- <tr align=center>
- <th colspan=4>Address of Right Sibling</th>
-
- <tr align=center>
- <th colspan=4><br><br>Remainder of Free Block<br><br><br></th>
- </table>
- </center>
-
- <p>The library reads as much of the free list as convenient when
- convenient and pushes those entries onto stacks. This can
- occur when a file is opened or any time during the life of the
- file. There is one stack for each free block size and the
- stacks are sorted by size in a balanced tree in memory.
-
- <p>Deallocation involves finding the correct stack or creating
- a new one (an O(log <em>K</em>) operation where <em>K</em> is
- the number of stacks), pushing the free block info onto the
- stack (a constant-time operation), and inserting the free
- block into the file free block list (a constant-time operation
- which doesn't necessarily involve any I/O since the free blocks
- can be cached like other objects). No attempt is made to
- coalesce adjacent free blocks into larger blocks.
-
- <p>Allocation involves finding the correct stack (an O(log
- <em>K</em>) operation), removing the top item from the stack
- (a constant-time operation), and removing the block from the
- file free block list (a constant-time operation). If there is
- no free block of the requested size or larger, then the file
- is extended.
-
- <p>To provide sharability of the free list between processes,
- the last step of an allocation will check for the free block
- signature and if it doesn't find one will repeat the process.
- Alternatively, a process can temporarily remove free blocks
- from the file and hold them in it's own private pool.
-
- <p>To summarize...
- <dl>
- <dt>File opening
- <dd>O(<em>N</em>) amortized over the time the file is open,
- where <em>N</em> is the number of free blocks. The library
- can still function without reading any of the file free
- block list.
-
- <dt>Deallocation
- <dd>O(log <em>K</em>) where <em>K</em> is the number of unique
- sizes of free blocks. File access is constant.
-
- <dt>Allocation
- <dd>O(log <em>K</em>). File access is constant.
-
- <dt>File closing
- <dd>O(1) even if the library temporarily removes free
- blocks from the file to hold them in a private pool since
- the pool can still be a linked list on disk.
- </dl>
-
- <!-- ---------------------------------------------------------------- -->
- <h2>The Algorithm for III-C</h2>
-
- <p>The HDF5 file format supports a general B-tree mechanism
- for storing data with keys. If we use a B-tree to represent
- all parts of the file that are free and the B-tree is indexed
- so that a free file chunk can be found if we know the starting
- or ending address, then we can efficiently determine whether a
- free chunk begins or ends at the specified address. Call this
- the <em>Address B-Tree</em>.
-
- <p>If a second B-tree points to a set of stacks where the
- members of a particular stack are all free chunks of the same
- size, and the tree is indexed by chunk size, then we can
- efficiently find the best-fit chunk size for a memory request.
- Call this the <em>Size B-Tree</em>.
-
- <p>All free blocks of a particular size can be linked together
- with an unsorted, doubly-linked, circular list and the left
- and right sibling addresses can be stored within the free
- chunk, allowing us to remove or insert items from the list in
- constant time.
-
- <p>Deallocation of a block fo file memory consists of:
-
- <ol type="I">
- <li>Add the new free block whose address is <em>ADDR</em> to the
- address B-tree.
-
- <ol type="A">
- <li>If the address B-tree contains an entry for a free
- block that ends at <em>ADDR</em>-1 then remove that
- block from the B-tree and from the linked list (if the
- block was the first on the list then the size B-tree
- must be updated). Adjust the size and address of the
- block being freed to include the block just removed from
- the free list. The time required to search for and
- possibly remove the left block is O(log <em>N</em>)
- where <em>N</em> is the number of free blocks.
-
- <li>If the address B-tree contains an entry for the free
- block that begins at <em>ADDR</em>+<em>LENGTH</em> then
- remove that block from the B-tree and from the linked
- list (if the block was the first on the list then the
- size B-tree must be updated). Adjust the size of the
- block being freed to include the block just removed from
- the free list. The time required to search for and
- possibly remove the right block is O(log <em>N</em>).
-
- <li>Add the new (adjusted) block to the address B-tree.
- The time for this operation is O(log <em>N</em>).
- </ol>
-
- <li>Add the new block to the size B-tree and linked list.
-
- <ol type="A">
- <li>If the size B-tree has an entry for this particular
- size, then add the chunk to the tail of the list. This
- is an O(log <em>K</em>) operation where <em>K</em> is
- the number of unique free block sizes.
-
- <li>Otherwise make a new entry in the B-tree for chunks of
- this size. This is also O(log <em>K</em>).
- </ol>
- </ol>
-
- <p>Allocation is similar to deallocation.
-
- <p>To summarize...
-
- <dl>
- <dt>File opening
- <dd>O(1)
-
- <dt>Deallocation
- <dd>O(log <em>N</em>) where <em>N</em> is the total number of
- free blocks. File access time is O(log <em>N</em>).
-
- <dt>Allocation
- <dd>O(log <em>N</em>). File access time is O(log <em>N</em>).
-
- <dt>File closing
- <dd>O(1).
- </dl>
-
-
- <hr>
- <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
-<!-- Created: Thu Jul 24 15:16:40 PDT 1997 -->
-<!-- hhmts start -->
-Last modified: Thu Jul 31 14:41:01 EST
-<!-- hhmts end -->
- </body>
-</html>