diff options
Diffstat (limited to 'doc/html/symtab')
-rw-r--r-- | doc/html/symtab | 313 |
1 files changed, 0 insertions, 313 deletions
diff --git a/doc/html/symtab b/doc/html/symtab deleted file mode 100644 index a657729..0000000 --- a/doc/html/symtab +++ /dev/null @@ -1,313 +0,0 @@ -A number of issues involving caching of object header messages in -symbol table entries must be resolved. - -What is the motivation for these changes? - - If we make objects completely independent of object name it allows - us to refer to one object by multiple names (a concept called hard - links in Unix file systems), which in turn provides an easy way to - share data between datasets. - - Every object in an HDF5 file has a unique, constant object header - address which serves as a handle (or OID) for the object. The - object header contains messages which describe the object. - - HDF5 allows some of the object header messages to be cached in - symbol table entries so that the object header doesn't have to be - read from disk. For instance, an entry for a directory caches the - directory disk addresses required to access that directory, so the - object header for that directory is seldom read. - - If an object has multiple names (that is, a link count greater than - one), then it has multiple symbol table entries which point to it. - All symbol table entries must agree on header messages. The - current mechanism is to turn off the caching of header messages in - symbol table entries when the header link count is more than one, - and to allow caching once the link count returns to one. - - However, in the current implementation, a package is allowed to - copy a symbol table entry and use it as a private cache for the - object header. This doesn't work for a number of reasons (all but - one require a `delete symbol entry' operation). - - 1. If two packages hold copies of the same symbol table entry, - they don't notify each other of changes to the symbol table - entry. Eventually, one package reads a cached message and - gets the wrong value because the other package changed the - message in the object header. - - 2. If one package holds a copy of the symbol table entry and - some other part of HDF5 removes the object and replaces it - with some other object, then the original package will - continue to access the non-existent object using the new - object header. - - 3. If one package holds a copy of the symbol table entry and - some other part of HDF5 (re)moves the directory which - contains the object, then the package will be unable to - update the symbol table entry with the new cached - data. Packages that refer to the object by the new name will - use old cached data. - - -The basic problem is that there may be multiple copies of the object -symbol table entry floating around in the code when there should -really be at most one per hard link. - - Level 0: A copy may exist on disk as part of a symbol table node, which - is a small 1d array of symbol table entries. - - Level 1: A copy may be cached in memory as part of a symbol table node - in the H5Gnode.c file by the H5AC layer. - - Level 2a: Another package may be holding a copy so it can perform - fast lookup of any header messages that might be cached in - the symbol table entry. It can't point directly to the - cached symbol table node because that node can dissappear - at any time. - - Level 2b: Packages may hold more than one copy of a symbol table - entry. For instance, if H5D_open() is called twice for - the same name, then two copies of the symbol table entry - for the dataset exist in the H5D package. - -How can level 2a and 2b be combined? - - If package data structures contained pointers to symbol table - entries instead of copies of symbol table entries and if H5G - allocated one symbol table entry per hard link, then it's trivial - for Level 2a and 2b to benefit from one another's actions since - they share the same cache. - -How does this work conceptually? - - Level 2a and 2b must notify Level 1 of their intent to use (or stop - using) a symbol table entry to access an object header. The - notification of the intent to access an object header is called - `opening' the object and releasing the access is `closing' the - object. - - Opening an object requires an object name which is used to locate - the symbol table entry to use for caching of object header - messages. The return value is a handle for the object. Figure 1 - shows the state after Dataset1 opens Object with a name that maps - through Entry1. The open request created a copy of Entry1 called - Shadow1 which exists even if SymNode1 is preempted from the H5AC - layer. - - ______ - Object / \ - SymNode1 +--------+ | - +--------+ _____\ | Header | | - | | / / +--------+ | - +--------+ +---------+ \______/ - | Entry1 | | Shadow1 | /____ - +--------+ +---------+ \ \ - : : \ - +--------+ +----------+ - | Dataset1 | - +----------+ - FIGURE 1 - - - - The SymNode1 can appear and disappear from the H5AC layer at any - time without affecting the Object Header data cached in the Shadow. - The rules are: - - * If the SymNode1 is present and is about to disappear and the - Shadow1 dirty bit is set, then Shadow1 is copied over Entry1, the - Entry1 dirty bit is set, and the Shadow1 dirty bit is cleared. - - * If something requests a copy of Entry1 (for a read-only peek - request), and Shadow1 exists, then a copy (not pointer) of Shadow1 - is returned instead. - - * Entry1 cannot be deleted while Shadow1 exists. - - * Entry1 cannot change directly if Shadow1 exists since this means - that some other package has opened the object and may be modifying - it. I haven't decided if it's useful to ever change Entry1 - directly (except of course within the H5G layer itself). - - * Shadow1 is created when Dataset1 `opens' the object through - Entry1. Dataset1 is given a pointer to Shadow1 and Shadow1's - reference count is incremented. - - * When Dataset1 `closes' the Object the Shadow1 reference count is - decremented. When the reference count reaches zero, if the - Shadow1 dirty bit is set, then Shadow1's contents are copied to - Entry1, and the Entry1 dirty bit is set. Shadow1 is then deleted - if its reference count is zero. This may require reading SymNode1 - back into the H5AC layer. - -What happens when another Dataset opens the Object through Entry1? - - If the current state is represented by the top part of Figure 2, - then Dataset2 will be given a pointer to Shadow1 and the Shadow1 - reference count will be incremented to two. The Object header link - count remains at one so Object Header messages continue to be cached - by Shadow1. Dataset1 and Dataset2 benefit from one another - actions. The resulting state is represented by Figure 2. - - _____ - SymNode1 Object / \ - +--------+ _____\ +--------+ | - | | / / | Header | | - +--------+ +---------+ +--------+ | - | Entry1 | | Shadow1 | /____ \_____/ - +--------+ +---------+ \ \ - : : _ \ - +--------+ |\ +----------+ - \ | Dataset1 | - \________ +----------+ - \ \ - +----------+ | - | Dataset2 | |- New Dataset - +----------+ | - / - FIGURE 2 - - -What happens when the link count for Object increases while Dataset -has the Object open? - - SymNode2 - +--------+ - SymNode1 Object | | - +--------+ ____\ +--------+ /______ +--------+ - | | / / | header | \ `| Entry2 | - +--------+ +---------+ +--------+ +--------+ - | Entry1 | | Shadow1 | /____ : : - +--------+ +---------+ \ \ +--------+ - : : \ - +--------+ +----------+ \________________/ - | Dataset1 | | - +----------+ New Link - - FIGURE 3 - - The current state is represented by the left part of Figure 3. To - create a new link the Object Header had to be located by traversing - through Entry1/Shadow1. On the way through, the Entry1/Shadow1 - cache is invalidated and the Object Header link count is - incremented. Entry2 is then added to SymNode2. - - Since the Object Header link count is greater than one, Object - header data will not be cached in Entry1/Shadow1. - - If the initial state had been all of Figure 3 and a third link is - being added and Object is open by Entry1 and Entry2, then creation - of the third link will invalidate the cache in Entry1 or Entry2. It - doesn't matter which since both caches are already invalidated - anyway. - -What happens if another Dataset opens the same object by another name? - - If the current state is represented by Figure 3, then a Shadow2 is - created and associated with Entry2. However, since the Object - Header link count is more than one, nothing gets cached in Shadow2 - (or Shadow1). - -What happens if the link count decreases? - - If the current state is represented by all of Figure 3 then it isn't - possible to delete Entry1 because the object is currently open - through that entry. Therefore, the link count must have - decreased because Entry2 was removed. - - As Dataset1 reads/writes messages in the Object header they will - begin to be cached in Shadow1 again because the Object header link - count is one. - -What happens if the object is removed while it's open? - - That operation is not allowed. - -What happens if the directory containing the object is deleted? - - That operation is not allowed since deleting the directory requires - that the directory be empty. The directory cannot be emptied - because the open object cannot be removed from the directory. - -What happens if the object is moved? - - Moving an object is a process consisting of creating a new - hard-link with the new name and then deleting the old name. - This will fail if the object is open. - -What happens if the directory containing the entry is moved? - - The entry and the shadow still exist and are associated with one - another. - -What if a file is flushed or closed when objects are open? - - Flushing a symbol table with open objects writes correct information - to the file since Shadow is copied to Entry before the table is - flushed. - - Closing a file with open objects will create a valid file but will - return failure. - -How is the Shadow associated with the Entry? - - A symbol table is composed of one or more symbol nodes. A node is a - small 1-d array of symbol table entries. The entries can move - around within a node and from node-to-node as entries are added or - removed from the symbol table and nodes can move around within a - symbol table, being created and destroyed as necessary. - - Since a symbol table has an object header with a unique and constant - file offset, and since H5G contains code to efficiently locate a - symbol table entry given it's name, we use these two values as a key - within a shadow to associate the shadow with the symbol table - entry. - - struct H5G_shadow_t { - haddr_t stab_addr; /*symbol table header address*/ - char *name; /*entry name wrt symbol table*/ - hbool_t dirty; /*out-of-date wrt stab entry?*/ - H5G_entry_t ent; /*my copy of stab entry */ - H5G_entry_t *main; /*the level 1 entry or null */ - H5G_shadow_t *next, *prev; /*other shadows for this stab*/ - }; - - The set of shadows will be organized in a hash table of linked - lists. Each linked list will contain the shadows associated with a - particular symbol table header address and the list will be sorted - lexicographically. - - Also, each Entry will have a pointer to the corresponding Shadow or - null if there is no shadow. - - When a symbol table node is loaded into the main cache, we look up - the linked list of shadows in the shadow hash table based on the - address of the symbol table object header. We then traverse that - list matching shadows with symbol table entries. - - We assume that opening/closing objects will be a relatively - infrequent event compared with loading/flushing symbol table - nodes. Therefore, if we keep the linked list of shadows sorted it - costs O(N) to open and close objects where N is the number of open - objects in that symbol table (instead of O(1)) but it costs only - O(N) to load a symbol table node (instead of O(N^2)). - -What about the root symbol entry? - - Level 1 storage for the root symbol entry is always available since - it's stored in the hdf5_file_t struct instead of a symbol table - node. However, the contents of that entry can move from the file - handle to a symbol table node by H5G_mkroot(). Therefore, if the - root object is opened, we keep a shadow entry for it whose - `stab_addr' field is zero and whose `name' is null. - - For this reason, the root object should always be read through the - H5G interface. - -One more key invariant: The H5O_STAB message in a symbol table header -never changes. This allows symbol table entries to cache the H5O_STAB -message for the symbol table to which it points without worrying about -whether the cache will ever be invalidated. - - |