summaryrefslogtreecommitdiffstats
path: root/doc/html/symtab
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html/symtab')
-rw-r--r--doc/html/symtab313
1 files changed, 0 insertions, 313 deletions
diff --git a/doc/html/symtab b/doc/html/symtab
deleted file mode 100644
index a657729..0000000
--- a/doc/html/symtab
+++ /dev/null
@@ -1,313 +0,0 @@
-A number of issues involving caching of object header messages in
-symbol table entries must be resolved.
-
-What is the motivation for these changes?
-
- If we make objects completely independent of object name it allows
- us to refer to one object by multiple names (a concept called hard
- links in Unix file systems), which in turn provides an easy way to
- share data between datasets.
-
- Every object in an HDF5 file has a unique, constant object header
- address which serves as a handle (or OID) for the object. The
- object header contains messages which describe the object.
-
- HDF5 allows some of the object header messages to be cached in
- symbol table entries so that the object header doesn't have to be
- read from disk. For instance, an entry for a directory caches the
- directory disk addresses required to access that directory, so the
- object header for that directory is seldom read.
-
- If an object has multiple names (that is, a link count greater than
- one), then it has multiple symbol table entries which point to it.
- All symbol table entries must agree on header messages. The
- current mechanism is to turn off the caching of header messages in
- symbol table entries when the header link count is more than one,
- and to allow caching once the link count returns to one.
-
- However, in the current implementation, a package is allowed to
- copy a symbol table entry and use it as a private cache for the
- object header. This doesn't work for a number of reasons (all but
- one require a `delete symbol entry' operation).
-
- 1. If two packages hold copies of the same symbol table entry,
- they don't notify each other of changes to the symbol table
- entry. Eventually, one package reads a cached message and
- gets the wrong value because the other package changed the
- message in the object header.
-
- 2. If one package holds a copy of the symbol table entry and
- some other part of HDF5 removes the object and replaces it
- with some other object, then the original package will
- continue to access the non-existent object using the new
- object header.
-
- 3. If one package holds a copy of the symbol table entry and
- some other part of HDF5 (re)moves the directory which
- contains the object, then the package will be unable to
- update the symbol table entry with the new cached
- data. Packages that refer to the object by the new name will
- use old cached data.
-
-
-The basic problem is that there may be multiple copies of the object
-symbol table entry floating around in the code when there should
-really be at most one per hard link.
-
- Level 0: A copy may exist on disk as part of a symbol table node, which
- is a small 1d array of symbol table entries.
-
- Level 1: A copy may be cached in memory as part of a symbol table node
- in the H5Gnode.c file by the H5AC layer.
-
- Level 2a: Another package may be holding a copy so it can perform
- fast lookup of any header messages that might be cached in
- the symbol table entry. It can't point directly to the
- cached symbol table node because that node can dissappear
- at any time.
-
- Level 2b: Packages may hold more than one copy of a symbol table
- entry. For instance, if H5D_open() is called twice for
- the same name, then two copies of the symbol table entry
- for the dataset exist in the H5D package.
-
-How can level 2a and 2b be combined?
-
- If package data structures contained pointers to symbol table
- entries instead of copies of symbol table entries and if H5G
- allocated one symbol table entry per hard link, then it's trivial
- for Level 2a and 2b to benefit from one another's actions since
- they share the same cache.
-
-How does this work conceptually?
-
- Level 2a and 2b must notify Level 1 of their intent to use (or stop
- using) a symbol table entry to access an object header. The
- notification of the intent to access an object header is called
- `opening' the object and releasing the access is `closing' the
- object.
-
- Opening an object requires an object name which is used to locate
- the symbol table entry to use for caching of object header
- messages. The return value is a handle for the object. Figure 1
- shows the state after Dataset1 opens Object with a name that maps
- through Entry1. The open request created a copy of Entry1 called
- Shadow1 which exists even if SymNode1 is preempted from the H5AC
- layer.
-
- ______
- Object / \
- SymNode1 +--------+ |
- +--------+ _____\ | Header | |
- | | / / +--------+ |
- +--------+ +---------+ \______/
- | Entry1 | | Shadow1 | /____
- +--------+ +---------+ \ \
- : : \
- +--------+ +----------+
- | Dataset1 |
- +----------+
- FIGURE 1
-
-
-
- The SymNode1 can appear and disappear from the H5AC layer at any
- time without affecting the Object Header data cached in the Shadow.
- The rules are:
-
- * If the SymNode1 is present and is about to disappear and the
- Shadow1 dirty bit is set, then Shadow1 is copied over Entry1, the
- Entry1 dirty bit is set, and the Shadow1 dirty bit is cleared.
-
- * If something requests a copy of Entry1 (for a read-only peek
- request), and Shadow1 exists, then a copy (not pointer) of Shadow1
- is returned instead.
-
- * Entry1 cannot be deleted while Shadow1 exists.
-
- * Entry1 cannot change directly if Shadow1 exists since this means
- that some other package has opened the object and may be modifying
- it. I haven't decided if it's useful to ever change Entry1
- directly (except of course within the H5G layer itself).
-
- * Shadow1 is created when Dataset1 `opens' the object through
- Entry1. Dataset1 is given a pointer to Shadow1 and Shadow1's
- reference count is incremented.
-
- * When Dataset1 `closes' the Object the Shadow1 reference count is
- decremented. When the reference count reaches zero, if the
- Shadow1 dirty bit is set, then Shadow1's contents are copied to
- Entry1, and the Entry1 dirty bit is set. Shadow1 is then deleted
- if its reference count is zero. This may require reading SymNode1
- back into the H5AC layer.
-
-What happens when another Dataset opens the Object through Entry1?
-
- If the current state is represented by the top part of Figure 2,
- then Dataset2 will be given a pointer to Shadow1 and the Shadow1
- reference count will be incremented to two. The Object header link
- count remains at one so Object Header messages continue to be cached
- by Shadow1. Dataset1 and Dataset2 benefit from one another
- actions. The resulting state is represented by Figure 2.
-
- _____
- SymNode1 Object / \
- +--------+ _____\ +--------+ |
- | | / / | Header | |
- +--------+ +---------+ +--------+ |
- | Entry1 | | Shadow1 | /____ \_____/
- +--------+ +---------+ \ \
- : : _ \
- +--------+ |\ +----------+
- \ | Dataset1 |
- \________ +----------+
- \ \
- +----------+ |
- | Dataset2 | |- New Dataset
- +----------+ |
- /
- FIGURE 2
-
-
-What happens when the link count for Object increases while Dataset
-has the Object open?
-
- SymNode2
- +--------+
- SymNode1 Object | |
- +--------+ ____\ +--------+ /______ +--------+
- | | / / | header | \ `| Entry2 |
- +--------+ +---------+ +--------+ +--------+
- | Entry1 | | Shadow1 | /____ : :
- +--------+ +---------+ \ \ +--------+
- : : \
- +--------+ +----------+ \________________/
- | Dataset1 | |
- +----------+ New Link
-
- FIGURE 3
-
- The current state is represented by the left part of Figure 3. To
- create a new link the Object Header had to be located by traversing
- through Entry1/Shadow1. On the way through, the Entry1/Shadow1
- cache is invalidated and the Object Header link count is
- incremented. Entry2 is then added to SymNode2.
-
- Since the Object Header link count is greater than one, Object
- header data will not be cached in Entry1/Shadow1.
-
- If the initial state had been all of Figure 3 and a third link is
- being added and Object is open by Entry1 and Entry2, then creation
- of the third link will invalidate the cache in Entry1 or Entry2. It
- doesn't matter which since both caches are already invalidated
- anyway.
-
-What happens if another Dataset opens the same object by another name?
-
- If the current state is represented by Figure 3, then a Shadow2 is
- created and associated with Entry2. However, since the Object
- Header link count is more than one, nothing gets cached in Shadow2
- (or Shadow1).
-
-What happens if the link count decreases?
-
- If the current state is represented by all of Figure 3 then it isn't
- possible to delete Entry1 because the object is currently open
- through that entry. Therefore, the link count must have
- decreased because Entry2 was removed.
-
- As Dataset1 reads/writes messages in the Object header they will
- begin to be cached in Shadow1 again because the Object header link
- count is one.
-
-What happens if the object is removed while it's open?
-
- That operation is not allowed.
-
-What happens if the directory containing the object is deleted?
-
- That operation is not allowed since deleting the directory requires
- that the directory be empty. The directory cannot be emptied
- because the open object cannot be removed from the directory.
-
-What happens if the object is moved?
-
- Moving an object is a process consisting of creating a new
- hard-link with the new name and then deleting the old name.
- This will fail if the object is open.
-
-What happens if the directory containing the entry is moved?
-
- The entry and the shadow still exist and are associated with one
- another.
-
-What if a file is flushed or closed when objects are open?
-
- Flushing a symbol table with open objects writes correct information
- to the file since Shadow is copied to Entry before the table is
- flushed.
-
- Closing a file with open objects will create a valid file but will
- return failure.
-
-How is the Shadow associated with the Entry?
-
- A symbol table is composed of one or more symbol nodes. A node is a
- small 1-d array of symbol table entries. The entries can move
- around within a node and from node-to-node as entries are added or
- removed from the symbol table and nodes can move around within a
- symbol table, being created and destroyed as necessary.
-
- Since a symbol table has an object header with a unique and constant
- file offset, and since H5G contains code to efficiently locate a
- symbol table entry given it's name, we use these two values as a key
- within a shadow to associate the shadow with the symbol table
- entry.
-
- struct H5G_shadow_t {
- haddr_t stab_addr; /*symbol table header address*/
- char *name; /*entry name wrt symbol table*/
- hbool_t dirty; /*out-of-date wrt stab entry?*/
- H5G_entry_t ent; /*my copy of stab entry */
- H5G_entry_t *main; /*the level 1 entry or null */
- H5G_shadow_t *next, *prev; /*other shadows for this stab*/
- };
-
- The set of shadows will be organized in a hash table of linked
- lists. Each linked list will contain the shadows associated with a
- particular symbol table header address and the list will be sorted
- lexicographically.
-
- Also, each Entry will have a pointer to the corresponding Shadow or
- null if there is no shadow.
-
- When a symbol table node is loaded into the main cache, we look up
- the linked list of shadows in the shadow hash table based on the
- address of the symbol table object header. We then traverse that
- list matching shadows with symbol table entries.
-
- We assume that opening/closing objects will be a relatively
- infrequent event compared with loading/flushing symbol table
- nodes. Therefore, if we keep the linked list of shadows sorted it
- costs O(N) to open and close objects where N is the number of open
- objects in that symbol table (instead of O(1)) but it costs only
- O(N) to load a symbol table node (instead of O(N^2)).
-
-What about the root symbol entry?
-
- Level 1 storage for the root symbol entry is always available since
- it's stored in the hdf5_file_t struct instead of a symbol table
- node. However, the contents of that entry can move from the file
- handle to a symbol table node by H5G_mkroot(). Therefore, if the
- root object is opened, we keep a shadow entry for it whose
- `stab_addr' field is zero and whose `name' is null.
-
- For this reason, the root object should always be read through the
- H5G interface.
-
-One more key invariant: The H5O_STAB message in a symbol table header
-never changes. This allows symbol table entries to cache the H5O_STAB
-message for the symbol table to which it points without worrying about
-whether the cache will ever be invalidated.
-
-