Symbol Table Caching Issues
A number of issues involving caching of object header messages in
symbol table entries must be resolved.
What is the motivation for these changes?
If we make objects completely independent of object name it allows
us to refer to one object by multiple names (a concept called hard
links in Unix file systems), which in turn provides an easy way to
share data between datasets.
Every object in an HDF5 file has a unique, constant object header
address which serves as a handle (or OID) for the object. The
object header contains messages which describe the object.
HDF5 allows some of the object header messages to be cached in
symbol table entries so that the object header doesn't have to be
read from disk. For instance, an entry for a directory caches the
directory disk addresses required to access that directory, so the
object header for that directory is seldom read.
If an object has multiple names (that is, a link count greater than
one), then it has multiple symbol table entries which point to it.
All symbol table entries must agree on header messages. The
current mechanism is to turn off the caching of header messages in
symbol table entries when the header link count is more than one,
and to allow caching once the link count returns to one.
However, in the current implementation, a package is allowed to
copy a symbol table entry and use it as a private cache for the
object header. This doesn't work for a number of reasons (all but
one require a `delete symbol entry' operation).
1. If two packages hold copies of the same symbol table entry,
they don't notify each other of changes to the symbol table
entry. Eventually, one package reads a cached message and
gets the wrong value because the other package changed the
message in the object header.
2. If one package holds a copy of the symbol table entry and
some other part of HDF5 removes the object and replaces it
with some other object, then the original package will
continue to access the non-existent object using the new
object header.
3. If one package holds a copy of the symbol table entry and
some other part of HDF5 (re)moves the directory which
contains the object, then the package will be unable to
update the symbol table entry with the new cached
data. Packages that refer to the object by the new name will
use old cached data.
The basic problem is that there may be multiple copies of the object
symbol table entry floating around in the code when there should
really be at most one per hard link.
Level 0: A copy may exist on disk as part of a symbol table node, which
is a small 1d array of symbol table entries.
Level 1: A copy may be cached in memory as part of a symbol table node
in the H5Gnode.c file by the H5AC layer.
Level 2a: Another package may be holding a copy so it can perform
fast lookup of any header messages that might be cached in
the symbol table entry. It can't point directly to the
cached symbol table node because that node can dissappear
at any time.
Level 2b: Packages may hold more than one copy of a symbol table
entry. For instance, if H5D_open() is called twice for
the same name, then two copies of the symbol table entry
for the dataset exist in the H5D package.
How can level 2a and 2b be combined?
If package data structures contained pointers to symbol table
entries instead of copies of symbol table entries and if H5G
allocated one symbol table entry per hard link, then it's trivial
for Level 2a and 2b to benefit from one another's actions since
they share the same cache.
How does this work conceptually?
Level 2a and 2b must notify Level 1 of their intent to use (or stop
using) a symbol table entry to access an object header. The
notification of the intent to access an object header is called
`opening' the object and releasing the access is `closing' the
object.
Opening an object requires an object name which is used to locate
the symbol table entry to use for caching of object header
messages. The return value is a handle for the object. Figure 1
shows the state after Dataset1 opens Object with a name that maps
through Entry1. The open request created a copy of Entry1 called
Shadow1 which exists even if SymNode1 is preempted from the H5AC
layer.
______
Object / \
SymNode1 +--------+ |
+--------+ _____\ | Header | |
| | / / +--------+ |
+--------+ +---------+ \______/
| Entry1 | | Shadow1 | /____
+--------+ +---------+ \ \
: : \
+--------+ +----------+
| Dataset1 |
+----------+
FIGURE 1
The SymNode1 can appear and disappear from the H5AC layer at any
time without affecting the Object Header data cached in the Shadow.
The rules are:
* If the SymNode1 is present and is about to disappear and the
Shadow1 dirty bit is set, then Shadow1 is copied over Entry1, the
Entry1 dirty bit is set, and the Shadow1 dirty bit is cleared.
* If something requests a copy of Entry1 (for a read-only peek
request), and Shadow1 exists, then a copy (not pointer) of Shadow1
is returned instead.
* Entry1 cannot be deleted while Shadow1 exists.
* Entry1 cannot change directly if Shadow1 exists since this means
that some other package has opened the object and may be modifying
it. I haven't decided if it's useful to ever change Entry1
directly (except of course within the H5G layer itself).
* Shadow1 is created when Dataset1 `opens' the object through
Entry1. Dataset1 is given a pointer to Shadow1 and Shadow1's
reference count is incremented.
* When Dataset1 `closes' the Object the Shadow1 reference count is
decremented. When the reference count reaches zero, if the
Shadow1 dirty bit is set, then Shadow1's contents are copied to
Entry1, and the Entry1 dirty bit is set. Shadow1 is then deleted
if its reference count is zero. This may require reading SymNode1
back into the H5AC layer.
What happens when another Dataset opens the Object through Entry1?
If the current state is represented by the top part of Figure 2,
then Dataset2 will be given a pointer to Shadow1 and the Shadow1
reference count will be incremented to two. The Object header link
count remains at one so Object Header messages continue to be cached
by Shadow1. Dataset1 and Dataset2 benefit from one another
actions. The resulting state is represented by Figure 2.
_____
SymNode1 Object / \
+--------+ _____\ +--------+ |
| | / / | Header | |
+--------+ +---------+ +--------+ |
| Entry1 | | Shadow1 | /____ \_____/
+--------+ +---------+ \ \
: : _ \
+--------+ |\ +----------+
\ | Dataset1 |
\________ +----------+
\ \
+----------+ |
| Dataset2 | |- New Dataset
+----------+ |
/
FIGURE 2
What happens when the link count for Object increases while Dataset
has the Object open?
SymNode2
+--------+
SymNode1 Object | |
+--------+ ____\ +--------+ /______ +--------+
| | / / | header | \ `| Entry2 |
+--------+ +---------+ +--------+ +--------+
| Entry1 | | Shadow1 | /____ : :
+--------+ +---------+ \ \ +--------+
: : \
+--------+ +----------+ \________________/
| Dataset1 | |
+----------+ New Link
FIGURE 3
The current state is represented by the left part of Figure 3. To
create a new link the Object Header had to be located by traversing
through Entry1/Shadow1. On the way through, the Entry1/Shadow1
cache is invalidated and the Object Header link count is
incremented. Entry2 is then added to SymNode2.
Since the Object Header link count is greater than one, Object
header data will not be cached in Entry1/Shadow1.
If the initial state had been all of Figure 3 and a third link is
being added and Object is open by Entry1 and Entry2, then creation
of the third link will invalidate the cache in Entry1 or Entry2. It
doesn't matter which since both caches are already invalidated
anyway.
What happens if another Dataset opens the same object by another name?
If the current state is represented by Figure 3, then a Shadow2 is
created and associated with Entry2. However, since the Object
Header link count is more than one, nothing gets cached in Shadow2
(or Shadow1).
What happens if the link count decreases?
If the current state is represented by all of Figure 3 then it isn't
possible to delete Entry1 because the object is currently open
through that entry. Therefore, the link count must have
decreased because Entry2 was removed.
As Dataset1 reads/writes messages in the Object header they will
begin to be cached in Shadow1 again because the Object header link
count is one.
What happens if the object is removed while it's open?
That operation is not allowed.
What happens if the directory containing the object is deleted?
That operation is not allowed since deleting the directory requires
that the directory be empty. The directory cannot be emptied
because the open object cannot be removed from the directory.
What happens if the object is moved?
Moving an object is a process consisting of creating a new
hard-link with the new name and then deleting the old name.
This will fail if the object is open.
What happens if the directory containing the entry is moved?
The entry and the shadow still exist and are associated with one
another.
What if a file is flushed or closed when objects are open?
Flushing a symbol table with open objects writes correct information
to the file since Shadow is copied to Entry before the table is
flushed.
Closing a file with open objects will create a valid file but will
return failure.
How is the Shadow associated with the Entry?
A symbol table is composed of one or more symbol nodes. A node is a
small 1-d array of symbol table entries. The entries can move
around within a node and from node-to-node as entries are added or
removed from the symbol table and nodes can move around within a
symbol table, being created and destroyed as necessary.
Since a symbol table has an object header with a unique and constant
file offset, and since H5G contains code to efficiently locate a
symbol table entry given it's name, we use these two values as a key
within a shadow to associate the shadow with the symbol table
entry.
struct H5G_shadow_t {
haddr_t stab_addr; /*symbol table header address*/
char *name; /*entry name wrt symbol table*/
hbool_t dirty; /*out-of-date wrt stab entry?*/
H5G_entry_t ent; /*my copy of stab entry */
H5G_entry_t *main; /*the level 1 entry or null */
H5G_shadow_t *next, *prev; /*other shadows for this stab*/
};
The set of shadows will be organized in a hash table of linked
lists. Each linked list will contain the shadows associated with a
particular symbol table header address and the list will be sorted
lexicographically.
Also, each Entry will have a pointer to the corresponding Shadow or
null if there is no shadow.
When a symbol table node is loaded into the main cache, we look up
the linked list of shadows in the shadow hash table based on the
address of the symbol table object header. We then traverse that
list matching shadows with symbol table entries.
We assume that opening/closing objects will be a relatively
infrequent event compared with loading/flushing symbol table
nodes. Therefore, if we keep the linked list of shadows sorted it
costs O(N) to open and close objects where N is the number of open
objects in that symbol table (instead of O(1)) but it costs only
O(N) to load a symbol table node (instead of O(N^2)).
What about the root symbol entry?
Level 1 storage for the root symbol entry is always available since
it's stored in the hdf5_file_t struct instead of a symbol table
node. However, the contents of that entry can move from the file
handle to a symbol table node by H5G_mkroot(). Therefore, if the
root object is opened, we keep a shadow entry for it whose
`stab_addr' field is zero and whose `name' is null.
For this reason, the root object should always be read through the
H5G interface.
One more key invariant: The H5O_STAB message in a symbol table header
never changes. This allows symbol table entries to cache the H5O_STAB
message for the symbol table to which it points without worrying about
whether the cache will ever be invalidated.
===========================================
Last Modified: 8 July 1998 (technical content)
Last Modified: 28 April 2000 (included in HDF5 Technical Notes)
HDF Help Desk: hdfhelp@ncsa.uiuc.edu