A number of issues involving caching of object header messages in symbol table entries must be resolved. What is the motivation for these changes? If we make objects completely independent of object name it allows us to refer to one object by multiple names (a concept called hard links in Unix file systems), which in turn provides an easy way to share data between datasets. Every object in an HDF5 file has a unique, constant object header address which serves as a handle (or OID) for the object. The object header contains messages which describe the object. HDF5 allows some of the object header messages to be cached in symbol table entries so that the object header doesn't have to be read from disk. For instance, an entry for a directory caches the directory disk addresses required to access that directory, so the object header for that directory is seldom read. If an object has multiple names (that is, a link count greater than one), then it has multiple symbol table entries which point to it. All symbol table entries must agree on header messages. The current mechanism is to turn off the caching of header messages in symbol table entries when the header link count is more than one, and to allow caching once the link count returns to one. However, in the current implementation, a package is allowed to copy a symbol table entry and use it as a private cache for the object header. This doesn't work for a number of reasons (all but one require a `delete symbol entry' operation). 1. If two packages hold copies of the same symbol table entry, they don't notify each other of changes to the symbol table entry. Eventually, one package reads a cached message and gets the wrong value because the other package changed the message in the object header. 2. If one package holds a copy of the symbol table entry and some other part of HDF5 removes the object and replaces it with some other object, then the original package will continue to access the non-existent object using the new object header. 3. If one package holds a copy of the symbol table entry and some other part of HDF5 (re)moves the directory which contains the object, then the package will be unable to update the symbol table entry with the new cached data. Packages that refer to the object by the new name will use old cached data. The basic problem is that there may be multiple copies of the object symbol table entry floating around in the code when there should really be at most one per hard link. Level 0: A copy may exist on disk as part of a symbol table node, which is a small 1d array of symbol table entries. Level 1: A copy may be cached in memory as part of a symbol table node in the H5Gnode.c file by the H5AC layer. Level 2a: Another package may be holding a copy so it can perform fast lookup of any header messages that might be cached in the symbol table entry. It can't point directly to the cached symbol table node because that node can dissappear at any time. Level 2b: Packages may hold more than one copy of a symbol table entry. For instance, if H5D_open() is called twice for the same name, then two copies of the symbol table entry for the dataset exist in the H5D package. How can level 2a and 2b be combined? If package data structures contained pointers to symbol table entries instead of copies of symbol table entries and if H5G allocated one symbol table entry per hard link, then it's trivial for Level 2a and 2b to benefit from one another's actions since they share the same cache. How does this work conceptually? Level 2a and 2b must notify Level 1 of their intent to use (or stop using) a symbol table entry to access an object header. The notification of the intent to access an object header is called `opening' the object and releasing the access is `closing' the object. Opening an object requires an object name which is used to locate the symbol table entry to use for caching of object header messages. The return value is a handle for the object. Figure 1 shows the state after Dataset1 opens Object with a name that maps through Entry1. The open request created a copy of Entry1 called Shadow1 which exists even if SymNode1 is preempted from the H5AC layer. ______ Object / \ SymNode1 +--------+ | +--------+ _____\ | Header | | | | / / +--------+ | +--------+ +---------+ \______/ | Entry1 | | Shadow1 | /____ +--------+ +---------+ \ \ : : \ +--------+ +----------+ | Dataset1 | +----------+ FIGURE 1 The SymNode1 can appear and disappear from the H5AC layer at any time without affecting the Object Header data cached in the Shadow. The rules are: * If the SymNode1 is present and is about to disappear and the Shadow1 dirty bit is set, then Shadow1 is copied over Entry1, the Entry1 dirty bit is set, and the Shadow1 dirty bit is cleared. * If something requests a copy of Entry1 (for a read-only peek request), and Shadow1 exists, then a copy (not pointer) of Shadow1 is returned instead. * Entry1 cannot be deleted while Shadow1 exists. * Entry1 cannot change directly if Shadow1 exists since this means that some other package has opened the object and may be modifying it. I haven't decided if it's useful to ever change Entry1 directly (except of course within the H5G layer itself). * Shadow1 is created when Dataset1 `opens' the object through Entry1. Dataset1 is given a pointer to Shadow1 and Shadow1's reference count is incremented. * When Dataset1 `closes' the Object the Shadow1 reference count is decremented. When the reference count reaches zero, if the Shadow1 dirty bit is set, then Shadow1's contents are copied to Entry1, and the Entry1 dirty bit is set. Shadow1 is then deleted if its reference count is zero. This may require reading SymNode1 back into the H5AC layer. What happens when another Dataset opens the Object through Entry1? If the current state is represented by the top part of Figure 2, then Dataset2 will be given a pointer to Shadow1 and the Shadow1 reference count will be incremented to two. The Object header link count remains at one so Object Header messages continue to be cached by Shadow1. Dataset1 and Dataset2 benefit from one another actions. The resulting state is represented by Figure 2. _____ SymNode1 Object / \ +--------+ _____\ +--------+ | | | / / | Header | | +--------+ +---------+ +--------+ | | Entry1 | | Shadow1 | /____ \_____/ +--------+ +---------+ \ \ : : _ \ +--------+ |\ +----------+ \ | Dataset1 | \________ +----------+ \ \ +----------+ | | Dataset2 | |- New Dataset +----------+ | / FIGURE 2 What happens when the link count for Object increases while Dataset has the Object open? SymNode2 +--------+ SymNode1 Object | | +--------+ ____\ +--------+ /______ +--------+ | | / / | header | \ `| Entry2 | +--------+ +---------+ +--------+ +--------+ | Entry1 | | Shadow1 | /____ : : +--------+ +---------+ \ \ +--------+ : : \ +--------+ +----------+ \________________/ | Dataset1 | | +----------+ New Link FIGURE 3 The current state is represented by the left part of Figure 3. To create a new link the Object Header had to be located by traversing through Entry1/Shadow1. On the way through, the Entry1/Shadow1 cache is invalidated and the Object Header link count is incremented. Entry2 is then added to SymNode2. Since the Object Header link count is greater than one, Object header data will not be cached in Entry1/Shadow1. If the initial state had been all of Figure 3 and a third link is being added and Object is open by Entry1 and Entry2, then creation of the third link will invalidate the cache in Entry1 or Entry2. It doesn't matter which since both caches are already invalidated anyway. What happens if another Dataset opens the same object by another name? If the current state is represented by Figure 3, then a Shadow2 is created and associated with Entry2. However, since the Object Header link count is more than one, nothing gets cached in Shadow2 (or Shadow1). What happens if the link count decreases? If the current state is represented by all of Figure 3 then it isn't possible to delete Entry1 because the object is currently open through that entry. Therefore, the link count must have decreased because Entry2 was removed. As Dataset1 reads/writes messages in the Object header they will begin to be cached in Shadow1 again because the Object header link count is one. What happens if the object is removed while it's open? That operation is not allowed. What happens if the directory containing the object is deleted? That operation is not allowed since deleting the directory requires that the directory be empty. The directory cannot be emptied because the open object cannot be removed from the directory. What happens if the object is moved? Moving an object is a process consisting of creating a new hard-link with the new name and then deleting the old name. This will fail if the object is open. What happens if the directory containing the entry is moved? The entry and the shadow still exist and are associated with one another. What if a file is flushed or closed when objects are open? Flushing a symbol table with open objects writes correct information to the file since Shadow is copied to Entry before the table is flushed. Closing a file with open objects will create a valid file but will return failure. How is the Shadow associated with the Entry? A symbol table is composed of one or more symbol nodes. A node is a small 1-d array of symbol table entries. The entries can move around within a node and from node-to-node as entries are added or removed from the symbol table and nodes can move around within a symbol table, being created and destroyed as necessary. Since a symbol table has an object header with a unique and constant file offset, and since H5G contains code to efficiently locate a symbol table entry given it's name, we use these two values as a key within a shadow to associate the shadow with the symbol table entry. struct H5G_shadow_t { haddr_t stab_addr; /*symbol table header address*/ char *name; /*entry name wrt symbol table*/ hbool_t dirty; /*out-of-date wrt stab entry?*/ H5G_entry_t ent; /*my copy of stab entry */ H5G_entry_t *main; /*the level 1 entry or null */ H5G_shadow_t *next, *prev; /*other shadows for this stab*/ }; The set of shadows will be organized in a hash table of linked lists. Each linked list will contain the shadows associated with a particular symbol table header address and the list will be sorted lexicographically. Also, each Entry will have a pointer to the corresponding Shadow or null if there is no shadow. When a symbol table node is loaded into the main cache, we look up the linked list of shadows in the shadow hash table based on the address of the symbol table object header. We then traverse that list matching shadows with symbol table entries. We assume that opening/closing objects will be a relatively infrequent event compared with loading/flushing symbol table nodes. Therefore, if we keep the linked list of shadows sorted it costs O(N) to open and close objects where N is the number of open objects in that symbol table (instead of O(1)) but it costs only O(N) to load a symbol table node (instead of O(N^2)). What about the root symbol entry? Level 1 storage for the root symbol entry is always available since it's stored in the hdf5_file_t struct instead of a symbol table node. However, the contents of that entry can move from the file handle to a symbol table node by H5G_mkroot(). Therefore, if the root object is opened, we keep a shadow entry for it whose `stab_addr' field is zero and whose `name' is null. For this reason, the root object should always be read through the H5G interface. One more key invariant: The H5O_STAB message in a symbol table header never changes. This allows symbol table entries to cache the H5O_STAB message for the symbol table to which it points without worrying about whether the cache will ever be invalidated.