Directories (or now Groups) are currently implemented as a directed graph with a single entry point into the graph which is the Root Object. The root object is usually a group. All objects have at least one predecessor (the Root Object always has the HDF5 file super block as a predecessor). The number of predecessors of a group is also known as the hard link count or just link count. Unlike Unix directories, HDF5 groups have no ".." entry since any group can have multiple predecessors. Given the handle or id of some object and returning a full name for that object would be an expensive graph traversal.
A special optimization is that a file may contain a single non-group object and no group(s). The object has one predecessor which is the file super block. However, once a root group is created it never dissappears (although I suppose it could if we wanted).
A special object called a Symbolic Link is simply a name. Usually the name refers to some (other) object, but that object need not exist. Symbolic links in HDF5 will have the same semantics as symbolic links in Unix.
The symbol table graph contains "entries" for each name. An entry contains the file address for the object header and possibly certain messages cached from the object header.
The H5G package understands the notion of opening and object which means that given the name of the object, a handle to the object is returned (this isn't an API function). Objects can be opened multiple times simultaneously through the same name or, if the object has hard links, through other names. The name of an object cannot be removed from a group if the object is opened through that group (although the name can change within the group).
Below the API, object attributes can be read without opening the object; object attributes cannot change without first opening that object. The one exception is that the contents of a group can change without opening the group.
Assuming we have a flat name space (that is, the root object is a group which contains names for all other objects in the file and none of those objects are groups), then we can build a hierarchy of groups that also refer to the objects.
The file initially contains `foo' `bar' `baz' in the root group. We wish to add groups `grp1' and `grp2' so that `grp1' contains objects `foo' and `baz' and `grp2' contains objects `bar' and `baz' (so `baz' appears in both groups).
In either case below, one might want to move the flat objects into some other group (like `flat') so their names don't interfere with the rest of the hierarchy (or move the hierarchy into a directory called `/hierarchy').
Create group `grp1' and add symbolic links called `foo' whose value is `/foo' and `baz' whose value is `/baz'. Similarly for `grp2'.
Accessing `grp1/foo' involves searching the root group for the name `grp1', then searching that group for `foo', then searching the root directory for `foo'. Alternatively, one could change working groups to the grp1 group and then ask for `foo' which searches `grp1' for the name `foo', then searches the root group for the name `foo'.
Deleting `/grp1/foo' deletes the symbolic link without affecting the `/foo' object. Deleting `/foo' leaves the `/grp1/foo' link dangling.
Creating the hierarchy is the same as with symbolic links.
Accessing `/grp1/foo' searches the root group for the name `grp1', then searches that group for the name `foo'. If the current working group is `/grp1' then we just search for the name `foo'.
Deleting `/grp1/foo' leaves `/foo' and vice versa.
Depending on the eventual API...
or
H5Gcreate (file_id, "/grp1");
H5Glink (file_id, H5G_HARD, "/foo", "/grp1/foo");
group_id = H5Gcreate (root_id, "grp1");
H5Glink (file_id, H5G_HARD, root_id, "foo", group_id, "foo");
H5Gclose (group_id);
Similar to abvoe, but in this case we have to watch out that we don't get two names which are the same: what happens to `/grp1/baz' and `/grp2/baz'? If they really refer to the same object then we just have `/baz', but if they point to two different objects what happens?
The other thing to watch out for cycles in the graph when we traverse it to build the flat namespace.
Two things to watch out for are that the group contents don't appear to change in a manner which would confuse the application, and that listing everything in a group is as efficient as possible.
Query the number of things in a group and then query each item
by index. A trivial implementation would be O(n*n) and wouldn't
protect the caller from changes to the directory which move
entries around and therefore change their indices.
n = H5GgetNumContents (group_id);
for (i=0; i<n; i++) {
H5GgetNameByIndex (group_id, i, ...); /*don't worry about args yet*/
}
The API contains a single function that reads all information
from the specified group and returns that info through an array.
The caller is responsible for freeing the array allocated by the
query and the things to which it points. This also makes it
clear the the returned value is a snapshot of the group which
doesn't change if the group is modified.
Notice that it would be difficult to expand the info struct since
its definition is part of the API.
n = H5Glist (file_id, "/grp1", info, ...);
for (i=0; i<n; i++) {
printf ("name = %s\n", info[i].name);
free (info[i].name); /*and maybe other fields too?*/
}
free (info);
The caller asks for a snapshot of the group and then accesses
items in the snapshot through various query-by-index API
functions. When finished, the caller notifies the library that
it's done with the snapshot. The word "snapshot" makes it clear
that subsequent changes to the directory will not be reflected in
the shapshot_id.
In fact, we could allow the user to leave off the H5Gsnapshot and
H5Grelease and use group_id in the H5GgetNumContents and
H5GgetNameByIndex so they can choose between Method A and Method
C.
snapshot_id = H5Gsnapshot (group_id); /*or perhaps group_name */
n = H5GgetNumContents (snapshot_id);
for (i=0; i<n; i++) {
H5GgetNameByIndex (shapshot_id, i, ...);
}
H5Grelease (shapshot_id);
hid_t H5Gshapshot (hid_t group_id)
H5GgetNameByIndex
is changed. Adding new entries
to a group doesn't affect the snapshot.
char *H5GgetNameByIndex (hid_t shapshot_id, int
index)
index
of
the snapshot array to get the object name. This is a
constant-time operation. The name is updated automatically if
the object is renamed within the group.
H5Gget<whatever>ByIndex...()
index
,
which is just a symbol table entry, and reads the appropriate
object header message(s) which might be cached in the symbol
table entry. This is a constant-time operation if cached,
linear in the number of messages if not cached.
H5Grelease (hid_t snapshot_id)
char*
or some HDF5 string type.In either case, the caller has to release resources associated with the return value, calling free() or some HDF5 function.
Names in the current implementation of the H5G package don't contain embedded null characters and are always null terminated.
Eventually the caller probably wants a char*
so it
can pass it to some non-HDF5 function, does that require
strdup'ing the string again? Then the caller has to free() the
the char* and release the DHF5 string.