Group Examples

Background

Directories (or now Groups) are currently implemented as a directed graph with a single entry point into the graph which is the Root Object. The root object is usually a group. All objects have at least one predecessor (the Root Object always has the HDF5 file super block as a predecessor). The number of predecessors of a group is also known as the hard link count or just link count. Unlike Unix directories, HDF5 groups have no ".." entry since any group can have multiple predecessors. Given the handle or id of some object and returning a full name for that object would be an expensive graph traversal.

A special optimization is that a file may contain a single non-group object and no group(s). The object has one predecessor which is the file super block. However, once a root group is created it never dissappears (although I suppose it could if we wanted).

A special object called a Symbolic Link is simply a name. Usually the name refers to some (other) object, but that object need not exist. Symbolic links in HDF5 will have the same semantics as symbolic links in Unix.

The symbol table graph contains "entries" for each name. An entry contains the file address for the object header and possibly certain messages cached from the object header.

The H5G package understands the notion of opening and object which means that given the name of the object, a handle to the object is returned (this isn't an API function). Objects can be opened multiple times simultaneously through the same name or, if the object has hard links, through other names. The name of an object cannot be removed from a group if the object is opened through that group (although the name can change within the group).

Below the API, object attributes can be read without opening the object; object attributes cannot change without first opening that object. The one exception is that the contents of a group can change without opening the group.

Building a hierarchy from a flat namespace

Assuming we have a flat name space (that is, the root object is a group which contains names for all other objects in the file and none of those objects are groups), then we can build a hierarchy of groups that also refer to the objects.

The file initially contains `foo' `bar' `baz' in the root group. We wish to add groups `grp1' and `grp2' so that `grp1' contains objects `foo' and `baz' and `grp2' contains objects `bar' and `baz' (so `baz' appears in both groups).

In either case below, one might want to move the flat objects into some other group (like `flat') so their names don't interfere with the rest of the hierarchy (or move the hierarchy into a directory called `/hierarchy').

with symbolic links

Create group `grp1' and add symbolic links called `foo' whose value is `/foo' and `baz' whose value is `/baz'. Similarly for `grp2'.

Accessing `grp1/foo' involves searching the root group for the name `grp1', then searching that group for `foo', then searching the root directory for `foo'. Alternatively, one could change working groups to the grp1 group and then ask for `foo' which searches `grp1' for the name `foo', then searches the root group for the name `foo'.

Deleting `/grp1/foo' deletes the symbolic link without affecting the `/foo' object. Deleting `/foo' leaves the `/grp1/foo' link dangling.

with hard links

Creating the hierarchy is the same as with symbolic links.

Accessing `/grp1/foo' searches the root group for the name `grp1', then searches that group for the name `foo'. If the current working group is `/grp1' then we just search for the name `foo'.

Deleting `/grp1/foo' leaves `/foo' and vice versa.

the code

Depending on the eventual API...

H5Gcreate (file_id, "/grp1");
H5Glink (file_id, H5G_HARD, "/foo", "/grp1/foo");

group_id = H5Gcreate (root_id, "grp1");
H5Glink (file_id, H5G_HARD, root_id, "foo", group_id, "foo");
H5Gclose (group_id);

Building a flat namespace from a hierarchy

Similar to abvoe, but in this case we have to watch out that we don't get two names which are the same: what happens to `/grp1/baz' and `/grp2/baz'? If they really refer to the same object then we just have `/baz', but if they point to two different objects what happens?

The other thing to watch out for cycles in the graph when we traverse it to build the flat namespace.

Listing the Group Contents

Two things to watch out for are that the group contents don't appear to change in a manner which would confuse the application, and that listing everything in a group is as efficient as possible.

Method A

Query the number of things in a group and then query each item by index. A trivial implementation would be O(n*n) and wouldn't protect the caller from changes to the directory which move entries around and therefore change their indices.

n = H5GgetNumContents (group_id);
for (i=0; i<n; i++) {
   H5GgetNameByIndex (group_id, i, ...); /*don't worry about args yet*/
}

Method B

The API contains a single function that reads all information from the specified group and returns that info through an array. The caller is responsible for freeing the array allocated by the query and the things to which it points. This also makes it clear the the returned value is a snapshot of the group which doesn't change if the group is modified.

n = H5Glist (file_id, "/grp1", info, ...);
for (i=0; i<n; i++) {
   printf ("name = %s\n", info[i].name);
   free (info[i].name); /*and maybe other fields too?*/
}
free (info);

Notice that it would be difficult to expand the info struct since its definition is part of the API.

Method C

The caller asks for a snapshot of the group and then accesses items in the snapshot through various query-by-index API functions. When finished, the caller notifies the library that it's done with the snapshot. The word "snapshot" makes it clear that subsequent changes to the directory will not be reflected in the shapshot_id.

snapshot_id = H5Gsnapshot (group_id); /*or perhaps group_name */
n = H5GgetNumContents (snapshot_id);
for (i=0; i<n; i++) {
   H5GgetNameByIndex (shapshot_id, i, ...);
}
H5Grelease (shapshot_id);

In fact, we could allow the user to leave off the H5Gsnapshot and H5Grelease and use group_id in the H5GgetNumContents and H5GgetNameByIndex so they can choose between Method A and Method C.

An implementation of Method C

hid_t H5Gshapshot (hid_t group_id): Opens every object in the specified group and stores the handles in an array managed by the library (linear-time operation). Open object handles are essentialy symbol table entries with a little extra info (symbol table entries cache certain things about the object which are also found in the object header). Because the objects are open (A) they cannot be removed from the group, (B) querying the object returns the latest info even if something else has that object open, (C) if the object is renamed within the group then its name with H5GgetNameByIndex is changed. Adding new entries to a group doesn't affect the snapshot.
char *H5GgetNameByIndex (hid_t shapshot_id, int index): Uses the open object handle from entry index of the snapshot array to get the object name. This is a constant-time operation. The name is updated automatically if the object is renamed within the group.
H5Gget<whatever>ByIndex...(): Uses the open object handle from entry index, which is just a symbol table entry, and reads the appropriate object header message(s) which might be cached in the symbol table entry. This is a constant-time operation if cached, linear in the number of messages if not cached.
H5Grelease (hid_t snapshot_id): Closes each object refered to by the snapshot and then frees the snapshot array. This is a linear-time operation.

To return `char*` or some HDF5 string type.

In either case, the caller has to release resources associated with the return value, calling free() or some HDF5 function.

Names in the current implementation of the H5G package don't contain embedded null characters and are always null terminated.

Eventually the caller probably wants a char* so it can pass it to some non-HDF5 function, does that require strdup'ing the string again? Then the caller has to free() the the char* and release the DHF5 string.

Robb Matzke

Last modified: Fri Oct 3 09:32:10 EST 1997