diff options
Diffstat (limited to 'doc/html/review1a.html')
-rw-r--r-- | doc/html/review1a.html | 252 |
1 files changed, 252 insertions, 0 deletions
diff --git a/doc/html/review1a.html b/doc/html/review1a.html new file mode 100644 index 0000000..78a5a84 --- /dev/null +++ b/doc/html/review1a.html @@ -0,0 +1,252 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Group Examples</title> + </head> + <body> + <center><h1>Group Examples</h1></center> + + <hr> + <h1>Background</h1> + + <p>Directories (or now <i>Groups</i>) are currently implemented as + a directed graph with a single entry point into the graph which + is the <i>Root Object</i>. The root object is usually a + group. All objects have at least one predecessor (the <i>Root + Object</i> always has the HDF5 file boot block as a + predecessor). The number of predecessors of a group is also + known as the <i>hard link count</i> or just <i>link count</i>. + Unlike Unix directories, HDF5 groups have no ".." entry since + any group can have multiple predecessors. Given the handle or + id of some object and returning a full name for that object + would be an expensive graph traversal. + + <p>A special optimization is that a file may contain a single + non-group object and no group(s). The object has one + predecessor which is the file boot block. However, once a root + group is created it never dissappears (although I suppose it + could if we wanted). + + <p>A special object called a <i>Symbolic Link</i> is simply a + name. Usually the name refers to some (other) object, but that + object need not exist. Symbolic links in HDF5 will have the + same semantics as symbolic links in Unix. + + <p>The symbol table graph contains "entries" for each name. An + entry contains the file address for the object header and + possibly certain messages cached from the object header. + + <p>The H5G package understands the notion of <i>opening</i> and object + which means that given the name of the object, a handle to the + object is returned (this isn't an API function). Objects can be + opened multiple times simultaneously through the same name or, + if the object has hard links, through other names. The name of + an object cannot be removed from a group if the object is opened + through that group (although the name can change within the + group). + + <p>Below the API, object attributes can be read without opening + the object; object attributes cannot change without first + opening that object. The one exception is that the contents of a + group can change without opening the group. + + <hr> + <h1>Building a hierarchy from a flat namespace</h1> + + <p>Assuming we have a flat name space (that is, the root object is + a group which contains names for all other objects in the file + and none of those objects are groups), then we can build a + hierarchy of groups that also refer to the objects. + + <p>The file initially contains `foo' `bar' `baz' in the root + group. We wish to add groups `grp1' and `grp2' so that `grp1' + contains objects `foo' and `baz' and `grp2' contains objects + `bar' and `baz' (so `baz' appears in both groups). + + <p>In either case below, one might want to move the flat objects + into some other group (like `flat') so their names don't + interfere with the rest of the hierarchy (or move the hierarchy + into a directory called `/hierarchy'). + + <h2>with symbolic links</h2> + + <p>Create group `grp1' and add symbolic links called `foo' whose + value is `/foo' and `baz' whose value is `/baz'. Similarly for + `grp2'. + + <p>Accessing `grp1/foo' involves searching the root group for + the name `grp1', then searching that group for `foo', then + searching the root directory for `foo'. Alternatively, one + could change working groups to the grp1 group and then ask for + `foo' which searches `grp1' for the name `foo', then searches + the root group for the name `foo'. + + <p>Deleting `/grp1/foo' deletes the symbolic link without + affecting the `/foo' object. Deleting `/foo' leaves the + `/grp1/foo' link dangling. + + <h2>with hard links</h2> + + <p>Creating the hierarchy is the same as with symbolic links. + + <p>Accessing `/grp1/foo' searches the root group for the name + `grp1', then searches that group for the name `foo'. If the + current working group is `/grp1' then we just search for the + name `foo'. + + <p>Deleting `/grp1/foo' leaves `/foo' and vice versa. + + <h2>the code</h2> + + <p>Depending on the eventual API... + + <code><pre> +H5Gcreate (file_id, "/grp1"); +H5Glink (file_id, H5G_HARD, "/foo", "/grp1/foo"); + </pre></code> + + or + + <code><pre> +group_id = H5Gcreate (root_id, "grp1"); +H5Glink (file_id, H5G_HARD, root_id, "foo", group_id, "foo"); +H5Gclose (group_id); + </pre></code> + + + <hr> + <h1>Building a flat namespace from a hierarchy</h1> + + <p>Similar to abvoe, but in this case we have to watch out that + we don't get two names which are the same: what happens to + `/grp1/baz' and `/grp2/baz'? If they really refer to the same + object then we just have `/baz', but if they point to two + different objects what happens? + + <p>The other thing to watch out for cycles in the graph when we + traverse it to build the flat namespace. + + <hr> + <h1>Listing the Group Contents</h1> + + <p>Two things to watch out for are that the group contents don't + appear to change in a manner which would confuse the + application, and that listing everything in a group is as + efficient as possible. + + <h2>Method A</h2> + + <p>Query the number of things in a group and then query each item + by index. A trivial implementation would be O(n*n) and wouldn't + protect the caller from changes to the directory which move + entries around and therefore change their indices. + + <code><pre> +n = H5GgetNumContents (group_id); +for (i=0; i<n; i++) { + H5GgetNameByIndex (group_id, i, ...); /*don't worry about args yet*/ +} + </pre></code> + + <h2>Method B</h2> + + <p>The API contains a single function that reads all information + from the specified group and returns that info through an array. + The caller is responsible for freeing the array allocated by the + query and the things to which it points. This also makes it + clear the the returned value is a snapshot of the group which + doesn't change if the group is modified. + + <code><pre> +n = H5Glist (file_id, "/grp1", info, ...); +for (i=0; i<n; i++) { + printf ("name = %s\n", info[i].name); + free (info[i].name); /*and maybe other fields too?*/ +} +free (info); + </pre></code> + + Notice that it would be difficult to expand the info struct since + its definition is part of the API. + + <h2>Method C</h2> + + <p>The caller asks for a snapshot of the group and then accesses + items in the snapshot through various query-by-index API + functions. When finished, the caller notifies the library that + it's done with the snapshot. The word "snapshot" makes it clear + that subsequent changes to the directory will not be reflected in + the shapshot_id. + + <code><pre> +snapshot_id = H5Gsnapshot (group_id); /*or perhaps group_name */ +n = H5GgetNumContents (snapshot_id); +for (i=0; i<n; i++) { + H5GgetNameByIndex (shapshot_id, i, ...); +} +H5Grelease (shapshot_id); + </pre></code> + + In fact, we could allow the user to leave off the H5Gsnapshot and + H5Grelease and use group_id in the H5GgetNumContents and + H5GgetNameByIndex so they can choose between Method A and Method + C. + + <hr> + <h1>An implementation of Method C</h1> + + <dl> + <dt><code>hid_t H5Gshapshot (hid_t group_id)</code> + <dd>Opens every object in the specified group and stores the + handles in an array managed by the library (linear-time + operation). Open object handles are essentialy symbol table + entries with a little extra info (symbol table entries cache + certain things about the object which are also found in the + object header). Because the objects are open (A) they cannot be + removed from the group, (B) querying the object returns the + latest info even if something else has that object open, (C) + if the object is renamed within the group then its name with + <code>H5GgetNameByIndex</code> is changed. Adding new entries + to a group doesn't affect the snapshot. + + <dt><code>char *H5GgetNameByIndex (hid_t shapshot_id, int + index)</code> + <dd>Uses the open object handle from entry <code>index</code> of + the snapshot array to get the object name. This is a + constant-time operation. The name is updated automatically if + the object is renamed within the group. + + <dt><code>H5Gget<whatever>ByIndex...()</code> + <dd>Uses the open object handle from entry <code>index</code>, + which is just a symbol table entry, and reads the appropriate + object header message(s) which might be cached in the symbol + table entry. This is a constant-time operation if cached, + linear in the number of messages if not cached. + + <dt><code>H5Grelease (hid_t snapshot_id)</code> + <dd>Closes each object refered to by the snapshot and then frees + the snapshot array. This is a linear-time operation. + </dl> + + <hr> + <h1>To return <code>char*</code> or some HDF5 string type.</h1> + + <p>In either case, the caller has to release resources associated + with the return value, calling free() or some HDF5 function. + + <p>Names in the current implementation of the H5G package don't + contain embedded null characters and are always null terminated. + + <p>Eventually the caller probably wants a <code>char*</code> so it + can pass it to some non-HDF5 function, does that require + strdup'ing the string again? Then the caller has to free() the + the char* <i>and</i> release the DHF5 string. + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Fri Sep 26 12:03:20 EST 1997 --> +<!-- hhmts start --> +Last modified: Fri Oct 3 09:32:10 EST 1997 +<!-- hhmts end --> + </body> +</html> |