summaryrefslogtreecommitdiffstats
path: root/doc/html/review1a.html
blob: 78a5a84c35c207ffd6283b5b27a6176c857e3906 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>Group Examples</title>
  </head>
  <body>
    <center><h1>Group Examples</h1></center>

    <hr>
    <h1>Background</h1>

    <p>Directories (or now <i>Groups</i>) are currently implemented as
      a directed graph with a single entry point into the graph which
      is the <i>Root Object</i>. The root object is usually a
      group. All objects have at least one predecessor (the <i>Root
      Object</i> always has the HDF5 file boot block as a
      predecessor).  The number of predecessors of a group is also
      known as the <i>hard link count</i> or just <i>link count</i>.
      Unlike Unix directories, HDF5 groups have no ".."  entry since
      any group can have multiple predecessors.  Given the handle or
      id of some object and returning a full name for that object
      would be an expensive graph traversal.

    <p>A special optimization is that a file may contain a single
      non-group object and no group(s).  The object has one
      predecessor which is the file boot block.  However, once a root
      group is created it never dissappears (although I suppose it
      could if we wanted).

    <p>A special object called a <i>Symbolic Link</i> is simply a
      name. Usually the name refers to some (other) object, but that
      object need not exist.  Symbolic links in HDF5 will have the
      same semantics as symbolic links in Unix.

    <p>The symbol table graph contains "entries" for each name.  An
      entry contains the file address for the object header and
      possibly certain messages cached from the object header.

    <p>The H5G package understands the notion of <i>opening</i> and object
      which means that given the name of the object, a handle to the
      object is returned (this isn't an API function).  Objects can be
      opened multiple times simultaneously through the same name or,
      if the object has hard links, through other names.  The name of
      an object cannot be removed from a group if the object is opened
      through that group (although the name can change within the
      group).

    <p>Below the API, object attributes can be read without opening
      the object; object attributes cannot change without first
      opening that object. The one exception is that the contents of a
      group can change without opening the group.

    <hr>
    <h1>Building a hierarchy from a flat namespace</h1>

    <p>Assuming we have a flat name space (that is, the root object is
      a group which contains names for all other objects in the file
      and none of those objects are groups), then we can build a
      hierarchy of groups that also refer to the objects.

    <p>The file initially contains `foo' `bar' `baz' in the root
      group.  We wish to add groups `grp1' and `grp2' so that `grp1'
      contains objects `foo' and `baz' and `grp2' contains objects
      `bar' and `baz' (so `baz' appears in both groups).

    <p>In either case below, one might want to move the flat objects
      into some other group (like `flat') so their names don't
      interfere with the rest of the hierarchy (or move the hierarchy
      into a directory called `/hierarchy').

    <h2>with symbolic links</h2>

    <p>Create group `grp1' and add symbolic links called `foo' whose
      value is `/foo' and `baz' whose value is `/baz'.  Similarly for
      `grp2'.

    <p>Accessing `grp1/foo' involves searching the root group for
      the name `grp1', then searching that group for `foo', then
      searching the root directory for `foo'.  Alternatively, one
      could change working groups to the grp1 group and then ask for
      `foo' which searches `grp1' for the name `foo', then searches
      the root group for the name `foo'.

    <p>Deleting `/grp1/foo' deletes the symbolic link without
      affecting the `/foo' object.  Deleting `/foo' leaves the
      `/grp1/foo' link dangling.

    <h2>with hard links</h2>

    <p>Creating the hierarchy is the same as with symbolic links.

    <p>Accessing `/grp1/foo' searches the root group for the name
      `grp1', then searches that group for the name `foo'.  If the
      current working group is `/grp1' then we just search for the
      name `foo'.

    <p>Deleting `/grp1/foo' leaves `/foo' and vice versa.

    <h2>the code</h2>

    <p>Depending on the eventual API...

    <code><pre>
H5Gcreate (file_id, "/grp1");
H5Glink (file_id, H5G_HARD, "/foo", "/grp1/foo");
    </pre></code>

    or

    <code><pre>
group_id = H5Gcreate (root_id, "grp1");
H5Glink (file_id, H5G_HARD, root_id, "foo", group_id, "foo");
H5Gclose (group_id);
    </pre></code>


    <hr>
    <h1>Building a flat namespace from a hierarchy</h1>
    
    <p>Similar to abvoe, but in this case we have to watch out that
      we don't get two names which are the same: what happens to
      `/grp1/baz' and `/grp2/baz'?  If they really refer to the same
      object then we just have `/baz', but if they point to two
      different objects what happens?

    <p>The other thing to watch out for cycles in the graph when we
      traverse it to build the flat namespace.

    <hr>
    <h1>Listing the Group Contents</h1>

    <p>Two things to watch out for are that the group contents don't
      appear to change in a manner which would confuse the
      application, and that listing everything in a group is as
      efficient as possible.

    <h2>Method A</h2>

    <p>Query the number of things in a group and then query each item
      by index. A trivial implementation would be O(n*n) and wouldn't
      protect the caller from changes to the directory which move
      entries around and therefore change their indices.

      <code><pre>
n = H5GgetNumContents (group_id);
for (i=0; i&lt;n; i++) {
   H5GgetNameByIndex (group_id, i, ...); /*don't worry about args yet*/
}
    </pre></code>

    <h2>Method B</h2>

    <p>The API contains a single function that reads all information
      from the specified group and returns that info through an array.
      The caller is responsible for freeing the array allocated by the
      query and the things to which it points.  This also makes it
      clear the the returned value is a snapshot of the group which
      doesn't change if the group is modified.

      <code><pre>
n = H5Glist (file_id, "/grp1", info, ...);
for (i=0; i&lt;n; i++) {
   printf ("name = %s\n", info[i].name);
   free (info[i].name); /*and maybe other fields too?*/
}
free (info);
    </pre></code>

    Notice that it would be difficult to expand the info struct since
    its definition is part of the API.

    <h2>Method C</h2>

    <p>The caller asks for a snapshot of the group and then accesses
      items in the snapshot through various query-by-index API
      functions.  When finished, the caller notifies the library that
      it's done with the snapshot.  The word "snapshot" makes it clear
      that subsequent changes to the directory will not be reflected in
      the shapshot_id.

      <code><pre>
snapshot_id = H5Gsnapshot (group_id); /*or perhaps group_name */
n = H5GgetNumContents (snapshot_id);
for (i=0; i&lt;n; i++) {
   H5GgetNameByIndex (shapshot_id, i, ...);
}
H5Grelease (shapshot_id); 
    </pre></code>

    In fact, we could allow the user to leave off the H5Gsnapshot and
    H5Grelease and use group_id in the H5GgetNumContents and
    H5GgetNameByIndex so they can choose between Method A and Method
    C.

    <hr>
    <h1>An implementation of Method C</h1>

    <dl>
      <dt><code>hid_t H5Gshapshot (hid_t group_id)</code>
      <dd>Opens every object in the specified group and stores the
	handles in an array managed by the library (linear-time
	operation).  Open object handles are essentialy symbol table
	entries with a little extra info (symbol table entries cache
	certain things about the object which are also found in the
	object header). Because the objects are open (A) they cannot be
	removed from the group, (B) querying the object returns the
	latest info even if something else has that object open, (C)
	if the object is renamed within the group then its name with
	<code>H5GgetNameByIndex</code> is changed.  Adding new entries
	to a group doesn't affect the snapshot.

      <dt><code>char *H5GgetNameByIndex (hid_t shapshot_id, int
	  index)</code>
      <dd>Uses the open object handle from entry <code>index</code> of
	the snapshot array to get the object name.  This is a
	constant-time operation.  The name is updated automatically if
	the object is renamed within the group.

      <dt><code>H5Gget&lt;whatever&gt;ByIndex...()</code>
      <dd>Uses the open object handle from entry <code>index</code>,
	which is just a symbol table entry, and reads the appropriate
	object header message(s) which might be cached in the symbol
	table entry.  This is a constant-time operation if cached,
	linear in the number of messages if not cached.

      <dt><code>H5Grelease (hid_t snapshot_id)</code>
      <dd>Closes each object refered to by the snapshot and then frees
	the snapshot array.  This is a linear-time operation.
    </dl>

    <hr>
    <h1>To return <code>char*</code> or some HDF5 string type.</h1>

    <p>In either case, the caller has to release resources associated
      with the return value, calling free() or some HDF5 function.

    <p>Names in the current implementation of the H5G package don't
      contain embedded null characters and are always null terminated.

    <p>Eventually the caller probably wants a <code>char*</code> so it
      can pass it to some non-HDF5 function, does that require
      strdup'ing the string again? Then the caller has to free() the
      the char* <i>and</i> release the DHF5 string.

    <hr>
    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Fri Sep 26 12:03:20 EST 1997 -->
<!-- hhmts start -->
Last modified: Fri Oct  3 09:32:10 EST 1997
<!-- hhmts end -->
  </body>
</html>