1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>External Files in HDF5</title>
</head>
<body>
<center><h1>External Files in HDF5</h1></center>
<h3>Overview of Layers</h3>
<p>This table shows some of the layers of HDF5. Each layer calls
functions at the same or lower layers and never functions at
higher layers. An object identifier (OID) takes various forms
at the various layers: at layer 0 an OID is an absolute physical
file address; at layers 1 and 2 it's an absolute virtual file
address. At layers 3 through 6 it's a relative address, and at
layers 7 and above it's an object handle.
<p><center>
<table border cellpadding=4 width="60%">
<tr align=center>
<td>Layer-7</td>
<td>Groups</td>
<td>Datasets</td>
</tr>
<tr align=center>
<td>Layer-6</td>
<td>Indirect Storage</td>
<td>Symbol Tables</td>
</tr>
<tr align=center>
<td>Layer-5</td>
<td>B-trees</td>
<td>Object Hdrs</td>
<td>Heaps</td>
</tr>
<tr align=center>
<td>Layer-4</td>
<td>Caching</td>
</tr>
<tr align=center>
<td>Layer-3</td>
<td>H5F chunk I/O</td>
</tr>
<tr align=center>
<td>Layer-2</td>
<td>H5F low</td>
</tr>
<tr align=center>
<td>Layer-1</td>
<td>File Family</td>
<td>Split Meta/Raw</td>
</tr>
<tr align=center>
<td>Layer-0</td>
<td>Section-2 I/O</td>
<td>Standard I/O</td>
<td>Malloc/Free</td>
</tr>
</table>
</center>
<h3>Single Address Space</h3>
<p>The simplest form of hdf5 file is a single file containing only
hdf5 data. The file begins with the boot block, which is
followed until the end of the file by hdf5 data. The next most
complicated file allows non-hdf5 data (user defined data or
internal wrappers) to appear before the boot block and after the
end of the hdf5 data. The hdf5 data is treated as a single
linear address space in both cases.
<p>The next level of complexity comes when non-hdf5 data is
interspersed with the hdf5 data. We handle that by including
the non-hdf5 interspersed data in the hdf5 address space and
simply not referencing it (eventually we might add those
addresses to a "do-not-disturb" list using the same mechanism as
the hdf5 free list, but it's not absolutely necessary). This is
implemented except for the "do-not-disturb" list.
<p>The most complicated single address space hdf5 file is when we
allow the address space to be split among multiple physical
files. For instance, a >2GB file can be split into smaller
chunks and transfered to a 32 bit machine, then accessed as a
single logical hdf5 file. The library already supports >32 bit
addresses, so at layer 1 we split a 64-bit address into a 32-bit
file number and a 32-bit offset (the 64 and 32 are
arbitrary). The rest of the library still operates with a linear
address space.
<p>Another variation might be a family of two files where all the
meta data is stored in one file and all the raw data is stored
in another file to allow the HDF5 wrapper to be easily replaced
with some other wrapper.
<p>The <code>H5Fcreate</code> and <code>H5Fopen</code> functions
would need to be modified to pass file-type info down to layer 2
so the correct drivers can be called and parameters passed to
the drivers to initialize them.
<h4>Implementation</h4>
<p>I've implemented fixed-size family members. The entire hdf5
file is partitioned into members where each member is the same
size. The family scheme is used if one passes a name to
<code>H5F_open</code> (which is called by <code>H5Fopen()</code>
and <code>H5Fcreate</code>) that contains a
<code>printf(3c)</code>-style integer format specifier.
Currently, the default low-level file driver is used for all
family members (H5F_LOW_DFLT, usually set to be Section 2 I/O or
Section 3 stdio), but we'll probably eventually want to pass
that as a parameter of the file access property list, which
hasn't been implemented yet. When creating a family, a default
family member size is used (defined at the top H5Ffamily.c,
currently 64MB) but that also should be settable in the file
access property list. When opening an existing family, the size
of the first member is used to determine the member size
(flushing/closing a family ensures that the first member is the
correct size) but the other family members don't have to be that
large (the local address space, however, is logically the same
size for all members).
<p>I haven't implemented a split meta/raw family yet but am rather
curious to see how it would perform. I was planning to use the
`.h5' extension for the meta data file and `.raw' for the raw
data file. The high-order bit in the address would determine
whether the address refers to meta data or raw data. If the user
passes a name that ends with `.raw' to <code>H5F_open</code>
then we'll chose the split family and use the default low level
driver for each of the two family members. Eventually we'll
want to pass these kinds of things through the file access
property list instead of relying on naming convention.
<h3>External Raw Data</h3>
<p>We also need the ability to point to raw data that isn't in the
HDF5 linear address space. For instance, a dataset might be
striped across several raw data files.
<p>Fortunately, the only two packages that need to be aware of
this are the packages for reading/writing contiguous raw data
and discontiguous raw data. Since contiguous raw data is a
special case, I'll discuss how to implement external raw data in
the discontiguous case.
<p>Discontiguous data is stored as a B-tree whose keys are the
chunk indices and whose leaf nodes point to the raw data by
storing a file address. So what we need is some way to name the
external files, and a way to efficiently store the external file
name for each chunk.
<p>I propose adding to the object header an <em>External File
List</em> message that is a 1-origin array of file names.
Then, in the B-tree, each key has an index into the External
File List (or zero for the HDF5 file) for the file where the
chunk can be found. The external file index is only used at
the leaf nodes to get to the raw data (the entire B-tree is in
the HDF5 file) but because of the way keys are copied among
the B-tree nodes, it's much easier to store the index with
every key.
<h3>Multiple HDF5 Files</h3>
<p>One might also want to combine two or more HDF5 files in a
manner similar to mounting file systems in Unix. That is, the
group structure and meta data from one file appear as though
they exist in the first file. One opens File-A, and then
<em>mounts</em> File-B at some point in File-A, the <em>mount
point</em>, so that traversing into the mount point actually
causes one to enter the root object of File-B. File-A and
File-B are each complete HDF5 files and can be accessed
individually without mounting them.
<p>We need a couple additional pieces of machinery to make this
work. First, an haddr_t type (a file address) doesn't contain
any info about which HDF5 file's address space the address
belongs to. But since haddr_t is an opaque type except at
layers 2 and below, it should be quite easy to add a pointer to
the HDF5 file. This would also remove the H5F_t argument from
most of the low-level functions since it would be part of the
OID.
<p>The other thing we need is a table of mount points and some
functions that understand them. We would add the following
table to each H5F_t struct:
<p><code><pre>
struct H5F_mount_t {
H5F_t *parent; /* Parent HDF5 file if any */
struct {
H5F_t *f; /* File which is mounted */
haddr_t where; /* Address of mount point */
} *mount; /* Array sorted by mount point */
intn nmounts; /* Number of mounted files */
intn alloc; /* Size of mount table */
}
</pre></code>
<p>The <code>H5Fmount</code> function takes the ID of an open
file or group, the name of a to-be-mounted file, the name of the mount
point, and a file access property list (like <code>H5Fopen</code>).
It opens the new file and adds a record to the parent's mount
table. The <code>H5Funmount</code> function takes the parent
file or group ID and the name of the mount point and disassociates
the mounted file from the mount point. It does not close the
mounted file. The <code>H5Fclose</code>
function closes/unmounts files recursively.
<p>The <code>H5G_iname</code> function which translates a name to
a file address (<code>haddr_t</code>) looks at the mount table
at each step in the translation and switches files where
appropriate. All name-to-address translations occur through
this function.
<h3>How Long?</h3>
<p>I'm expecting to be able to implement the two new flavors of
single linear address space in about two days. It took two hours
to implement the malloc/free file driver at level zero and I
don't expect this to be much more work.
<p>I'm expecting three days to implement the external raw data for
discontiguous arrays. Adding the file index to the B-tree is
quite trivial; adding the external file list message shouldn't
be too hard since the object header message class from wich this
message derives is fully implemented; and changing
<code>H5F_istore_read</code> should be trivial. Most of the
time will be spent designing a way to cache Unix file
descriptors efficiently since the total number open files
allowed per process could be much smaller than the total number
of HDF5 files and external raw data files.
<p>I'm expecting four days to implement being able to mount one
HDF5 file on another. I was originally planning a lot more, but
making <code>haddr_t</code> opaque turned out to be much easier
than I planned (I did it last Fri). Most of the work will
probably be removing the redundant H5F_t arguments for lots of
functions.
<h3>Conclusion</h3>
<p>The external raw data could be implemented as a single linear
address space, but doing so would require one to allocate large
enough file addresses throughout the file (>32bits) before the
file was created. It would make mixing an HDF5 file family with
external raw data, or external HDF5 wrapper around an HDF4 file
a more difficult process. So I consider the implementation of
external raw data files as a single HDF5 linear address space a
kludge.
<p>The ability to mount one HDF5 file on another might not be a
very important feature especially since each HDF5 file must be a
complete file by itself. It's not possible to stripe an array
over multiple HDF5 files because the B-tree wouldn't be complete
in any one file, so the only choice is to stripe the array
across multiple raw data files and store the B-tree in the HDF5
file. On the other hand, it might be useful if one file
contains some public data which can be mounted by other files
(e.g., a mesh topology shared among collaborators and mounted by
files that contain other fields defined on the mesh). Of course
the applications can open the two files separately, but it might
be more portable if we support it in the library.
<p>So we're looking at about two weeks to implement all three
versions. I didn't get a chance to do any of them in AIO
although we had long-term plans for the first two with a
possibility of the third. They'll be much easier to implement in
HDF5 than AIO since I've been keeping these in mind from the
start.
<hr>
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Sat Nov 8 18:08:52 EST 1997 -->
<!-- hhmts start -->
Last modified: Tue Sep 8 14:43:32 EDT 1998
<!-- hhmts end -->
</body>
</html>
|