summaryrefslogtreecommitdiffstats
path: root/doc/html
diff options
context:
space:
mode:
authorFrank Baker <fbaker@hdfgroup.org>2000-05-01 21:31:11 (GMT)
committerFrank Baker <fbaker@hdfgroup.org>2000-05-01 21:31:11 (GMT)
commit7749127d803ca4f8a11220d8ddda083ca11d658a (patch)
treea1b7bfc6bb85fb4ff7cbe3d2515170a49d4cd568 /doc/html
parent74f1fc208d2e866a750b5069a6fd4948bca58baf (diff)
downloadhdf5-7749127d803ca4f8a11220d8ddda083ca11d658a.zip
hdf5-7749127d803ca4f8a11220d8ddda083ca11d658a.tar.gz
hdf5-7749127d803ca4f8a11220d8ddda083ca11d658a.tar.bz2
[svn-r2208] Big.html --> BigDataSmMach.html
Coding.html --> NamingScheme.html CodeReview.html ExternalFiles.html compat.html --> H4-H5Compat.html heap.txt --> HeapMgmt.html IOPipe.html Lib_Maint.html --> LibMaint.html MemoryManagement.html move.html --> MoveDStruct.html ObjectHeader.txt storage.html --> RawDStorage.html symtab --> SymbolTables.html Version.html Above files moved from doc/html/ to doc/html/TechNotes/ for into new "HDF5 Technical Notes" document. Filenames changed as indicated.
Diffstat (limited to 'doc/html')
-rw-r--r--doc/html/TechNotes/BigDataSmMach.html122
-rw-r--r--doc/html/TechNotes/CodeReview.html300
-rw-r--r--doc/html/TechNotes/ExternalFiles.html279
-rw-r--r--doc/html/TechNotes/H4-H5Compat.html271
-rw-r--r--doc/html/TechNotes/HeapMgmt.html79
-rw-r--r--doc/html/TechNotes/IOPipe.html114
-rw-r--r--doc/html/TechNotes/LibMaint.html122
-rw-r--r--doc/html/TechNotes/MemoryMgmt.html510
-rw-r--r--doc/html/TechNotes/MoveDStruct.html66
-rw-r--r--doc/html/TechNotes/NamingScheme.html300
-rw-r--r--doc/html/TechNotes/ObjectHeader.html67
-rw-r--r--doc/html/TechNotes/RawDStorage.html274
-rw-r--r--doc/html/TechNotes/SymbolTables.html323
-rw-r--r--doc/html/TechNotes/Version.html137
14 files changed, 2964 insertions, 0 deletions
diff --git a/doc/html/TechNotes/BigDataSmMach.html b/doc/html/TechNotes/BigDataSmMach.html
new file mode 100644
index 0000000..fe00ff8
--- /dev/null
+++ b/doc/html/TechNotes/BigDataSmMach.html
@@ -0,0 +1,122 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <title>Big Datasets on Small Machines</title>
+ </head>
+
+ <body>
+ <h1>Big Datasets on Small Machines</h1>
+
+ <h2>1. Introduction</h2>
+
+ <p>The HDF5 library is able to handle files larger than the
+ maximum file size, and datasets larger than the maximum memory
+ size. For instance, a machine where <code>sizeof(off_t)</code>
+ and <code>sizeof(size_t)</code> are both four bytes can handle
+ datasets and files as large as 18x10^18 bytes. However, most
+ Unix systems limit the number of concurrently open files, so a
+ practical file size limit is closer to 512GB or 1TB.
+
+ <p>Two "tricks" must be imployed on these small systems in order
+ to store large datasets. The first trick circumvents the
+ <code>off_t</code> file size limit and the second circumvents
+ the <code>size_t</code> main memory limit.
+
+ <h2>2. File Size Limits</h2>
+
+ <p>Systems that have 64-bit file addresses will be able to access
+ those files automatically. One should see the following output
+ from configure:
+
+ <p><code><pre>
+checking size of off_t... 8
+ </pre></code>
+
+ <p>Also, some 32-bit operating systems have special file systems
+ that can support large (&gt;2GB) files and HDF5 will detect
+ these and use them automatically. If this is the case, the
+ output from configure will show:
+
+ <p><code><pre>
+checking for lseek64... yes
+checking for fseek64... yes
+ </pre></code>
+
+ <p>Otherwise one must use an HDF5 file family. Such a family is
+ created by setting file family properties in a file access
+ property list and then supplying a file name that includes a
+ <code>printf</code>-style integer format. For instance:
+
+ <p><code><pre>
+hid_t plist, file;
+plist = H5Pcreate (H5P_FILE_ACCESS);
+H5Pset_family (plist, 1&lt;&lt;30, H5P_DEFAULT);
+file = H5Fcreate ("big%03d.h5", H5F_ACC_TRUNC, H5P_DEFAULT, plist);
+ </code></pre>
+
+ <p>The second argument (<code>1&lt;&lt;30</code>) to
+ <code>H5Pset_family()</code> indicates that the family members
+ are to be 2^30 bytes (1GB) each although we could have used any
+ reasonably large value. In general, family members cannot be
+ 2GB because writes to byte number 2,147,483,647 will fail, so
+ the largest safe value for a family member is 2,147,483,647.
+ HDF5 will create family members on demand as the HDF5 address
+ space increases, but since most Unix systems limit the number of
+ concurrently open files the effective maximum size of the HDF5
+ address space will be limited (the system on which this was
+ developed allows 1024 open files, so if each family member is
+ approx 2GB then the largest HDF5 file is approx 2TB).
+
+ <p>If the effective HDF5 address space is limited then one may be
+ able to store datasets as external datasets each spanning
+ multiple files of any length since HDF5 opens external dataset
+ files one at a time. To arrange storage for a 5TB dataset split
+ among 1GB files one could say:
+
+ <p><code><pre>
+hid_t plist = H5Pcreate (H5P_DATASET_CREATE);
+for (i=0; i&lt;5*1024; i++) {
+ sprintf (name, "velocity-%04d.raw", i);
+ H5Pset_external (plist, name, 0, (size_t)1&lt;&lt;30);
+}
+ </code></pre>
+
+ <h2>3. Dataset Size Limits</h2>
+
+ <p>The second limit which must be overcome is that of
+ <code>sizeof(size_t)</code>. HDF5 defines a data type called
+ <code>hsize_t</code> which is used for sizes of datasets and is,
+ by default, defined as <code>unsigned long long</code>.
+
+ <p>To create a dataset with 8*2^30 4-byte integers for a total of
+ 32GB one first creates the dataspace. We give two examples
+ here: a 4-dimensional dataset whose dimension sizes are smaller
+ than the maximum value of a <code>size_t</code>, and a
+ 1-dimensional dataset whose dimension size is too large to fit
+ in a <code>size_t</code>.
+
+ <p><code><pre>
+hsize_t size1[4] = {8, 1024, 1024, 1024};
+hid_t space1 = H5Screate_simple (4, size1, size1);
+
+hsize_t size2[1] = {8589934592LL};
+hid_t space2 = H5Screate_simple (1, size2, size2};
+ </pre></code>
+
+ <p>However, the <code>LL</code> suffix is not portable, so it may
+ be better to replace the number with
+ <code>(hsize_t)8*1024*1024*1024</code>.
+
+ <p>For compilers that don't support <code>long long</code> large
+ datasets will not be possible. The library performs too much
+ arithmetic on <code>hsize_t</code> types to make the use of a
+ struct feasible.
+
+ <hr>
+ <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Fri Apr 10 13:26:04 EDT 1998 -->
+<!-- hhmts start -->
+Last modified: Sun Jul 19 11:37:25 EDT 1998
+<!-- hhmts end -->
+ </body>
+</html>
diff --git a/doc/html/TechNotes/CodeReview.html b/doc/html/TechNotes/CodeReview.html
new file mode 100644
index 0000000..213cbbe
--- /dev/null
+++ b/doc/html/TechNotes/CodeReview.html
@@ -0,0 +1,300 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <title>Code Review</title>
+ </head>
+ <body>
+ <center><h1>Code Review 1</h1></center>
+
+ <h3>Some background...</h3>
+ <p>This is one of the functions exported from the
+ <code>H5B.c</code> file that implements a B-link-tree class
+ without worrying about concurrency yet (thus the `Note:' in the
+ function prologue). The <code>H5B.c</code> file provides the
+ basic machinery for operating on generic B-trees, but it isn't
+ much use by itself. Various subclasses of the B-tree (like
+ symbol tables or indirect storage) provide their own interface
+ and back end to this function. For instance,
+ <code>H5G_stab_find()</code> takes a symbol table OID and a name
+ and calls <code>H5B_find()</code> with an appropriate
+ <code>udata</code> argument that eventually gets passed to the
+ <code>H5G_stab_find()</code> function.
+
+ <p><code><pre>
+ 1 /*-------------------------------------------------------------------------
+ 2 * Function: H5B_find
+ 3 *
+ 4 * Purpose: Locate the specified information in a B-tree and return
+ 5 * that information by filling in fields of the caller-supplied
+ 6 * UDATA pointer depending on the type of leaf node
+ 7 * requested. The UDATA can point to additional data passed
+ 8 * to the key comparison function.
+ 9 *
+10 * Note: This function does not follow the left/right sibling
+11 * pointers since it assumes that all nodes can be reached
+12 * from the parent node.
+13 *
+14 * Return: Success: SUCCEED if found, values returned through the
+15 * UDATA argument.
+16 *
+17 * Failure: FAIL if not found, UDATA is undefined.
+18 *
+19 * Programmer: Robb Matzke
+20 * matzke@llnl.gov
+21 * Jun 23 1997
+22 *
+23 * Modifications:
+24 *
+25 *-------------------------------------------------------------------------
+26 */
+27 herr_t
+28 H5B_find (H5F_t *f, const H5B_class_t *type, const haddr_t *addr, void *udata)
+29 {
+30 H5B_t *bt=NULL;
+31 intn idx=-1, lt=0, rt, cmp=1;
+32 int ret_value = FAIL;
+ </pre></code>
+
+ <p>All pointer arguments are initialized when defined. I don't
+ worry much about non-pointers because it's usually obvious when
+ the value isn't initialized.
+
+ <p><code><pre>
+33
+34 FUNC_ENTER (H5B_find, NULL, FAIL);
+35
+36 /*
+37 * Check arguments.
+38 */
+39 assert (f);
+40 assert (type);
+41 assert (type->decode);
+42 assert (type->cmp3);
+43 assert (type->found);
+44 assert (addr && H5F_addr_defined (addr));
+ </pre></code>
+
+ <p>I use <code>assert</code> to check invariant conditions. At
+ this level of the library, none of these assertions should fail
+ unless something is majorly wrong. The arguments should have
+ already been checked by higher layers. It also provides
+ documentation about what arguments might be optional.
+
+ <p><code><pre>
+45
+46 /*
+47 * Perform a binary search to locate the child which contains
+48 * the thing for which we're searching.
+49 */
+50 if (NULL==(bt=H5AC_protect (f, H5AC_BT, addr, type, udata))) {
+51 HGOTO_ERROR (H5E_BTREE, H5E_CANTLOAD, FAIL);
+52 }
+ </pre></code>
+
+ <p>You'll see this quite often in the low-level stuff and it's
+ documented in the <code>H5AC.c</code> file. The
+ <code>H5AC_protect</code> insures that the B-tree node (which
+ inherits from the H5AC package) whose OID is <code>addr</code>
+ is locked into memory for the duration of this function (see the
+ <code>H5AC_unprotect</code> on line 90). Most likely, if this
+ node has been accessed in the not-to-distant past, it will still
+ be in memory and the <code>H5AC_protect</code> is almost a
+ no-op. If cache debugging is compiled in, then the protect also
+ prevents other parts of the library from accessing the node
+ while this function is protecting it, so this function can allow
+ the node to be in an inconsistent state while calling other
+ parts of the library.
+
+ <p>The alternative is to call the slighlty cheaper
+ <code>H5AC_find</code> and assume that the pointer it returns is
+ valid only until some other library function is called, but
+ since we're accessing the pointer throughout this function, I
+ chose to use the simpler protect scheme. All protected objects
+ <em>must be unprotected</em> before the file is closed, thus the
+ use of <code>HGOTO_ERROR</code> instead of
+ <code>HRETURN_ERROR</code>.
+
+ <p><code><pre>
+53 rt = bt->nchildren;
+54
+55 while (lt&lt;rt && cmp) {
+56 idx = (lt + rt) / 2;
+57 if (H5B_decode_keys (f, bt, idx)&lt;0) {
+58 HGOTO_ERROR (H5E_BTREE, H5E_CANTDECODE, FAIL);
+59 }
+60
+61 /* compare */
+62 if ((cmp=(type-&gt;cmp3)(f, bt->key[idx].nkey, udata,
+63 bt->key[idx+1].nkey))&lt;0) {
+64 rt = idx;
+65 } else {
+66 lt = idx+1;
+67 }
+68 }
+69 if (cmp) {
+70 HGOTO_ERROR (H5E_BTREE, H5E_NOTFOUND, FAIL);
+71 }
+ </pre></code>
+
+ <p>Code is arranged in paragraphs with a comment starting each
+ paragraph. The previous paragraph is a standard binary search
+ algorithm. The <code>(type-&gt;cmp3)()</code> is an indirect
+ function call into the subclass of the B-tree. All indirect
+ function calls have the function part in parentheses to document
+ that it's indirect (quite obvious here, but not so obvious when
+ the function is a variable).
+
+ <p>It's also my standard practice to have side effects in
+ conditional expressions because I can write code faster and it's
+ more apparent to me what the condition is testing. But if I
+ have an assignment in a conditional expr, then I use an extra
+ set of parens even if they're not required (usually they are, as
+ in this case) so it's clear that I meant <code>=</code> instead
+ of <code>==</code>.
+
+ <p><code><pre>
+72
+73 /*
+74 * Follow the link to the subtree or to the data node.
+75 */
+76 assert (idx&gt;=0 && idx<bt->nchildren);
+77 if (bt->level > 0) {
+78 if ((ret_value = H5B_find (f, type, bt->child+idx, udata))&lt;0) {
+79 HGOTO_ERROR (H5E_BTREE, H5E_NOTFOUND, FAIL);
+80 }
+81 } else {
+82 ret_value = (type-&gt;found)(f, bt->child+idx, bt->key[idx].nkey,
+83 udata, bt->key[idx+1].nkey);
+84 if (ret_value&lt;0) {
+85 HGOTO_ERROR (H5E_BTREE, H5E_NOTFOUND, FAIL);
+86 }
+87 }
+ </pre></code>
+
+ <p>Here I broke the "side effect in conditional" rule, which I
+ sometimes do if the expression is so long that the
+ <code>&lt;0</code> gets lost at the end. Another thing to note is
+ that success/failure is always determined by comparing with zero
+ instead of <code>SUCCEED</code> or <code>FAIL</code>. I do this
+ because occassionally one might want to return other meaningful
+ values (always non-negative) or distinguish between various types of
+ failure (always negative).
+
+ <p><code><pre>
+88
+89 done:
+90 if (bt && H5AC_unprotect (f, H5AC_BT, addr, bt)&lt;0) {
+91 HRETURN_ERROR (H5E_BTREE, H5E_PROTECT, FAIL);
+92 }
+93 FUNC_LEAVE (ret_value);
+94 }
+ </pre></code>
+
+ <p>For lack of a better way to handle errors during error cleanup,
+ I just call the <code>HRETURN_ERROR</code> macro even though it
+ will make the error stack not quite right. I also use short
+ circuiting boolean operators instead of nested <code>if</code>
+ statements since that's standard C practice.
+
+ <center><h1>Code Review 2</h1></center>
+
+
+ <p>The following code is an API function from the H5F package...
+
+ <p><code><pre>
+ 1 /*--------------------------------------------------------------------------
+ 2 NAME
+ 3 H5Fflush
+ 4
+ 5 PURPOSE
+ 6 Flush all cached data to disk and optionally invalidates all cached
+ 7 data.
+ 8
+ 9 USAGE
+10 herr_t H5Fflush(fid, invalidate)
+11 hid_t fid; IN: File ID of file to close.
+12 hbool_t invalidate; IN: Invalidate all of the cache?
+13
+14 ERRORS
+15 ARGS BADTYPE Not a file atom.
+16 ATOM BADATOM Can't get file struct.
+17 CACHE CANTFLUSH Flush failed.
+18
+19 RETURNS
+20 SUCCEED/FAIL
+21
+22 DESCRIPTION
+23 This function flushes all cached data to disk and, if INVALIDATE
+24 is non-zero, removes cached objects from the cache so they must be
+25 re-read from the file on the next access to the object.
+26
+27 MODIFICATIONS:
+28 --------------------------------------------------------------------------*/
+ </pre></code>
+
+ <p>An API prologue is used for each API function instead of my
+ normal function prologue. I use the prologue from Code Review 1
+ for non-API functions because it's more suited to C programmers,
+ it requires less work to keep it synchronized with the code, and
+ I have better editing tools for it.
+
+ <p><code><pre>
+29 herr_t
+30 H5Fflush (hid_t fid, hbool_t invalidate)
+31 {
+32 H5F_t *file = NULL;
+33
+34 FUNC_ENTER (H5Fflush, H5F_init_interface, FAIL);
+35 H5ECLEAR;
+ </pre></code>
+
+ <p>API functions are never called internally, therefore I always
+ clear the error stack before doing anything.
+
+ <p><code><pre>
+36
+37 /* check arguments */
+38 if (H5_FILE!=H5Aatom_group (fid)) {
+39 HRETURN_ERROR (H5E_ARGS, H5E_BADTYPE, FAIL); /*not a file atom*/
+40 }
+41 if (NULL==(file=H5Aatom_object (fid))) {
+42 HRETURN_ERROR (H5E_ATOM, H5E_BADATOM, FAIL); /*can't get file struct*/
+43 }
+ </pre></code>
+
+ <p>If something is wrong with the arguments then we raise an
+ error. We never <code>assert</code> arguments at this level.
+ We also convert atoms to pointers since atoms are really just a
+ pointer-hiding mechanism. Functions that can be called
+ internally always have pointer arguments instead of atoms
+ because (1) then they don't have to always convert atoms to
+ pointers, and (2) the various pointer data types provide more
+ documentation and type checking than just an <code>hid_t</code>
+ type.
+
+ <p><code><pre>
+44
+45 /* do work */
+46 if (H5F_flush (file, invalidate)&lt;0) {
+47 HRETURN_ERROR (H5E_CACHE, H5E_CANTFLUSH, FAIL); /*flush failed*/
+48 }
+ </pre></code>
+
+ <p>An internal version of the function does the real work. That
+ internal version calls <code>assert</code> to check/document
+ it's arguments and can be called from other library functions.
+
+ <p><code><pre>
+49
+50 FUNC_LEAVE (SUCCEED);
+51 }
+ </pre></code>
+
+ <hr>
+ <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Sat Nov 8 17:09:33 EST 1997 -->
+<!-- hhmts start -->
+Last modified: Mon Nov 10 15:33:33 EST 1997
+<!-- hhmts end -->
+ </body>
+</html>
diff --git a/doc/html/TechNotes/ExternalFiles.html b/doc/html/TechNotes/ExternalFiles.html
new file mode 100644
index 0000000..c3197af
--- /dev/null
+++ b/doc/html/TechNotes/ExternalFiles.html
@@ -0,0 +1,279 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <title>External Files in HDF5</title>
+ </head>
+
+ <body>
+ <center><h1>External Files in HDF5</h1></center>
+
+ <h3>Overview of Layers</h3>
+
+ <p>This table shows some of the layers of HDF5. Each layer calls
+ functions at the same or lower layers and never functions at
+ higher layers. An object identifier (OID) takes various forms
+ at the various layers: at layer 0 an OID is an absolute physical
+ file address; at layers 1 and 2 it's an absolute virtual file
+ address. At layers 3 through 6 it's a relative address, and at
+ layers 7 and above it's an object handle.
+
+ <p><center>
+ <table border cellpadding=4 width="60%">
+ <tr align=center>
+ <td>Layer-7</td>
+ <td>Groups</td>
+ <td>Datasets</td>
+ </tr>
+ <tr align=center>
+ <td>Layer-6</td>
+ <td>Indirect Storage</td>
+ <td>Symbol Tables</td>
+ </tr>
+ <tr align=center>
+ <td>Layer-5</td>
+ <td>B-trees</td>
+ <td>Object Hdrs</td>
+ <td>Heaps</td>
+ </tr>
+ <tr align=center>
+ <td>Layer-4</td>
+ <td>Caching</td>
+ </tr>
+ <tr align=center>
+ <td>Layer-3</td>
+ <td>H5F chunk I/O</td>
+ </tr>
+ <tr align=center>
+ <td>Layer-2</td>
+ <td>H5F low</td>
+ </tr>
+ <tr align=center>
+ <td>Layer-1</td>
+ <td>File Family</td>
+ <td>Split Meta/Raw</td>
+ </tr>
+ <tr align=center>
+ <td>Layer-0</td>
+ <td>Section-2 I/O</td>
+ <td>Standard I/O</td>
+ <td>Malloc/Free</td>
+ </tr>
+ </table>
+ </center>
+
+ <h3>Single Address Space</h3>
+
+ <p>The simplest form of hdf5 file is a single file containing only
+ hdf5 data. The file begins with the boot block, which is
+ followed until the end of the file by hdf5 data. The next most
+ complicated file allows non-hdf5 data (user defined data or
+ internal wrappers) to appear before the boot block and after the
+ end of the hdf5 data. The hdf5 data is treated as a single
+ linear address space in both cases.
+
+ <p>The next level of complexity comes when non-hdf5 data is
+ interspersed with the hdf5 data. We handle that by including
+ the non-hdf5 interspersed data in the hdf5 address space and
+ simply not referencing it (eventually we might add those
+ addresses to a "do-not-disturb" list using the same mechanism as
+ the hdf5 free list, but it's not absolutely necessary). This is
+ implemented except for the "do-not-disturb" list.
+
+ <p>The most complicated single address space hdf5 file is when we
+ allow the address space to be split among multiple physical
+ files. For instance, a >2GB file can be split into smaller
+ chunks and transfered to a 32 bit machine, then accessed as a
+ single logical hdf5 file. The library already supports >32 bit
+ addresses, so at layer 1 we split a 64-bit address into a 32-bit
+ file number and a 32-bit offset (the 64 and 32 are
+ arbitrary). The rest of the library still operates with a linear
+ address space.
+
+ <p>Another variation might be a family of two files where all the
+ meta data is stored in one file and all the raw data is stored
+ in another file to allow the HDF5 wrapper to be easily replaced
+ with some other wrapper.
+
+ <p>The <code>H5Fcreate</code> and <code>H5Fopen</code> functions
+ would need to be modified to pass file-type info down to layer 2
+ so the correct drivers can be called and parameters passed to
+ the drivers to initialize them.
+
+ <h4>Implementation</h4>
+
+ <p>I've implemented fixed-size family members. The entire hdf5
+ file is partitioned into members where each member is the same
+ size. The family scheme is used if one passes a name to
+ <code>H5F_open</code> (which is called by <code>H5Fopen()</code>
+ and <code>H5Fcreate</code>) that contains a
+ <code>printf(3c)</code>-style integer format specifier.
+ Currently, the default low-level file driver is used for all
+ family members (H5F_LOW_DFLT, usually set to be Section 2 I/O or
+ Section 3 stdio), but we'll probably eventually want to pass
+ that as a parameter of the file access property list, which
+ hasn't been implemented yet. When creating a family, a default
+ family member size is used (defined at the top H5Ffamily.c,
+ currently 64MB) but that also should be settable in the file
+ access property list. When opening an existing family, the size
+ of the first member is used to determine the member size
+ (flushing/closing a family ensures that the first member is the
+ correct size) but the other family members don't have to be that
+ large (the local address space, however, is logically the same
+ size for all members).
+
+ <p>I haven't implemented a split meta/raw family yet but am rather
+ curious to see how it would perform. I was planning to use the
+ `.h5' extension for the meta data file and `.raw' for the raw
+ data file. The high-order bit in the address would determine
+ whether the address refers to meta data or raw data. If the user
+ passes a name that ends with `.raw' to <code>H5F_open</code>
+ then we'll chose the split family and use the default low level
+ driver for each of the two family members. Eventually we'll
+ want to pass these kinds of things through the file access
+ property list instead of relying on naming convention.
+
+ <h3>External Raw Data</h3>
+
+ <p>We also need the ability to point to raw data that isn't in the
+ HDF5 linear address space. For instance, a dataset might be
+ striped across several raw data files.
+
+ <p>Fortunately, the only two packages that need to be aware of
+ this are the packages for reading/writing contiguous raw data
+ and discontiguous raw data. Since contiguous raw data is a
+ special case, I'll discuss how to implement external raw data in
+ the discontiguous case.
+
+ <p>Discontiguous data is stored as a B-tree whose keys are the
+ chunk indices and whose leaf nodes point to the raw data by
+ storing a file address. So what we need is some way to name the
+ external files, and a way to efficiently store the external file
+ name for each chunk.
+
+ <p>I propose adding to the object header an <em>External File
+ List</em> message that is a 1-origin array of file names.
+ Then, in the B-tree, each key has an index into the External
+ File List (or zero for the HDF5 file) for the file where the
+ chunk can be found. The external file index is only used at
+ the leaf nodes to get to the raw data (the entire B-tree is in
+ the HDF5 file) but because of the way keys are copied among
+ the B-tree nodes, it's much easier to store the index with
+ every key.
+
+ <h3>Multiple HDF5 Files</h3>
+
+ <p>One might also want to combine two or more HDF5 files in a
+ manner similar to mounting file systems in Unix. That is, the
+ group structure and meta data from one file appear as though
+ they exist in the first file. One opens File-A, and then
+ <em>mounts</em> File-B at some point in File-A, the <em>mount
+ point</em>, so that traversing into the mount point actually
+ causes one to enter the root object of File-B. File-A and
+ File-B are each complete HDF5 files and can be accessed
+ individually without mounting them.
+
+ <p>We need a couple additional pieces of machinery to make this
+ work. First, an haddr_t type (a file address) doesn't contain
+ any info about which HDF5 file's address space the address
+ belongs to. But since haddr_t is an opaque type except at
+ layers 2 and below, it should be quite easy to add a pointer to
+ the HDF5 file. This would also remove the H5F_t argument from
+ most of the low-level functions since it would be part of the
+ OID.
+
+ <p>The other thing we need is a table of mount points and some
+ functions that understand them. We would add the following
+ table to each H5F_t struct:
+
+ <p><code><pre>
+struct H5F_mount_t {
+ H5F_t *parent; /* Parent HDF5 file if any */
+ struct {
+ H5F_t *f; /* File which is mounted */
+ haddr_t where; /* Address of mount point */
+ } *mount; /* Array sorted by mount point */
+ intn nmounts; /* Number of mounted files */
+ intn alloc; /* Size of mount table */
+}
+ </pre></code>
+
+ <p>The <code>H5Fmount</code> function takes the ID of an open
+ file or group, the name of a to-be-mounted file, the name of the mount
+ point, and a file access property list (like <code>H5Fopen</code>).
+ It opens the new file and adds a record to the parent's mount
+ table. The <code>H5Funmount</code> function takes the parent
+ file or group ID and the name of the mount point and disassociates
+ the mounted file from the mount point. It does not close the
+ mounted file. The <code>H5Fclose</code>
+ function closes/unmounts files recursively.
+
+ <p>The <code>H5G_iname</code> function which translates a name to
+ a file address (<code>haddr_t</code>) looks at the mount table
+ at each step in the translation and switches files where
+ appropriate. All name-to-address translations occur through
+ this function.
+
+ <h3>How Long?</h3>
+
+ <p>I'm expecting to be able to implement the two new flavors of
+ single linear address space in about two days. It took two hours
+ to implement the malloc/free file driver at level zero and I
+ don't expect this to be much more work.
+
+ <p>I'm expecting three days to implement the external raw data for
+ discontiguous arrays. Adding the file index to the B-tree is
+ quite trivial; adding the external file list message shouldn't
+ be too hard since the object header message class from wich this
+ message derives is fully implemented; and changing
+ <code>H5F_istore_read</code> should be trivial. Most of the
+ time will be spent designing a way to cache Unix file
+ descriptors efficiently since the total number open files
+ allowed per process could be much smaller than the total number
+ of HDF5 files and external raw data files.
+
+ <p>I'm expecting four days to implement being able to mount one
+ HDF5 file on another. I was originally planning a lot more, but
+ making <code>haddr_t</code> opaque turned out to be much easier
+ than I planned (I did it last Fri). Most of the work will
+ probably be removing the redundant H5F_t arguments for lots of
+ functions.
+
+ <h3>Conclusion</h3>
+
+ <p>The external raw data could be implemented as a single linear
+ address space, but doing so would require one to allocate large
+ enough file addresses throughout the file (>32bits) before the
+ file was created. It would make mixing an HDF5 file family with
+ external raw data, or external HDF5 wrapper around an HDF4 file
+ a more difficult process. So I consider the implementation of
+ external raw data files as a single HDF5 linear address space a
+ kludge.
+
+ <p>The ability to mount one HDF5 file on another might not be a
+ very important feature especially since each HDF5 file must be a
+ complete file by itself. It's not possible to stripe an array
+ over multiple HDF5 files because the B-tree wouldn't be complete
+ in any one file, so the only choice is to stripe the array
+ across multiple raw data files and store the B-tree in the HDF5
+ file. On the other hand, it might be useful if one file
+ contains some public data which can be mounted by other files
+ (e.g., a mesh topology shared among collaborators and mounted by
+ files that contain other fields defined on the mesh). Of course
+ the applications can open the two files separately, but it might
+ be more portable if we support it in the library.
+
+ <p>So we're looking at about two weeks to implement all three
+ versions. I didn't get a chance to do any of them in AIO
+ although we had long-term plans for the first two with a
+ possibility of the third. They'll be much easier to implement in
+ HDF5 than AIO since I've been keeping these in mind from the
+ start.
+
+ <hr>
+ <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Sat Nov 8 18:08:52 EST 1997 -->
+<!-- hhmts start -->
+Last modified: Tue Sep 8 14:43:32 EDT 1998
+<!-- hhmts end -->
+ </body>
+</html>
diff --git a/doc/html/TechNotes/H4-H5Compat.html b/doc/html/TechNotes/H4-H5Compat.html
new file mode 100644
index 0000000..2992476
--- /dev/null
+++ b/doc/html/TechNotes/H4-H5Compat.html
@@ -0,0 +1,271 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <title>Backward/Forward Compatability</title>
+ </head>
+
+ <body>
+ <h1>Backward/Forward Compatability</h1>
+
+ <p>The HDF5 development must proceed in such a manner as to
+ satisfy the following conditions:
+
+ <ol type=A>
+ <li>HDF5 applications can produce data that HDF5
+ applications can read and write and HDF4 applications can produce
+ data that HDF4 applications can read and write. The situation
+ that demands this condition is obvious.</li>
+
+ <li>HDF5 applications are able to produce data that HDF4 applications
+ can read and HDF4 applications can subsequently modify the
+ file subject to certain constraints depending on the
+ implementation. This condition is for the temporary
+ situation where a consumer has neither been relinked with a new
+ HDF4 API built on top of the HDF5 API nor recompiled with the
+ HDF5 API.</li>
+
+ <li>HDF5 applications can read existing HDF4 files and subsequently
+ modify the file subject to certain constraints depending on
+ the implementation. This is condition is for the temporary
+ situation in which the producer has neither been relinked with a
+ new HDF4 API built on top of the HDF5 API nor recompiled with
+ the HDF5 API, or the permanent situation of HDF5 consumers
+ reading archived HDF4 files.</li>
+ </ul>
+
+ <p>There's at least one invarient: new object features introduced
+ in the HDF5 file format (like 2-d arrays of structs) might be
+ impossible to "translate" to a format that an old HDF4
+ application can understand either because the HDF4 file format
+ or the HDF4 API has no mechanism to describe the object.
+
+ <p>What follows is one possible implementation based on how
+ Condition B was solved in the AIO/PDB world. It also attempts
+ to satisfy these goals:
+
+ <ol type=1>
+ <li>The main HDF5 library contains as little extra baggage as
+ possible by either relying on external programs to take care
+ of compatability issues or by incorporating the logic of such
+ programs as optional modules in the HDF5 library. Conditions B
+ and C are separate programs/modules.</li>
+
+ <li>No extra baggage not only means the library proper is small,
+ but also means it can be implemented (rather than migrated
+ from HDF4 source) from the ground up with minimal regard for
+ HDF4 thus keeping the logic straight forward.</li>
+
+ <li>Compatability issues are handled behind the scenes when
+ necessary (and possible) but can be carried out explicitly
+ during things like data migration.</li>
+ </ol>
+
+ <hr>
+ <h2>Wrappers</h2>
+
+ <p>The proposed implementation uses <i>wrappers</i> to handle
+ compatability issues. A Format-X file is <i>wrapped</i> in a
+ Format-Y file by creating a Format-Y skeleton that replicates
+ the Format-X meta data. The Format-Y skeleton points to the raw
+ data stored in Format-X without moving the raw data. The
+ restriction is that raw data storage methods in Format-Y is a
+ superset of raw data storage methods in Format-X (otherwise the
+ raw data must be copied to Format-Y). We're assuming that meta
+ data is small wrt the entire file.
+
+ <p>The wrapper can be a separate file that has pointers into the
+ first file or it can be contained within the first file. If
+ contained in a single file, the file can appear as a Format-Y
+ file or simultaneously a Format-Y and Format-X file.
+
+ <p>The Format-X meta-data can be thought of as the original
+ wrapper around raw data and Format-Y is a second wrapper around
+ the same data. The wrappers are independend of one another;
+ modifying the meta-data in one wrapper causes the other to
+ become out of date. Modification of raw data doesn't invalidate
+ either view as long as the meta data that describes its storage
+ isn't modifed. For instance, an array element can change values
+ if storage is already allocated for the element, but if storage
+ isn't allocated then the meta data describing the storage must
+ change, invalidating all wrappers but one.
+
+ <p>It's perfectly legal to modify the meta data of one wrapper
+ without modifying the meta data in the other wrapper(s). The
+ illegal part is accessing the raw data through a wrapper which
+ is out of date.
+
+ <p>If raw data is wrapped by more than one internal wrapper
+ (<i>internal</i> means that the wrapper is in the same file as
+ the raw data) then access to that file must assume that
+ unreferenced parts of that file contain meta data for another
+ wrapper and cannot be reclaimed as free memory.
+
+ <hr>
+ <h2>Implementation of Condition B</h2>
+
+ <p>Since this is a temporary situation which can't be
+ automatically detected by the HDF5 library, we must rely
+ on the application to notify the HDF5 library whether or not it
+ must satisfy Condition B. (Even if we don't rely on the
+ application, at some point someone is going to remove the
+ Condition B constraint from the library.) So the module that
+ handles Condition B is conditionally compiled and then enabled
+ on a per-file basis.
+
+ <p>If the application desires to produce an HDF4 file (determined
+ by arguments to <code>H5Fopen</code>), and the Condition B
+ module is compiled into the library, then <code>H5Fclose</code>
+ calls the module to traverse the HDF5 wrapper and generate an
+ additional internal or external HDF4 wrapper (wrapper specifics
+ are described below). If Condition B is implemented as a module
+ then it can benefit from the metadata already cached by the main
+ library.
+
+ <p>An internal HDF4 wrapper would be used if the HDF5 file is
+ writable and the user doesn't mind that the HDF5 file is
+ modified. An external wrapper would be used if the file isn't
+ writable or if the user wants the data file to be primarily HDF5
+ but a few applications need an HDF4 view of the data.
+
+ <p>Modifying through the HDF5 library an HDF5 file that has
+ internal HDF4 wrapper should invalidate the HDF4 wrapper (and
+ optionally regenerate it when <code>H5Fclose</code> is
+ called). The HDF5 library must understand how wrappers work, but
+ not necessarily anything about the HDF4 file format.
+
+ <p>Modifying through the HDF5 library an HDF5 file that has an
+ external HDF4 wrapper will cause the HDF4 wrapper to become out
+ of date (but possibly regenerated during <code>H5Fclose</code>).
+ <b>Note: Perhaps the next release of the HDF4 library should
+ insure that the HDF4 wrapper file has a more recent modification
+ time than the raw data file (the HDF5 file) to which it
+ points(?)</b>
+
+ <p>Modifying through the HDF4 library an HDF5 file that has an
+ internal or external HDF4 wrapper will cause the HDF5 wrapper to
+ become out of date. However, there is now way for the old HDF4
+ library to notify the HDF5 wrapper that it's out of date.
+ Therefore the HDF5 library must be able to detect when the HDF5
+ wrapper is out of date and be able to fix it. If the HDF4
+ wrapper is complete then the easy way is to ignore the original
+ HDF5 wrapper and generate a new one from the HDF4 wrapper. The
+ other approach is to compare the HDF4 and HDF5 wrappers and
+ assume that if they differ HDF4 is the right one, if HDF4 omits
+ data then it was because HDF4 is a partial wrapper (rather than
+ assume HDF4 deleted the data), and if HDF4 has new data then
+ copy the new meta data to the HDF5 wrapper. On the other hand,
+ perhaps we don't need to allow these situations (modifying an
+ HDF5 file with the old HDF4 library and then accessing it with
+ the HDF5 library is either disallowed or causes HDF5 objects
+ that can't be described by HDF4 to be lost).
+
+ <p>To convert an HDF5 file to an HDF4 file on demand, one simply
+ opens the file with the HDF4 flag and closes it. This is also
+ how AIO implemented backward compatability with PDB in its file
+ format.
+
+ <hr>
+ <h2>Implementation of Condition C</h2>
+
+ <p>This condition must be satisfied for all time because there
+ will always be archived HDF4 files. If a pure HDF4 file (that
+ is, one without HDF5 meta data) is opened with an HDF5 library,
+ the <code>H5Fopen</code> builds an internal or external HDF5
+ wrapper and then accesses the raw data through that wrapper. If
+ the HDF5 library modifies the file then the HDF4 wrapper becomes
+ out of date. However, since the HDF5 library hasn't been
+ released, we can at least implement it to disable and/or reclaim
+ the HDF4 wrapper.
+
+ <p>If an external and temporary HDF5 wrapper is desired, the
+ wrapper is created through the cache like all other HDF5 files.
+ The data appears on disk only if a particular cached datum is
+ preempted. Instead of calling <code>H5Fclose</code> on the HDF5
+ wrapper file we call <code>H5Fabort</code> which immediately
+ releases all file resources without updating the file, and then
+ we unlink the file from Unix.
+
+ <hr>
+ <h2>What do wrappers look like?</h2>
+
+ <p>External wrappers are quite obvious: they contain only things
+ from the format specs for the wrapper and nothing from the
+ format specs of the format which they wrap.
+
+ <p>An internal HDF4 wrapper is added to an HDF5 file in such a way
+ that the file appears to be both an HDF4 file and an HDF5
+ file. HDF4 requires an HDF4 file header at file offset zero. If
+ a user block is present then we just move the user block down a
+ bit (and truncate it) and insert the minimum HDF4 signature.
+ The HDF4 <code>dd</code> list and any other data it needs are
+ appended to the end of the file and the HDF5 signature uses the
+ logical file length field to determine the beginning of the
+ trailing part of the wrapper.
+
+ <p>
+ <center>
+ <table border width="60%">
+ <tr>
+ <td>HDF4 minimal file header. Its main job is to point to
+ the <code>dd</code> list at the end of the file.</td>
+ </tr>
+ <tr>
+ <td>User-defined block which is truncated by the size of the
+ HDF4 file header so that the HDF5 boot block file address
+ doesn't change.</td>
+ </tr>
+ <tr>
+ <td>The HDF5 boot block and data, unmodified by adding the
+ HDF4 wrapper.</td>
+ </tr>
+ <tr>
+ <td>The main part of the HDF4 wrapper. The <code>dd</code>
+ list will have entries for all parts of the file so
+ hdpack(?) doesn't (re)move anything.</td>
+ </tr>
+ </table>
+ </center>
+
+ <p>When such a file is opened by the HDF5 library for
+ modification it shifts the user block back down to address zero
+ and fills with zeros, then truncates the file at the end of the
+ HDF5 data or adds the trailing HDF4 wrapper to the free
+ list. This prevents HDF4 applications from reading the file with
+ an out of date wrapper.
+
+ <p>If there is no user block then we have a problem. The HDF5
+ boot block must be moved to make room for the HDF4 file header.
+ But moving just the boot block causes problems because all file
+ addresses stored in the file are relative to the boot block
+ address. The only option is to shift the entire file contents
+ by 512 bytes to open up a user block (too bad we don't have
+ hooks into the Unix i-node stuff so we could shift the entire
+ file contents by the size of a file system page without ever
+ performing I/O on the file :-)
+
+ <p>Is it possible to place an HDF5 wrapper in an HDF4 file? I
+ don't know enough about the HDF4 format, but I would suspect it
+ might be possible to open a hole at file address 512 (and
+ possibly before) by moving some things to the end of the file
+ to make room for the HDF5 signature. The remainder of the HDF5
+ wrapper goes at the end of the file and entries are added to the
+ HDF4 <code>dd</code> list to mark the location(s) of the HDF5
+ wrapper.
+
+ <hr>
+ <h2>Other Thoughts</h2>
+
+ <p>Conversion programs that copy an entire HDF4 file to a separate,
+ self-contained HDF5 file and vice versa might be useful.
+
+
+
+
+ <hr>
+ <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Fri Oct 3 11:52:31 EST 1997 -->
+<!-- hhmts start -->
+Last modified: Wed Oct 8 12:34:42 EST 1997
+<!-- hhmts end -->
+ </body>
+</html>
diff --git a/doc/html/TechNotes/HeapMgmt.html b/doc/html/TechNotes/HeapMgmt.html
new file mode 100644
index 0000000..ebf58b2
--- /dev/null
+++ b/doc/html/TechNotes/HeapMgmt.html
@@ -0,0 +1,79 @@
+<html>
+<body>
+
+<h1>Heap Management in HDF5</h1>
+
+<pre>
+
+Heap functions are in the H5H package.
+
+
+off_t
+H5H_new (hdf5_file_t *f, size_t size_hint, size_t realloc_hint);
+
+ Creates a new heap in the specified file which can efficiently
+ store at least SIZE_HINT bytes. The heap can store more than
+ that, but doing so may cause the heap to become less efficient
+ (for instance, a heap implemented as a B-tree might become
+ discontigous). The REALLOC_HINT is the minimum number of bytes
+ by which the heap will grow when it must be resized. The hints
+ may be zero in which case reasonable (but probably not
+ optimal) values will be chosen.
+
+ The return value is the address of the new heap relative to
+ the beginning of the file boot block.
+
+off_t
+H5H_insert (hdf5_file_t *f, off_t addr, size_t size, const void *buf);
+
+ Copies SIZE bytes of data from BUF into the heap whose address
+ is ADDR in file F. BUF must be the _entire_ heap object. The
+ return value is the byte offset of the new data in the heap.
+
+void *
+H5H_read (hdf5_file_t *f, off_t addr, off_t offset, size_t size, void *buf);
+
+ Copies SIZE bytes of data from the heap whose address is ADDR
+ in file F into BUF and then returns the address of BUF. If
+ BUF is the null pointer then a new buffer will be malloc'd by
+ this function and its address is returned.
+
+ Returns buffer address or null.
+
+const void *
+H5H_peek (hdf5_file_t *f, off_t addr, off_t offset)
+
+ A more efficient version of H5H_read that returns a pointer
+ directly into the cache; the data is not copied from the cache
+ to a buffer. The pointer is valid until the next call to an
+ H5AC function directly or indirectly.
+
+ Returns a pointer or null. Do not free the pointer.
+
+void *
+H5H_write (hdf5_file_t *f, off_t addr, off_t offset, size_t size,
+ const void *buf);
+
+ Modifies (part of) an object in the heap at address ADDR of
+ file F by copying SIZE bytes from the beginning of BUF to the
+ file. OFFSET is the address withing the heap where the output
+ is to occur.
+
+ This function can fail if the combination of OFFSET and SIZE
+ would write over a boundary between two heap objects.
+
+herr_t
+H5H_remove (hdf5_file_t *f, off_t addr, off_t offset, size_t size);
+
+ Removes an object or part of an object which begins at byte
+ OFFSET within a heap whose address is ADDR in file F. SIZE
+ bytes are returned to the free list. Removing the middle of
+ an object has the side effect that one object is now split
+ into two objects.
+
+ Returns success or failure.
+
+</pre>
+
+</body>
+</html>
diff --git a/doc/html/TechNotes/IOPipe.html b/doc/html/TechNotes/IOPipe.html
new file mode 100644
index 0000000..7c24e2c
--- /dev/null
+++ b/doc/html/TechNotes/IOPipe.html
@@ -0,0 +1,114 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <title>The Raw Data I/O Pipeline</title>
+ </head>
+
+ <body>
+ <h1>The Raw Data I/O Pipeline</h1>
+
+ <p>The HDF5 raw data pipeline is a complicated beast that handles
+ all aspects of raw data storage and transfer of that data
+ between the file and the application. Data can be stored
+ contiguously (internal or external), in variable size external
+ segments, or regularly chunked; it can be sparse, extendible,
+ and/or compressible. Data transfers must be able to convert from
+ one data space to another, convert from one number type to
+ another, and perform partial I/O operations. Furthermore,
+ applications will expect their common usage of the pipeline to
+ perform well.
+
+ <p>To accomplish these goals, the pipeline has been designed in a
+ modular way so no single subroutine is overly complicated and so
+ functionality can be inserted easily at the appropriate
+ locations in the pipeline. A general pipeline was developed and
+ then certain paths through the pipeline were optimized for
+ performance.
+
+ <p>We describe only the file-to-memory side of the pipeline since
+ the memory-to-file side is a mirror image. We also assume that a
+ proper hyperslab of a simple data space is being read from the
+ file into a proper hyperslab of a simple data space in memory,
+ and that the data type is a compound type which may require
+ various number conversions on its members.
+
+ <img alt="Figure 1" src="pipe1.gif">
+
+ <p>The diagrams should be read from the top down. The Line A
+ in the figure above shows that <code>H5Dread()</code> copies
+ data from a hyperslab of a file dataset to a hyperslab of an
+ application buffer by calling <code>H5D_read()</code>. And
+ <code>H5D_read()</code> calls, in a loop,
+ <code>H5S_simp_fgath()</code>, <code>H5T_conv_struct()</code>,
+ and <code>H5S_simp_mscat()</code>. A temporary buffer, TCONV, is
+ loaded with data points from the file, then data type conversion
+ is performed on the temporary buffer, and finally data points
+ are scattered out to application memory. Thus, data type
+ conversion is an in-place operation and data space conversion
+ consists of two steps. An additional temporary buffer, BKG, is
+ large enough to hold <em>N</em> instances of the destination
+ data type where <em>N</em> is the same number of data points
+ that can be held by the TCONV buffer (which is large enough to
+ hold either source or destination data points).
+
+ <p>The application sets an upper limit for the size of the TCONV
+ buffer and optionally supplies a buffer. If no buffer is
+ supplied then one will be created by calling
+ <code>malloc()</code> when the pipeline is executed (when
+ necessary) and freed when the pipeline exits. The size of the
+ BKG buffer depends on the size of the TCONV buffer and if the
+ application supplies a BKG buffer it should be at least as large
+ as the TCONV buffer. The default size for these buffers is one
+ megabyte but the buffer might not be used to full capacity if
+ the buffer size is not an integer multiple of the source or
+ destination data point size (whichever is larger, but only
+ destination for the BKG buffer).
+
+
+
+ <p>Occassionally the destination data points will be partially
+ initialized and the <code>H5Dread()</code> operation should not
+ clobber those values. For instance, the destination type might
+ be a struct with members <code>a</code> and <code>b</code> where
+ <code>a</code> is already initialized and we're reading
+ <code>b</code> from the file. An extra line, G, is added to the
+ pipeline to provide the type conversion functions with the
+ existing data.
+
+ <img alt="Figure 2" src="pipe2.gif">
+
+ <p>It will most likely be quite common that no data type
+ conversion is necessary. In such cases a temporary buffer for
+ data type conversion is not needed and data space conversion
+ can happen in a single step. In fact, when the source and
+ destination data are both contiguous (they aren't in the
+ picture) the loop degenerates to a single iteration.
+
+
+ <img alt="Figure 3" src="pipe3.gif">
+
+ <p>So far we've looked only at internal contiguous storage, but by
+ replacing Line B in Figures 1 and 2 and Line A in Figure 3 with
+ Figure 4 the pipeline is able to handle regularly chunked
+ objects. Line B of Figure 4 is executed once for each chunk
+ which contains data to be read and the chunk address is found by
+ looking at a multi-dimensional key in a chunk B-tree which has
+ one entry per chunk.
+
+ <img alt="Figure 4" src="pipe4.gif">
+
+ <p>If a single chunk is requested and the destination buffer is
+ the same size/shape as the chunk, then the CHUNK buffer is
+ bypassed and the destination buffer is used instead as shown in
+ Figure 5.
+
+ <img alt="Figure 5" src="pipe5.gif">
+
+ <hr>
+ <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Tue Mar 17 11:13:35 EST 1998 -->
+<!-- hhmts start -->
+Last modified: Wed Mar 18 10:38:30 EST 1998
+<!-- hhmts end -->
+ </body>
+</html>
diff --git a/doc/html/TechNotes/LibMaint.html b/doc/html/TechNotes/LibMaint.html
new file mode 100644
index 0000000..5e6b222
--- /dev/null
+++ b/doc/html/TechNotes/LibMaint.html
@@ -0,0 +1,122 @@
+<html>
+<body>
+
+
+<h1>Information for HDF5 Maintainers</h1>
+
+<pre>
+
+* You can run make from any directory. However, running in a
+ subdirectory only knows how to build things in that directory and
+ below. However, all makefiles know when their target depends on
+ something outside the local directory tree:
+
+ $ cd test
+ $ make
+ make: *** No rule to make target ../src/libhdf5.a
+
+* All Makefiles understand the following targets:
+
+ all -- build locally.
+ install -- install libs, headers, progs.
+ uninstall -- remove installed files.
+ mostlyclean -- remove temp files (eg, *.o but not *.a).
+ clean -- mostlyclean plus libs and progs.
+ distclean -- all non-distributed files.
+ maintainer-clean -- all derived files but H5config.h.in and configure.
+
+* Most Makefiles also understand:
+
+ TAGS -- build a tags table
+ dep, depend -- recalculate source dependencies
+ lib -- build just the libraries w/o programs
+
+* If you have personal preferences for which make, compiler, compiler
+ flags, preprocessor flags, etc., that you use and you don't want to
+ set environment variables, then use a site configuration file.
+
+ When configure starts, it looks in the config directory for files
+ whose name is some combination of the CPU name, vendor, and
+ operating system in this order:
+
+ CPU-VENDOR-OS
+ VENDOR-OS
+ CPU-VENDOR
+ OS
+ VENDOR
+ CPU
+
+ The first file which is found is sourced and can therefore affect
+ the behavior of the rest of configure. See config/BlankForm for the
+ template.
+
+* If you use GNU make along with gcc the Makefile will contain targets
+ that automatically maintain a list of source interdependencies; you
+ seldom have to say `make clean'. I say `seldom' because if you
+ change how one `*.h' file includes other `*.h' files you'll have
+ to force an update.
+
+ To force an update of all dependency information remove the
+ `.depend' file from each directory and type `make'. For
+ instance:
+
+ $ cd $HDF5_HOME
+ $ find . -name .depend -exec rm {} \;
+ $ make
+
+ If you're not using GNU make and gcc then dependencies come from
+ ".distdep" files in each directory. Those files are generated on
+ GNU systems and inserted into the Makefile's by running
+ config.status (which happens near the end of configure).
+
+* If you use GNU make along with gcc then the Perl script `trace' is
+ run just before dependencies are calculated to update any H5TRACE()
+ calls that might appear in the file. Otherwise, after changing the
+ type of a function (return type or argument types) one should run
+ `trace' manually on those source files (e.g., ../bin/trace *.c).
+
+* Object files stay in the directory and are added to the library as a
+ final step instead of placing the file in the library immediately
+ and removing it from the directory. The reason is three-fold:
+
+ 1. Most versions of make don't allow `$(LIB)($(SRC:.c=.o))'
+ which makes it necessary to have two lists of files, one
+ that ends with `.c' and the other that has the library
+ name wrapped around each `.o' file.
+
+ 2. Some versions of make/ar have problems with modification
+ times of archive members.
+
+ 3. Adding object files immediately causes problems on SMP
+ machines where make is doing more than one thing at a
+ time.
+
+* When using GNU make on an SMP you can cause it to compile more than
+ one thing at a time. At the top of the source tree invoke make as
+
+ $ make -j -l6
+
+ which causes make to fork as many children as possible as long as
+ the load average doesn't go above 6. In subdirectories one can say
+
+ $ make -j2
+
+ which limits the number of children to two (this doesn't work at the
+ top level because the `-j2' is not passed to recursive makes).
+
+* To create a release tarball go to the top-level directory and run
+ ./bin/release. You can optionally supply one or more of the words
+ `tar', `gzip', `bzip2' or `compress' on the command line. The
+ result will be a (compressed) tar file(s) in the `releases'
+ directory. The README file is updated to contain the release date
+ and version number.
+
+* To create a tarball of all the files which are part of HDF5 go to
+ the top-level directory and type:
+
+ tar cvf foo.tar `grep '^\.' MANIFEST |unexpand |cut -f1`
+
+</pre>
+
+</body>
+</html>
diff --git a/doc/html/TechNotes/MemoryMgmt.html b/doc/html/TechNotes/MemoryMgmt.html
new file mode 100644
index 0000000..93782b5
--- /dev/null
+++ b/doc/html/TechNotes/MemoryMgmt.html
@@ -0,0 +1,510 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <title>Memory Management in HDF5</title>
+ </head>
+
+ <body>
+ <h1>Memory Management in HDF5</h1>
+
+ <!-- ---------------------------------------------------------------- -->
+ <h2>Is a Memory Manager Necessary?</h2>
+
+ <p>Some form of memory management may be necessary in HDF5 when
+ the various deletion operators are implemented so that the
+ file memory is not permanently orphaned. However, since an
+ HDF5 file was designed with persistent data in mind, the
+ importance of a memory manager is questionable.
+
+ <p>On the other hand, when certain meta data containers (file glue)
+ grow, they may need to be relocated in order to keep the
+ container contiguous.
+
+ <blockquote>
+ <b>Example:</b> An object header consists of up to two
+ chunks of contiguous memory. The first chunk is a fixed
+ size at a fixed location when the header link count is
+ greater than one. Thus, inserting additional items into an
+ object header may require the second chunk to expand. When
+ this occurs, the second chunk may need to move to another
+ location in the file, freeing the file memory which that
+ chunk originally occupied.
+ </blockquote>
+
+ <p>The relocation of meta data containers could potentially
+ orphan a significant amount of file memory if the application
+ has made poor estimates for preallocation sizes.
+
+ <!-- ---------------------------------------------------------------- -->
+ <h2>Levels of Memory Management</h2>
+
+ <p>Memory management by the library can be independent of memory
+ management support by the file format. The file format can
+ support no memory management, some memory management, or full
+ memory management. Similarly with the library.
+
+ <h3>Support in the Library</h3>
+
+ <dl>
+ <dt><b>No Support: I</b>
+ <dd>When memory is deallocated it simply becomes unreferenced
+ (orphaned) in the file. Memory allocation requests are
+ satisfied by extending the file.
+
+ <dd>A separate off-line utility can be used to detect the
+ unreferenced bytes of a file and "bubble" them up to the end
+ of the file and then truncate the file.
+
+ <dt><b>Some Support: II</b>
+ <dd>The library could support partial memory management all
+ the time, or full memory management some of the time.
+ Orphaning free blocks instead of adding them to a free list
+ should not affect the file integrity, nor should fulfilling
+ new requests by extending the file instead of using the free
+ list.
+
+ <dt><b>Full Support: III</b>
+ <dd>The library supports space-efficient memory management by
+ always fulfilling allocation requests from the free list when
+ possible, and by coalescing adjacent free blocks into a
+ single larger free block.
+ </dl>
+
+ <h3>Support in the File Format</h3>
+
+ <dl>
+ <dt><b>No Support: A</b>
+ <dd>The file format does not support memory management; any
+ unreferenced block in the file is assumed to be free. If
+ the library supports full memory management then it will
+ have to traverse the entire file to determine which blocks
+ are unreferenced.
+
+ <dt><b>Some Support: B</b>
+ <dd>Assuming that unreferenced blocks are free can be
+ dangerous in a situation where the file is not consistent.
+ For instance, if a directory tree becomes detached from the
+ main directory hierarchy, then the detached directory and
+ everything that is referenced only through the detached
+ directory become unreferenced. File repair utilities will
+ be unable to determine which unreferenced blocks need to be
+ linked back into the file hierarchy.
+
+ <dd>Therefore, it might be useful to keep an unsorted,
+ doubly-linked list of free blocks in the file. The library
+ can add and remove blocks from the list in constant time,
+ and can generate its own internal free-block data structure
+ in time proportional to the number of free blocks instead of
+ the size of the file. Additionally, a library can use a
+ subset of the free blocks, an alternative which is not
+ feasible if the file format doesn't support any form of
+ memory management.
+
+ <dt><b>Full Support: C</b>
+ <dd>The file format can mirror library data structures for
+ space-efficient memory management. The free blocks are
+ linked in unsorted, doubly-linked lists with one list per
+ free block size. The heads of the lists are pointed to by a
+ B-tree whose nodes are sorted by free block size. At the
+ same time, all free blocks are the leaf nodes of another
+ B-tree sorted by starting and ending address. When the
+ trees are used in combination we can deallocate and allocate
+ memory in O(log <em>N</em>) time where <em>N</em> is the
+ number of free blocks.
+ </dl>
+
+ <h3>Combinations of Library and File Format Support</h3>
+
+ <p>We now evaluate each combination of library support with file
+ support:
+
+ <dl>
+ <dt><b>I-A</b>
+ <dd>If neither the library nor the file support memory
+ management, then each allocation request will come from the
+ end of the file and each deallocation request is a no-op
+ that simply leaves the free block unreferenced.
+
+ <ul>
+ <li>Advantages
+ <ul>
+ <li>No file overhead for allocation or deallocation.
+ <li>No library overhead for allocation or
+ deallocation.
+ <li>No file traversal required at time of open.
+ <li>No data needs to be written back to the file when
+ it's closed.
+ <li>Trivial to implement (already implemented).
+ </ul>
+
+ <li>Disadvantages
+ <ul>
+ <li>Inefficient use of file space.
+ <li>A file repair utility must reclaim lost file space.
+ <li>Difficulties for file repair utilities. (Is an
+ unreferenced block a free block or orphaned data?)
+ </ul>
+ </ul>
+
+ <dt><b>II-A</b>
+ <dd>In order for the library to support memory management, it
+ will be required to build the internal free block
+ representation by traversing the entire file looking for
+ unreferenced blocks.
+
+ <ul>
+ <li>Advantages
+ <ul>
+ <li>No file overhead for allocation or deallocation.
+ <li>Variable amount of library overhead for allocation
+ and deallocation depending on how much work the
+ library wants to do.
+ <li>No data needs to be written back to the file when
+ it's closed.
+ <li>Might use file space efficiently.
+ </ul>
+ <li>Disadvantages
+ <ul>
+ <li>Might use file space inefficiently.
+ <li>File traversal required at time of open.
+ <li>A file repair utility must reclaim lost file space.
+ <li>Difficulties for file repair utilities.
+ <li>Sharing of the free list between processes falls
+ outside the HDF5 file format documentation.
+ </ul>
+ </ul>
+
+ <dt><b>III-A</b>
+ <dd>In order for the library to support full memory
+ management, it will be required to build the internal free
+ block representation by traversing the entire file looking
+ for unreferenced blocks.
+
+ <ul>
+ <li>Advantages
+ <ul>
+ <li>No file overhead for allocation or deallocation.
+ <li>Efficient use of file space.
+ <li>No data needs to be written back to the file when
+ it's closed.
+ </ul>
+ <li>Disadvantages
+ <ul>
+ <li>Moderate amount of library overhead for allocation
+ and deallocation.
+ <li>File traversal required at time of open.
+ <li>A file repair utility must reclaim lost file space.
+ <li>Difficulties for file repair utilities.
+ <li>Sharing of the free list between processes falls
+ outside the HDF5 file format documentation.
+ </ul>
+ </ul>
+
+ <dt><b>I-B</b>
+ <dd>If the library doesn't support memory management but the
+ file format supports some level of management, then a file
+ repair utility will have to be run occasionally to reclaim
+ unreferenced blocks.
+
+ <ul>
+ <li>Advantages
+ <ul>
+ <li>No file overhead for allocation or deallocation.
+ <li>No library overhead for allocation or
+ deallocation.
+ <li>No file traversal required at time of open.
+ <li>No data needs to be written back to the file when
+ it's closed.
+ </ul>
+ <li>Disadvantages
+ <ul>
+ <li>A file repair utility must reclaim lost file space.
+ <li>Difficulties for file repair utilities.
+ </ul>
+ </ul>
+
+ <dt><b>II-B</b>
+ <dd>Both the library and the file format support some level
+ of memory management.
+
+ <ul>
+ <li>Advantages
+ <ul>
+ <li>Constant file overhead per allocation or
+ deallocation.
+ <li>Variable library overhead per allocation or
+ deallocation depending on how much work the library
+ wants to do.
+ <li>Traversal at file open time is on the order of the
+ free list size instead of the file size.
+ <li>The library has the option of reading only part of
+ the free list.
+ <li>No data needs to be written at file close time if
+ it has been amortized into the cost of allocation
+ and deallocation.
+ <li>File repair utilties don't have to be run to
+ reclaim memory.
+ <li>File repair utilities can detect whether an
+ unreferenced block is a free block or orphaned data.
+ <li>Sharing of the free list between processes might
+ be easier.
+ <li>Possible efficient use of file space.
+ </ul>
+ <li>Disadvantages
+ <ul>
+ <li>Possible inefficient use of file space.
+ </ul>
+ </ul>
+
+ <dt><b>III-B</b>
+ <dd>The library provides space-efficient memory management but
+ the file format only supports an unsorted list of free
+ blocks.
+
+ <ul>
+ <li>Advantages
+ <ul>
+ <li>Constant time file overhead per allocation or
+ deallocation.
+ <li>No data needs to be written at file close time if
+ it has been amortized into the cost of allocation
+ and deallocation.
+ <li>File repair utilities don't have to be run to
+ reclaim memory.
+ <li>File repair utilities can detect whether an
+ unreferenced block is a free block or orphaned data.
+ <li>Sharing of the free list between processes might
+ be easier.
+ <li>Efficient use of file space.
+ </ul>
+ <li>Disadvantages
+ <ul>
+ <li>O(log <em>N</em>) library overhead per allocation or
+ deallocation where <em>N</em> is the total number of
+ free blocks.
+ <li>O(<em>N</em>) time to open a file since the entire
+ free list must be read to construct the in-core
+ trees used by the library.
+ <li>Library is more complicated.
+ </ul>
+ </ul>
+
+ <dt><b>I-C</b>
+ <dd>This has the same advantages and disadvantages as I-C with
+ the added disadvantage that the file format is much more
+ complicated.
+
+ <dt><b>II-C</b>
+ <dd>If the library only provides partial memory management but
+ the file requires full memory management, then this method
+ degenerates to the same as II-A with the added disadvantage
+ that the file format is much more complicated.
+
+ <dt><b>III-C</b>
+ <dd>The library and file format both provide complete data
+ structures for space-efficient memory management.
+
+ <ul>
+ <li>Advantages
+ <ul>
+ <li>Files can be opened in constant time since the
+ free list is read on demand and amortised into the
+ allocation and deallocation requests.
+ <li>No data needs to be written back to the file when
+ it's closed.
+ <li>File repair utilities don't have to be run to
+ reclaim memory.
+ <li>File repair utilities can detect whether an
+ unreferenced block is a free block or orphaned data.
+ <li>Sharing the free list between processes is easy.
+ <li>Efficient use of file space.
+ </ul>
+ <li>Disadvantages
+ <ul>
+ <li>O(log <em>N</em>) file allocation and deallocation
+ cost where <em>N</em> is the total number of free
+ blocks.
+ <li>O(log <em>N</em>) library allocation and
+ deallocation cost.
+ <li>Much more complicated file format.
+ <li>More complicated library.
+ </ul>
+ </ul>
+
+ </dl>
+
+ <!-- ---------------------------------------------------------------- -->
+ <h2>The Algorithm for II-B</h2>
+
+ <p>The file contains an unsorted, doubly-linked list of free
+ blocks. The address of the head of the list appears in the
+ boot block. Each free block contains the following fields:
+
+ <center>
+ <table border cellpadding=4 width="60%">
+ <tr align=center>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+
+ <tr align=center>
+ <th colspan=4>Free Block Signature</th>
+
+ <tr align=center>
+ <th colspan=4>Total Free Block Size</th>
+
+ <tr align=center>
+ <th colspan=4>Address of Left Sibling</th>
+
+ <tr align=center>
+ <th colspan=4>Address of Right Sibling</th>
+
+ <tr align=center>
+ <th colspan=4><br><br>Remainder of Free Block<br><br><br></th>
+ </table>
+ </center>
+
+ <p>The library reads as much of the free list as convenient when
+ convenient and pushes those entries onto stacks. This can
+ occur when a file is opened or any time during the life of the
+ file. There is one stack for each free block size and the
+ stacks are sorted by size in a balanced tree in memory.
+
+ <p>Deallocation involves finding the correct stack or creating
+ a new one (an O(log <em>K</em>) operation where <em>K</em> is
+ the number of stacks), pushing the free block info onto the
+ stack (a constant-time operation), and inserting the free
+ block into the file free block list (a constant-time operation
+ which doesn't necessarily involve any I/O since the free blocks
+ can be cached like other objects). No attempt is made to
+ coalesce adjacent free blocks into larger blocks.
+
+ <p>Allocation involves finding the correct stack (an O(log
+ <em>K</em>) operation), removing the top item from the stack
+ (a constant-time operation), and removing the block from the
+ file free block list (a constant-time operation). If there is
+ no free block of the requested size or larger, then the file
+ is extended.
+
+ <p>To provide sharability of the free list between processes,
+ the last step of an allocation will check for the free block
+ signature and if it doesn't find one will repeat the process.
+ Alternatively, a process can temporarily remove free blocks
+ from the file and hold them in it's own private pool.
+
+ <p>To summarize...
+ <dl>
+ <dt>File opening
+ <dd>O(<em>N</em>) amortized over the time the file is open,
+ where <em>N</em> is the number of free blocks. The library
+ can still function without reading any of the file free
+ block list.
+
+ <dt>Deallocation
+ <dd>O(log <em>K</em>) where <em>K</em> is the number of unique
+ sizes of free blocks. File access is constant.
+
+ <dt>Allocation
+ <dd>O(log <em>K</em>). File access is constant.
+
+ <dt>File closing
+ <dd>O(1) even if the library temporarily removes free
+ blocks from the file to hold them in a private pool since
+ the pool can still be a linked list on disk.
+ </dl>
+
+ <!-- ---------------------------------------------------------------- -->
+ <h2>The Algorithm for III-C</h2>
+
+ <p>The HDF5 file format supports a general B-tree mechanism
+ for storing data with keys. If we use a B-tree to represent
+ all parts of the file that are free and the B-tree is indexed
+ so that a free file chunk can be found if we know the starting
+ or ending address, then we can efficiently determine whether a
+ free chunk begins or ends at the specified address. Call this
+ the <em>Address B-Tree</em>.
+
+ <p>If a second B-tree points to a set of stacks where the
+ members of a particular stack are all free chunks of the same
+ size, and the tree is indexed by chunk size, then we can
+ efficiently find the best-fit chunk size for a memory request.
+ Call this the <em>Size B-Tree</em>.
+
+ <p>All free blocks of a particular size can be linked together
+ with an unsorted, doubly-linked, circular list and the left
+ and right sibling addresses can be stored within the free
+ chunk, allowing us to remove or insert items from the list in
+ constant time.
+
+ <p>Deallocation of a block fo file memory consists of:
+
+ <ol type="I">
+ <li>Add the new free block whose address is <em>ADDR</em> to the
+ address B-tree.
+
+ <ol type="A">
+ <li>If the address B-tree contains an entry for a free
+ block that ends at <em>ADDR</em>-1 then remove that
+ block from the B-tree and from the linked list (if the
+ block was the first on the list then the size B-tree
+ must be updated). Adjust the size and address of the
+ block being freed to include the block just removed from
+ the free list. The time required to search for and
+ possibly remove the left block is O(log <em>N</em>)
+ where <em>N</em> is the number of free blocks.
+
+ <li>If the address B-tree contains an entry for the free
+ block that begins at <em>ADDR</em>+<em>LENGTH</em> then
+ remove that block from the B-tree and from the linked
+ list (if the block was the first on the list then the
+ size B-tree must be updated). Adjust the size of the
+ block being freed to include the block just removed from
+ the free list. The time required to search for and
+ possibly remove the right block is O(log <em>N</em>).
+
+ <li>Add the new (adjusted) block to the address B-tree.
+ The time for this operation is O(log <em>N</em>).
+ </ol>
+
+ <li>Add the new block to the size B-tree and linked list.
+
+ <ol type="A">
+ <li>If the size B-tree has an entry for this particular
+ size, then add the chunk to the tail of the list. This
+ is an O(log <em>K</em>) operation where <em>K</em> is
+ the number of unique free block sizes.
+
+ <li>Otherwise make a new entry in the B-tree for chunks of
+ this size. This is also O(log <em>K</em>).
+ </ol>
+ </ol>
+
+ <p>Allocation is similar to deallocation.
+
+ <p>To summarize...
+
+ <dl>
+ <dt>File opening
+ <dd>O(1)
+
+ <dt>Deallocation
+ <dd>O(log <em>N</em>) where <em>N</em> is the total number of
+ free blocks. File access time is O(log <em>N</em>).
+
+ <dt>Allocation
+ <dd>O(log <em>N</em>). File access time is O(log <em>N</em>).
+
+ <dt>File closing
+ <dd>O(1).
+ </dl>
+
+
+ <hr>
+ <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Thu Jul 24 15:16:40 PDT 1997 -->
+<!-- hhmts start -->
+Last modified: Thu Jul 31 14:41:01 EST
+<!-- hhmts end -->
+ </body>
+</html>
diff --git a/doc/html/TechNotes/MoveDStruct.html b/doc/html/TechNotes/MoveDStruct.html
new file mode 100644
index 0000000..4576bd2
--- /dev/null
+++ b/doc/html/TechNotes/MoveDStruct.html
@@ -0,0 +1,66 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <title>Relocating a File Data Structure</title>
+ </head>
+
+ <body>
+ <h1>Relocating a File Data Structure</h1>
+
+ <p>Since file data structures can be cached in memory by the H5AC
+ package it becomes problematic to move such a data structure in
+ the file. One cannot just copy a portion of the file from one
+ location to another because:
+
+ <ol>
+ <li>the file might not contain the latest information, and</li>
+ <li>the H5AC package might not realize that the object's
+ address has changed and attempt to write the object to disk
+ at the old address.</li>
+ </ol>
+
+ <p>Here's a correct method to move data from one location to
+ another. The example code assumes that one is moving a B-link
+ tree node from <code>old_addr</code> to <code>new_addr</code>.
+
+ <ol>
+ <li>Make sure the disk is up-to-date with respect to the
+ cache. There is no need to remove the item from the cache,
+ hence the final argument to <code>H5AC_flush</code> is
+ <code>FALSE</code>.
+ <br><br>
+ <code>
+ H5AC_flush (f, H5AC_BT, old_addr, FALSE);<br>
+ </code>
+ <br>
+ </li>
+
+ <li>Read the data from the old address and write it to the new
+ address.
+ <br><br>
+ <code>
+ H5F_block_read (f, old_addr, size, buf);<br>
+ H5F_block_write (f, new_addr, size, buf);<br>
+ </code>
+ <br>
+ </li>
+
+ <li>Notify the cache that the address of the object changed.
+ <br><br>
+ <code>
+ H5AC_rename (f, H5AC_BT, old_addr, new_addr);<br>
+ </code>
+ <br>
+ </li>
+ </ol>
+
+
+
+ <hr>
+ <address><a href="mailto:robb@maya.nuance.com">Robb Matzke</a></address>
+<!-- Created: Mon Jul 14 15:09:06 EST 1997 -->
+<!-- hhmts start -->
+Last modified: Mon Jul 14 15:38:29 EST
+<!-- hhmts end -->
+ </body>
+</html>
diff --git a/doc/html/TechNotes/NamingScheme.html b/doc/html/TechNotes/NamingScheme.html
new file mode 100644
index 0000000..dbf55bf
--- /dev/null
+++ b/doc/html/TechNotes/NamingScheme.html
@@ -0,0 +1,300 @@
+<HTML>
+<HEAD><TITLE>
+ HDF5 Naming Scheme
+ </TITLE> </HEAD>
+
+<BODY bgcolor="#ffffff">
+
+
+<H1>
+<FONT color="#c80028"
+ <I> <B> <CENTER> HDF5 Naming Scheme for </CENTER> </B> </I> </H1>
+</FONT>
+<P>
+<UL>
+
+<LI> <A HREF = "#01"><I> FILES </I> </A>
+<LI> <A HREF = "#02"><I> PACKAGES </I> </A>
+<LI> <A HREF = "#03"><I> PUBLIC vs PRIVATE </I> </A>
+<LI> <A HREF = "#04"><I> INTEGRAL TYPES </I> </A>
+<LI> <A HREF = "#05"><I> OTHER TYPES </I> </A>
+<LI> <A HREF = "#06"><I> GLOBAL VARIABLES </I> </A>
+<LI> <A HREF = "#07"><I> MACROS, PREPROCESSOR CONSTANTS, AND ENUM MEMEBERs </I> </A>
+
+</UL>
+<P>
+<center>
+ Authors: <A HREF = "mailto:koziol@ncsa.uiuc.edu">
+ <I>Quincey Koziol</I> </A> and
+ <A HREF = "mailto:matzke@llnl.gov">
+ <I> Robb Matzke </I> </A>
+
+</center>
+<UL>
+
+<FONT color="#c80028"
+<LI> <A NAME="01"> <B> <I> FILES </I> </B> </A>
+</FONT>
+
+<UL>
+
+ <LI> Source files are named according to the package they contain (see
+ below). All files will begin with `H5' so we can stuff our
+ object files into someone else's library and not worry about file
+ name conflicts.
+ <P>For Example:
+<i><b>
+<dd> H5.c -- "Generic" library functions
+ <br>
+<dd> H5B.c -- B-link tree functions
+</i></b>
+ <p>
+ <LI> If a package is in more than one file, then another name is tacked
+ on. It's all lower case with no underscores or hyphens.
+ <P>For Example:
+<i><b>
+<dd> H5F.c -- the file for this package
+ <br>
+<dd> H5Fstdio.c -- stdio functions (just an example)
+ <br>
+<dd> H5Ffcntl.c -- fcntl functions (just an example)
+</i></b>
+ <p>
+ <LI> Each package file has a header file of API stuff (unless there is
+ no API component to the package)
+ <P>For Example:
+<i><b>
+<dd> H5F.h -- things an application would see. </i> </b>
+ <P>
+ and a header file of private stuff
+<i><b>
+ <p>
+<dd> H5Fprivate.h -- things an application wouldn't see. The
+ private header includes the public header.
+</i></b>
+ <p>
+ and a header for private prototypes
+<i><b>
+ <p>
+<dd> H5Fproto.h -- prototypes for internal functions.
+</i></b>
+ <P>
+ By splitting the prototypes into separate include files we don't
+ have to recompile everything when just one function prototype
+ changes.
+
+ <LI> The main API header file is `hdf5.h' and it includes each of the
+ public header files but none of the private header files. Or the
+ application can include just the public header files it needs.
+
+ <LI> There is no main private or prototype header file because it
+ prevents make from being efficient. Instead, each source file
+ includes only the private header and prototype files it needs
+ (first all the private headers, then all the private prototypes).
+
+ <LI> Header files should include everything they need and nothing more.
+
+</UL>
+<P>
+
+<FONT color="#c80028"
+<LI> <A NAME="02"> <B> <I> PACKAGES </I> </B> </A>
+</FONT>
+
+<P>
+Names exported beyond function scope begin with `H5' followed by zero,
+one, or two upper-case letters that describe the class of object.
+This prefix is the package name. The implementation of packages
+doesn't necessarily have to map 1:1 to the source files.
+<P>
+<i><b>
+<dd> H5 -- library functions
+<br>
+<dd> H5A -- atoms
+<br>
+<dd> H5AC -- cache
+<br>
+<dd> H5B -- B-link trees
+<br>
+<dd> H5D -- datasets
+<br>
+<dd> H5E -- error handling
+<br>
+<dd> H5F -- files
+<br>
+<dd> H5G -- groups
+<br>
+<dd> H5M -- meta data
+<br>
+<dd> H5MM -- core memory management
+<br>
+<dd> H5MF -- file memory management
+<br>
+<dd> H5O -- object headers
+<br>
+<dd> H5P -- Property Lists
+<br>
+<dd> H5S -- dataspaces
+<br>
+<dd> H5R -- relationships
+<br>
+<dd> H5T -- datatype
+</i></b>
+<p>
+Each package implements a single main class of object (e.g., the H5B
+package implements B-link trees). The main data type of a package is
+the package name followed by `_t'.
+<p>
+<i><b>
+<dd> H5F_t -- HDF5 file type
+<br>
+<dd> H5B_t -- B-link tree data type
+</i></b>
+<p>
+
+Not all packages implement a data type (H5, H5MF) and some
+packages provide access to a preexisting data type (H5MM, H5S).
+<p>
+
+
+<FONT color="#c80028"
+<LI> <A NAME="03"> <B> <I> PUBLIC vs PRIVATE </I> </B> </A>
+</FONT>
+<p>
+If the symbol is for internal use only, then the package name is
+followed by an underscore and the rest of the name. Otherwise, the
+symbol is part of the API and there is no underscore between the
+package name and the rest of the name.
+<p>
+<i><b>
+<dd> H5Fopen -- an API function.
+<br>
+<dd> H5B_find -- an internal function.
+</i></b>
+<p>
+For functions, this is important because the API functions never pass
+pointers around (they use atoms instead for hiding the implementation)
+and they perform stringent checks on their arguments. Internal
+unctions, on the other hand, check arguments with assert().
+<p>
+Data types like H5B_t carry no information about whether the type is
+public or private since it doesn't matter.
+
+<p>
+
+
+<FONT color="#c80028"
+<LI> <A NAME="04"> <B> <I> INTEGRAL TYPES </I> </B> </A>
+</FONT>
+<p>
+Integral fixed-point type names are an optional `u' followed by `int'
+followed by the size in bits (8, 16,
+32, or 64). There is no trailing `_t' because these are common
+enough and follow their own naming convention.
+<p>
+<pre><H4>
+<dd> hbool_t -- boolean values (BTRUE, BFALSE, BFAIL)
+<br>
+<dd> int8 -- signed 8-bit integers
+<br>
+<dd> uint8 -- unsigned 8-bit integers
+<br>
+<dd> int16 -- signed 16-bit integers
+<br>
+<dd> uint16 -- unsigned 16-bit integers
+<br>
+<dd> int32 -- signed 32-bit integers
+<br>
+<dd> uint32 -- unsigned 32-bit integers
+<br>
+<dd> int64 -- signed 64-bit integers
+<br>
+<dd> uint64 -- unsigned 64-bit integers
+<br>
+<dd> intn -- "native" integers
+<br>
+<dd> uintn -- "native" unsigned integers
+
+</pre></H4>
+<p>
+
+<FONT color="#c80028"
+<LI> <A NAME="05"> <B> <I> OTHER TYPES </I> </B> </A>
+</FONT>
+
+<p>
+
+Other data types are always followed by `_t'.
+<p>
+<pre><H4>
+<dd> H5B_key_t-- additional data type used by H5B package.
+</pre></H4>
+<p>
+
+However, if the name is so common that it's used almost everywhere,
+then we make an alias for it by removing the package name and leading
+underscore and replacing it with an `h' (the main datatype for a
+package already has a short enough name, so we don't have aliases for
+them).
+<P>
+<pre><H4>
+<dd> typedef H5E_err_t herr_t;
+</pre> </H4>
+<p>
+
+<FONT color="#c80028"
+<LI> <A NAME="06"> <B> <I> GLOBAL VARIABLES </I> </B> </A>
+</FONT>
+<p>
+Global variables include the package name and end with `_g'.
+<p>
+<pre><H4>
+<dd> H5AC_methods_g -- global variable in the H5AC package.
+</pre> </H4>
+<p>
+
+
+<FONT color="#c80028"
+<LI> <A NAME="07">
+<I> <B>
+MACROS, PREPROCESSOR CONSTANTS, AND ENUM MEMBERS
+ </I> </B> </A>
+</FONT>
+<p>
+Same rules as other symbols except the name is all upper case. There
+are a few exceptions: <br>
+<ul>
+<li> Constants and macros defined on a system that is deficient:
+ <p><pre><H4>
+<dd> MIN(x,y), MAX(x,y) and their relatives
+ </pre></H4>
+
+<li> Platform constants :
+ <P>
+ No naming scheme; determined by OS and compiler.<br>
+ These appear only in one header file anyway.
+ <p>
+<li> Feature test constants (?)<br>
+ Always start with `HDF5_HAVE_' like HDF5_HAVE_STDARG_H for a
+ header file, or HDF5_HAVE_DEV_T for a data type, or
+ HDF5_HAVE_DIV for a function.
+</UL>
+<p>
+
+</UL>
+<p>
+<H6>
+<center>
+ This file /hdf3/web/hdf/internal/HDF_standard/HDF5.coding_standard.html is
+ maintained by Elena Pourmal <A HREF = "mailto:epourmal@ncsa.uiuc.edu">
+ <I>epourmal@ncsa.uiuc.edu</I> </A>.
+</center>
+<p>
+<center>
+ Last modified August 5, 1997
+</center>
+
+</H6>
+</BODY>
+<HTML>
+
diff --git a/doc/html/TechNotes/ObjectHeader.html b/doc/html/TechNotes/ObjectHeader.html
new file mode 100644
index 0000000..33ce711
--- /dev/null
+++ b/doc/html/TechNotes/ObjectHeader.html
@@ -0,0 +1,67 @@
+<html>
+<body>
+
+<h1>Object Headers</h1>
+
+<pre>
+
+haddr_t
+H5O_new (hdf5_file_t *f, intn nrefs, size_t size_hint)
+
+ Creates a new empty object header and returns its address.
+ The SIZE_HINT is the initial size of the data portion of the
+ object header and NREFS is the number of symbol table entries
+ that reference this object header (normally one).
+
+ If SIZE_HINT is too small, then at least some default amount
+ of space is allocated for the object header.
+
+intn /*num remaining links */
+H5O_link (hdf5_file_t *f, /*file containing header */
+ haddr_t addr, /*header file address */
+ intn adjust) /*link adjustment amount */
+
+
+size_t
+H5O_sizeof (hdf5_file_t *f, /*file containing header */
+ haddr_t addr, /*header file address */
+ H5O_class_t *type, /*message type or H5O_ANY */
+ intn sequence) /*sequence number, usually zero */
+
+ Returns the size of a particular instance of a message in an
+ object header. When an object header has more than one
+ instance of a particular message type, then SEQUENCE indicates
+ which instance to return.
+
+void *
+H5O_read (hdf5_file_t *f, /*file containing header */
+ haddr_t addr, /*header file address */
+ H5G_entry_t *ent, /*optional symbol table entry */
+ H5O_class_t *type, /*message type or H5O_ANY */
+ intn sequence, /*sequence number, usually zero */
+ size_t size, /*size of output message */
+ void *mesg) /*output buffer */
+
+ Reads a message from the object header into memory.
+
+const void *
+H5O_peek (hdf5_file_t *f, /*file containing header */
+ haddr_t addr, /*header file address */
+ H5G_entry_t *ent, /*optional symbol table entry */
+ H5O_class_t *type, /*type of message or H5O_ANY */
+ intn sequence) /*sequence number, usually zero */
+
+haddr_t /*new heap address */
+H5O_modify (hdf5_file_t *f, /*file containing header */
+ haddr_t addr, /*header file address */
+ H5G_entry_t *ent, /*optional symbol table entry */
+ hbool_t *ent_modified, /*entry modification flag */
+ H5O_class_t *type, /*message type */
+ intn overwrite, /*sequence number or -1 */
+ void *mesg) /*the message */
+
+
+</pre>
+
+</body>
+</html>
diff --git a/doc/html/TechNotes/RawDStorage.html b/doc/html/TechNotes/RawDStorage.html
new file mode 100644
index 0000000..87ea54d
--- /dev/null
+++ b/doc/html/TechNotes/RawDStorage.html
@@ -0,0 +1,274 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <title>Raw Data Storage in HDF5</title>
+ </head>
+
+ <body>
+ <h1>Raw Data Storage in HDF5</h1>
+
+ <p>This document describes the various ways that raw data is
+ stored in an HDF5 file and the object header messages which
+ contain the parameters for the storage.
+
+ <p>Raw data storage has three components: the mapping from some
+ logical multi-dimensional element space to the linear address
+ space of a file, compression of the raw data on disk, and
+ striping of raw data across multiple files. These components
+ are orthogonal.
+
+ <p>Some goals of the storage mechanism are to be able to
+ efficently store data which is:
+
+ <dl>
+ <dt>Small
+ <dd>Small pieces of raw data can be treated as meta data and
+ stored in the object header. This will be achieved by storing
+ the raw data in the object header with message 0x0006.
+ Compression and striping are not supported in this case.
+
+ <dt>Complete Large
+ <dd>The library should be able to store large arrays
+ contiguously in the file provided the user knows the final
+ array size a priori. The array can then be read/written in a
+ single I/O request. This is accomplished by describing the
+ storage with object header message 0x0005. Compression and
+ striping are not supported in this case.
+
+ <dt>Sparse Large
+ <dd>A large sparse raw data array should be stored in a manner
+ that is space-efficient but one in which any element can still
+ be accessed in a reasonable amount of time. Implementation
+ details are below.
+
+ <dt>Dynamic Size
+ <dd>One often doesn't have prior knowledge of the size of an
+ array. It would be nice to allow arrays to grow dynamically in
+ any dimension. It might also be nice to allow the array to
+ grow in the negative dimension directions if convenient to
+ implement. Implementation details are below.
+
+ <dt>Subslab Access
+ <dd>Some multi-dimensional arrays are almost always accessed by
+ subslabs. For instance, a 2-d array of pixels might always be
+ accessed as smaller 1k-by-1k 2-d arrays always aligned on 1k
+ index values. We should be able to store the array in such a
+ way that striding though the entire array is not necessary.
+ Subslab access might also be useful with compression
+ algorithms where each storage slab can be compressed
+ independently of the others. Implementation details are below.
+
+ <dt>Compressed
+ <dd>Various compression algorithms can be applied to the entire
+ array. We're not planning to support separate algorithms (or a
+ single algorithm with separate parameters) for each chunk
+ although it would be possible to implement that in a manner
+ similar to the way striping across files is
+ implemented.
+
+ <dt>Striped Across Files
+ <dd>The array access functions should support arrays stored
+ discontiguously across a set of files.
+ </dl>
+
+ <h1>Implementation of Indexed Storage</h1>
+
+ <p>The Sparse Large, Dynamic Size, and Subslab Access methods
+ share so much code that they can be described with a single
+ message. The new Indexed Storage Message (<code>0x0008</code>)
+ will replace the old Chunked Object (<code>0x0009</code>) and
+ Sparse Object (<code>0x000A</code>) Messages.
+
+ <p>
+ <center>
+ <table border cellpadding=4 width="60%">
+ <caption align=bottom>
+ <b>The Format of the Indexed Storage Message</b>
+ </caption>
+ <tr align=center>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ </tr>
+
+ <tr align=center>
+ <td colspan=4><br>Address of B-tree<br><br></td>
+ </tr>
+ <tr align=center>
+ <td>Number of Dimensions</td>
+ <td>Reserved</td>
+ <td>Reserved</td>
+ <td>Reserved</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Reserved (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Alignment for Dimension 0 (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Alignment for Dimension 1 (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>...</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Alignment for Dimension N (4 bytes)</td>
+ </tr>
+ </table>
+ </center>
+
+ <p>The alignment fields indicate the alignment in logical space to
+ use when allocating new storage areas on disk. For instance,
+ writing every other element of a 100-element one-dimensional
+ array (using one HDF5 I/O partial write operation per element)
+ that has unit storage alignment would result in 50
+ single-element, discontiguous storage segments. However, using
+ an alignment of 25 would result in only four discontiguous
+ segments. The size of the message varies with the number of
+ dimensions.
+
+ <p>A B-tree is used to point to the discontiguous portions of
+ storage which has been allocated for the object. All keys of a
+ particular B-tree are the same size and are a function of the
+ number of dimensions. It is therefore not possible to change the
+ dimensionality of an indexed storage array after its B-tree is
+ created.
+
+ <p>
+ <center>
+ <table border cellpadding=4 width="60%">
+ <caption align=bottom>
+ <b>The Format of a B-Tree Key</b>
+ </caption>
+ <tr align=center>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ </tr>
+
+ <tr align=center>
+ <td colspan=4>External File Number or Zero (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Chunk Offset in Dimension 0 (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Chunk Offset in Dimension 1 (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>...</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Chunk Offset in Dimension N (4 bytes)</td>
+ </tr>
+ </table>
+ </center>
+
+ <p>The keys within a B-tree obey an ordering based on the chunk
+ offsets. If the offsets in dimension-0 are equal, then
+ dimension-1 is used, etc. The External File Number field
+ contains a 1-origin offset into the External File List message
+ which contains the name of the external file in which that chunk
+ is stored.
+
+ <h1>Implementation of Striping</h1>
+
+ <p>The indexed storage will support arbitrary striping at the
+ chunk level; each chunk can be stored in any file. This is
+ accomplished by using the External File Number field of an
+ indexed storage B-tree key as a 1-origin offset into an External
+ File List Message (0x0009) which takes the form:
+
+ <p>
+ <center>
+ <table border cellpadding=4 width="60%">
+ <caption align=bottom>
+ <b>The Format of the External File List Message</b>
+ </caption>
+ <tr align=center>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ <th width="25%">byte</th>
+ </tr>
+
+ <tr align=center>
+ <td colspan=4><br>Name Heap Address<br><br></td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Number of Slots Allocated (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Number of File Names (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Byte Offset of Name 1 in Heap (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>Byte Offset of Name 2 in Heap (4 bytes)</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4>...</td>
+ </tr>
+ <tr align=center>
+ <td colspan=4><br>Unused Slot(s)<br><br></td>
+ </tr>
+ </table>
+ </center>
+
+ <p>Each indexed storage array that has all or part of its data
+ stored in external files will contain a single external file
+ list message. The size of the messages is determined when the
+ message is created, but it may be possible to enlarge the
+ message on demand by moving it. At this time, it's not possible
+ for multiple arrays to share a single external file list
+ message.
+
+ <dl>
+ <dt><code>
+ H5O_efl_t *H5O_efl_new (H5G_entry_t *object, intn
+ nslots_hint, intn heap_size_hint)
+ </code>
+ <dd>Adds a new, empty external file list message to an object
+ header and returns a pointer to that message. The message
+ acts as a cache for file descriptors of external files that
+ are open.
+
+ <p><dt><code>
+ intn H5O_efl_index (H5O_efl_t *efl, const char *filename)
+ </code>
+ <dd>Gets the external file index number for a particular file name.
+ If the name isn't in the external file list then it's added to
+ the H5O_efl_t struct and immediately written to the object
+ header to which the external file list message belongs. Name
+ comparison is textual. Each name should be relative to the
+ directory which contains the HDF5 file.
+
+ <p><dt><code>
+ H5F_low_t *H5O_efl_open (H5O_efl_t *efl, intn index, uintn mode)
+ </code>
+ <dd>Gets a low-level file descriptor for an external file. The
+ external file list caches file descriptors because we might
+ have many more external files than there are file descriptors
+ available to this process. The caller should not close this file.
+
+ <p><dt><code>
+ herr_t H5O_efl_release (H5O_efl_t *efl)
+ </code>
+ <dd>Releases an external file list, closes all files
+ associated with that list, and if the list has been modified
+ since the call to <code>H5O_efl_new</code> flushes the message
+ to disk.
+ </dl>
+
+ <hr>
+ <address><a href="mailto:robb@arborea.spizella.com">Robb Matzke</a></address>
+<!-- Created: Fri Oct 3 09:52:32 EST 1997 -->
+<!-- hhmts start -->
+Last modified: Tue Nov 25 12:36:50 EST 1997
+<!-- hhmts end -->
+ </body>
+</html>
diff --git a/doc/html/TechNotes/SymbolTables.html b/doc/html/TechNotes/SymbolTables.html
new file mode 100644
index 0000000..a05cd5a
--- /dev/null
+++ b/doc/html/TechNotes/SymbolTables.html
@@ -0,0 +1,323 @@
+<html>
+<body>
+
+<h1>Symbol Table Caching Issues</h1>
+
+<pre>
+
+A number of issues involving caching of object header messages in
+symbol table entries must be resolved.
+
+What is the motivation for these changes?
+
+ If we make objects completely independent of object name it allows
+ us to refer to one object by multiple names (a concept called hard
+ links in Unix file systems), which in turn provides an easy way to
+ share data between datasets.
+
+ Every object in an HDF5 file has a unique, constant object header
+ address which serves as a handle (or OID) for the object. The
+ object header contains messages which describe the object.
+
+ HDF5 allows some of the object header messages to be cached in
+ symbol table entries so that the object header doesn't have to be
+ read from disk. For instance, an entry for a directory caches the
+ directory disk addresses required to access that directory, so the
+ object header for that directory is seldom read.
+
+ If an object has multiple names (that is, a link count greater than
+ one), then it has multiple symbol table entries which point to it.
+ All symbol table entries must agree on header messages. The
+ current mechanism is to turn off the caching of header messages in
+ symbol table entries when the header link count is more than one,
+ and to allow caching once the link count returns to one.
+
+ However, in the current implementation, a package is allowed to
+ copy a symbol table entry and use it as a private cache for the
+ object header. This doesn't work for a number of reasons (all but
+ one require a `delete symbol entry' operation).
+
+ 1. If two packages hold copies of the same symbol table entry,
+ they don't notify each other of changes to the symbol table
+ entry. Eventually, one package reads a cached message and
+ gets the wrong value because the other package changed the
+ message in the object header.
+
+ 2. If one package holds a copy of the symbol table entry and
+ some other part of HDF5 removes the object and replaces it
+ with some other object, then the original package will
+ continue to access the non-existent object using the new
+ object header.
+
+ 3. If one package holds a copy of the symbol table entry and
+ some other part of HDF5 (re)moves the directory which
+ contains the object, then the package will be unable to
+ update the symbol table entry with the new cached
+ data. Packages that refer to the object by the new name will
+ use old cached data.
+
+
+The basic problem is that there may be multiple copies of the object
+symbol table entry floating around in the code when there should
+really be at most one per hard link.
+
+ Level 0: A copy may exist on disk as part of a symbol table node, which
+ is a small 1d array of symbol table entries.
+
+ Level 1: A copy may be cached in memory as part of a symbol table node
+ in the H5Gnode.c file by the H5AC layer.
+
+ Level 2a: Another package may be holding a copy so it can perform
+ fast lookup of any header messages that might be cached in
+ the symbol table entry. It can't point directly to the
+ cached symbol table node because that node can dissappear
+ at any time.
+
+ Level 2b: Packages may hold more than one copy of a symbol table
+ entry. For instance, if H5D_open() is called twice for
+ the same name, then two copies of the symbol table entry
+ for the dataset exist in the H5D package.
+
+How can level 2a and 2b be combined?
+
+ If package data structures contained pointers to symbol table
+ entries instead of copies of symbol table entries and if H5G
+ allocated one symbol table entry per hard link, then it's trivial
+ for Level 2a and 2b to benefit from one another's actions since
+ they share the same cache.
+
+How does this work conceptually?
+
+ Level 2a and 2b must notify Level 1 of their intent to use (or stop
+ using) a symbol table entry to access an object header. The
+ notification of the intent to access an object header is called
+ `opening' the object and releasing the access is `closing' the
+ object.
+
+ Opening an object requires an object name which is used to locate
+ the symbol table entry to use for caching of object header
+ messages. The return value is a handle for the object. Figure 1
+ shows the state after Dataset1 opens Object with a name that maps
+ through Entry1. The open request created a copy of Entry1 called
+ Shadow1 which exists even if SymNode1 is preempted from the H5AC
+ layer.
+
+ ______
+ Object / \
+ SymNode1 +--------+ |
+ +--------+ _____\ | Header | |
+ | | / / +--------+ |
+ +--------+ +---------+ \______/
+ | Entry1 | | Shadow1 | /____
+ +--------+ +---------+ \ \
+ : : \
+ +--------+ +----------+
+ | Dataset1 |
+ +----------+
+ FIGURE 1
+
+
+
+ The SymNode1 can appear and disappear from the H5AC layer at any
+ time without affecting the Object Header data cached in the Shadow.
+ The rules are:
+
+ * If the SymNode1 is present and is about to disappear and the
+ Shadow1 dirty bit is set, then Shadow1 is copied over Entry1, the
+ Entry1 dirty bit is set, and the Shadow1 dirty bit is cleared.
+
+ * If something requests a copy of Entry1 (for a read-only peek
+ request), and Shadow1 exists, then a copy (not pointer) of Shadow1
+ is returned instead.
+
+ * Entry1 cannot be deleted while Shadow1 exists.
+
+ * Entry1 cannot change directly if Shadow1 exists since this means
+ that some other package has opened the object and may be modifying
+ it. I haven't decided if it's useful to ever change Entry1
+ directly (except of course within the H5G layer itself).
+
+ * Shadow1 is created when Dataset1 `opens' the object through
+ Entry1. Dataset1 is given a pointer to Shadow1 and Shadow1's
+ reference count is incremented.
+
+ * When Dataset1 `closes' the Object the Shadow1 reference count is
+ decremented. When the reference count reaches zero, if the
+ Shadow1 dirty bit is set, then Shadow1's contents are copied to
+ Entry1, and the Entry1 dirty bit is set. Shadow1 is then deleted
+ if its reference count is zero. This may require reading SymNode1
+ back into the H5AC layer.
+
+What happens when another Dataset opens the Object through Entry1?
+
+ If the current state is represented by the top part of Figure 2,
+ then Dataset2 will be given a pointer to Shadow1 and the Shadow1
+ reference count will be incremented to two. The Object header link
+ count remains at one so Object Header messages continue to be cached
+ by Shadow1. Dataset1 and Dataset2 benefit from one another
+ actions. The resulting state is represented by Figure 2.
+
+ _____
+ SymNode1 Object / \
+ +--------+ _____\ +--------+ |
+ | | / / | Header | |
+ +--------+ +---------+ +--------+ |
+ | Entry1 | | Shadow1 | /____ \_____/
+ +--------+ +---------+ \ \
+ : : _ \
+ +--------+ |\ +----------+
+ \ | Dataset1 |
+ \________ +----------+
+ \ \
+ +----------+ |
+ | Dataset2 | |- New Dataset
+ +----------+ |
+ /
+ FIGURE 2
+
+
+What happens when the link count for Object increases while Dataset
+has the Object open?
+
+ SymNode2
+ +--------+
+ SymNode1 Object | |
+ +--------+ ____\ +--------+ /______ +--------+
+ | | / / | header | \ `| Entry2 |
+ +--------+ +---------+ +--------+ +--------+
+ | Entry1 | | Shadow1 | /____ : :
+ +--------+ +---------+ \ \ +--------+
+ : : \
+ +--------+ +----------+ \________________/
+ | Dataset1 | |
+ +----------+ New Link
+
+ FIGURE 3
+
+ The current state is represented by the left part of Figure 3. To
+ create a new link the Object Header had to be located by traversing
+ through Entry1/Shadow1. On the way through, the Entry1/Shadow1
+ cache is invalidated and the Object Header link count is
+ incremented. Entry2 is then added to SymNode2.
+
+ Since the Object Header link count is greater than one, Object
+ header data will not be cached in Entry1/Shadow1.
+
+ If the initial state had been all of Figure 3 and a third link is
+ being added and Object is open by Entry1 and Entry2, then creation
+ of the third link will invalidate the cache in Entry1 or Entry2. It
+ doesn't matter which since both caches are already invalidated
+ anyway.
+
+What happens if another Dataset opens the same object by another name?
+
+ If the current state is represented by Figure 3, then a Shadow2 is
+ created and associated with Entry2. However, since the Object
+ Header link count is more than one, nothing gets cached in Shadow2
+ (or Shadow1).
+
+What happens if the link count decreases?
+
+ If the current state is represented by all of Figure 3 then it isn't
+ possible to delete Entry1 because the object is currently open
+ through that entry. Therefore, the link count must have
+ decreased because Entry2 was removed.
+
+ As Dataset1 reads/writes messages in the Object header they will
+ begin to be cached in Shadow1 again because the Object header link
+ count is one.
+
+What happens if the object is removed while it's open?
+
+ That operation is not allowed.
+
+What happens if the directory containing the object is deleted?
+
+ That operation is not allowed since deleting the directory requires
+ that the directory be empty. The directory cannot be emptied
+ because the open object cannot be removed from the directory.
+
+What happens if the object is moved?
+
+ Moving an object is a process consisting of creating a new
+ hard-link with the new name and then deleting the old name.
+ This will fail if the object is open.
+
+What happens if the directory containing the entry is moved?
+
+ The entry and the shadow still exist and are associated with one
+ another.
+
+What if a file is flushed or closed when objects are open?
+
+ Flushing a symbol table with open objects writes correct information
+ to the file since Shadow is copied to Entry before the table is
+ flushed.
+
+ Closing a file with open objects will create a valid file but will
+ return failure.
+
+How is the Shadow associated with the Entry?
+
+ A symbol table is composed of one or more symbol nodes. A node is a
+ small 1-d array of symbol table entries. The entries can move
+ around within a node and from node-to-node as entries are added or
+ removed from the symbol table and nodes can move around within a
+ symbol table, being created and destroyed as necessary.
+
+ Since a symbol table has an object header with a unique and constant
+ file offset, and since H5G contains code to efficiently locate a
+ symbol table entry given it's name, we use these two values as a key
+ within a shadow to associate the shadow with the symbol table
+ entry.
+
+ struct H5G_shadow_t {
+ haddr_t stab_addr; /*symbol table header address*/
+ char *name; /*entry name wrt symbol table*/
+ hbool_t dirty; /*out-of-date wrt stab entry?*/
+ H5G_entry_t ent; /*my copy of stab entry */
+ H5G_entry_t *main; /*the level 1 entry or null */
+ H5G_shadow_t *next, *prev; /*other shadows for this stab*/
+ };
+
+ The set of shadows will be organized in a hash table of linked
+ lists. Each linked list will contain the shadows associated with a
+ particular symbol table header address and the list will be sorted
+ lexicographically.
+
+ Also, each Entry will have a pointer to the corresponding Shadow or
+ null if there is no shadow.
+
+ When a symbol table node is loaded into the main cache, we look up
+ the linked list of shadows in the shadow hash table based on the
+ address of the symbol table object header. We then traverse that
+ list matching shadows with symbol table entries.
+
+ We assume that opening/closing objects will be a relatively
+ infrequent event compared with loading/flushing symbol table
+ nodes. Therefore, if we keep the linked list of shadows sorted it
+ costs O(N) to open and close objects where N is the number of open
+ objects in that symbol table (instead of O(1)) but it costs only
+ O(N) to load a symbol table node (instead of O(N^2)).
+
+What about the root symbol entry?
+
+ Level 1 storage for the root symbol entry is always available since
+ it's stored in the hdf5_file_t struct instead of a symbol table
+ node. However, the contents of that entry can move from the file
+ handle to a symbol table node by H5G_mkroot(). Therefore, if the
+ root object is opened, we keep a shadow entry for it whose
+ `stab_addr' field is zero and whose `name' is null.
+
+ For this reason, the root object should always be read through the
+ H5G interface.
+
+One more key invariant: The H5O_STAB message in a symbol table header
+never changes. This allows symbol table entries to cache the H5O_STAB
+message for the symbol table to which it points without worrying about
+whether the cache will ever be invalidated.
+
+</pre>
+
+</body>
+</html>
diff --git a/doc/html/TechNotes/Version.html b/doc/html/TechNotes/Version.html
new file mode 100644
index 0000000..0e0853b
--- /dev/null
+++ b/doc/html/TechNotes/Version.html
@@ -0,0 +1,137 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <title>Version Numbers</title>
+ </head>
+
+ <body>
+ <h1>HDF5 Release Version Numbers</h1>
+
+ <h2>1. Introduction</h2>
+
+ <p>The HDF5 version number is a set of three integer values
+ written as either <code>hdf5-1.2.3</code> or <code>hdf5 version
+ 1.2 release 3</code>.
+
+ <p>The <code>5</code> is part of the library name and will only
+ change if the entire file format and library are redesigned
+ similar in scope to the changes between HDF4 and HDF5.
+
+ <p>The <code>1</code> is the <em>major version number</em> and
+ changes when there is an extensive change to the file format or
+ library API. Such a change will likely require files to be
+ translated and applications to be modified. This number is not
+ expected to change frequently.
+
+ <p>The <code>2</code> is the <em>minor version number</em> and is
+ incremented by each public release that presents new features.
+ Even numbers are reserved for stable public versions of the
+ library while odd numbers are reserved for developement
+ versions. See the diagram below for examples.
+
+ <p>The <code>3</code> is the <em>release number</em>. For public
+ versions of the library, the release number is incremented each
+ time a bug is fixed and the fix is made available to the public.
+ For development versions, the release number is incremented more
+ often (perhaps almost daily).
+
+ <h2>2. Abbreviated Versions</h2>
+
+ <p>It's often convenient to drop the release number when referring
+ to a version of the library, like saying version 1.2 of HDF5.
+ The release number can be any value in this case.
+
+ <h2>3. Special Versions</h2>
+
+ <p>Version 1.0.0 was released for alpha testing the first week of
+ March, 1998. The developement version number was incremented to
+ 1.0.1 and remained constant until the the last week of April,
+ when the release number started to increase and development
+ versions were made available to people outside the core HDF5
+ development team.
+
+ <p>Version 1.0.23 was released mid-July as a second alpha
+ version.
+
+ <p>Version 1.1.0 will be the first official beta release but the
+ 1.1 branch will also serve as a development branch since we're
+ not concerned about providing bug fixes separate from normal
+ development for the beta version.
+
+ <p>After the beta release we rolled back the version number so the
+ first release is version 1.0 and development will continue on
+ version 1.1. We felt that an initial version of 1.0 was more
+ important than continuing to increment the pre-release version
+ numbers.
+
+ <h2>4. Public versus Development</h2>
+
+ <p>The motivation for separate public and development versions is
+ that the public version will receive only bug fixes while the
+ development version will receive new features. This also allows
+ us to release bug fixes expediently without waiting for the
+ development version to reach a stable state.
+
+ <p>Eventually, the development version will near completion and a
+ new development branch will fork while the original one enters a
+ feature freeze state. When the original development branch is
+ ready for release the minor version number will be incremented
+ to an even value.
+
+ <p>
+ <center>
+ <img alt="Version Example" src="version.gif">
+ <br><b>Fig 1: Version Example</b>
+ </center>
+
+ <h2>5. Version Support from the Library</h2>
+
+ <p>The library provides a set of macros and functions to query and
+ check version numbers.
+
+ <dl>
+ <dt><code>H5_VERS_MAJOR</code>
+ <dt><code>H5_VERS_MINOR</code>
+ <dt><code>H5_VERS_RELEASE</code>
+ <dd>These preprocessor constants are defined in the public
+ include file and determine the version of the include files.
+
+ <br><br>
+ <dt><code>herr_t H5get_libversion (unsigned *<em>majnum</em>, unsigned
+ *<em>minnum</em>, unsigned *<em>relnum</em>)</code>
+ <dd>This function returns through its arguments the version
+ numbers for the library to which the application is linked.
+
+ <br><br>
+ <dt><code>void H5check(void)</code>
+ <dd>This is a macro that verifies that the version number of the
+ HDF5 include file used to compile the application matches the
+ version number of the library to which the application is
+ linked. This check occurs automatically when the first HDF5
+ file is created or opened and is important because a mismatch
+ between the include files and the library is likely to result
+ in corrupted data and/or segmentation faults. If a mismatch
+ is detected the library issues an error message on the
+ standard error stream and aborts with a core dump.
+
+ <br><br>
+ <dt><code>herr_t H5check_version (unsigned <em>majnum</em>,
+ unsigned <em>minnum</em>, unsigned <em>relnum</em>)</code>
+ <dd>This function is called by the <code>H5check()</code> macro
+ with the include file version constants. The function
+ compares its arguments to the result returned by
+ <code>H5get_libversion()</code> and if a mismatch is detected prints
+ an error message on the standard error stream and aborts.
+ </dl>
+
+<hr>
+<address><a href="mailto:hdfhelp@ncsa.uiuc.edu">HDF Help Desk</a></address>
+<br>
+
+<!-- Created: Wed Apr 22 11:24:40 EDT 1998 -->
+<!-- hhmts start -->
+Last modified: Fri Oct 30 10:32:50 EST 1998
+<!-- hhmts end -->
+
+ </body>
+</html>