diff options
author | Frank Baker <fbaker@hdfgroup.org> | 2000-05-01 21:31:11 (GMT) |
---|---|---|
committer | Frank Baker <fbaker@hdfgroup.org> | 2000-05-01 21:31:11 (GMT) |
commit | 7749127d803ca4f8a11220d8ddda083ca11d658a (patch) | |
tree | a1b7bfc6bb85fb4ff7cbe3d2515170a49d4cd568 /doc/html | |
parent | 74f1fc208d2e866a750b5069a6fd4948bca58baf (diff) | |
download | hdf5-7749127d803ca4f8a11220d8ddda083ca11d658a.zip hdf5-7749127d803ca4f8a11220d8ddda083ca11d658a.tar.gz hdf5-7749127d803ca4f8a11220d8ddda083ca11d658a.tar.bz2 |
[svn-r2208] Big.html --> BigDataSmMach.html
Coding.html --> NamingScheme.html
CodeReview.html
ExternalFiles.html
compat.html --> H4-H5Compat.html
heap.txt --> HeapMgmt.html
IOPipe.html
Lib_Maint.html --> LibMaint.html
MemoryManagement.html
move.html --> MoveDStruct.html
ObjectHeader.txt
storage.html --> RawDStorage.html
symtab --> SymbolTables.html
Version.html
Above files moved from doc/html/ to doc/html/TechNotes/
for into new "HDF5 Technical Notes" document.
Filenames changed as indicated.
Diffstat (limited to 'doc/html')
-rw-r--r-- | doc/html/TechNotes/BigDataSmMach.html | 122 | ||||
-rw-r--r-- | doc/html/TechNotes/CodeReview.html | 300 | ||||
-rw-r--r-- | doc/html/TechNotes/ExternalFiles.html | 279 | ||||
-rw-r--r-- | doc/html/TechNotes/H4-H5Compat.html | 271 | ||||
-rw-r--r-- | doc/html/TechNotes/HeapMgmt.html | 79 | ||||
-rw-r--r-- | doc/html/TechNotes/IOPipe.html | 114 | ||||
-rw-r--r-- | doc/html/TechNotes/LibMaint.html | 122 | ||||
-rw-r--r-- | doc/html/TechNotes/MemoryMgmt.html | 510 | ||||
-rw-r--r-- | doc/html/TechNotes/MoveDStruct.html | 66 | ||||
-rw-r--r-- | doc/html/TechNotes/NamingScheme.html | 300 | ||||
-rw-r--r-- | doc/html/TechNotes/ObjectHeader.html | 67 | ||||
-rw-r--r-- | doc/html/TechNotes/RawDStorage.html | 274 | ||||
-rw-r--r-- | doc/html/TechNotes/SymbolTables.html | 323 | ||||
-rw-r--r-- | doc/html/TechNotes/Version.html | 137 |
14 files changed, 2964 insertions, 0 deletions
diff --git a/doc/html/TechNotes/BigDataSmMach.html b/doc/html/TechNotes/BigDataSmMach.html new file mode 100644 index 0000000..fe00ff8 --- /dev/null +++ b/doc/html/TechNotes/BigDataSmMach.html @@ -0,0 +1,122 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Big Datasets on Small Machines</title> + </head> + + <body> + <h1>Big Datasets on Small Machines</h1> + + <h2>1. Introduction</h2> + + <p>The HDF5 library is able to handle files larger than the + maximum file size, and datasets larger than the maximum memory + size. For instance, a machine where <code>sizeof(off_t)</code> + and <code>sizeof(size_t)</code> are both four bytes can handle + datasets and files as large as 18x10^18 bytes. However, most + Unix systems limit the number of concurrently open files, so a + practical file size limit is closer to 512GB or 1TB. + + <p>Two "tricks" must be imployed on these small systems in order + to store large datasets. The first trick circumvents the + <code>off_t</code> file size limit and the second circumvents + the <code>size_t</code> main memory limit. + + <h2>2. File Size Limits</h2> + + <p>Systems that have 64-bit file addresses will be able to access + those files automatically. One should see the following output + from configure: + + <p><code><pre> +checking size of off_t... 8 + </pre></code> + + <p>Also, some 32-bit operating systems have special file systems + that can support large (>2GB) files and HDF5 will detect + these and use them automatically. If this is the case, the + output from configure will show: + + <p><code><pre> +checking for lseek64... yes +checking for fseek64... yes + </pre></code> + + <p>Otherwise one must use an HDF5 file family. Such a family is + created by setting file family properties in a file access + property list and then supplying a file name that includes a + <code>printf</code>-style integer format. For instance: + + <p><code><pre> +hid_t plist, file; +plist = H5Pcreate (H5P_FILE_ACCESS); +H5Pset_family (plist, 1<<30, H5P_DEFAULT); +file = H5Fcreate ("big%03d.h5", H5F_ACC_TRUNC, H5P_DEFAULT, plist); + </code></pre> + + <p>The second argument (<code>1<<30</code>) to + <code>H5Pset_family()</code> indicates that the family members + are to be 2^30 bytes (1GB) each although we could have used any + reasonably large value. In general, family members cannot be + 2GB because writes to byte number 2,147,483,647 will fail, so + the largest safe value for a family member is 2,147,483,647. + HDF5 will create family members on demand as the HDF5 address + space increases, but since most Unix systems limit the number of + concurrently open files the effective maximum size of the HDF5 + address space will be limited (the system on which this was + developed allows 1024 open files, so if each family member is + approx 2GB then the largest HDF5 file is approx 2TB). + + <p>If the effective HDF5 address space is limited then one may be + able to store datasets as external datasets each spanning + multiple files of any length since HDF5 opens external dataset + files one at a time. To arrange storage for a 5TB dataset split + among 1GB files one could say: + + <p><code><pre> +hid_t plist = H5Pcreate (H5P_DATASET_CREATE); +for (i=0; i<5*1024; i++) { + sprintf (name, "velocity-%04d.raw", i); + H5Pset_external (plist, name, 0, (size_t)1<<30); +} + </code></pre> + + <h2>3. Dataset Size Limits</h2> + + <p>The second limit which must be overcome is that of + <code>sizeof(size_t)</code>. HDF5 defines a data type called + <code>hsize_t</code> which is used for sizes of datasets and is, + by default, defined as <code>unsigned long long</code>. + + <p>To create a dataset with 8*2^30 4-byte integers for a total of + 32GB one first creates the dataspace. We give two examples + here: a 4-dimensional dataset whose dimension sizes are smaller + than the maximum value of a <code>size_t</code>, and a + 1-dimensional dataset whose dimension size is too large to fit + in a <code>size_t</code>. + + <p><code><pre> +hsize_t size1[4] = {8, 1024, 1024, 1024}; +hid_t space1 = H5Screate_simple (4, size1, size1); + +hsize_t size2[1] = {8589934592LL}; +hid_t space2 = H5Screate_simple (1, size2, size2}; + </pre></code> + + <p>However, the <code>LL</code> suffix is not portable, so it may + be better to replace the number with + <code>(hsize_t)8*1024*1024*1024</code>. + + <p>For compilers that don't support <code>long long</code> large + datasets will not be possible. The library performs too much + arithmetic on <code>hsize_t</code> types to make the use of a + struct feasible. + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Fri Apr 10 13:26:04 EDT 1998 --> +<!-- hhmts start --> +Last modified: Sun Jul 19 11:37:25 EDT 1998 +<!-- hhmts end --> + </body> +</html> diff --git a/doc/html/TechNotes/CodeReview.html b/doc/html/TechNotes/CodeReview.html new file mode 100644 index 0000000..213cbbe --- /dev/null +++ b/doc/html/TechNotes/CodeReview.html @@ -0,0 +1,300 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Code Review</title> + </head> + <body> + <center><h1>Code Review 1</h1></center> + + <h3>Some background...</h3> + <p>This is one of the functions exported from the + <code>H5B.c</code> file that implements a B-link-tree class + without worrying about concurrency yet (thus the `Note:' in the + function prologue). The <code>H5B.c</code> file provides the + basic machinery for operating on generic B-trees, but it isn't + much use by itself. Various subclasses of the B-tree (like + symbol tables or indirect storage) provide their own interface + and back end to this function. For instance, + <code>H5G_stab_find()</code> takes a symbol table OID and a name + and calls <code>H5B_find()</code> with an appropriate + <code>udata</code> argument that eventually gets passed to the + <code>H5G_stab_find()</code> function. + + <p><code><pre> + 1 /*------------------------------------------------------------------------- + 2 * Function: H5B_find + 3 * + 4 * Purpose: Locate the specified information in a B-tree and return + 5 * that information by filling in fields of the caller-supplied + 6 * UDATA pointer depending on the type of leaf node + 7 * requested. The UDATA can point to additional data passed + 8 * to the key comparison function. + 9 * +10 * Note: This function does not follow the left/right sibling +11 * pointers since it assumes that all nodes can be reached +12 * from the parent node. +13 * +14 * Return: Success: SUCCEED if found, values returned through the +15 * UDATA argument. +16 * +17 * Failure: FAIL if not found, UDATA is undefined. +18 * +19 * Programmer: Robb Matzke +20 * matzke@llnl.gov +21 * Jun 23 1997 +22 * +23 * Modifications: +24 * +25 *------------------------------------------------------------------------- +26 */ +27 herr_t +28 H5B_find (H5F_t *f, const H5B_class_t *type, const haddr_t *addr, void *udata) +29 { +30 H5B_t *bt=NULL; +31 intn idx=-1, lt=0, rt, cmp=1; +32 int ret_value = FAIL; + </pre></code> + + <p>All pointer arguments are initialized when defined. I don't + worry much about non-pointers because it's usually obvious when + the value isn't initialized. + + <p><code><pre> +33 +34 FUNC_ENTER (H5B_find, NULL, FAIL); +35 +36 /* +37 * Check arguments. +38 */ +39 assert (f); +40 assert (type); +41 assert (type->decode); +42 assert (type->cmp3); +43 assert (type->found); +44 assert (addr && H5F_addr_defined (addr)); + </pre></code> + + <p>I use <code>assert</code> to check invariant conditions. At + this level of the library, none of these assertions should fail + unless something is majorly wrong. The arguments should have + already been checked by higher layers. It also provides + documentation about what arguments might be optional. + + <p><code><pre> +45 +46 /* +47 * Perform a binary search to locate the child which contains +48 * the thing for which we're searching. +49 */ +50 if (NULL==(bt=H5AC_protect (f, H5AC_BT, addr, type, udata))) { +51 HGOTO_ERROR (H5E_BTREE, H5E_CANTLOAD, FAIL); +52 } + </pre></code> + + <p>You'll see this quite often in the low-level stuff and it's + documented in the <code>H5AC.c</code> file. The + <code>H5AC_protect</code> insures that the B-tree node (which + inherits from the H5AC package) whose OID is <code>addr</code> + is locked into memory for the duration of this function (see the + <code>H5AC_unprotect</code> on line 90). Most likely, if this + node has been accessed in the not-to-distant past, it will still + be in memory and the <code>H5AC_protect</code> is almost a + no-op. If cache debugging is compiled in, then the protect also + prevents other parts of the library from accessing the node + while this function is protecting it, so this function can allow + the node to be in an inconsistent state while calling other + parts of the library. + + <p>The alternative is to call the slighlty cheaper + <code>H5AC_find</code> and assume that the pointer it returns is + valid only until some other library function is called, but + since we're accessing the pointer throughout this function, I + chose to use the simpler protect scheme. All protected objects + <em>must be unprotected</em> before the file is closed, thus the + use of <code>HGOTO_ERROR</code> instead of + <code>HRETURN_ERROR</code>. + + <p><code><pre> +53 rt = bt->nchildren; +54 +55 while (lt<rt && cmp) { +56 idx = (lt + rt) / 2; +57 if (H5B_decode_keys (f, bt, idx)<0) { +58 HGOTO_ERROR (H5E_BTREE, H5E_CANTDECODE, FAIL); +59 } +60 +61 /* compare */ +62 if ((cmp=(type->cmp3)(f, bt->key[idx].nkey, udata, +63 bt->key[idx+1].nkey))<0) { +64 rt = idx; +65 } else { +66 lt = idx+1; +67 } +68 } +69 if (cmp) { +70 HGOTO_ERROR (H5E_BTREE, H5E_NOTFOUND, FAIL); +71 } + </pre></code> + + <p>Code is arranged in paragraphs with a comment starting each + paragraph. The previous paragraph is a standard binary search + algorithm. The <code>(type->cmp3)()</code> is an indirect + function call into the subclass of the B-tree. All indirect + function calls have the function part in parentheses to document + that it's indirect (quite obvious here, but not so obvious when + the function is a variable). + + <p>It's also my standard practice to have side effects in + conditional expressions because I can write code faster and it's + more apparent to me what the condition is testing. But if I + have an assignment in a conditional expr, then I use an extra + set of parens even if they're not required (usually they are, as + in this case) so it's clear that I meant <code>=</code> instead + of <code>==</code>. + + <p><code><pre> +72 +73 /* +74 * Follow the link to the subtree or to the data node. +75 */ +76 assert (idx>=0 && idx<bt->nchildren); +77 if (bt->level > 0) { +78 if ((ret_value = H5B_find (f, type, bt->child+idx, udata))<0) { +79 HGOTO_ERROR (H5E_BTREE, H5E_NOTFOUND, FAIL); +80 } +81 } else { +82 ret_value = (type->found)(f, bt->child+idx, bt->key[idx].nkey, +83 udata, bt->key[idx+1].nkey); +84 if (ret_value<0) { +85 HGOTO_ERROR (H5E_BTREE, H5E_NOTFOUND, FAIL); +86 } +87 } + </pre></code> + + <p>Here I broke the "side effect in conditional" rule, which I + sometimes do if the expression is so long that the + <code><0</code> gets lost at the end. Another thing to note is + that success/failure is always determined by comparing with zero + instead of <code>SUCCEED</code> or <code>FAIL</code>. I do this + because occassionally one might want to return other meaningful + values (always non-negative) or distinguish between various types of + failure (always negative). + + <p><code><pre> +88 +89 done: +90 if (bt && H5AC_unprotect (f, H5AC_BT, addr, bt)<0) { +91 HRETURN_ERROR (H5E_BTREE, H5E_PROTECT, FAIL); +92 } +93 FUNC_LEAVE (ret_value); +94 } + </pre></code> + + <p>For lack of a better way to handle errors during error cleanup, + I just call the <code>HRETURN_ERROR</code> macro even though it + will make the error stack not quite right. I also use short + circuiting boolean operators instead of nested <code>if</code> + statements since that's standard C practice. + + <center><h1>Code Review 2</h1></center> + + + <p>The following code is an API function from the H5F package... + + <p><code><pre> + 1 /*-------------------------------------------------------------------------- + 2 NAME + 3 H5Fflush + 4 + 5 PURPOSE + 6 Flush all cached data to disk and optionally invalidates all cached + 7 data. + 8 + 9 USAGE +10 herr_t H5Fflush(fid, invalidate) +11 hid_t fid; IN: File ID of file to close. +12 hbool_t invalidate; IN: Invalidate all of the cache? +13 +14 ERRORS +15 ARGS BADTYPE Not a file atom. +16 ATOM BADATOM Can't get file struct. +17 CACHE CANTFLUSH Flush failed. +18 +19 RETURNS +20 SUCCEED/FAIL +21 +22 DESCRIPTION +23 This function flushes all cached data to disk and, if INVALIDATE +24 is non-zero, removes cached objects from the cache so they must be +25 re-read from the file on the next access to the object. +26 +27 MODIFICATIONS: +28 --------------------------------------------------------------------------*/ + </pre></code> + + <p>An API prologue is used for each API function instead of my + normal function prologue. I use the prologue from Code Review 1 + for non-API functions because it's more suited to C programmers, + it requires less work to keep it synchronized with the code, and + I have better editing tools for it. + + <p><code><pre> +29 herr_t +30 H5Fflush (hid_t fid, hbool_t invalidate) +31 { +32 H5F_t *file = NULL; +33 +34 FUNC_ENTER (H5Fflush, H5F_init_interface, FAIL); +35 H5ECLEAR; + </pre></code> + + <p>API functions are never called internally, therefore I always + clear the error stack before doing anything. + + <p><code><pre> +36 +37 /* check arguments */ +38 if (H5_FILE!=H5Aatom_group (fid)) { +39 HRETURN_ERROR (H5E_ARGS, H5E_BADTYPE, FAIL); /*not a file atom*/ +40 } +41 if (NULL==(file=H5Aatom_object (fid))) { +42 HRETURN_ERROR (H5E_ATOM, H5E_BADATOM, FAIL); /*can't get file struct*/ +43 } + </pre></code> + + <p>If something is wrong with the arguments then we raise an + error. We never <code>assert</code> arguments at this level. + We also convert atoms to pointers since atoms are really just a + pointer-hiding mechanism. Functions that can be called + internally always have pointer arguments instead of atoms + because (1) then they don't have to always convert atoms to + pointers, and (2) the various pointer data types provide more + documentation and type checking than just an <code>hid_t</code> + type. + + <p><code><pre> +44 +45 /* do work */ +46 if (H5F_flush (file, invalidate)<0) { +47 HRETURN_ERROR (H5E_CACHE, H5E_CANTFLUSH, FAIL); /*flush failed*/ +48 } + </pre></code> + + <p>An internal version of the function does the real work. That + internal version calls <code>assert</code> to check/document + it's arguments and can be called from other library functions. + + <p><code><pre> +49 +50 FUNC_LEAVE (SUCCEED); +51 } + </pre></code> + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Sat Nov 8 17:09:33 EST 1997 --> +<!-- hhmts start --> +Last modified: Mon Nov 10 15:33:33 EST 1997 +<!-- hhmts end --> + </body> +</html> diff --git a/doc/html/TechNotes/ExternalFiles.html b/doc/html/TechNotes/ExternalFiles.html new file mode 100644 index 0000000..c3197af --- /dev/null +++ b/doc/html/TechNotes/ExternalFiles.html @@ -0,0 +1,279 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>External Files in HDF5</title> + </head> + + <body> + <center><h1>External Files in HDF5</h1></center> + + <h3>Overview of Layers</h3> + + <p>This table shows some of the layers of HDF5. Each layer calls + functions at the same or lower layers and never functions at + higher layers. An object identifier (OID) takes various forms + at the various layers: at layer 0 an OID is an absolute physical + file address; at layers 1 and 2 it's an absolute virtual file + address. At layers 3 through 6 it's a relative address, and at + layers 7 and above it's an object handle. + + <p><center> + <table border cellpadding=4 width="60%"> + <tr align=center> + <td>Layer-7</td> + <td>Groups</td> + <td>Datasets</td> + </tr> + <tr align=center> + <td>Layer-6</td> + <td>Indirect Storage</td> + <td>Symbol Tables</td> + </tr> + <tr align=center> + <td>Layer-5</td> + <td>B-trees</td> + <td>Object Hdrs</td> + <td>Heaps</td> + </tr> + <tr align=center> + <td>Layer-4</td> + <td>Caching</td> + </tr> + <tr align=center> + <td>Layer-3</td> + <td>H5F chunk I/O</td> + </tr> + <tr align=center> + <td>Layer-2</td> + <td>H5F low</td> + </tr> + <tr align=center> + <td>Layer-1</td> + <td>File Family</td> + <td>Split Meta/Raw</td> + </tr> + <tr align=center> + <td>Layer-0</td> + <td>Section-2 I/O</td> + <td>Standard I/O</td> + <td>Malloc/Free</td> + </tr> + </table> + </center> + + <h3>Single Address Space</h3> + + <p>The simplest form of hdf5 file is a single file containing only + hdf5 data. The file begins with the boot block, which is + followed until the end of the file by hdf5 data. The next most + complicated file allows non-hdf5 data (user defined data or + internal wrappers) to appear before the boot block and after the + end of the hdf5 data. The hdf5 data is treated as a single + linear address space in both cases. + + <p>The next level of complexity comes when non-hdf5 data is + interspersed with the hdf5 data. We handle that by including + the non-hdf5 interspersed data in the hdf5 address space and + simply not referencing it (eventually we might add those + addresses to a "do-not-disturb" list using the same mechanism as + the hdf5 free list, but it's not absolutely necessary). This is + implemented except for the "do-not-disturb" list. + + <p>The most complicated single address space hdf5 file is when we + allow the address space to be split among multiple physical + files. For instance, a >2GB file can be split into smaller + chunks and transfered to a 32 bit machine, then accessed as a + single logical hdf5 file. The library already supports >32 bit + addresses, so at layer 1 we split a 64-bit address into a 32-bit + file number and a 32-bit offset (the 64 and 32 are + arbitrary). The rest of the library still operates with a linear + address space. + + <p>Another variation might be a family of two files where all the + meta data is stored in one file and all the raw data is stored + in another file to allow the HDF5 wrapper to be easily replaced + with some other wrapper. + + <p>The <code>H5Fcreate</code> and <code>H5Fopen</code> functions + would need to be modified to pass file-type info down to layer 2 + so the correct drivers can be called and parameters passed to + the drivers to initialize them. + + <h4>Implementation</h4> + + <p>I've implemented fixed-size family members. The entire hdf5 + file is partitioned into members where each member is the same + size. The family scheme is used if one passes a name to + <code>H5F_open</code> (which is called by <code>H5Fopen()</code> + and <code>H5Fcreate</code>) that contains a + <code>printf(3c)</code>-style integer format specifier. + Currently, the default low-level file driver is used for all + family members (H5F_LOW_DFLT, usually set to be Section 2 I/O or + Section 3 stdio), but we'll probably eventually want to pass + that as a parameter of the file access property list, which + hasn't been implemented yet. When creating a family, a default + family member size is used (defined at the top H5Ffamily.c, + currently 64MB) but that also should be settable in the file + access property list. When opening an existing family, the size + of the first member is used to determine the member size + (flushing/closing a family ensures that the first member is the + correct size) but the other family members don't have to be that + large (the local address space, however, is logically the same + size for all members). + + <p>I haven't implemented a split meta/raw family yet but am rather + curious to see how it would perform. I was planning to use the + `.h5' extension for the meta data file and `.raw' for the raw + data file. The high-order bit in the address would determine + whether the address refers to meta data or raw data. If the user + passes a name that ends with `.raw' to <code>H5F_open</code> + then we'll chose the split family and use the default low level + driver for each of the two family members. Eventually we'll + want to pass these kinds of things through the file access + property list instead of relying on naming convention. + + <h3>External Raw Data</h3> + + <p>We also need the ability to point to raw data that isn't in the + HDF5 linear address space. For instance, a dataset might be + striped across several raw data files. + + <p>Fortunately, the only two packages that need to be aware of + this are the packages for reading/writing contiguous raw data + and discontiguous raw data. Since contiguous raw data is a + special case, I'll discuss how to implement external raw data in + the discontiguous case. + + <p>Discontiguous data is stored as a B-tree whose keys are the + chunk indices and whose leaf nodes point to the raw data by + storing a file address. So what we need is some way to name the + external files, and a way to efficiently store the external file + name for each chunk. + + <p>I propose adding to the object header an <em>External File + List</em> message that is a 1-origin array of file names. + Then, in the B-tree, each key has an index into the External + File List (or zero for the HDF5 file) for the file where the + chunk can be found. The external file index is only used at + the leaf nodes to get to the raw data (the entire B-tree is in + the HDF5 file) but because of the way keys are copied among + the B-tree nodes, it's much easier to store the index with + every key. + + <h3>Multiple HDF5 Files</h3> + + <p>One might also want to combine two or more HDF5 files in a + manner similar to mounting file systems in Unix. That is, the + group structure and meta data from one file appear as though + they exist in the first file. One opens File-A, and then + <em>mounts</em> File-B at some point in File-A, the <em>mount + point</em>, so that traversing into the mount point actually + causes one to enter the root object of File-B. File-A and + File-B are each complete HDF5 files and can be accessed + individually without mounting them. + + <p>We need a couple additional pieces of machinery to make this + work. First, an haddr_t type (a file address) doesn't contain + any info about which HDF5 file's address space the address + belongs to. But since haddr_t is an opaque type except at + layers 2 and below, it should be quite easy to add a pointer to + the HDF5 file. This would also remove the H5F_t argument from + most of the low-level functions since it would be part of the + OID. + + <p>The other thing we need is a table of mount points and some + functions that understand them. We would add the following + table to each H5F_t struct: + + <p><code><pre> +struct H5F_mount_t { + H5F_t *parent; /* Parent HDF5 file if any */ + struct { + H5F_t *f; /* File which is mounted */ + haddr_t where; /* Address of mount point */ + } *mount; /* Array sorted by mount point */ + intn nmounts; /* Number of mounted files */ + intn alloc; /* Size of mount table */ +} + </pre></code> + + <p>The <code>H5Fmount</code> function takes the ID of an open + file or group, the name of a to-be-mounted file, the name of the mount + point, and a file access property list (like <code>H5Fopen</code>). + It opens the new file and adds a record to the parent's mount + table. The <code>H5Funmount</code> function takes the parent + file or group ID and the name of the mount point and disassociates + the mounted file from the mount point. It does not close the + mounted file. The <code>H5Fclose</code> + function closes/unmounts files recursively. + + <p>The <code>H5G_iname</code> function which translates a name to + a file address (<code>haddr_t</code>) looks at the mount table + at each step in the translation and switches files where + appropriate. All name-to-address translations occur through + this function. + + <h3>How Long?</h3> + + <p>I'm expecting to be able to implement the two new flavors of + single linear address space in about two days. It took two hours + to implement the malloc/free file driver at level zero and I + don't expect this to be much more work. + + <p>I'm expecting three days to implement the external raw data for + discontiguous arrays. Adding the file index to the B-tree is + quite trivial; adding the external file list message shouldn't + be too hard since the object header message class from wich this + message derives is fully implemented; and changing + <code>H5F_istore_read</code> should be trivial. Most of the + time will be spent designing a way to cache Unix file + descriptors efficiently since the total number open files + allowed per process could be much smaller than the total number + of HDF5 files and external raw data files. + + <p>I'm expecting four days to implement being able to mount one + HDF5 file on another. I was originally planning a lot more, but + making <code>haddr_t</code> opaque turned out to be much easier + than I planned (I did it last Fri). Most of the work will + probably be removing the redundant H5F_t arguments for lots of + functions. + + <h3>Conclusion</h3> + + <p>The external raw data could be implemented as a single linear + address space, but doing so would require one to allocate large + enough file addresses throughout the file (>32bits) before the + file was created. It would make mixing an HDF5 file family with + external raw data, or external HDF5 wrapper around an HDF4 file + a more difficult process. So I consider the implementation of + external raw data files as a single HDF5 linear address space a + kludge. + + <p>The ability to mount one HDF5 file on another might not be a + very important feature especially since each HDF5 file must be a + complete file by itself. It's not possible to stripe an array + over multiple HDF5 files because the B-tree wouldn't be complete + in any one file, so the only choice is to stripe the array + across multiple raw data files and store the B-tree in the HDF5 + file. On the other hand, it might be useful if one file + contains some public data which can be mounted by other files + (e.g., a mesh topology shared among collaborators and mounted by + files that contain other fields defined on the mesh). Of course + the applications can open the two files separately, but it might + be more portable if we support it in the library. + + <p>So we're looking at about two weeks to implement all three + versions. I didn't get a chance to do any of them in AIO + although we had long-term plans for the first two with a + possibility of the third. They'll be much easier to implement in + HDF5 than AIO since I've been keeping these in mind from the + start. + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Sat Nov 8 18:08:52 EST 1997 --> +<!-- hhmts start --> +Last modified: Tue Sep 8 14:43:32 EDT 1998 +<!-- hhmts end --> + </body> +</html> diff --git a/doc/html/TechNotes/H4-H5Compat.html b/doc/html/TechNotes/H4-H5Compat.html new file mode 100644 index 0000000..2992476 --- /dev/null +++ b/doc/html/TechNotes/H4-H5Compat.html @@ -0,0 +1,271 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Backward/Forward Compatability</title> + </head> + + <body> + <h1>Backward/Forward Compatability</h1> + + <p>The HDF5 development must proceed in such a manner as to + satisfy the following conditions: + + <ol type=A> + <li>HDF5 applications can produce data that HDF5 + applications can read and write and HDF4 applications can produce + data that HDF4 applications can read and write. The situation + that demands this condition is obvious.</li> + + <li>HDF5 applications are able to produce data that HDF4 applications + can read and HDF4 applications can subsequently modify the + file subject to certain constraints depending on the + implementation. This condition is for the temporary + situation where a consumer has neither been relinked with a new + HDF4 API built on top of the HDF5 API nor recompiled with the + HDF5 API.</li> + + <li>HDF5 applications can read existing HDF4 files and subsequently + modify the file subject to certain constraints depending on + the implementation. This is condition is for the temporary + situation in which the producer has neither been relinked with a + new HDF4 API built on top of the HDF5 API nor recompiled with + the HDF5 API, or the permanent situation of HDF5 consumers + reading archived HDF4 files.</li> + </ul> + + <p>There's at least one invarient: new object features introduced + in the HDF5 file format (like 2-d arrays of structs) might be + impossible to "translate" to a format that an old HDF4 + application can understand either because the HDF4 file format + or the HDF4 API has no mechanism to describe the object. + + <p>What follows is one possible implementation based on how + Condition B was solved in the AIO/PDB world. It also attempts + to satisfy these goals: + + <ol type=1> + <li>The main HDF5 library contains as little extra baggage as + possible by either relying on external programs to take care + of compatability issues or by incorporating the logic of such + programs as optional modules in the HDF5 library. Conditions B + and C are separate programs/modules.</li> + + <li>No extra baggage not only means the library proper is small, + but also means it can be implemented (rather than migrated + from HDF4 source) from the ground up with minimal regard for + HDF4 thus keeping the logic straight forward.</li> + + <li>Compatability issues are handled behind the scenes when + necessary (and possible) but can be carried out explicitly + during things like data migration.</li> + </ol> + + <hr> + <h2>Wrappers</h2> + + <p>The proposed implementation uses <i>wrappers</i> to handle + compatability issues. A Format-X file is <i>wrapped</i> in a + Format-Y file by creating a Format-Y skeleton that replicates + the Format-X meta data. The Format-Y skeleton points to the raw + data stored in Format-X without moving the raw data. The + restriction is that raw data storage methods in Format-Y is a + superset of raw data storage methods in Format-X (otherwise the + raw data must be copied to Format-Y). We're assuming that meta + data is small wrt the entire file. + + <p>The wrapper can be a separate file that has pointers into the + first file or it can be contained within the first file. If + contained in a single file, the file can appear as a Format-Y + file or simultaneously a Format-Y and Format-X file. + + <p>The Format-X meta-data can be thought of as the original + wrapper around raw data and Format-Y is a second wrapper around + the same data. The wrappers are independend of one another; + modifying the meta-data in one wrapper causes the other to + become out of date. Modification of raw data doesn't invalidate + either view as long as the meta data that describes its storage + isn't modifed. For instance, an array element can change values + if storage is already allocated for the element, but if storage + isn't allocated then the meta data describing the storage must + change, invalidating all wrappers but one. + + <p>It's perfectly legal to modify the meta data of one wrapper + without modifying the meta data in the other wrapper(s). The + illegal part is accessing the raw data through a wrapper which + is out of date. + + <p>If raw data is wrapped by more than one internal wrapper + (<i>internal</i> means that the wrapper is in the same file as + the raw data) then access to that file must assume that + unreferenced parts of that file contain meta data for another + wrapper and cannot be reclaimed as free memory. + + <hr> + <h2>Implementation of Condition B</h2> + + <p>Since this is a temporary situation which can't be + automatically detected by the HDF5 library, we must rely + on the application to notify the HDF5 library whether or not it + must satisfy Condition B. (Even if we don't rely on the + application, at some point someone is going to remove the + Condition B constraint from the library.) So the module that + handles Condition B is conditionally compiled and then enabled + on a per-file basis. + + <p>If the application desires to produce an HDF4 file (determined + by arguments to <code>H5Fopen</code>), and the Condition B + module is compiled into the library, then <code>H5Fclose</code> + calls the module to traverse the HDF5 wrapper and generate an + additional internal or external HDF4 wrapper (wrapper specifics + are described below). If Condition B is implemented as a module + then it can benefit from the metadata already cached by the main + library. + + <p>An internal HDF4 wrapper would be used if the HDF5 file is + writable and the user doesn't mind that the HDF5 file is + modified. An external wrapper would be used if the file isn't + writable or if the user wants the data file to be primarily HDF5 + but a few applications need an HDF4 view of the data. + + <p>Modifying through the HDF5 library an HDF5 file that has + internal HDF4 wrapper should invalidate the HDF4 wrapper (and + optionally regenerate it when <code>H5Fclose</code> is + called). The HDF5 library must understand how wrappers work, but + not necessarily anything about the HDF4 file format. + + <p>Modifying through the HDF5 library an HDF5 file that has an + external HDF4 wrapper will cause the HDF4 wrapper to become out + of date (but possibly regenerated during <code>H5Fclose</code>). + <b>Note: Perhaps the next release of the HDF4 library should + insure that the HDF4 wrapper file has a more recent modification + time than the raw data file (the HDF5 file) to which it + points(?)</b> + + <p>Modifying through the HDF4 library an HDF5 file that has an + internal or external HDF4 wrapper will cause the HDF5 wrapper to + become out of date. However, there is now way for the old HDF4 + library to notify the HDF5 wrapper that it's out of date. + Therefore the HDF5 library must be able to detect when the HDF5 + wrapper is out of date and be able to fix it. If the HDF4 + wrapper is complete then the easy way is to ignore the original + HDF5 wrapper and generate a new one from the HDF4 wrapper. The + other approach is to compare the HDF4 and HDF5 wrappers and + assume that if they differ HDF4 is the right one, if HDF4 omits + data then it was because HDF4 is a partial wrapper (rather than + assume HDF4 deleted the data), and if HDF4 has new data then + copy the new meta data to the HDF5 wrapper. On the other hand, + perhaps we don't need to allow these situations (modifying an + HDF5 file with the old HDF4 library and then accessing it with + the HDF5 library is either disallowed or causes HDF5 objects + that can't be described by HDF4 to be lost). + + <p>To convert an HDF5 file to an HDF4 file on demand, one simply + opens the file with the HDF4 flag and closes it. This is also + how AIO implemented backward compatability with PDB in its file + format. + + <hr> + <h2>Implementation of Condition C</h2> + + <p>This condition must be satisfied for all time because there + will always be archived HDF4 files. If a pure HDF4 file (that + is, one without HDF5 meta data) is opened with an HDF5 library, + the <code>H5Fopen</code> builds an internal or external HDF5 + wrapper and then accesses the raw data through that wrapper. If + the HDF5 library modifies the file then the HDF4 wrapper becomes + out of date. However, since the HDF5 library hasn't been + released, we can at least implement it to disable and/or reclaim + the HDF4 wrapper. + + <p>If an external and temporary HDF5 wrapper is desired, the + wrapper is created through the cache like all other HDF5 files. + The data appears on disk only if a particular cached datum is + preempted. Instead of calling <code>H5Fclose</code> on the HDF5 + wrapper file we call <code>H5Fabort</code> which immediately + releases all file resources without updating the file, and then + we unlink the file from Unix. + + <hr> + <h2>What do wrappers look like?</h2> + + <p>External wrappers are quite obvious: they contain only things + from the format specs for the wrapper and nothing from the + format specs of the format which they wrap. + + <p>An internal HDF4 wrapper is added to an HDF5 file in such a way + that the file appears to be both an HDF4 file and an HDF5 + file. HDF4 requires an HDF4 file header at file offset zero. If + a user block is present then we just move the user block down a + bit (and truncate it) and insert the minimum HDF4 signature. + The HDF4 <code>dd</code> list and any other data it needs are + appended to the end of the file and the HDF5 signature uses the + logical file length field to determine the beginning of the + trailing part of the wrapper. + + <p> + <center> + <table border width="60%"> + <tr> + <td>HDF4 minimal file header. Its main job is to point to + the <code>dd</code> list at the end of the file.</td> + </tr> + <tr> + <td>User-defined block which is truncated by the size of the + HDF4 file header so that the HDF5 boot block file address + doesn't change.</td> + </tr> + <tr> + <td>The HDF5 boot block and data, unmodified by adding the + HDF4 wrapper.</td> + </tr> + <tr> + <td>The main part of the HDF4 wrapper. The <code>dd</code> + list will have entries for all parts of the file so + hdpack(?) doesn't (re)move anything.</td> + </tr> + </table> + </center> + + <p>When such a file is opened by the HDF5 library for + modification it shifts the user block back down to address zero + and fills with zeros, then truncates the file at the end of the + HDF5 data or adds the trailing HDF4 wrapper to the free + list. This prevents HDF4 applications from reading the file with + an out of date wrapper. + + <p>If there is no user block then we have a problem. The HDF5 + boot block must be moved to make room for the HDF4 file header. + But moving just the boot block causes problems because all file + addresses stored in the file are relative to the boot block + address. The only option is to shift the entire file contents + by 512 bytes to open up a user block (too bad we don't have + hooks into the Unix i-node stuff so we could shift the entire + file contents by the size of a file system page without ever + performing I/O on the file :-) + + <p>Is it possible to place an HDF5 wrapper in an HDF4 file? I + don't know enough about the HDF4 format, but I would suspect it + might be possible to open a hole at file address 512 (and + possibly before) by moving some things to the end of the file + to make room for the HDF5 signature. The remainder of the HDF5 + wrapper goes at the end of the file and entries are added to the + HDF4 <code>dd</code> list to mark the location(s) of the HDF5 + wrapper. + + <hr> + <h2>Other Thoughts</h2> + + <p>Conversion programs that copy an entire HDF4 file to a separate, + self-contained HDF5 file and vice versa might be useful. + + + + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Fri Oct 3 11:52:31 EST 1997 --> +<!-- hhmts start --> +Last modified: Wed Oct 8 12:34:42 EST 1997 +<!-- hhmts end --> + </body> +</html> diff --git a/doc/html/TechNotes/HeapMgmt.html b/doc/html/TechNotes/HeapMgmt.html new file mode 100644 index 0000000..ebf58b2 --- /dev/null +++ b/doc/html/TechNotes/HeapMgmt.html @@ -0,0 +1,79 @@ +<html> +<body> + +<h1>Heap Management in HDF5</h1> + +<pre> + +Heap functions are in the H5H package. + + +off_t +H5H_new (hdf5_file_t *f, size_t size_hint, size_t realloc_hint); + + Creates a new heap in the specified file which can efficiently + store at least SIZE_HINT bytes. The heap can store more than + that, but doing so may cause the heap to become less efficient + (for instance, a heap implemented as a B-tree might become + discontigous). The REALLOC_HINT is the minimum number of bytes + by which the heap will grow when it must be resized. The hints + may be zero in which case reasonable (but probably not + optimal) values will be chosen. + + The return value is the address of the new heap relative to + the beginning of the file boot block. + +off_t +H5H_insert (hdf5_file_t *f, off_t addr, size_t size, const void *buf); + + Copies SIZE bytes of data from BUF into the heap whose address + is ADDR in file F. BUF must be the _entire_ heap object. The + return value is the byte offset of the new data in the heap. + +void * +H5H_read (hdf5_file_t *f, off_t addr, off_t offset, size_t size, void *buf); + + Copies SIZE bytes of data from the heap whose address is ADDR + in file F into BUF and then returns the address of BUF. If + BUF is the null pointer then a new buffer will be malloc'd by + this function and its address is returned. + + Returns buffer address or null. + +const void * +H5H_peek (hdf5_file_t *f, off_t addr, off_t offset) + + A more efficient version of H5H_read that returns a pointer + directly into the cache; the data is not copied from the cache + to a buffer. The pointer is valid until the next call to an + H5AC function directly or indirectly. + + Returns a pointer or null. Do not free the pointer. + +void * +H5H_write (hdf5_file_t *f, off_t addr, off_t offset, size_t size, + const void *buf); + + Modifies (part of) an object in the heap at address ADDR of + file F by copying SIZE bytes from the beginning of BUF to the + file. OFFSET is the address withing the heap where the output + is to occur. + + This function can fail if the combination of OFFSET and SIZE + would write over a boundary between two heap objects. + +herr_t +H5H_remove (hdf5_file_t *f, off_t addr, off_t offset, size_t size); + + Removes an object or part of an object which begins at byte + OFFSET within a heap whose address is ADDR in file F. SIZE + bytes are returned to the free list. Removing the middle of + an object has the side effect that one object is now split + into two objects. + + Returns success or failure. + +</pre> + +</body> +</html> diff --git a/doc/html/TechNotes/IOPipe.html b/doc/html/TechNotes/IOPipe.html new file mode 100644 index 0000000..7c24e2c --- /dev/null +++ b/doc/html/TechNotes/IOPipe.html @@ -0,0 +1,114 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>The Raw Data I/O Pipeline</title> + </head> + + <body> + <h1>The Raw Data I/O Pipeline</h1> + + <p>The HDF5 raw data pipeline is a complicated beast that handles + all aspects of raw data storage and transfer of that data + between the file and the application. Data can be stored + contiguously (internal or external), in variable size external + segments, or regularly chunked; it can be sparse, extendible, + and/or compressible. Data transfers must be able to convert from + one data space to another, convert from one number type to + another, and perform partial I/O operations. Furthermore, + applications will expect their common usage of the pipeline to + perform well. + + <p>To accomplish these goals, the pipeline has been designed in a + modular way so no single subroutine is overly complicated and so + functionality can be inserted easily at the appropriate + locations in the pipeline. A general pipeline was developed and + then certain paths through the pipeline were optimized for + performance. + + <p>We describe only the file-to-memory side of the pipeline since + the memory-to-file side is a mirror image. We also assume that a + proper hyperslab of a simple data space is being read from the + file into a proper hyperslab of a simple data space in memory, + and that the data type is a compound type which may require + various number conversions on its members. + + <img alt="Figure 1" src="pipe1.gif"> + + <p>The diagrams should be read from the top down. The Line A + in the figure above shows that <code>H5Dread()</code> copies + data from a hyperslab of a file dataset to a hyperslab of an + application buffer by calling <code>H5D_read()</code>. And + <code>H5D_read()</code> calls, in a loop, + <code>H5S_simp_fgath()</code>, <code>H5T_conv_struct()</code>, + and <code>H5S_simp_mscat()</code>. A temporary buffer, TCONV, is + loaded with data points from the file, then data type conversion + is performed on the temporary buffer, and finally data points + are scattered out to application memory. Thus, data type + conversion is an in-place operation and data space conversion + consists of two steps. An additional temporary buffer, BKG, is + large enough to hold <em>N</em> instances of the destination + data type where <em>N</em> is the same number of data points + that can be held by the TCONV buffer (which is large enough to + hold either source or destination data points). + + <p>The application sets an upper limit for the size of the TCONV + buffer and optionally supplies a buffer. If no buffer is + supplied then one will be created by calling + <code>malloc()</code> when the pipeline is executed (when + necessary) and freed when the pipeline exits. The size of the + BKG buffer depends on the size of the TCONV buffer and if the + application supplies a BKG buffer it should be at least as large + as the TCONV buffer. The default size for these buffers is one + megabyte but the buffer might not be used to full capacity if + the buffer size is not an integer multiple of the source or + destination data point size (whichever is larger, but only + destination for the BKG buffer). + + + + <p>Occassionally the destination data points will be partially + initialized and the <code>H5Dread()</code> operation should not + clobber those values. For instance, the destination type might + be a struct with members <code>a</code> and <code>b</code> where + <code>a</code> is already initialized and we're reading + <code>b</code> from the file. An extra line, G, is added to the + pipeline to provide the type conversion functions with the + existing data. + + <img alt="Figure 2" src="pipe2.gif"> + + <p>It will most likely be quite common that no data type + conversion is necessary. In such cases a temporary buffer for + data type conversion is not needed and data space conversion + can happen in a single step. In fact, when the source and + destination data are both contiguous (they aren't in the + picture) the loop degenerates to a single iteration. + + + <img alt="Figure 3" src="pipe3.gif"> + + <p>So far we've looked only at internal contiguous storage, but by + replacing Line B in Figures 1 and 2 and Line A in Figure 3 with + Figure 4 the pipeline is able to handle regularly chunked + objects. Line B of Figure 4 is executed once for each chunk + which contains data to be read and the chunk address is found by + looking at a multi-dimensional key in a chunk B-tree which has + one entry per chunk. + + <img alt="Figure 4" src="pipe4.gif"> + + <p>If a single chunk is requested and the destination buffer is + the same size/shape as the chunk, then the CHUNK buffer is + bypassed and the destination buffer is used instead as shown in + Figure 5. + + <img alt="Figure 5" src="pipe5.gif"> + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Tue Mar 17 11:13:35 EST 1998 --> +<!-- hhmts start --> +Last modified: Wed Mar 18 10:38:30 EST 1998 +<!-- hhmts end --> + </body> +</html> diff --git a/doc/html/TechNotes/LibMaint.html b/doc/html/TechNotes/LibMaint.html new file mode 100644 index 0000000..5e6b222 --- /dev/null +++ b/doc/html/TechNotes/LibMaint.html @@ -0,0 +1,122 @@ +<html> +<body> + + +<h1>Information for HDF5 Maintainers</h1> + +<pre> + +* You can run make from any directory. However, running in a + subdirectory only knows how to build things in that directory and + below. However, all makefiles know when their target depends on + something outside the local directory tree: + + $ cd test + $ make + make: *** No rule to make target ../src/libhdf5.a + +* All Makefiles understand the following targets: + + all -- build locally. + install -- install libs, headers, progs. + uninstall -- remove installed files. + mostlyclean -- remove temp files (eg, *.o but not *.a). + clean -- mostlyclean plus libs and progs. + distclean -- all non-distributed files. + maintainer-clean -- all derived files but H5config.h.in and configure. + +* Most Makefiles also understand: + + TAGS -- build a tags table + dep, depend -- recalculate source dependencies + lib -- build just the libraries w/o programs + +* If you have personal preferences for which make, compiler, compiler + flags, preprocessor flags, etc., that you use and you don't want to + set environment variables, then use a site configuration file. + + When configure starts, it looks in the config directory for files + whose name is some combination of the CPU name, vendor, and + operating system in this order: + + CPU-VENDOR-OS + VENDOR-OS + CPU-VENDOR + OS + VENDOR + CPU + + The first file which is found is sourced and can therefore affect + the behavior of the rest of configure. See config/BlankForm for the + template. + +* If you use GNU make along with gcc the Makefile will contain targets + that automatically maintain a list of source interdependencies; you + seldom have to say `make clean'. I say `seldom' because if you + change how one `*.h' file includes other `*.h' files you'll have + to force an update. + + To force an update of all dependency information remove the + `.depend' file from each directory and type `make'. For + instance: + + $ cd $HDF5_HOME + $ find . -name .depend -exec rm {} \; + $ make + + If you're not using GNU make and gcc then dependencies come from + ".distdep" files in each directory. Those files are generated on + GNU systems and inserted into the Makefile's by running + config.status (which happens near the end of configure). + +* If you use GNU make along with gcc then the Perl script `trace' is + run just before dependencies are calculated to update any H5TRACE() + calls that might appear in the file. Otherwise, after changing the + type of a function (return type or argument types) one should run + `trace' manually on those source files (e.g., ../bin/trace *.c). + +* Object files stay in the directory and are added to the library as a + final step instead of placing the file in the library immediately + and removing it from the directory. The reason is three-fold: + + 1. Most versions of make don't allow `$(LIB)($(SRC:.c=.o))' + which makes it necessary to have two lists of files, one + that ends with `.c' and the other that has the library + name wrapped around each `.o' file. + + 2. Some versions of make/ar have problems with modification + times of archive members. + + 3. Adding object files immediately causes problems on SMP + machines where make is doing more than one thing at a + time. + +* When using GNU make on an SMP you can cause it to compile more than + one thing at a time. At the top of the source tree invoke make as + + $ make -j -l6 + + which causes make to fork as many children as possible as long as + the load average doesn't go above 6. In subdirectories one can say + + $ make -j2 + + which limits the number of children to two (this doesn't work at the + top level because the `-j2' is not passed to recursive makes). + +* To create a release tarball go to the top-level directory and run + ./bin/release. You can optionally supply one or more of the words + `tar', `gzip', `bzip2' or `compress' on the command line. The + result will be a (compressed) tar file(s) in the `releases' + directory. The README file is updated to contain the release date + and version number. + +* To create a tarball of all the files which are part of HDF5 go to + the top-level directory and type: + + tar cvf foo.tar `grep '^\.' MANIFEST |unexpand |cut -f1` + +</pre> + +</body> +</html> diff --git a/doc/html/TechNotes/MemoryMgmt.html b/doc/html/TechNotes/MemoryMgmt.html new file mode 100644 index 0000000..93782b5 --- /dev/null +++ b/doc/html/TechNotes/MemoryMgmt.html @@ -0,0 +1,510 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Memory Management in HDF5</title> + </head> + + <body> + <h1>Memory Management in HDF5</h1> + + <!-- ---------------------------------------------------------------- --> + <h2>Is a Memory Manager Necessary?</h2> + + <p>Some form of memory management may be necessary in HDF5 when + the various deletion operators are implemented so that the + file memory is not permanently orphaned. However, since an + HDF5 file was designed with persistent data in mind, the + importance of a memory manager is questionable. + + <p>On the other hand, when certain meta data containers (file glue) + grow, they may need to be relocated in order to keep the + container contiguous. + + <blockquote> + <b>Example:</b> An object header consists of up to two + chunks of contiguous memory. The first chunk is a fixed + size at a fixed location when the header link count is + greater than one. Thus, inserting additional items into an + object header may require the second chunk to expand. When + this occurs, the second chunk may need to move to another + location in the file, freeing the file memory which that + chunk originally occupied. + </blockquote> + + <p>The relocation of meta data containers could potentially + orphan a significant amount of file memory if the application + has made poor estimates for preallocation sizes. + + <!-- ---------------------------------------------------------------- --> + <h2>Levels of Memory Management</h2> + + <p>Memory management by the library can be independent of memory + management support by the file format. The file format can + support no memory management, some memory management, or full + memory management. Similarly with the library. + + <h3>Support in the Library</h3> + + <dl> + <dt><b>No Support: I</b> + <dd>When memory is deallocated it simply becomes unreferenced + (orphaned) in the file. Memory allocation requests are + satisfied by extending the file. + + <dd>A separate off-line utility can be used to detect the + unreferenced bytes of a file and "bubble" them up to the end + of the file and then truncate the file. + + <dt><b>Some Support: II</b> + <dd>The library could support partial memory management all + the time, or full memory management some of the time. + Orphaning free blocks instead of adding them to a free list + should not affect the file integrity, nor should fulfilling + new requests by extending the file instead of using the free + list. + + <dt><b>Full Support: III</b> + <dd>The library supports space-efficient memory management by + always fulfilling allocation requests from the free list when + possible, and by coalescing adjacent free blocks into a + single larger free block. + </dl> + + <h3>Support in the File Format</h3> + + <dl> + <dt><b>No Support: A</b> + <dd>The file format does not support memory management; any + unreferenced block in the file is assumed to be free. If + the library supports full memory management then it will + have to traverse the entire file to determine which blocks + are unreferenced. + + <dt><b>Some Support: B</b> + <dd>Assuming that unreferenced blocks are free can be + dangerous in a situation where the file is not consistent. + For instance, if a directory tree becomes detached from the + main directory hierarchy, then the detached directory and + everything that is referenced only through the detached + directory become unreferenced. File repair utilities will + be unable to determine which unreferenced blocks need to be + linked back into the file hierarchy. + + <dd>Therefore, it might be useful to keep an unsorted, + doubly-linked list of free blocks in the file. The library + can add and remove blocks from the list in constant time, + and can generate its own internal free-block data structure + in time proportional to the number of free blocks instead of + the size of the file. Additionally, a library can use a + subset of the free blocks, an alternative which is not + feasible if the file format doesn't support any form of + memory management. + + <dt><b>Full Support: C</b> + <dd>The file format can mirror library data structures for + space-efficient memory management. The free blocks are + linked in unsorted, doubly-linked lists with one list per + free block size. The heads of the lists are pointed to by a + B-tree whose nodes are sorted by free block size. At the + same time, all free blocks are the leaf nodes of another + B-tree sorted by starting and ending address. When the + trees are used in combination we can deallocate and allocate + memory in O(log <em>N</em>) time where <em>N</em> is the + number of free blocks. + </dl> + + <h3>Combinations of Library and File Format Support</h3> + + <p>We now evaluate each combination of library support with file + support: + + <dl> + <dt><b>I-A</b> + <dd>If neither the library nor the file support memory + management, then each allocation request will come from the + end of the file and each deallocation request is a no-op + that simply leaves the free block unreferenced. + + <ul> + <li>Advantages + <ul> + <li>No file overhead for allocation or deallocation. + <li>No library overhead for allocation or + deallocation. + <li>No file traversal required at time of open. + <li>No data needs to be written back to the file when + it's closed. + <li>Trivial to implement (already implemented). + </ul> + + <li>Disadvantages + <ul> + <li>Inefficient use of file space. + <li>A file repair utility must reclaim lost file space. + <li>Difficulties for file repair utilities. (Is an + unreferenced block a free block or orphaned data?) + </ul> + </ul> + + <dt><b>II-A</b> + <dd>In order for the library to support memory management, it + will be required to build the internal free block + representation by traversing the entire file looking for + unreferenced blocks. + + <ul> + <li>Advantages + <ul> + <li>No file overhead for allocation or deallocation. + <li>Variable amount of library overhead for allocation + and deallocation depending on how much work the + library wants to do. + <li>No data needs to be written back to the file when + it's closed. + <li>Might use file space efficiently. + </ul> + <li>Disadvantages + <ul> + <li>Might use file space inefficiently. + <li>File traversal required at time of open. + <li>A file repair utility must reclaim lost file space. + <li>Difficulties for file repair utilities. + <li>Sharing of the free list between processes falls + outside the HDF5 file format documentation. + </ul> + </ul> + + <dt><b>III-A</b> + <dd>In order for the library to support full memory + management, it will be required to build the internal free + block representation by traversing the entire file looking + for unreferenced blocks. + + <ul> + <li>Advantages + <ul> + <li>No file overhead for allocation or deallocation. + <li>Efficient use of file space. + <li>No data needs to be written back to the file when + it's closed. + </ul> + <li>Disadvantages + <ul> + <li>Moderate amount of library overhead for allocation + and deallocation. + <li>File traversal required at time of open. + <li>A file repair utility must reclaim lost file space. + <li>Difficulties for file repair utilities. + <li>Sharing of the free list between processes falls + outside the HDF5 file format documentation. + </ul> + </ul> + + <dt><b>I-B</b> + <dd>If the library doesn't support memory management but the + file format supports some level of management, then a file + repair utility will have to be run occasionally to reclaim + unreferenced blocks. + + <ul> + <li>Advantages + <ul> + <li>No file overhead for allocation or deallocation. + <li>No library overhead for allocation or + deallocation. + <li>No file traversal required at time of open. + <li>No data needs to be written back to the file when + it's closed. + </ul> + <li>Disadvantages + <ul> + <li>A file repair utility must reclaim lost file space. + <li>Difficulties for file repair utilities. + </ul> + </ul> + + <dt><b>II-B</b> + <dd>Both the library and the file format support some level + of memory management. + + <ul> + <li>Advantages + <ul> + <li>Constant file overhead per allocation or + deallocation. + <li>Variable library overhead per allocation or + deallocation depending on how much work the library + wants to do. + <li>Traversal at file open time is on the order of the + free list size instead of the file size. + <li>The library has the option of reading only part of + the free list. + <li>No data needs to be written at file close time if + it has been amortized into the cost of allocation + and deallocation. + <li>File repair utilties don't have to be run to + reclaim memory. + <li>File repair utilities can detect whether an + unreferenced block is a free block or orphaned data. + <li>Sharing of the free list between processes might + be easier. + <li>Possible efficient use of file space. + </ul> + <li>Disadvantages + <ul> + <li>Possible inefficient use of file space. + </ul> + </ul> + + <dt><b>III-B</b> + <dd>The library provides space-efficient memory management but + the file format only supports an unsorted list of free + blocks. + + <ul> + <li>Advantages + <ul> + <li>Constant time file overhead per allocation or + deallocation. + <li>No data needs to be written at file close time if + it has been amortized into the cost of allocation + and deallocation. + <li>File repair utilities don't have to be run to + reclaim memory. + <li>File repair utilities can detect whether an + unreferenced block is a free block or orphaned data. + <li>Sharing of the free list between processes might + be easier. + <li>Efficient use of file space. + </ul> + <li>Disadvantages + <ul> + <li>O(log <em>N</em>) library overhead per allocation or + deallocation where <em>N</em> is the total number of + free blocks. + <li>O(<em>N</em>) time to open a file since the entire + free list must be read to construct the in-core + trees used by the library. + <li>Library is more complicated. + </ul> + </ul> + + <dt><b>I-C</b> + <dd>This has the same advantages and disadvantages as I-C with + the added disadvantage that the file format is much more + complicated. + + <dt><b>II-C</b> + <dd>If the library only provides partial memory management but + the file requires full memory management, then this method + degenerates to the same as II-A with the added disadvantage + that the file format is much more complicated. + + <dt><b>III-C</b> + <dd>The library and file format both provide complete data + structures for space-efficient memory management. + + <ul> + <li>Advantages + <ul> + <li>Files can be opened in constant time since the + free list is read on demand and amortised into the + allocation and deallocation requests. + <li>No data needs to be written back to the file when + it's closed. + <li>File repair utilities don't have to be run to + reclaim memory. + <li>File repair utilities can detect whether an + unreferenced block is a free block or orphaned data. + <li>Sharing the free list between processes is easy. + <li>Efficient use of file space. + </ul> + <li>Disadvantages + <ul> + <li>O(log <em>N</em>) file allocation and deallocation + cost where <em>N</em> is the total number of free + blocks. + <li>O(log <em>N</em>) library allocation and + deallocation cost. + <li>Much more complicated file format. + <li>More complicated library. + </ul> + </ul> + + </dl> + + <!-- ---------------------------------------------------------------- --> + <h2>The Algorithm for II-B</h2> + + <p>The file contains an unsorted, doubly-linked list of free + blocks. The address of the head of the list appears in the + boot block. Each free block contains the following fields: + + <center> + <table border cellpadding=4 width="60%"> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <th colspan=4>Free Block Signature</th> + + <tr align=center> + <th colspan=4>Total Free Block Size</th> + + <tr align=center> + <th colspan=4>Address of Left Sibling</th> + + <tr align=center> + <th colspan=4>Address of Right Sibling</th> + + <tr align=center> + <th colspan=4><br><br>Remainder of Free Block<br><br><br></th> + </table> + </center> + + <p>The library reads as much of the free list as convenient when + convenient and pushes those entries onto stacks. This can + occur when a file is opened or any time during the life of the + file. There is one stack for each free block size and the + stacks are sorted by size in a balanced tree in memory. + + <p>Deallocation involves finding the correct stack or creating + a new one (an O(log <em>K</em>) operation where <em>K</em> is + the number of stacks), pushing the free block info onto the + stack (a constant-time operation), and inserting the free + block into the file free block list (a constant-time operation + which doesn't necessarily involve any I/O since the free blocks + can be cached like other objects). No attempt is made to + coalesce adjacent free blocks into larger blocks. + + <p>Allocation involves finding the correct stack (an O(log + <em>K</em>) operation), removing the top item from the stack + (a constant-time operation), and removing the block from the + file free block list (a constant-time operation). If there is + no free block of the requested size or larger, then the file + is extended. + + <p>To provide sharability of the free list between processes, + the last step of an allocation will check for the free block + signature and if it doesn't find one will repeat the process. + Alternatively, a process can temporarily remove free blocks + from the file and hold them in it's own private pool. + + <p>To summarize... + <dl> + <dt>File opening + <dd>O(<em>N</em>) amortized over the time the file is open, + where <em>N</em> is the number of free blocks. The library + can still function without reading any of the file free + block list. + + <dt>Deallocation + <dd>O(log <em>K</em>) where <em>K</em> is the number of unique + sizes of free blocks. File access is constant. + + <dt>Allocation + <dd>O(log <em>K</em>). File access is constant. + + <dt>File closing + <dd>O(1) even if the library temporarily removes free + blocks from the file to hold them in a private pool since + the pool can still be a linked list on disk. + </dl> + + <!-- ---------------------------------------------------------------- --> + <h2>The Algorithm for III-C</h2> + + <p>The HDF5 file format supports a general B-tree mechanism + for storing data with keys. If we use a B-tree to represent + all parts of the file that are free and the B-tree is indexed + so that a free file chunk can be found if we know the starting + or ending address, then we can efficiently determine whether a + free chunk begins or ends at the specified address. Call this + the <em>Address B-Tree</em>. + + <p>If a second B-tree points to a set of stacks where the + members of a particular stack are all free chunks of the same + size, and the tree is indexed by chunk size, then we can + efficiently find the best-fit chunk size for a memory request. + Call this the <em>Size B-Tree</em>. + + <p>All free blocks of a particular size can be linked together + with an unsorted, doubly-linked, circular list and the left + and right sibling addresses can be stored within the free + chunk, allowing us to remove or insert items from the list in + constant time. + + <p>Deallocation of a block fo file memory consists of: + + <ol type="I"> + <li>Add the new free block whose address is <em>ADDR</em> to the + address B-tree. + + <ol type="A"> + <li>If the address B-tree contains an entry for a free + block that ends at <em>ADDR</em>-1 then remove that + block from the B-tree and from the linked list (if the + block was the first on the list then the size B-tree + must be updated). Adjust the size and address of the + block being freed to include the block just removed from + the free list. The time required to search for and + possibly remove the left block is O(log <em>N</em>) + where <em>N</em> is the number of free blocks. + + <li>If the address B-tree contains an entry for the free + block that begins at <em>ADDR</em>+<em>LENGTH</em> then + remove that block from the B-tree and from the linked + list (if the block was the first on the list then the + size B-tree must be updated). Adjust the size of the + block being freed to include the block just removed from + the free list. The time required to search for and + possibly remove the right block is O(log <em>N</em>). + + <li>Add the new (adjusted) block to the address B-tree. + The time for this operation is O(log <em>N</em>). + </ol> + + <li>Add the new block to the size B-tree and linked list. + + <ol type="A"> + <li>If the size B-tree has an entry for this particular + size, then add the chunk to the tail of the list. This + is an O(log <em>K</em>) operation where <em>K</em> is + the number of unique free block sizes. + + <li>Otherwise make a new entry in the B-tree for chunks of + this size. This is also O(log <em>K</em>). + </ol> + </ol> + + <p>Allocation is similar to deallocation. + + <p>To summarize... + + <dl> + <dt>File opening + <dd>O(1) + + <dt>Deallocation + <dd>O(log <em>N</em>) where <em>N</em> is the total number of + free blocks. File access time is O(log <em>N</em>). + + <dt>Allocation + <dd>O(log <em>N</em>). File access time is O(log <em>N</em>). + + <dt>File closing + <dd>O(1). + </dl> + + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Thu Jul 24 15:16:40 PDT 1997 --> +<!-- hhmts start --> +Last modified: Thu Jul 31 14:41:01 EST +<!-- hhmts end --> + </body> +</html> diff --git a/doc/html/TechNotes/MoveDStruct.html b/doc/html/TechNotes/MoveDStruct.html new file mode 100644 index 0000000..4576bd2 --- /dev/null +++ b/doc/html/TechNotes/MoveDStruct.html @@ -0,0 +1,66 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Relocating a File Data Structure</title> + </head> + + <body> + <h1>Relocating a File Data Structure</h1> + + <p>Since file data structures can be cached in memory by the H5AC + package it becomes problematic to move such a data structure in + the file. One cannot just copy a portion of the file from one + location to another because: + + <ol> + <li>the file might not contain the latest information, and</li> + <li>the H5AC package might not realize that the object's + address has changed and attempt to write the object to disk + at the old address.</li> + </ol> + + <p>Here's a correct method to move data from one location to + another. The example code assumes that one is moving a B-link + tree node from <code>old_addr</code> to <code>new_addr</code>. + + <ol> + <li>Make sure the disk is up-to-date with respect to the + cache. There is no need to remove the item from the cache, + hence the final argument to <code>H5AC_flush</code> is + <code>FALSE</code>. + <br><br> + <code> + H5AC_flush (f, H5AC_BT, old_addr, FALSE);<br> + </code> + <br> + </li> + + <li>Read the data from the old address and write it to the new + address. + <br><br> + <code> + H5F_block_read (f, old_addr, size, buf);<br> + H5F_block_write (f, new_addr, size, buf);<br> + </code> + <br> + </li> + + <li>Notify the cache that the address of the object changed. + <br><br> + <code> + H5AC_rename (f, H5AC_BT, old_addr, new_addr);<br> + </code> + <br> + </li> + </ol> + + + + <hr> + <address><a href="mailto:robb@maya.nuance.com">Robb Matzke</a></address> +<!-- Created: Mon Jul 14 15:09:06 EST 1997 --> +<!-- hhmts start --> +Last modified: Mon Jul 14 15:38:29 EST +<!-- hhmts end --> + </body> +</html> diff --git a/doc/html/TechNotes/NamingScheme.html b/doc/html/TechNotes/NamingScheme.html new file mode 100644 index 0000000..dbf55bf --- /dev/null +++ b/doc/html/TechNotes/NamingScheme.html @@ -0,0 +1,300 @@ +<HTML> +<HEAD><TITLE> + HDF5 Naming Scheme + </TITLE> </HEAD> + +<BODY bgcolor="#ffffff"> + + +<H1> +<FONT color="#c80028" + <I> <B> <CENTER> HDF5 Naming Scheme for </CENTER> </B> </I> </H1> +</FONT> +<P> +<UL> + +<LI> <A HREF = "#01"><I> FILES </I> </A> +<LI> <A HREF = "#02"><I> PACKAGES </I> </A> +<LI> <A HREF = "#03"><I> PUBLIC vs PRIVATE </I> </A> +<LI> <A HREF = "#04"><I> INTEGRAL TYPES </I> </A> +<LI> <A HREF = "#05"><I> OTHER TYPES </I> </A> +<LI> <A HREF = "#06"><I> GLOBAL VARIABLES </I> </A> +<LI> <A HREF = "#07"><I> MACROS, PREPROCESSOR CONSTANTS, AND ENUM MEMEBERs </I> </A> + +</UL> +<P> +<center> + Authors: <A HREF = "mailto:koziol@ncsa.uiuc.edu"> + <I>Quincey Koziol</I> </A> and + <A HREF = "mailto:matzke@llnl.gov"> + <I> Robb Matzke </I> </A> + +</center> +<UL> + +<FONT color="#c80028" +<LI> <A NAME="01"> <B> <I> FILES </I> </B> </A> +</FONT> + +<UL> + + <LI> Source files are named according to the package they contain (see + below). All files will begin with `H5' so we can stuff our + object files into someone else's library and not worry about file + name conflicts. + <P>For Example: +<i><b> +<dd> H5.c -- "Generic" library functions + <br> +<dd> H5B.c -- B-link tree functions +</i></b> + <p> + <LI> If a package is in more than one file, then another name is tacked + on. It's all lower case with no underscores or hyphens. + <P>For Example: +<i><b> +<dd> H5F.c -- the file for this package + <br> +<dd> H5Fstdio.c -- stdio functions (just an example) + <br> +<dd> H5Ffcntl.c -- fcntl functions (just an example) +</i></b> + <p> + <LI> Each package file has a header file of API stuff (unless there is + no API component to the package) + <P>For Example: +<i><b> +<dd> H5F.h -- things an application would see. </i> </b> + <P> + and a header file of private stuff +<i><b> + <p> +<dd> H5Fprivate.h -- things an application wouldn't see. The + private header includes the public header. +</i></b> + <p> + and a header for private prototypes +<i><b> + <p> +<dd> H5Fproto.h -- prototypes for internal functions. +</i></b> + <P> + By splitting the prototypes into separate include files we don't + have to recompile everything when just one function prototype + changes. + + <LI> The main API header file is `hdf5.h' and it includes each of the + public header files but none of the private header files. Or the + application can include just the public header files it needs. + + <LI> There is no main private or prototype header file because it + prevents make from being efficient. Instead, each source file + includes only the private header and prototype files it needs + (first all the private headers, then all the private prototypes). + + <LI> Header files should include everything they need and nothing more. + +</UL> +<P> + +<FONT color="#c80028" +<LI> <A NAME="02"> <B> <I> PACKAGES </I> </B> </A> +</FONT> + +<P> +Names exported beyond function scope begin with `H5' followed by zero, +one, or two upper-case letters that describe the class of object. +This prefix is the package name. The implementation of packages +doesn't necessarily have to map 1:1 to the source files. +<P> +<i><b> +<dd> H5 -- library functions +<br> +<dd> H5A -- atoms +<br> +<dd> H5AC -- cache +<br> +<dd> H5B -- B-link trees +<br> +<dd> H5D -- datasets +<br> +<dd> H5E -- error handling +<br> +<dd> H5F -- files +<br> +<dd> H5G -- groups +<br> +<dd> H5M -- meta data +<br> +<dd> H5MM -- core memory management +<br> +<dd> H5MF -- file memory management +<br> +<dd> H5O -- object headers +<br> +<dd> H5P -- Property Lists +<br> +<dd> H5S -- dataspaces +<br> +<dd> H5R -- relationships +<br> +<dd> H5T -- datatype +</i></b> +<p> +Each package implements a single main class of object (e.g., the H5B +package implements B-link trees). The main data type of a package is +the package name followed by `_t'. +<p> +<i><b> +<dd> H5F_t -- HDF5 file type +<br> +<dd> H5B_t -- B-link tree data type +</i></b> +<p> + +Not all packages implement a data type (H5, H5MF) and some +packages provide access to a preexisting data type (H5MM, H5S). +<p> + + +<FONT color="#c80028" +<LI> <A NAME="03"> <B> <I> PUBLIC vs PRIVATE </I> </B> </A> +</FONT> +<p> +If the symbol is for internal use only, then the package name is +followed by an underscore and the rest of the name. Otherwise, the +symbol is part of the API and there is no underscore between the +package name and the rest of the name. +<p> +<i><b> +<dd> H5Fopen -- an API function. +<br> +<dd> H5B_find -- an internal function. +</i></b> +<p> +For functions, this is important because the API functions never pass +pointers around (they use atoms instead for hiding the implementation) +and they perform stringent checks on their arguments. Internal +unctions, on the other hand, check arguments with assert(). +<p> +Data types like H5B_t carry no information about whether the type is +public or private since it doesn't matter. + +<p> + + +<FONT color="#c80028" +<LI> <A NAME="04"> <B> <I> INTEGRAL TYPES </I> </B> </A> +</FONT> +<p> +Integral fixed-point type names are an optional `u' followed by `int' +followed by the size in bits (8, 16, +32, or 64). There is no trailing `_t' because these are common +enough and follow their own naming convention. +<p> +<pre><H4> +<dd> hbool_t -- boolean values (BTRUE, BFALSE, BFAIL) +<br> +<dd> int8 -- signed 8-bit integers +<br> +<dd> uint8 -- unsigned 8-bit integers +<br> +<dd> int16 -- signed 16-bit integers +<br> +<dd> uint16 -- unsigned 16-bit integers +<br> +<dd> int32 -- signed 32-bit integers +<br> +<dd> uint32 -- unsigned 32-bit integers +<br> +<dd> int64 -- signed 64-bit integers +<br> +<dd> uint64 -- unsigned 64-bit integers +<br> +<dd> intn -- "native" integers +<br> +<dd> uintn -- "native" unsigned integers + +</pre></H4> +<p> + +<FONT color="#c80028" +<LI> <A NAME="05"> <B> <I> OTHER TYPES </I> </B> </A> +</FONT> + +<p> + +Other data types are always followed by `_t'. +<p> +<pre><H4> +<dd> H5B_key_t-- additional data type used by H5B package. +</pre></H4> +<p> + +However, if the name is so common that it's used almost everywhere, +then we make an alias for it by removing the package name and leading +underscore and replacing it with an `h' (the main datatype for a +package already has a short enough name, so we don't have aliases for +them). +<P> +<pre><H4> +<dd> typedef H5E_err_t herr_t; +</pre> </H4> +<p> + +<FONT color="#c80028" +<LI> <A NAME="06"> <B> <I> GLOBAL VARIABLES </I> </B> </A> +</FONT> +<p> +Global variables include the package name and end with `_g'. +<p> +<pre><H4> +<dd> H5AC_methods_g -- global variable in the H5AC package. +</pre> </H4> +<p> + + +<FONT color="#c80028" +<LI> <A NAME="07"> +<I> <B> +MACROS, PREPROCESSOR CONSTANTS, AND ENUM MEMBERS + </I> </B> </A> +</FONT> +<p> +Same rules as other symbols except the name is all upper case. There +are a few exceptions: <br> +<ul> +<li> Constants and macros defined on a system that is deficient: + <p><pre><H4> +<dd> MIN(x,y), MAX(x,y) and their relatives + </pre></H4> + +<li> Platform constants : + <P> + No naming scheme; determined by OS and compiler.<br> + These appear only in one header file anyway. + <p> +<li> Feature test constants (?)<br> + Always start with `HDF5_HAVE_' like HDF5_HAVE_STDARG_H for a + header file, or HDF5_HAVE_DEV_T for a data type, or + HDF5_HAVE_DIV for a function. +</UL> +<p> + +</UL> +<p> +<H6> +<center> + This file /hdf3/web/hdf/internal/HDF_standard/HDF5.coding_standard.html is + maintained by Elena Pourmal <A HREF = "mailto:epourmal@ncsa.uiuc.edu"> + <I>epourmal@ncsa.uiuc.edu</I> </A>. +</center> +<p> +<center> + Last modified August 5, 1997 +</center> + +</H6> +</BODY> +<HTML> + diff --git a/doc/html/TechNotes/ObjectHeader.html b/doc/html/TechNotes/ObjectHeader.html new file mode 100644 index 0000000..33ce711 --- /dev/null +++ b/doc/html/TechNotes/ObjectHeader.html @@ -0,0 +1,67 @@ +<html> +<body> + +<h1>Object Headers</h1> + +<pre> + +haddr_t +H5O_new (hdf5_file_t *f, intn nrefs, size_t size_hint) + + Creates a new empty object header and returns its address. + The SIZE_HINT is the initial size of the data portion of the + object header and NREFS is the number of symbol table entries + that reference this object header (normally one). + + If SIZE_HINT is too small, then at least some default amount + of space is allocated for the object header. + +intn /*num remaining links */ +H5O_link (hdf5_file_t *f, /*file containing header */ + haddr_t addr, /*header file address */ + intn adjust) /*link adjustment amount */ + + +size_t +H5O_sizeof (hdf5_file_t *f, /*file containing header */ + haddr_t addr, /*header file address */ + H5O_class_t *type, /*message type or H5O_ANY */ + intn sequence) /*sequence number, usually zero */ + + Returns the size of a particular instance of a message in an + object header. When an object header has more than one + instance of a particular message type, then SEQUENCE indicates + which instance to return. + +void * +H5O_read (hdf5_file_t *f, /*file containing header */ + haddr_t addr, /*header file address */ + H5G_entry_t *ent, /*optional symbol table entry */ + H5O_class_t *type, /*message type or H5O_ANY */ + intn sequence, /*sequence number, usually zero */ + size_t size, /*size of output message */ + void *mesg) /*output buffer */ + + Reads a message from the object header into memory. + +const void * +H5O_peek (hdf5_file_t *f, /*file containing header */ + haddr_t addr, /*header file address */ + H5G_entry_t *ent, /*optional symbol table entry */ + H5O_class_t *type, /*type of message or H5O_ANY */ + intn sequence) /*sequence number, usually zero */ + +haddr_t /*new heap address */ +H5O_modify (hdf5_file_t *f, /*file containing header */ + haddr_t addr, /*header file address */ + H5G_entry_t *ent, /*optional symbol table entry */ + hbool_t *ent_modified, /*entry modification flag */ + H5O_class_t *type, /*message type */ + intn overwrite, /*sequence number or -1 */ + void *mesg) /*the message */ + + +</pre> + +</body> +</html> diff --git a/doc/html/TechNotes/RawDStorage.html b/doc/html/TechNotes/RawDStorage.html new file mode 100644 index 0000000..87ea54d --- /dev/null +++ b/doc/html/TechNotes/RawDStorage.html @@ -0,0 +1,274 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Raw Data Storage in HDF5</title> + </head> + + <body> + <h1>Raw Data Storage in HDF5</h1> + + <p>This document describes the various ways that raw data is + stored in an HDF5 file and the object header messages which + contain the parameters for the storage. + + <p>Raw data storage has three components: the mapping from some + logical multi-dimensional element space to the linear address + space of a file, compression of the raw data on disk, and + striping of raw data across multiple files. These components + are orthogonal. + + <p>Some goals of the storage mechanism are to be able to + efficently store data which is: + + <dl> + <dt>Small + <dd>Small pieces of raw data can be treated as meta data and + stored in the object header. This will be achieved by storing + the raw data in the object header with message 0x0006. + Compression and striping are not supported in this case. + + <dt>Complete Large + <dd>The library should be able to store large arrays + contiguously in the file provided the user knows the final + array size a priori. The array can then be read/written in a + single I/O request. This is accomplished by describing the + storage with object header message 0x0005. Compression and + striping are not supported in this case. + + <dt>Sparse Large + <dd>A large sparse raw data array should be stored in a manner + that is space-efficient but one in which any element can still + be accessed in a reasonable amount of time. Implementation + details are below. + + <dt>Dynamic Size + <dd>One often doesn't have prior knowledge of the size of an + array. It would be nice to allow arrays to grow dynamically in + any dimension. It might also be nice to allow the array to + grow in the negative dimension directions if convenient to + implement. Implementation details are below. + + <dt>Subslab Access + <dd>Some multi-dimensional arrays are almost always accessed by + subslabs. For instance, a 2-d array of pixels might always be + accessed as smaller 1k-by-1k 2-d arrays always aligned on 1k + index values. We should be able to store the array in such a + way that striding though the entire array is not necessary. + Subslab access might also be useful with compression + algorithms where each storage slab can be compressed + independently of the others. Implementation details are below. + + <dt>Compressed + <dd>Various compression algorithms can be applied to the entire + array. We're not planning to support separate algorithms (or a + single algorithm with separate parameters) for each chunk + although it would be possible to implement that in a manner + similar to the way striping across files is + implemented. + + <dt>Striped Across Files + <dd>The array access functions should support arrays stored + discontiguously across a set of files. + </dl> + + <h1>Implementation of Indexed Storage</h1> + + <p>The Sparse Large, Dynamic Size, and Subslab Access methods + share so much code that they can be described with a single + message. The new Indexed Storage Message (<code>0x0008</code>) + will replace the old Chunked Object (<code>0x0009</code>) and + Sparse Object (<code>0x000A</code>) Messages. + + <p> + <center> + <table border cellpadding=4 width="60%"> + <caption align=bottom> + <b>The Format of the Indexed Storage Message</b> + </caption> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4><br>Address of B-tree<br><br></td> + </tr> + <tr align=center> + <td>Number of Dimensions</td> + <td>Reserved</td> + <td>Reserved</td> + <td>Reserved</td> + </tr> + <tr align=center> + <td colspan=4>Reserved (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>Alignment for Dimension 0 (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>Alignment for Dimension 1 (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>...</td> + </tr> + <tr align=center> + <td colspan=4>Alignment for Dimension N (4 bytes)</td> + </tr> + </table> + </center> + + <p>The alignment fields indicate the alignment in logical space to + use when allocating new storage areas on disk. For instance, + writing every other element of a 100-element one-dimensional + array (using one HDF5 I/O partial write operation per element) + that has unit storage alignment would result in 50 + single-element, discontiguous storage segments. However, using + an alignment of 25 would result in only four discontiguous + segments. The size of the message varies with the number of + dimensions. + + <p>A B-tree is used to point to the discontiguous portions of + storage which has been allocated for the object. All keys of a + particular B-tree are the same size and are a function of the + number of dimensions. It is therefore not possible to change the + dimensionality of an indexed storage array after its B-tree is + created. + + <p> + <center> + <table border cellpadding=4 width="60%"> + <caption align=bottom> + <b>The Format of a B-Tree Key</b> + </caption> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4>External File Number or Zero (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>Chunk Offset in Dimension 0 (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>Chunk Offset in Dimension 1 (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>...</td> + </tr> + <tr align=center> + <td colspan=4>Chunk Offset in Dimension N (4 bytes)</td> + </tr> + </table> + </center> + + <p>The keys within a B-tree obey an ordering based on the chunk + offsets. If the offsets in dimension-0 are equal, then + dimension-1 is used, etc. The External File Number field + contains a 1-origin offset into the External File List message + which contains the name of the external file in which that chunk + is stored. + + <h1>Implementation of Striping</h1> + + <p>The indexed storage will support arbitrary striping at the + chunk level; each chunk can be stored in any file. This is + accomplished by using the External File Number field of an + indexed storage B-tree key as a 1-origin offset into an External + File List Message (0x0009) which takes the form: + + <p> + <center> + <table border cellpadding=4 width="60%"> + <caption align=bottom> + <b>The Format of the External File List Message</b> + </caption> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4><br>Name Heap Address<br><br></td> + </tr> + <tr align=center> + <td colspan=4>Number of Slots Allocated (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>Number of File Names (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>Byte Offset of Name 1 in Heap (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>Byte Offset of Name 2 in Heap (4 bytes)</td> + </tr> + <tr align=center> + <td colspan=4>...</td> + </tr> + <tr align=center> + <td colspan=4><br>Unused Slot(s)<br><br></td> + </tr> + </table> + </center> + + <p>Each indexed storage array that has all or part of its data + stored in external files will contain a single external file + list message. The size of the messages is determined when the + message is created, but it may be possible to enlarge the + message on demand by moving it. At this time, it's not possible + for multiple arrays to share a single external file list + message. + + <dl> + <dt><code> + H5O_efl_t *H5O_efl_new (H5G_entry_t *object, intn + nslots_hint, intn heap_size_hint) + </code> + <dd>Adds a new, empty external file list message to an object + header and returns a pointer to that message. The message + acts as a cache for file descriptors of external files that + are open. + + <p><dt><code> + intn H5O_efl_index (H5O_efl_t *efl, const char *filename) + </code> + <dd>Gets the external file index number for a particular file name. + If the name isn't in the external file list then it's added to + the H5O_efl_t struct and immediately written to the object + header to which the external file list message belongs. Name + comparison is textual. Each name should be relative to the + directory which contains the HDF5 file. + + <p><dt><code> + H5F_low_t *H5O_efl_open (H5O_efl_t *efl, intn index, uintn mode) + </code> + <dd>Gets a low-level file descriptor for an external file. The + external file list caches file descriptors because we might + have many more external files than there are file descriptors + available to this process. The caller should not close this file. + + <p><dt><code> + herr_t H5O_efl_release (H5O_efl_t *efl) + </code> + <dd>Releases an external file list, closes all files + associated with that list, and if the list has been modified + since the call to <code>H5O_efl_new</code> flushes the message + to disk. + </dl> + + <hr> + <address><a href="mailto:robb@arborea.spizella.com">Robb Matzke</a></address> +<!-- Created: Fri Oct 3 09:52:32 EST 1997 --> +<!-- hhmts start --> +Last modified: Tue Nov 25 12:36:50 EST 1997 +<!-- hhmts end --> + </body> +</html> diff --git a/doc/html/TechNotes/SymbolTables.html b/doc/html/TechNotes/SymbolTables.html new file mode 100644 index 0000000..a05cd5a --- /dev/null +++ b/doc/html/TechNotes/SymbolTables.html @@ -0,0 +1,323 @@ +<html> +<body> + +<h1>Symbol Table Caching Issues</h1> + +<pre> + +A number of issues involving caching of object header messages in +symbol table entries must be resolved. + +What is the motivation for these changes? + + If we make objects completely independent of object name it allows + us to refer to one object by multiple names (a concept called hard + links in Unix file systems), which in turn provides an easy way to + share data between datasets. + + Every object in an HDF5 file has a unique, constant object header + address which serves as a handle (or OID) for the object. The + object header contains messages which describe the object. + + HDF5 allows some of the object header messages to be cached in + symbol table entries so that the object header doesn't have to be + read from disk. For instance, an entry for a directory caches the + directory disk addresses required to access that directory, so the + object header for that directory is seldom read. + + If an object has multiple names (that is, a link count greater than + one), then it has multiple symbol table entries which point to it. + All symbol table entries must agree on header messages. The + current mechanism is to turn off the caching of header messages in + symbol table entries when the header link count is more than one, + and to allow caching once the link count returns to one. + + However, in the current implementation, a package is allowed to + copy a symbol table entry and use it as a private cache for the + object header. This doesn't work for a number of reasons (all but + one require a `delete symbol entry' operation). + + 1. If two packages hold copies of the same symbol table entry, + they don't notify each other of changes to the symbol table + entry. Eventually, one package reads a cached message and + gets the wrong value because the other package changed the + message in the object header. + + 2. If one package holds a copy of the symbol table entry and + some other part of HDF5 removes the object and replaces it + with some other object, then the original package will + continue to access the non-existent object using the new + object header. + + 3. If one package holds a copy of the symbol table entry and + some other part of HDF5 (re)moves the directory which + contains the object, then the package will be unable to + update the symbol table entry with the new cached + data. Packages that refer to the object by the new name will + use old cached data. + + +The basic problem is that there may be multiple copies of the object +symbol table entry floating around in the code when there should +really be at most one per hard link. + + Level 0: A copy may exist on disk as part of a symbol table node, which + is a small 1d array of symbol table entries. + + Level 1: A copy may be cached in memory as part of a symbol table node + in the H5Gnode.c file by the H5AC layer. + + Level 2a: Another package may be holding a copy so it can perform + fast lookup of any header messages that might be cached in + the symbol table entry. It can't point directly to the + cached symbol table node because that node can dissappear + at any time. + + Level 2b: Packages may hold more than one copy of a symbol table + entry. For instance, if H5D_open() is called twice for + the same name, then two copies of the symbol table entry + for the dataset exist in the H5D package. + +How can level 2a and 2b be combined? + + If package data structures contained pointers to symbol table + entries instead of copies of symbol table entries and if H5G + allocated one symbol table entry per hard link, then it's trivial + for Level 2a and 2b to benefit from one another's actions since + they share the same cache. + +How does this work conceptually? + + Level 2a and 2b must notify Level 1 of their intent to use (or stop + using) a symbol table entry to access an object header. The + notification of the intent to access an object header is called + `opening' the object and releasing the access is `closing' the + object. + + Opening an object requires an object name which is used to locate + the symbol table entry to use for caching of object header + messages. The return value is a handle for the object. Figure 1 + shows the state after Dataset1 opens Object with a name that maps + through Entry1. The open request created a copy of Entry1 called + Shadow1 which exists even if SymNode1 is preempted from the H5AC + layer. + + ______ + Object / \ + SymNode1 +--------+ | + +--------+ _____\ | Header | | + | | / / +--------+ | + +--------+ +---------+ \______/ + | Entry1 | | Shadow1 | /____ + +--------+ +---------+ \ \ + : : \ + +--------+ +----------+ + | Dataset1 | + +----------+ + FIGURE 1 + + + + The SymNode1 can appear and disappear from the H5AC layer at any + time without affecting the Object Header data cached in the Shadow. + The rules are: + + * If the SymNode1 is present and is about to disappear and the + Shadow1 dirty bit is set, then Shadow1 is copied over Entry1, the + Entry1 dirty bit is set, and the Shadow1 dirty bit is cleared. + + * If something requests a copy of Entry1 (for a read-only peek + request), and Shadow1 exists, then a copy (not pointer) of Shadow1 + is returned instead. + + * Entry1 cannot be deleted while Shadow1 exists. + + * Entry1 cannot change directly if Shadow1 exists since this means + that some other package has opened the object and may be modifying + it. I haven't decided if it's useful to ever change Entry1 + directly (except of course within the H5G layer itself). + + * Shadow1 is created when Dataset1 `opens' the object through + Entry1. Dataset1 is given a pointer to Shadow1 and Shadow1's + reference count is incremented. + + * When Dataset1 `closes' the Object the Shadow1 reference count is + decremented. When the reference count reaches zero, if the + Shadow1 dirty bit is set, then Shadow1's contents are copied to + Entry1, and the Entry1 dirty bit is set. Shadow1 is then deleted + if its reference count is zero. This may require reading SymNode1 + back into the H5AC layer. + +What happens when another Dataset opens the Object through Entry1? + + If the current state is represented by the top part of Figure 2, + then Dataset2 will be given a pointer to Shadow1 and the Shadow1 + reference count will be incremented to two. The Object header link + count remains at one so Object Header messages continue to be cached + by Shadow1. Dataset1 and Dataset2 benefit from one another + actions. The resulting state is represented by Figure 2. + + _____ + SymNode1 Object / \ + +--------+ _____\ +--------+ | + | | / / | Header | | + +--------+ +---------+ +--------+ | + | Entry1 | | Shadow1 | /____ \_____/ + +--------+ +---------+ \ \ + : : _ \ + +--------+ |\ +----------+ + \ | Dataset1 | + \________ +----------+ + \ \ + +----------+ | + | Dataset2 | |- New Dataset + +----------+ | + / + FIGURE 2 + + +What happens when the link count for Object increases while Dataset +has the Object open? + + SymNode2 + +--------+ + SymNode1 Object | | + +--------+ ____\ +--------+ /______ +--------+ + | | / / | header | \ `| Entry2 | + +--------+ +---------+ +--------+ +--------+ + | Entry1 | | Shadow1 | /____ : : + +--------+ +---------+ \ \ +--------+ + : : \ + +--------+ +----------+ \________________/ + | Dataset1 | | + +----------+ New Link + + FIGURE 3 + + The current state is represented by the left part of Figure 3. To + create a new link the Object Header had to be located by traversing + through Entry1/Shadow1. On the way through, the Entry1/Shadow1 + cache is invalidated and the Object Header link count is + incremented. Entry2 is then added to SymNode2. + + Since the Object Header link count is greater than one, Object + header data will not be cached in Entry1/Shadow1. + + If the initial state had been all of Figure 3 and a third link is + being added and Object is open by Entry1 and Entry2, then creation + of the third link will invalidate the cache in Entry1 or Entry2. It + doesn't matter which since both caches are already invalidated + anyway. + +What happens if another Dataset opens the same object by another name? + + If the current state is represented by Figure 3, then a Shadow2 is + created and associated with Entry2. However, since the Object + Header link count is more than one, nothing gets cached in Shadow2 + (or Shadow1). + +What happens if the link count decreases? + + If the current state is represented by all of Figure 3 then it isn't + possible to delete Entry1 because the object is currently open + through that entry. Therefore, the link count must have + decreased because Entry2 was removed. + + As Dataset1 reads/writes messages in the Object header they will + begin to be cached in Shadow1 again because the Object header link + count is one. + +What happens if the object is removed while it's open? + + That operation is not allowed. + +What happens if the directory containing the object is deleted? + + That operation is not allowed since deleting the directory requires + that the directory be empty. The directory cannot be emptied + because the open object cannot be removed from the directory. + +What happens if the object is moved? + + Moving an object is a process consisting of creating a new + hard-link with the new name and then deleting the old name. + This will fail if the object is open. + +What happens if the directory containing the entry is moved? + + The entry and the shadow still exist and are associated with one + another. + +What if a file is flushed or closed when objects are open? + + Flushing a symbol table with open objects writes correct information + to the file since Shadow is copied to Entry before the table is + flushed. + + Closing a file with open objects will create a valid file but will + return failure. + +How is the Shadow associated with the Entry? + + A symbol table is composed of one or more symbol nodes. A node is a + small 1-d array of symbol table entries. The entries can move + around within a node and from node-to-node as entries are added or + removed from the symbol table and nodes can move around within a + symbol table, being created and destroyed as necessary. + + Since a symbol table has an object header with a unique and constant + file offset, and since H5G contains code to efficiently locate a + symbol table entry given it's name, we use these two values as a key + within a shadow to associate the shadow with the symbol table + entry. + + struct H5G_shadow_t { + haddr_t stab_addr; /*symbol table header address*/ + char *name; /*entry name wrt symbol table*/ + hbool_t dirty; /*out-of-date wrt stab entry?*/ + H5G_entry_t ent; /*my copy of stab entry */ + H5G_entry_t *main; /*the level 1 entry or null */ + H5G_shadow_t *next, *prev; /*other shadows for this stab*/ + }; + + The set of shadows will be organized in a hash table of linked + lists. Each linked list will contain the shadows associated with a + particular symbol table header address and the list will be sorted + lexicographically. + + Also, each Entry will have a pointer to the corresponding Shadow or + null if there is no shadow. + + When a symbol table node is loaded into the main cache, we look up + the linked list of shadows in the shadow hash table based on the + address of the symbol table object header. We then traverse that + list matching shadows with symbol table entries. + + We assume that opening/closing objects will be a relatively + infrequent event compared with loading/flushing symbol table + nodes. Therefore, if we keep the linked list of shadows sorted it + costs O(N) to open and close objects where N is the number of open + objects in that symbol table (instead of O(1)) but it costs only + O(N) to load a symbol table node (instead of O(N^2)). + +What about the root symbol entry? + + Level 1 storage for the root symbol entry is always available since + it's stored in the hdf5_file_t struct instead of a symbol table + node. However, the contents of that entry can move from the file + handle to a symbol table node by H5G_mkroot(). Therefore, if the + root object is opened, we keep a shadow entry for it whose + `stab_addr' field is zero and whose `name' is null. + + For this reason, the root object should always be read through the + H5G interface. + +One more key invariant: The H5O_STAB message in a symbol table header +never changes. This allows symbol table entries to cache the H5O_STAB +message for the symbol table to which it points without worrying about +whether the cache will ever be invalidated. + +</pre> + +</body> +</html> diff --git a/doc/html/TechNotes/Version.html b/doc/html/TechNotes/Version.html new file mode 100644 index 0000000..0e0853b --- /dev/null +++ b/doc/html/TechNotes/Version.html @@ -0,0 +1,137 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Version Numbers</title> + </head> + + <body> + <h1>HDF5 Release Version Numbers</h1> + + <h2>1. Introduction</h2> + + <p>The HDF5 version number is a set of three integer values + written as either <code>hdf5-1.2.3</code> or <code>hdf5 version + 1.2 release 3</code>. + + <p>The <code>5</code> is part of the library name and will only + change if the entire file format and library are redesigned + similar in scope to the changes between HDF4 and HDF5. + + <p>The <code>1</code> is the <em>major version number</em> and + changes when there is an extensive change to the file format or + library API. Such a change will likely require files to be + translated and applications to be modified. This number is not + expected to change frequently. + + <p>The <code>2</code> is the <em>minor version number</em> and is + incremented by each public release that presents new features. + Even numbers are reserved for stable public versions of the + library while odd numbers are reserved for developement + versions. See the diagram below for examples. + + <p>The <code>3</code> is the <em>release number</em>. For public + versions of the library, the release number is incremented each + time a bug is fixed and the fix is made available to the public. + For development versions, the release number is incremented more + often (perhaps almost daily). + + <h2>2. Abbreviated Versions</h2> + + <p>It's often convenient to drop the release number when referring + to a version of the library, like saying version 1.2 of HDF5. + The release number can be any value in this case. + + <h2>3. Special Versions</h2> + + <p>Version 1.0.0 was released for alpha testing the first week of + March, 1998. The developement version number was incremented to + 1.0.1 and remained constant until the the last week of April, + when the release number started to increase and development + versions were made available to people outside the core HDF5 + development team. + + <p>Version 1.0.23 was released mid-July as a second alpha + version. + + <p>Version 1.1.0 will be the first official beta release but the + 1.1 branch will also serve as a development branch since we're + not concerned about providing bug fixes separate from normal + development for the beta version. + + <p>After the beta release we rolled back the version number so the + first release is version 1.0 and development will continue on + version 1.1. We felt that an initial version of 1.0 was more + important than continuing to increment the pre-release version + numbers. + + <h2>4. Public versus Development</h2> + + <p>The motivation for separate public and development versions is + that the public version will receive only bug fixes while the + development version will receive new features. This also allows + us to release bug fixes expediently without waiting for the + development version to reach a stable state. + + <p>Eventually, the development version will near completion and a + new development branch will fork while the original one enters a + feature freeze state. When the original development branch is + ready for release the minor version number will be incremented + to an even value. + + <p> + <center> + <img alt="Version Example" src="version.gif"> + <br><b>Fig 1: Version Example</b> + </center> + + <h2>5. Version Support from the Library</h2> + + <p>The library provides a set of macros and functions to query and + check version numbers. + + <dl> + <dt><code>H5_VERS_MAJOR</code> + <dt><code>H5_VERS_MINOR</code> + <dt><code>H5_VERS_RELEASE</code> + <dd>These preprocessor constants are defined in the public + include file and determine the version of the include files. + + <br><br> + <dt><code>herr_t H5get_libversion (unsigned *<em>majnum</em>, unsigned + *<em>minnum</em>, unsigned *<em>relnum</em>)</code> + <dd>This function returns through its arguments the version + numbers for the library to which the application is linked. + + <br><br> + <dt><code>void H5check(void)</code> + <dd>This is a macro that verifies that the version number of the + HDF5 include file used to compile the application matches the + version number of the library to which the application is + linked. This check occurs automatically when the first HDF5 + file is created or opened and is important because a mismatch + between the include files and the library is likely to result + in corrupted data and/or segmentation faults. If a mismatch + is detected the library issues an error message on the + standard error stream and aborts with a core dump. + + <br><br> + <dt><code>herr_t H5check_version (unsigned <em>majnum</em>, + unsigned <em>minnum</em>, unsigned <em>relnum</em>)</code> + <dd>This function is called by the <code>H5check()</code> macro + with the include file version constants. The function + compares its arguments to the result returned by + <code>H5get_libversion()</code> and if a mismatch is detected prints + an error message on the standard error stream and aborts. + </dl> + +<hr> +<address><a href="mailto:hdfhelp@ncsa.uiuc.edu">HDF Help Desk</a></address> +<br> + +<!-- Created: Wed Apr 22 11:24:40 EDT 1998 --> +<!-- hhmts start --> +Last modified: Fri Oct 30 10:32:50 EST 1998 +<!-- hhmts end --> + + </body> +</html> |