diff options
Diffstat (limited to 'doc/html/compat.html')
-rw-r--r-- | doc/html/compat.html | 271 |
1 files changed, 271 insertions, 0 deletions
diff --git a/doc/html/compat.html b/doc/html/compat.html new file mode 100644 index 0000000..2992476 --- /dev/null +++ b/doc/html/compat.html @@ -0,0 +1,271 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <title>Backward/Forward Compatability</title> + </head> + + <body> + <h1>Backward/Forward Compatability</h1> + + <p>The HDF5 development must proceed in such a manner as to + satisfy the following conditions: + + <ol type=A> + <li>HDF5 applications can produce data that HDF5 + applications can read and write and HDF4 applications can produce + data that HDF4 applications can read and write. The situation + that demands this condition is obvious.</li> + + <li>HDF5 applications are able to produce data that HDF4 applications + can read and HDF4 applications can subsequently modify the + file subject to certain constraints depending on the + implementation. This condition is for the temporary + situation where a consumer has neither been relinked with a new + HDF4 API built on top of the HDF5 API nor recompiled with the + HDF5 API.</li> + + <li>HDF5 applications can read existing HDF4 files and subsequently + modify the file subject to certain constraints depending on + the implementation. This is condition is for the temporary + situation in which the producer has neither been relinked with a + new HDF4 API built on top of the HDF5 API nor recompiled with + the HDF5 API, or the permanent situation of HDF5 consumers + reading archived HDF4 files.</li> + </ul> + + <p>There's at least one invarient: new object features introduced + in the HDF5 file format (like 2-d arrays of structs) might be + impossible to "translate" to a format that an old HDF4 + application can understand either because the HDF4 file format + or the HDF4 API has no mechanism to describe the object. + + <p>What follows is one possible implementation based on how + Condition B was solved in the AIO/PDB world. It also attempts + to satisfy these goals: + + <ol type=1> + <li>The main HDF5 library contains as little extra baggage as + possible by either relying on external programs to take care + of compatability issues or by incorporating the logic of such + programs as optional modules in the HDF5 library. Conditions B + and C are separate programs/modules.</li> + + <li>No extra baggage not only means the library proper is small, + but also means it can be implemented (rather than migrated + from HDF4 source) from the ground up with minimal regard for + HDF4 thus keeping the logic straight forward.</li> + + <li>Compatability issues are handled behind the scenes when + necessary (and possible) but can be carried out explicitly + during things like data migration.</li> + </ol> + + <hr> + <h2>Wrappers</h2> + + <p>The proposed implementation uses <i>wrappers</i> to handle + compatability issues. A Format-X file is <i>wrapped</i> in a + Format-Y file by creating a Format-Y skeleton that replicates + the Format-X meta data. The Format-Y skeleton points to the raw + data stored in Format-X without moving the raw data. The + restriction is that raw data storage methods in Format-Y is a + superset of raw data storage methods in Format-X (otherwise the + raw data must be copied to Format-Y). We're assuming that meta + data is small wrt the entire file. + + <p>The wrapper can be a separate file that has pointers into the + first file or it can be contained within the first file. If + contained in a single file, the file can appear as a Format-Y + file or simultaneously a Format-Y and Format-X file. + + <p>The Format-X meta-data can be thought of as the original + wrapper around raw data and Format-Y is a second wrapper around + the same data. The wrappers are independend of one another; + modifying the meta-data in one wrapper causes the other to + become out of date. Modification of raw data doesn't invalidate + either view as long as the meta data that describes its storage + isn't modifed. For instance, an array element can change values + if storage is already allocated for the element, but if storage + isn't allocated then the meta data describing the storage must + change, invalidating all wrappers but one. + + <p>It's perfectly legal to modify the meta data of one wrapper + without modifying the meta data in the other wrapper(s). The + illegal part is accessing the raw data through a wrapper which + is out of date. + + <p>If raw data is wrapped by more than one internal wrapper + (<i>internal</i> means that the wrapper is in the same file as + the raw data) then access to that file must assume that + unreferenced parts of that file contain meta data for another + wrapper and cannot be reclaimed as free memory. + + <hr> + <h2>Implementation of Condition B</h2> + + <p>Since this is a temporary situation which can't be + automatically detected by the HDF5 library, we must rely + on the application to notify the HDF5 library whether or not it + must satisfy Condition B. (Even if we don't rely on the + application, at some point someone is going to remove the + Condition B constraint from the library.) So the module that + handles Condition B is conditionally compiled and then enabled + on a per-file basis. + + <p>If the application desires to produce an HDF4 file (determined + by arguments to <code>H5Fopen</code>), and the Condition B + module is compiled into the library, then <code>H5Fclose</code> + calls the module to traverse the HDF5 wrapper and generate an + additional internal or external HDF4 wrapper (wrapper specifics + are described below). If Condition B is implemented as a module + then it can benefit from the metadata already cached by the main + library. + + <p>An internal HDF4 wrapper would be used if the HDF5 file is + writable and the user doesn't mind that the HDF5 file is + modified. An external wrapper would be used if the file isn't + writable or if the user wants the data file to be primarily HDF5 + but a few applications need an HDF4 view of the data. + + <p>Modifying through the HDF5 library an HDF5 file that has + internal HDF4 wrapper should invalidate the HDF4 wrapper (and + optionally regenerate it when <code>H5Fclose</code> is + called). The HDF5 library must understand how wrappers work, but + not necessarily anything about the HDF4 file format. + + <p>Modifying through the HDF5 library an HDF5 file that has an + external HDF4 wrapper will cause the HDF4 wrapper to become out + of date (but possibly regenerated during <code>H5Fclose</code>). + <b>Note: Perhaps the next release of the HDF4 library should + insure that the HDF4 wrapper file has a more recent modification + time than the raw data file (the HDF5 file) to which it + points(?)</b> + + <p>Modifying through the HDF4 library an HDF5 file that has an + internal or external HDF4 wrapper will cause the HDF5 wrapper to + become out of date. However, there is now way for the old HDF4 + library to notify the HDF5 wrapper that it's out of date. + Therefore the HDF5 library must be able to detect when the HDF5 + wrapper is out of date and be able to fix it. If the HDF4 + wrapper is complete then the easy way is to ignore the original + HDF5 wrapper and generate a new one from the HDF4 wrapper. The + other approach is to compare the HDF4 and HDF5 wrappers and + assume that if they differ HDF4 is the right one, if HDF4 omits + data then it was because HDF4 is a partial wrapper (rather than + assume HDF4 deleted the data), and if HDF4 has new data then + copy the new meta data to the HDF5 wrapper. On the other hand, + perhaps we don't need to allow these situations (modifying an + HDF5 file with the old HDF4 library and then accessing it with + the HDF5 library is either disallowed or causes HDF5 objects + that can't be described by HDF4 to be lost). + + <p>To convert an HDF5 file to an HDF4 file on demand, one simply + opens the file with the HDF4 flag and closes it. This is also + how AIO implemented backward compatability with PDB in its file + format. + + <hr> + <h2>Implementation of Condition C</h2> + + <p>This condition must be satisfied for all time because there + will always be archived HDF4 files. If a pure HDF4 file (that + is, one without HDF5 meta data) is opened with an HDF5 library, + the <code>H5Fopen</code> builds an internal or external HDF5 + wrapper and then accesses the raw data through that wrapper. If + the HDF5 library modifies the file then the HDF4 wrapper becomes + out of date. However, since the HDF5 library hasn't been + released, we can at least implement it to disable and/or reclaim + the HDF4 wrapper. + + <p>If an external and temporary HDF5 wrapper is desired, the + wrapper is created through the cache like all other HDF5 files. + The data appears on disk only if a particular cached datum is + preempted. Instead of calling <code>H5Fclose</code> on the HDF5 + wrapper file we call <code>H5Fabort</code> which immediately + releases all file resources without updating the file, and then + we unlink the file from Unix. + + <hr> + <h2>What do wrappers look like?</h2> + + <p>External wrappers are quite obvious: they contain only things + from the format specs for the wrapper and nothing from the + format specs of the format which they wrap. + + <p>An internal HDF4 wrapper is added to an HDF5 file in such a way + that the file appears to be both an HDF4 file and an HDF5 + file. HDF4 requires an HDF4 file header at file offset zero. If + a user block is present then we just move the user block down a + bit (and truncate it) and insert the minimum HDF4 signature. + The HDF4 <code>dd</code> list and any other data it needs are + appended to the end of the file and the HDF5 signature uses the + logical file length field to determine the beginning of the + trailing part of the wrapper. + + <p> + <center> + <table border width="60%"> + <tr> + <td>HDF4 minimal file header. Its main job is to point to + the <code>dd</code> list at the end of the file.</td> + </tr> + <tr> + <td>User-defined block which is truncated by the size of the + HDF4 file header so that the HDF5 boot block file address + doesn't change.</td> + </tr> + <tr> + <td>The HDF5 boot block and data, unmodified by adding the + HDF4 wrapper.</td> + </tr> + <tr> + <td>The main part of the HDF4 wrapper. The <code>dd</code> + list will have entries for all parts of the file so + hdpack(?) doesn't (re)move anything.</td> + </tr> + </table> + </center> + + <p>When such a file is opened by the HDF5 library for + modification it shifts the user block back down to address zero + and fills with zeros, then truncates the file at the end of the + HDF5 data or adds the trailing HDF4 wrapper to the free + list. This prevents HDF4 applications from reading the file with + an out of date wrapper. + + <p>If there is no user block then we have a problem. The HDF5 + boot block must be moved to make room for the HDF4 file header. + But moving just the boot block causes problems because all file + addresses stored in the file are relative to the boot block + address. The only option is to shift the entire file contents + by 512 bytes to open up a user block (too bad we don't have + hooks into the Unix i-node stuff so we could shift the entire + file contents by the size of a file system page without ever + performing I/O on the file :-) + + <p>Is it possible to place an HDF5 wrapper in an HDF4 file? I + don't know enough about the HDF4 format, but I would suspect it + might be possible to open a hole at file address 512 (and + possibly before) by moving some things to the end of the file + to make room for the HDF5 signature. The remainder of the HDF5 + wrapper goes at the end of the file and entries are added to the + HDF4 <code>dd</code> list to mark the location(s) of the HDF5 + wrapper. + + <hr> + <h2>Other Thoughts</h2> + + <p>Conversion programs that copy an entire HDF4 file to a separate, + self-contained HDF5 file and vice versa might be useful. + + + + + <hr> + <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +<!-- Created: Fri Oct 3 11:52:31 EST 1997 --> +<!-- hhmts start --> +Last modified: Wed Oct 8 12:34:42 EST 1997 +<!-- hhmts end --> + </body> +</html> |