The HDF5 development must proceed in such a manner as to satisfy the following conditions:
There's at least one invarient: new object features introduced in the HDF5 file format (like 2-d arrays of structs) might be impossible to "translate" to a format that an old HDF4 application can understand either because the HDF4 file format or the HDF4 API has no mechanism to describe the object.
What follows is one possible implementation based on how Condition B was solved in the AIO/PDB world. It also attempts to satisfy these goals:
The proposed implementation uses wrappers to handle compatability issues. A Format-X file is wrapped in a Format-Y file by creating a Format-Y skeleton that replicates the Format-X meta data. The Format-Y skeleton points to the raw data stored in Format-X without moving the raw data. The restriction is that raw data storage methods in Format-Y is a superset of raw data storage methods in Format-X (otherwise the raw data must be copied to Format-Y). We're assuming that meta data is small wrt the entire file.
The wrapper can be a separate file that has pointers into the first file or it can be contained within the first file. If contained in a single file, the file can appear as a Format-Y file or simultaneously a Format-Y and Format-X file.
The Format-X meta-data can be thought of as the original wrapper around raw data and Format-Y is a second wrapper around the same data. The wrappers are independend of one another; modifying the meta-data in one wrapper causes the other to become out of date. Modification of raw data doesn't invalidate either view as long as the meta data that describes its storage isn't modifed. For instance, an array element can change values if storage is already allocated for the element, but if storage isn't allocated then the meta data describing the storage must change, invalidating all wrappers but one.
It's perfectly legal to modify the meta data of one wrapper without modifying the meta data in the other wrapper(s). The illegal part is accessing the raw data through a wrapper which is out of date.
If raw data is wrapped by more than one internal wrapper (internal means that the wrapper is in the same file as the raw data) then access to that file must assume that unreferenced parts of that file contain meta data for another wrapper and cannot be reclaimed as free memory.
Since this is a temporary situation which can't be automatically detected by the HDF5 library, we must rely on the application to notify the HDF5 library whether or not it must satisfy Condition B. (Even if we don't rely on the application, at some point someone is going to remove the Condition B constraint from the library.) So the module that handles Condition B is conditionally compiled and then enabled on a per-file basis.
If the application desires to produce an HDF4 file (determined
by arguments to H5Fopen
), and the Condition B
module is compiled into the library, then H5Fclose
calls the module to traverse the HDF5 wrapper and generate an
additional internal or external HDF4 wrapper (wrapper specifics
are described below). If Condition B is implemented as a module
then it can benefit from the metadata already cached by the main
library.
An internal HDF4 wrapper would be used if the HDF5 file is writable and the user doesn't mind that the HDF5 file is modified. An external wrapper would be used if the file isn't writable or if the user wants the data file to be primarily HDF5 but a few applications need an HDF4 view of the data.
Modifying through the HDF5 library an HDF5 file that has
internal HDF4 wrapper should invalidate the HDF4 wrapper (and
optionally regenerate it when H5Fclose
is
called). The HDF5 library must understand how wrappers work, but
not necessarily anything about the HDF4 file format.
Modifying through the HDF5 library an HDF5 file that has an
external HDF4 wrapper will cause the HDF4 wrapper to become out
of date (but possibly regenerated during H5Fclose
).
Note: Perhaps the next release of the HDF4 library should
insure that the HDF4 wrapper file has a more recent modification
time than the raw data file (the HDF5 file) to which it
points(?)
Modifying through the HDF4 library an HDF5 file that has an internal or external HDF4 wrapper will cause the HDF5 wrapper to become out of date. However, there is now way for the old HDF4 library to notify the HDF5 wrapper that it's out of date. Therefore the HDF5 library must be able to detect when the HDF5 wrapper is out of date and be able to fix it. If the HDF4 wrapper is complete then the easy way is to ignore the original HDF5 wrapper and generate a new one from the HDF4 wrapper. The other approach is to compare the HDF4 and HDF5 wrappers and assume that if they differ HDF4 is the right one, if HDF4 omits data then it was because HDF4 is a partial wrapper (rather than assume HDF4 deleted the data), and if HDF4 has new data then copy the new meta data to the HDF5 wrapper. On the other hand, perhaps we don't need to allow these situations (modifying an HDF5 file with the old HDF4 library and then accessing it with the HDF5 library is either disallowed or causes HDF5 objects that can't be described by HDF4 to be lost).
To convert an HDF5 file to an HDF4 file on demand, one simply opens the file with the HDF4 flag and closes it. This is also how AIO implemented backward compatability with PDB in its file format.
This condition must be satisfied for all time because there
will always be archived HDF4 files. If a pure HDF4 file (that
is, one without HDF5 meta data) is opened with an HDF5 library,
the H5Fopen
builds an internal or external HDF5
wrapper and then accesses the raw data through that wrapper. If
the HDF5 library modifies the file then the HDF4 wrapper becomes
out of date. However, since the HDF5 library hasn't been
released, we can at least implement it to disable and/or reclaim
the HDF4 wrapper.
If an external and temporary HDF5 wrapper is desired, the
wrapper is created through the cache like all other HDF5 files.
The data appears on disk only if a particular cached datum is
preempted. Instead of calling H5Fclose
on the HDF5
wrapper file we call H5Fabort
which immediately
releases all file resources without updating the file, and then
we unlink the file from Unix.
External wrappers are quite obvious: they contain only things from the format specs for the wrapper and nothing from the format specs of the format which they wrap.
An internal HDF4 wrapper is added to an HDF5 file in such a way
that the file appears to be both an HDF4 file and an HDF5
file. HDF4 requires an HDF4 file header at file offset zero. If
a user block is present then we just move the user block down a
bit (and truncate it) and insert the minimum HDF4 signature.
The HDF4 dd
list and any other data it needs are
appended to the end of the file and the HDF5 signature uses the
logical file length field to determine the beginning of the
trailing part of the wrapper.
HDF4 minimal file header. Its main job is to point to
the dd list at the end of the file. |
User-defined block which is truncated by the size of the HDF4 file header so that the HDF5 super block file address doesn't change. |
The HDF5 super block and data, unmodified by adding the HDF4 wrapper. |
The main part of the HDF4 wrapper. The dd
list will have entries for all parts of the file so
hdpack(?) doesn't (re)move anything. |
When such a file is opened by the HDF5 library for modification it shifts the user block back down to address zero and fills with zeros, then truncates the file at the end of the HDF5 data or adds the trailing HDF4 wrapper to the free list. This prevents HDF4 applications from reading the file with an out of date wrapper.
If there is no user block then we have a problem. The HDF5 super block must be moved to make room for the HDF4 file header. But moving just the super block causes problems because all file addresses stored in the file are relative to the super block address. The only option is to shift the entire file contents by 512 bytes to open up a user block (too bad we don't have hooks into the Unix i-node stuff so we could shift the entire file contents by the size of a file system page without ever performing I/O on the file :-)
Is it possible to place an HDF5 wrapper in an HDF4 file? I
don't know enough about the HDF4 format, but I would suspect it
might be possible to open a hole at file address 512 (and
possibly before) by moving some things to the end of the file
to make room for the HDF5 signature. The remainder of the HDF5
wrapper goes at the end of the file and entries are added to the
HDF4 dd
list to mark the location(s) of the HDF5
wrapper.
Conversion programs that copy an entire HDF4 file to a separate, self-contained HDF5 file and vice versa might be useful.