<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>Backward/Forward Compatability</title>
  </head>

  <body>
    <h1>Backward/Forward Compatability</h1>

    <p>The HDF5 development must proceed in such a manner as to
      satisfy the following conditions:

    <ol type=A>
      <li>HDF5 applications can produce data that HDF5
	applications can read and write and HDF4 applications can produce
	data that HDF4 applications can read and write. The situation
	that demands this condition is obvious.</li>

      <li>HDF5 applications are able to produce data that HDF4 applications
	can read and HDF4 applications can subsequently modify the
	file subject to certain constraints depending on the
	implementation. This condition is for the temporary
	situation where a consumer has neither been relinked with a new
	HDF4 API built on top of the HDF5 API nor recompiled with the
	HDF5 API.</li>

      <li>HDF5 applications can read existing HDF4 files and subsequently
	modify the file subject to certain constraints depending on
	the implementation. This is condition is for the temporary
	situation in which the producer has neither been relinked with a
	new HDF4 API built on top of the HDF5 API nor recompiled with
	the HDF5 API, or the permanent situation of HDF5 consumers
	reading archived HDF4 files.</li>
    </ul>

    <p>There's at least one invarient: new object features introduced
      in the HDF5 file format (like 2-d arrays of structs) might be
      impossible to "translate" to a format that an old HDF4
      application can understand either because the HDF4 file format
      or the HDF4 API has no mechanism to describe the object.

    <p>What follows is one possible implementation based on how
      Condition B was solved in the AIO/PDB world.  It also attempts
      to satisfy these goals:

    <ol type=1>
      <li>The main HDF5 library contains as little extra baggage as
	possible by either relying on external programs to take care
	of compatability issues or by incorporating the logic of such
	programs as optional modules in the HDF5 library.  Conditions B
	and C are separate programs/modules.</li>

      <li>No extra baggage not only means the library proper is small,
	but also means it can be implemented (rather than migrated
	from HDF4 source) from the ground up with minimal regard for
	HDF4 thus keeping the logic straight forward.</li>

      <li>Compatability issues are handled behind the scenes when
	necessary (and possible) but can be carried out explicitly
	during things like data migration.</li>
    </ol>

    <hr>
    <h2>Wrappers</h2>

    <p>The proposed implementation uses <i>wrappers</i> to handle
      compatability issues.  A Format-X file is <i>wrapped</i> in a
      Format-Y file by creating a Format-Y skeleton that replicates
      the Format-X meta data.  The Format-Y skeleton points to the raw
      data stored in Format-X without moving the raw data.  The
      restriction is that raw data storage methods in Format-Y is a
      superset of raw data storage methods in Format-X (otherwise the
      raw data must be copied to Format-Y).  We're assuming that meta
      data is small wrt the entire file.

    <p>The wrapper can be a separate file that has pointers into the
      first file or it can be contained within the first file.  If
      contained in a single file, the file can appear as a Format-Y
      file or simultaneously a Format-Y and Format-X file.

    <p>The Format-X meta-data can be thought of as the original
      wrapper around raw data and Format-Y is a second wrapper around
      the same data.  The wrappers are independend of one another;
      modifying the meta-data in one wrapper causes the other to
      become out of date.  Modification of raw data doesn't invalidate
      either view as long as the meta data that describes its storage
      isn't modifed. For instance, an array element can change values
      if storage is already allocated for the element, but if storage
      isn't allocated then the meta data describing the storage must
      change, invalidating all wrappers but one.

    <p>It's perfectly legal to modify the meta data of one wrapper
      without modifying the meta data in the other wrapper(s).  The
      illegal part is accessing the raw data through a wrapper which
      is out of date.

    <p>If raw data is wrapped by more than one internal wrapper
      (<i>internal</i> means that the wrapper is in the same file as
      the raw data) then access to that file must assume that
      unreferenced parts of that file contain meta data for another
      wrapper and cannot be reclaimed as free memory.

    <hr>
    <h2>Implementation of Condition B</h2>

    <p>Since this is a temporary situation which can't be
      automatically detected by the HDF5 library, we must rely
      on the application to notify the HDF5 library whether or not it
      must satisfy Condition B. (Even if we don't rely on the
      application, at some point someone is going to remove the
      Condition B constraint from the library.)  So the module that
      handles Condition B is conditionally compiled and then enabled
      on a per-file basis.

    <p>If the application desires to produce an HDF4 file (determined
      by arguments to <code>H5Fopen</code>), and the Condition B
      module is compiled into the library, then <code>H5Fclose</code>
      calls the module to traverse the HDF5 wrapper and generate an
      additional internal or external HDF4 wrapper (wrapper specifics
      are described below).  If Condition B is implemented as a module
      then it can benefit from the metadata already cached by the main
      library.

    <p>An internal HDF4 wrapper would be used if the HDF5 file is
      writable and the user doesn't mind that the HDF5 file is
      modified.  An external wrapper would be used if the file isn't
      writable or if the user wants the data file to be primarily HDF5
      but a few applications need an HDF4 view of the data.

    <p>Modifying through the HDF5 library an HDF5 file that has
      internal HDF4 wrapper should invalidate the HDF4 wrapper (and
      optionally regenerate it when <code>H5Fclose</code> is
      called). The HDF5 library must understand how wrappers work, but
      not necessarily anything about the HDF4 file format.

    <p>Modifying through the HDF5 library an HDF5 file that has an
      external HDF4 wrapper will cause the HDF4 wrapper to become out
      of date (but possibly regenerated during <code>H5Fclose</code>).
      <b>Note:  Perhaps the next release of the HDF4 library should
      insure that the HDF4 wrapper file has a more recent modification
      time than the raw data file (the HDF5 file) to which it
      points(?)</b>

    <p>Modifying through the HDF4 library an HDF5 file that has an
      internal or external HDF4 wrapper will cause the HDF5 wrapper to
      become out of date. However, there is now way for the old HDF4
      library to notify the HDF5 wrapper that it's out of date.
      Therefore the HDF5 library must be able to detect when the HDF5
      wrapper is out of date and be able to fix it. If the HDF4
      wrapper is complete then the easy way is to ignore the original
      HDF5 wrapper and generate a new one from the HDF4 wrapper. The
      other approach is to compare the HDF4 and HDF5 wrappers and
      assume that if they differ HDF4 is the right one, if HDF4 omits
      data then it was because HDF4 is a partial wrapper (rather than
      assume HDF4 deleted the data), and if HDF4 has new data then
      copy the new meta data to the HDF5 wrapper. On the other hand,
      perhaps we don't need to allow these situations (modifying an
      HDF5 file with the old HDF4 library and then accessing it with
      the HDF5 library is either disallowed or causes HDF5 objects
      that can't be described by HDF4 to be lost).

    <p>To convert an HDF5 file to an HDF4 file on demand, one simply
      opens the file with the HDF4 flag and closes it. This is also
      how AIO implemented backward compatability with PDB in its file
      format.

    <hr>
    <h2>Implementation of Condition C</h2>

    <p>This condition must be satisfied for all time because there
      will always be archived HDF4 files. If a pure HDF4 file (that
      is, one without HDF5 meta data) is opened with an HDF5 library,
      the <code>H5Fopen</code> builds an internal or external HDF5
      wrapper and then accesses the raw data through that wrapper. If
      the HDF5 library modifies the file then the HDF4 wrapper becomes
      out of date.  However, since the HDF5 library hasn't been
      released, we can at least implement it to disable and/or reclaim
      the HDF4 wrapper.

    <p>If an external and temporary HDF5 wrapper is desired, the
      wrapper is created through the cache like all other HDF5 files.
      The data appears on disk only if a particular cached datum is
      preempted. Instead of calling <code>H5Fclose</code> on the HDF5
      wrapper file we call <code>H5Fabort</code> which immediately
      releases all file resources without updating the file, and then
      we unlink the file from Unix.

    <hr>
    <h2>What do wrappers look like?</h2>

    <p>External wrappers are quite obvious: they contain only things
      from the format specs for the wrapper and nothing from the
      format specs of the format which they wrap.

    <p>An internal HDF4 wrapper is added to an HDF5 file in such a way
      that the file appears to be both an HDF4 file and an HDF5
      file. HDF4 requires an HDF4 file header at file offset zero. If
      a user block is present then we just move the user block down a
      bit (and truncate it) and insert the minimum HDF4 signature.
      The HDF4 <code>dd</code> list and any other data it needs are
      appended to the end of the file and the HDF5 signature uses the
      logical file length field to determine the beginning of the
      trailing part of the wrapper.

    <p>
      <center>
	<table border width="60%">
	  <tr>
	    <td>HDF4 minimal file header. Its main job is to point to
	      the <code>dd</code> list at the end of the file.</td>
	  </tr>
	  <tr>
	    <td>User-defined block which is truncated by the size of the
	      HDF4 file header so that the HDF5 super block file address
	      doesn't change.</td>
	  </tr>
	  <tr>
	    <td>The HDF5 super block and data, unmodified by adding the
	      HDF4 wrapper.</td>
	  </tr>
	  <tr>
	    <td>The main part of the HDF4 wrapper.  The <code>dd</code>
	      list will have entries for all parts of the file so
	      hdpack(?) doesn't (re)move anything.</td>
	  </tr>
	</table>
      </center>
    
    <p>When such a file is opened by the HDF5 library for
      modification it shifts the user block back down to address zero
      and fills with zeros, then truncates the file at the end of the
      HDF5 data or adds the trailing HDF4 wrapper to the free
      list. This prevents HDF4 applications from reading the file with
      an out of date wrapper.

    <p>If there is no user block then we have a problem.  The HDF5
      super block must be moved to make room for the HDF4 file header.
      But moving just the super block causes problems because all file
      addresses stored in the file are relative to the super block
      address.  The only option is to shift the entire file contents
      by 512 bytes to open up a user block (too bad we don't have
      hooks into the Unix i-node stuff so we could shift the entire
      file contents by the size of a file system page without ever
      performing I/O on the file :-)

    <p>Is it possible to place an HDF5 wrapper in an HDF4 file?  I
      don't know enough about the HDF4 format, but I would suspect it
      might be possible to open a hole at file address 512 (and
      possibly before) by moving some things to the end of the file
      to make room for the HDF5 signature.  The remainder of the HDF5
      wrapper goes at the end of the file and entries are added to the
      HDF4 <code>dd</code> list to mark the location(s) of the HDF5
      wrapper.

    <hr>
    <h2>Other Thoughts</h2>

    <p>Conversion programs that copy an entire HDF4 file to a separate,
      self-contained HDF5 file and vice versa might be useful.




    <hr>
    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Fri Oct  3 11:52:31 EST 1997 -->
<!-- hhmts start -->
Last modified: Wed Oct  8 12:34:42 EST 1997
<!-- hhmts end -->
  </body>
</html>