diff options
author | Gerd Heber <gheber@hdfgroup.org> | 2021-04-26 19:07:29 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2021-04-26 19:07:29 (GMT) |
commit | 1d680fe04c545d601678d876ad22dea7bfc31edd (patch) | |
tree | f84a97c149df9ccf4409c5bfea04a7316c019e8a /doxygen/examples/H5.format.1.0.html | |
parent | 12082e728de7481144db7ea3cca6f38acf0a7cac (diff) | |
download | hdf5-1d680fe04c545d601678d876ad22dea7bfc31edd.zip hdf5-1d680fe04c545d601678d876ad22dea7bfc31edd.tar.gz hdf5-1d680fe04c545d601678d876ad22dea7bfc31edd.tar.bz2 |
Merge doxygen2 into develop (#553)
* Fixed warnings and started H5Epublic.h.
* Include H5FD* headers to correctly resolve references.
* Doxygen2 (#330)
* H5Eauto_is_v2.
* Added a few more calls.
* Added a few more H5E calls.
* First cut of H5E v2.
* Added the deprecated v1 calls.
* Updated spacing.
* Once more.
* Taking some inspiration from Eigen3.
* Add doxygen for the assigned functions: H5Pregister1,H5Pinsert1,H5Pen… (#352)
* Add doxygen for the assigned functions: H5Pregister1,H5Pinsert1,H5Pencode1, H5Pget_filter_by_id1,H5Pget_version, H5Pset_file_space,H5Pget_file_space. Someone already adds H5Pget_filter1. Also fixs an extra parameter 'close' call back function for HPregister2.
* doxygen work. fixs format by using clang-format.
* doxgen work for H5Pregister1 etc. Addressed Barbara and Gerd's comments.
For Quincey's comments, since we are not supposed to change the source code.
I leave this to future improvements.
* added documentation for H5P APIs (#350)
* add documenation for H5Pget_buffer,H5Pget_data_transform,H5Pget_edc_check,H5Pget_hyper_vector_size,H5Pget_preserve,H5Pget_type_conv_cb,H5Pget_vlen_mem_manager,H5Pset_btree_ratios
* format corrections
* fixed grammer
* fixed herr_t
* Better name.
* A fresh look.
* add doxygen to H5Ppublic.h
* use attention instead of warning
* Add doxygen comments in H5Ppublic.h (#375)
* Add doxygen comments in H5Ppublic.h
* H5Pset_meta_block_size
* H5Pset_metadata_read_attempts
* H5Pset_multi_type
* H5Pset_object_flush_cb
* H5Pset_sieve_buf_size
* H5Pset_small_data_block_size
* H5Pset_all_coll_metadata_ops
* H5Pget_all_coll_metadata_ops
* Add DOXYGEN_EXAMPLES_DIR to src/CMakeLists.txt
* Fix clang-format errors
* Fix filenames in doxygen/examples
* add doxygen to H5Ppublic.h (#378)
* add doxygen to H5Ppublic.h
* use attention instead of warning
Co-authored-by: Kimmy Mu <kmu@hdfgroup.org>
* Revert "add doxygen to H5Ppublic.h (#378)"
This reverts commit 2ee1821b138a5c00b15ea57ce9e950367480f5f2.
* Updated Doxygen variables.
* I forgot to copy two images.
* Enable desktop search by default.
* Add my assigned Doxygen documentation.
* Remove whitespace at EOL. Appease clang-format.
* Addressed Chris' comments.
* Added an alias for asynchronous functions.
* One space is enough for all of us.
* Slightly restructured RM page.
* address some issues
* reformatting
* Style external links.
* reformatting
* reformatting
* Added "Metadata Caching in HDF5" as a technical note example.
* Revise this soon!
* Added specification examples.
* Fixed references.
* Added H5AC cache image stuff and file format study.
* Added older FMT versions. Where did 1.0 go?
* Updated C/C++ note and replaced ambiguous labels.
* Reformat source with clang v10.0.1.
* Added the VFL technical note.
* Added what I believe might be called version 1.0 of the format.
* Added the remaining specs.
* Added H5Z callback documentation and fixed a few mistakes.
* Added dox for deprecated H5G calls and fixed a few snippet blockIDs.
* clang-format happy?
* Ok?
* Bonus track: Deprecated H5D functions.
* Carry over the more detailed group description.
* Added documentation for the missing and deprecated H5R calls.
* Life is easier and less repetitive w/ snippets. Use them!
* Eliminate the snippet block ID artifacts in the HTML rendering.
* Fixed snippet HTML artifacts and added a few missing calls.
* Under 20 H5Ps to go!
* Almost complete!
* "This is a form of pedantry up with which I will not put." (Churchill)
* Let's not waste as much space on bulleted lists!
* First complete (?) draft of the Doxygen-based RM.
* Completeness check and minor fixes along the way.
* Pedantry.
* Adding missing H5FD calls checkpoint.
* Pedantry.
* More pedantry.
* Added H5Pset_fapl_log.
* First draft of H5ES.
* Fixed warnings.
* Prep. for map module.
* First cut of the map module.
* Pedantry.
* Possible H5F introduction.
* Fix the indentation.
* Pedantry.
* Ditto.
* Thanks to the reviewers for their comments.
* Added missing images.
* Line numbers are a distraction here.
* More examples, references, and clean-up. Don't repeat yourself!
* Clang pedantry.
* Ditto.
* More reviewer comments...
* Templatized references and cleaned up \todos.
* Committing clang-format changes
* Fixed MANIFEST.
* Addressed Quincey's comments. (OCPLs)
* Fixed a few more \todo items.
* Fixed more \todo items.
* Added attribute life cycle.
* Forgot the examples file.
* Committing clang-format changes
* Pedantry.
* Live and learn!
* Added a sample H5D life cycle.
* Committing clang-format changes
* Pedantry.
Co-authored-by: kyang2014 <kyang2014@users.noreply.github.com>
Co-authored-by: Scot Breitenfeld <brtnfld@hdfgroup.org>
Co-authored-by: Kimmy Mu <kmu@hdfgroup.org>
Co-authored-by: Christopher Hogan <ChristopherHogan@users.noreply.github.com>
Co-authored-by: jya-kmu <53388330+jya-kmu@users.noreply.github.com>
Co-authored-by: David Young <dyoung@hdfgroup.org>
Co-authored-by: Larry Knox <lrknox@hdfgroup.org>
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Diffstat (limited to 'doxygen/examples/H5.format.1.0.html')
-rw-r--r-- | doxygen/examples/H5.format.1.0.html | 4050 |
1 files changed, 4050 insertions, 0 deletions
diff --git a/doxygen/examples/H5.format.1.0.html b/doxygen/examples/H5.format.1.0.html new file mode 100644 index 0000000..2d3ffbe --- /dev/null +++ b/doxygen/examples/H5.format.1.0.html @@ -0,0 +1,4050 @@ +<html> + <head> + <title> + HDF5 File Format Specification + </title> + </head> + <body bgcolor="#FFFFFF"> + + <center> + <table border=0 width=90%> + <tr> + <td valign=top> + <ol type=I> + <li><a href="#Intro">Introduction</a> + <li><a href="#BootBlock">Disk Format Level 0 - File Signature and Super Block</a> + <li><a href="#Group">Disk Format Level 1 - File Infrastructure</a> + <font size=-2> + <ol type=A> + <li><a href="#Btrees">Disk Format Level 1A - B-link Trees and B-tree Nodes</a> + <li><a href="#SymbolTable">Disk Format Level 1B - Group</a> + <li><a href="#SymbolTableEntry">Disk Format Level 1C - Group Entry</a> + <li><a href="#LocalHeap">Disk Format Level 1D - Local Heaps</a> + <li><a href="#GlobalHeap">Disk Format Level 1E - Global Heap</a> + <li><a href="#FreeSpaceIndex">Disk Format Level 1F - Free-space Index</a> + </ol> + </font> + <li><a href="#DataObject">Disk Format Level 2 - Data Objects</a> + <font size=-2> + <ol type=A> + <li><a href="#ObjectHeader">Disk Format Level 2a - Data Object Headers</a> + <ol type=1> + <li><a href="#NILMessage">Name: NIL</a> <!-- 0x0000 --> + <li><a href="#SimpleDataSpace">Name: Simple Dataspace</a> <!-- 0x0001 --> +<!-- + <li><a href="#DataSpaceMessage">Name: Complex Dataspace</a> --> <!-- 0x0002 --> + <li><a href="#DataTypeMessage">Name: Datatype</a> <!-- 0x0003 --> + <li><a href="#FillValueMessage">Name: Data Storage - Fill Value</a> <!-- 0x0004 --> + <li><a href="#ReservedMessage_0005">Name: Reserved - not assigned yet</a> <!-- 0x0005 --> + </ol> + </ol> + </font> + </ol> + </td><td> </td><td valign=top> + <ol type=I> + + <li><a href="#DataObject">Disk Format Level 2 - Data Objects</a> + <font size=-2><i>(Continued)</i> + <ol type=A> + <li><a href="#ObjectHeader">Disk Format Level 2a - Data Object Headers</a><i>(Continued)</i> + <ol type=1> + <li><a href="#CompactDataStorageMessage">Name: Data Storage - Compact</a> <!-- 0x0006 --> + <li><a href="#ExternalFileListMessage">Name: Data Storage - External Data Files</a> <!-- 0x0007 --> + <li><a href="#LayoutMessage">Name: Data Storage - Layout</a> <!-- 0x0008 --> + <li><a href="#ReservedMessage_0009">Name: Reserved - not assigned yet</a> <!-- 0x0009 --> + <li><a href="#ReservedMessage_000A">Name: Reserved - not assigned yet</a> <!-- 0x000a --> + <li><a href="#FilterMessage">Name: Data Storage - Filter Pipeline</a> <!-- 0x000b --> + <li><a href="#AttributeMessage">Name: Attribute</a> <!-- 0x000c --> + <li><a href="#NameMessage">Name: Object Name</a> <!-- 0x000d --> + <li><a href="#ModifiedMessage">Name: Object Modification Date and Time</a> <!-- 0x000e --> + <li><a href="#SharedMessage">Name: Shared Object Message</a> <!-- 0x000f --> + <li><a href="#ContinuationMessage">Name: Object Header Continuation</a> <!-- 0x0010 --> + <li><a href="#SymbolTableMessage">Name: Group Message</a> <!-- 0x0011 --> + </ol> + <li><a href="#SharedObjectHeader">Disk Format: Level 2b - Shared Data Object Headers</a> + <li><a href="#DataStorage">Disk Format: Level 2c - Data Object Data Storage</a> + </ol> + </font> + </ol> +</td></tr> +</table> +</center> + +<br><br> + + + <h2>Introduction</h2> + + <table align=right width=100> + <tr><td> </td><td align=center> + <hr> + <img src="FF-IH_FileGroup.gif" alt="HDF5 Groups" hspace=15 vspace=15> + </td><td> </td></tr> + <tr><td> </td><td align=center> + <strong>Figure 1:</strong> Relationships among the HDF5 root group, other groups, and objects + <hr> + </td><td> </td></tr> + + <tr><td> </td><td align=center> + <img src="FF-IH_FileObject.gif" alt="HDF5 Objects" hspace=15 vspace=15> + </td><td> </td></tr> + <tr><td> </td><td align=center> + <strong>Figure 2:</strong> HDF5 objects -- datasets, datatypes, or dataspaces + <hr> + </td><td> </td></tr> + </table> + + + <P>The format of an HDF5 file on disk encompasses several + key ideas of the HDF4 and AIO file formats as well as + addressing some shortcomings therein. The new format is + more self-describing than the HDF4 format and is more + uniformly applied to data objects in the file. + + <P>An HDF5 file appears to the user as a directed graph. + The nodes of this graph are the higher-level HDF5 objects + that are exposed by the HDF5 APIs: + + <ul> + <li>Groups + <li>Datasets + <li>Datatypes + <li>Dataspaces + </ul> + + <P>At the lowest level, as information is actually written to the disk, + an HDF5 file is made up of the following objects: + <ul> + <li>A super block + <li>B-tree nodes (containing either symbol nodes or raw data chunks) + <li>Object headers + + <li>Collections + <li>Local heaps + <li>Free space + </ul> + + The HDF5 library uses these lower-level objects to represent the + higher-level objects that are then presented to the user or + to applications through the APIs. + For instance, a group is an object header that contains a message that + points to a local heap and to a B-tree which points to symbol nodes. + A dataset is an object header that contains messages that describe + datatype, space, layout, filters, external files, fill value, etc + with the layout message pointing to either a raw data chunk or to a + B-tree that points to raw data chunks. + + + <h3>This Document</h3> + + <p>This document describes the lower-level data objects; + the higher-level objects and their properties are described + in the <a href="H5.user.html"><cite>HDF5 User's Guide</cite></a>. + + +<!-- +<blockquote> +<pre> + +Elena> NOTE: give reference to the detailed discussion of the B-trees +Elena> when needed. Right now we do not have specification (only general one) +Elena> for the Symbol Table B-trees and B-trees used to manage chunked datasets. +Elena> B-trees +Elena> General Discussion +Elena> Object related discussions +Elena> Symbol Tables +Elena> Global heap +Elena> "Free-space object" + + +</pre> +</blockquote> +--> + + + + <P>Three levels of information comprise the file format. + Level 0 contains basic information for identifying and + defining information about the file. Level 1 information contains + the group information (stored as a B-tree) and is used as the + index for all the objects in the file. Level 2 is the rest + of the file and contains all of the data objects, with each object + partitioned into header information, also known as + <em>meta information</em>, and data. + + <p>The sizes of various fields in the following layout tables are + determined by looking at the number of columns the field spans + in the table. There are three exceptions: (1) The size may be + overridden by specifying a size in parentheses, (2) the size of + addresses is determined by the <em>Size of Offsets</em> field + in the super block, and (3) the size of size fields is determined + by the <em>Size of Lengths</em> field in the super block. + + + +<br><br> +<br><br> + + + <h2><a name="BootBlock"> + Disk Format: Level 0 - File Signature and Super Block</a></h2> + + <P>The super block may begin at certain predefined offsets within + the HDF5 file, allowing a block of unspecified content for + users to place additional information at the beginning (and + end) of the HDF5 file without limiting the HDF5 library's + ability to manage the objects within the file itself. This + feature was designed to accommodate wrapping an HDF5 file in + another file format or adding descriptive information to the + file without requiring the modification of the actual file's + information. The super block is located by searching for the + HDF5 file signature at byte offset 0, byte offset 512 and at + successive locations in the file, each a multiple of two of + the previous location, i.e. 0, 512, 1024, 2048, etc. + + <P>The super block is composed of a file signature, followed by + super block and group version numbers, information + about the sizes of offset and length values used to describe + items within the file, the size of each group page, + and a group entry for the root object in the file. + + <p> + <center> + <table border align=center cellpadding=4 width="80%"> + <caption align=top> + <B>HDF5 Super Block Layout</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4><br>HDF5 File Signature (8 bytes)<br><br></td> + </tr> + + <tr align=center> + <td>Version # of Super Block</td> + <td>Version # of Global Free-space Storage</td> + <td>Version # of Group</td> + <td>Reserved</td> + </tr> + + <tr align=center> + <td>Version # of Shared Header Message Format</td> + <td>Size of Offsets</td> + <td>Size of Lengths</td> + <td>Reserved (zero)</td> + </tr> + + <tr align=center> + <td colspan=2>Group Leaf Node K</td> + <td colspan=2>Group Internal Node K</td> + </tr> + + <tr align=center> + <td colspan=4>File Consistency Flags</td> + </tr> + + <tr align=center> + <td colspan=4>Base Address*</td> + </tr> + + <tr align=center> + <td colspan=4>Address of Global Free-space Heap*</td> + </tr> + + <tr align=center> + <td colspan=4>End of File Address*</td> + </tr> + + <tr align=center> + <td colspan=4>Driver Information Block Address*</td> + </tr> + + <tr align=center> + <td colspan=4>Root Group Address*</td> + </tr> + </table> + + <table width="80%" border=0> + <tr><td> + <div align=right> + (Items marked with an asterisk (*) in the above table + <br> + are of the size specified in "Size of Offsets.") + </div> + </td></tr> + </table> + </center> + + <p> + <center> + <table width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>File Signature</td> + <td>This field contains a constant value and can be used to + quickly identify a file as being an HDF5 file. The + constant value is designed to allow easy identification of + an HDF5 file and to allow certain types of data corruption + to be detected. The file signature of an HDF5 file always + contains the following values: + + <br><br><center> + <table border align=center cellpadding=4 width="100%"> + <tr align=center> + <td>decimal</td> + <td width="8%">137</td> + <td width="8%">72</td> + <td width="8%">68</td> + <td width="8%">70</td> + <td width="8%">13</td> + <td width="8%">10</td> + <td width="8%">26</td> + <td width="8%">10</td> + </tr> + + <tr align=center> + <td>hexadecimal</td> + <td width="8%">89</td> + <td width="8%">48</td> + <td width="8%">44</td> + <td width="8%">46</td> + <td width="8%">0d</td> + <td width="8%">0a</td> + <td width="8%">1a</td> + <td width="8%">0a</td> + </tr> + + <tr align=center> + <td>ASCII C Notation</td> + <td width="8%">\211</td> + <td width="8%">H</td> + <td width="8%">D</td> + <td width="8%">F</td> + <td width="8%">\r</td> + <td width="8%">\n</td> + <td width="8%">\032</td> + <td width="8%">\n</td> + </tr> + </table> + </center> + <br> + + This signature both identifies the file as an HDF5 file + and provides for immediate detection of common + file-transfer problems. The first two bytes distinguish + HDF5 files on systems that expect the first two bytes to + identify the file type uniquely. The first byte is + chosen as a non-ASCII value to reduce the probability + that a text file may be misrecognized as an HDF5 file; + also, it catches bad file transfers that clear bit + 7. Bytes two through four name the format. The CR-LF + sequence catches bad file transfers that alter newline + sequences. The control-Z character stops file display + under MS-DOS. The final line feed checks for the inverse + of the CR-LF translation problem. (This is a direct + descendent of the PNG file signature.)</td> + </tr> + + <tr valign=top> + <td>Version Number of the Super Block</td> + <td>This value is used to determine the format of the + information in the super block. When the format of the + information in the super block is changed, the version number + is incremented to the next integer and can be used to + determine how the information in the super block is + formatted.</td> + </tr> + + <tr valign=top> + <td>Version Number of the Global Free-space Heap</td> + <td>This value is used to determine the format of the + information in the Global Free-space Heap.</td> + </tr> + + <tr valign=top> + <td>Version Number of the Group</td> + <td>This value is used to determine the format of the + information in the Group. When the format of + the information in the Group is changed, the + version number is incremented to the next integer and can be + used to determine how the information in the Group + is formatted.</td> + </tr> + + <tr valign=top> + <td>Version Number of the Shared Header Message Format</td> + <td>This value is used to determine the format of the + information in a shared object header message, which is + stored in the global small-data heap. Since the format + of the shared header messages differs from the private + header messages, a version number is used to identify changes + in the format.</td> + </tr> + + <tr valign=top> + <td>Size of Offsets</td> + <td>This value contains the number of bytes used to store + addresses in the file. The values for the addresses of + objects in the file are offsets relative to a base address, + usually the address of the super block signature. This + allows a wrapper to be added after the file is created + without invalidating the internal offset locations.</td> + </tr> + + <tr valign=top> + <td>Size of Lengths</td> + <td>This value contains the number of bytes used to store + the size of an object.</td> + </tr> + + <tr valign=top> + <td>Group Leaf Node K</td> + <td>Each leaf node of a group B-tree will have at + least this many entries but not more than twice this + many. If a group has a single leaf node then it + may have fewer entries.</td> + </tr> + + <tr valign=top> + <td>Group Internal Node K</td> + <td>Each internal node of a group B-tree will have + at least K pointers to other nodes but not more than 2K + pointers. If the group has only one internal + node then it might have fewer than K pointers.</td> + </tr> + + <tr valign=top> + <td>Bytes per B-tree Page</td> + <td>This value contains the number of bytes used for symbol + pairs per page of the B-trees used in the file. All + B-tree pages will have the same size per page. + <br> + For 32-bit file offsets, 340 objects is the maximum + per 4KB page; for 64-bit file offset, 254 objects will fit + per 4KB page. In general, the equation is: + <br> + <code> <<i>number of objects</i>> = + <br> + FLOOR((<<i>page size</i>> - <<i>offset size</i>>) / + <br> + (<<i>Symbol size</i>> + <<i>offset size</i>>)) + - 1 </code></td> + </tr> + + <tr valign=top> + <td>File Consistency Flags</td> + <td>This value contains flags to indicate information + about the consistency of the information contained + within the file. Currently, the following bit flags are + defined: + <ul> + <li>Bit 0 set indicates that the file is opened for + write-access. + <li>Bit 1 set indicates that the file has + been verified for consistency and is guaranteed to be + consistent with the format defined in this document. + <li>Bits 2-31 are reserved for future use. + </ul> + Bit 0 should be + set as the first action when a file is opened for write + access and should be cleared only as the final action + when closing a file. Bit 1 should be cleared during + normal access to a file and only set after the file's + consistency is guaranteed by the library or a + consistency utility.</td> + </tr> + + <tr valign=top> + <td>Base Address</td> + <td>This is the absolute file address of the first byte of + the HDF5 data within the file. The library currently + constrains this value to be the absolute file address + of the super block itself when creating new files; + future versions of the library may provide greater + flexibility. Unless otherwise noted, + all other file addresses are relative to this base + address.</td> + </tr> + + <tr valign=top> + <td>Address of Global Free-space Heap</td> + <td>Free-space management is not yet defined in the HDF5 + file format and is not handled by the library. + Currently this field always contains the + undefined address <code>0xfff...ff</code>. +<!-- + <td>This value contains the relative address of the B-tree + used to manage the blocks of data which are unused in the + file currently. The free-space heap is used to manage the + blocks of bytes at the file-level which become unused when + objects are moved within the file.</td> +--> + </tr> + + <tr valign=top> + <td>End of File Address</td> + <td>This is the relative file address of the first byte past + the end of all HDF5 data. It is used to determine whether a + file has been accidently truncated and as an address where + file data allocation can occur if the free list is not + used.</td> + </tr> + + <tr valign=top> + <td>Driver Information Block Address</td> + <td>This is the relative file address of the file driver + information block which contains driver-specific + information needed to reopen the file. If there is no + driver information block then this entry should be the + undefined address (all bits set).</td> + </tr> + + <tr valign=top> + <td>Root Group Address</td> + <td>This is the address of the root group (described later + in this document), which serves as the entry point into + the group graph.</td> + </tr> + </table> + </center> + + + <p>The <em>file driver information block</em> is an optional region of the + file which contains information needed by the file driver in + order to reopen a file. The format of the file driver information + block is: + + <p> + <center> + <table border align=center cellpadding=4 width="80%"> + <caption align=top> + <B>Driver Information Block</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td>Version</td> + <td colspan=3>Reserved (zero)</td> + </tr> + + <tr align=center> + <td colspan=4>Driver Information Size (4 bytes)</td> + </tr> + + <tr align=center> + <td colspan=4><br>Driver Identification (8 bytes)<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br><br>Driver Information<br><br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Version</td> + <td>The version number of the driver information block. The + file format documented here is version zero.</td> + </tr> + + <tr valign=top> + <td>Driver Information Size</td> + <td>The size in bytes of the Driver Information part of this + structure.</td> + </tr> + + <tr valign=top> + <td>Driver Identification</td> + <td>This is an eight-byte ASCII string without null + termination which identifies the driver and version number + of the Driver Information block. The predefined drivers + supplied with the HDF5 library are identified by the + letters <code>NCSA</code> followed by the first four characters of + the driver name. If the Driver Information block is not + the original version then the last letter(s) of the + identification will be replaced by a version number in + ASCII. + For example, the various versions of the <em>family driver</em> + will be identified by <code>NCSAfami</code>, <code>NCSAfam0</code>, + <code>NCSAfam1</code>, etc. + (<code>NCSAfami</code> is simply <code>NCSAfamily</code> truncated + to eight characters. Subsequent identifiers will be created by + substituting sequential numerical values for the final character, + starting with zero.) + <p> + Identification for user-defined drivers + is arbitrary but should be unique.</td> + </tr> + + <tr valign=top> + <td>Driver Information</td> + <td>Driver information is stored in a format defined by the + file driver and encoded/decoded by the driver callbacks + invoked from the <code>H5FD_sb_encode</code> and + <code>H5FD_sb_decode</code> functions.</td> + </tr> + </table> + </center> + + + <br><br> + <br><br> + + + <h2><a name="Group"> + Disk Format: Level 1 - File Infrastructure</a></h2> + <h3><a name="Btrees">Disk Format: Level 1A - B-link Trees and B-tree Nodes</a></h3> + + <p>B-link trees allow flexible storage for objects which tend to grow + in ways that cause the object to be stored discontiguously. B-trees + are described in various algorithms books including "Introduction to + Algorithms" by Thomas H. Cormen, Charles E. Leiserson, and Ronald + L. Rivest. The B-link tree, in which the sibling nodes at a + particular level in the tree are stored in a doubly-linked list, + is described in the "Efficient Locking for Concurrent Operations + on B-trees" paper by Phillip Lehman and S. Bing Yao as published + in the <em>ACM Transactions on Database Systems</em>, Vol. 6, + No. 4, December 1981. + + <p>The B-link trees implemented by the file format contain one more + key than the number of children. In other words, each child + pointer out of a B-tree node has a left key and a right key. + The pointers out of internal nodes point to sub-trees while + the pointers out of leaf nodes point to symbol nodes and + raw data chunks. + Aside from that difference, internal nodes and leaf nodes + are identical. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>B-tree Nodes</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4>Node Signature</td> + + <tr align=center> + <td>Node Type</td> + <td>Node Level</td> + <td colspan=2>Entries Used</td> + + <tr align=center> + <td colspan=4>Address of Left Sibling</td> + + <tr align=center> + <td colspan=4>Address of Right Sibling</td> + + <tr align=center> + <td colspan=4>Key 0 (variable size)</td> + + <tr align=center> + <td colspan=4>Address of Child 0</td> + + <tr align=center> + <td colspan=4>Key 1 (variable size)</td> + + <tr align=center> + <td colspan=4>Address of Child 1</td> + + <tr align=center> + <td colspan=4>...</td> + + <tr align=center> + <td colspan=4>Key 2<em>K</em> (variable size)</td> + + <tr align=center> + <td colspan=4>Address of Child 2<em>K</em></td> + + <tr align=center> + <td colspan=4>Key 2<em>K</em>+1 (variable size)</td> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Node Signature</td> + <td>The ASCII character string <code>TREE</code> is + used to indicate the + beginning of a B-link tree node. This gives file + consistency checking utilities a better chance of + reconstructing a damaged file.</td> + </tr> + + <tr valign=top> + <td>Node Type</td> + <td>Each B-link tree points to a particular type of data. + This field indicates the type of data as well as + implying the maximum degree <em>K</em> of the tree and + the size of each Key field. + <br> + <dl compact> + <dt>0 + <dd>This tree points to group nodes. + <dt>1 + <dd>This tree points to a new data chunk. + </dl> + </td> + </tr> + + <tr valign=top> + <td>Node Level</td> + <td>The node level indicates the level at which this node + appears in the tree (leaf nodes are at level zero). Not + only does the level indicate whether child pointers + point to sub-trees or to data, but it can also be used + to help file consistency checking utilities reconstruct + damanged trees.</td> + </tr> + + <tr valign=top> + <td>Entries Used</td> + <td>This determines the number of children to which this + node points. All nodes of a particular type of tree + have the same maximum degree, but most nodes will point + to less than that number of children. The valid child + pointers and keys appear at the beginning of the node + and the unused pointers and keys appear at the end of + the node. The unused pointers and keys have undefined + values.</td> + </tr> + + <tr valign=top> + <td>Address of Left Sibling</td> + <td>This is the file address of the left sibling of the + current node relative to the super block. If the current + node is the left-most node at this level then this field + is the undefined address (all bits set).</td> + </tr> + + <tr valign=top> + <td>Address of Right Sibling</td> + <td>This is the file address of the right sibling of the + current node relative to the super block. If the current + node is the right-most node at this level then this + field is the undefined address (all bits set).</td> + </tr> + + <tr valign=top> + <td>Keys and Child Pointers</td> + <td>Each tree has 2<em>K</em>+1 keys with 2<em>K</em> + child pointers interleaved between the keys. The number + of keys and child pointers actually containing valid + values is determined by the <em>Entries Used</em> field. If + that field is <em>N</em> then the B-link tree contains + <em>N</em> child pointers and <em>N</em>+1 keys.</td> + </tr> + + <tr valign=top> + <td>Key</td> + <td>The format and size of the key values is determined by + the type of data to which this tree points. The keys are + ordered and are boundaries for the contents of the child + pointer; that is, the key values represented by child + <em>N</em> fall between Key <em>N</em> and Key + <em>N</em>+1. Whether the interval is open or closed on + each end is determined by the type of data to which the + tree points. + <p> + The format of the key depends on the node type. + For nodes of node type 1, the key is formatted as follows: + <center> + <table> + <tr valign=top align=left> + <td width=40%>Bytes 1-4</td> + <td>Size of chunk in bytes.</td> + <tr valign=top align=left></tr> + <td>Bytes 4-8</td> + <td>Filter mask, a 32-bit bitfield indicating which + filters have been applied to that chunk.</td> + </tr><tr valign=top align=left> + <td><i>N</i> fields of 8 bytes each</td> + <td>A 64-bit index indicating the offset of the + chunk within the dataset where <i>N</i> is the number + of dimensions of the dataset. For example, if + a chunk in a 3-dimensional dataset begins at the + position <code>[5,5,5]</code>, there will be three + such 8-bit indices, each with the value of + <code>5</code>.</td> + </tr> + </table> + </center> + <p> + For nodes of node type 0, the key is formatted as follows: + <center> + <table> + <tr valign=top align=left> + <td width=40%>A single field of <i>Size of Lengths</i> + bytes</td> + <td>Indicates the byte offset into the local heap + for the first object name in the subtree which + that key describes.</td> + </tr> + </table> + </center> + </td> + </tr> + + <tr valign=top> + <td>Child Pointers</td> + <td>The tree node contains file addresses of subtrees or + data depending on the node level. Nodes at Level 0 point + to data addresses, either data chunk or group nodes. + Nodes at non-zero levels point to other nodes of the + same B-tree.</td> + </tr> + </table> + </center> + +<p> + Each B-tree node looks like this: + + <center> + <table> + <tr valign=top align=center> + <td>key[0]</td><td> </td> + <td>child[0]</td><td> </td> + <td>key[1]</td><td> </td> + <td>child[1]</td><td> </td> + <td>key[2]</td><td> </td> + <td>...</td><td> </td> + <td>...</td><td> </td> + <td>key[<i>N</i>-1]</td><td> </td> + <td>child[<i>N</i>-1]</td><td> </td> + <td>key[<i>N</i>]</td> + </tr> + </table> + </center> + + where child[<i>i</i>] is a pointer to a sub-tree (at a level + above Level 0) or to data (at Level 0). + Each key[<i>i</i>] describes an <i>item</i> stored by the B-tree + (a chunk or an object of a group node). The range of values + represented by child[<i>i</i>] are indicated by key[<i>i</i>] + and key[<i>i</i>+1]. + + + <p>The following question must next be answered: + "Is the value described by key[<i>i</i>] contained in + child[<i>i</i>-1] or in child[<i>i</i>]?" + The answer depends on the type of tree. + In trees for groups (node type 0) the object described by + key[<i>i</i>] is the greatest object contained in + child[<i>i</i>-1] while in chunk trees (node type 1) the + chunk described by key[<i>i</i>] is the least chunk in + child[<i>i</i>]. + + <p>That means that key[0] for group trees is sometimes unused; + it points to offset zero in the heap, which is always the + empty string and compares as "less-than" any valid object name. + + <p>And key[<i>N</i>] for chunk trees is sometimes unused; + it contains a chunk offset which compares as "greater-than" + any other chunk offset and has a chunk byte size of zero + to indicate that it is not actually allocated. + + + <h3><a name="SymbolTable">Disk Format: Level 1B - Group and Symbol Nodes</a></h3> + + <p>A group is an object internal to the file that allows + arbitrary nesting of objects (including other groups). + A group maps a set of names to a set of file + address relative to the base address. Certain meta data + for an object to which the group points can be duplicated + in the group symbol table in addition to the object header. + + <p>An HDF5 object name space can be stored hierarchically by + partitioning the name into components and storing each + component in a group. The group entry for a + non-ultimate component points to the group containing + the next component. The group entry for the last + component points to the object being named. + + <p>A group is a collection of group nodes pointed + to by a B-link tree. Each group node contains entries + for one or more symbols. If an attempt is made to add a + symbol to an already full group node containing + 2<em>K</em> entries, then the node is split and one node + contains <em>K</em> symbols and the other contains + <em>K</em>+1 symbols. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Group Node (A Leaf of a B-tree)</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4>Node Signature</td> + + <tr align=center> + <td>Version Number</td> + <td>Reserved for Future Use</td> + <td colspan=2>Number of Symbols</td> + + <tr align=center> + <td colspan=4><br><br>Group Entries<br><br><br></td> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Node Signature</td> + <td>The ASCII character string <code>SNOD</code> is + used to indicate the + beginning of a group node. This gives file + consistency checking utilities a better chance of + reconstructing a damaged file.</td> + </tr> + + <tr valign=top> + <td>Version Number</td> + <td>The version number for the group node. This + document describes version 1.</td> + </tr> + + <tr valign=top> + <td>Number of Symbols</td> + <td>Although all group nodes have the same length, + most contain fewer than the maximum possible number of + symbol entries. This field indicates how many entries + contain valid data. The valid entries are packed at the + beginning of the group node while the remaining + entries contain undefined values.</td> + </tr> + + <tr valign=top> + <td>Group Entries</td> + <td>Each symbol has an entry in the group node. + The format of the entry is described below.</td> + </tr> + </table> + </center> + + <h3><a name="SymbolTableEntry"> + Disk Format: Level 1C - Group Entry </a></h3> + + <p>Each group entry in a group node is designed + to allow for very fast browsing of stored objects. + Toward that design goal, the group entries + include space for caching certain constant meta data from the + object header. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Group Entry</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4>Name Offset (<size> bytes)</td> + </tr> + + <tr align=center> + <td colspan=4>Object Header Address</td> + </tr> + + <tr align=center> + <td colspan=4>Cache Type</td> + </tr> + + <tr align=center> + <td colspan=4>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4><br><br>Scratch-pad Space (16 bytes)<br><br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Name Offset</td> + <td>This is the byte offset into the group local + heap for the name of the object. The name is null + terminated.</td> + </tr> + + <tr valign=top> + <td>Object Header Address</td> + <td>Every object has an object header which serves as a + permanent location for the object's meta data. In addition + to appearing in the object header, some meta data can be + cached in the scratch-pad space.</td> + </tr> + + <tr valign=top> + <td>Cache Type</td> + <td>The cache type is determined from the object header. + It also determines the format for the scratch-pad space. + <br> + <dl compact> + <dt>0 + <dd>No data is cached by the group entry. This + is guaranteed to be the case when an object header + has a link count greater than one. + + <dt>1 + <dd>Object header meta data is cached in the group + entry. This implies that the group + entry refers to another group. + + <dt>2 + <dd>The entry is a symbolic link. The first four bytes + of the scratch-pad space are the offset into the local + heap for the link value. The object header address + will be undefined. + + <dt><em>N</em> + <dd>Other cache values can be defined later and + libraries that do not understand the new values will + still work properly. + </dl> + </td> + </tr> + + <tr valign=top> + <td>Reserved</td> + <td>These four bytes are present so that the scratch-pad + space is aligned on an eight-byte boundary. They are + always set to zero.</td> + </tr> + + <tr valign=top> + <td>Scratch-pad Space</td> + <td>This space is used for different purposes, depending + on the value of the Cache Type field. Any meta-data + about a dataset object represented in the scratch-pad + space is duplicated in the object header for that + dataset. This meta data can include the datatype + and the size of the dataspace for a dataset whose datatype + is atomic and whose dataspace is fixed and less than + four dimensions. + Furthermore, no data is cached in the group + entry scratch-pad space if the object header for + the group entry has a link count greater than + one.</td> + </tr> + </table> + </center> + + <h4>Format of the Scratch-pad Space</h4> + + <p>The group entry scratch-pad space is formatted + according to the value in the Cache Type field. + + <p>If the Cache Type field contains the value zero + (<code>0</code>) then no information is + stored in the scratch-pad space. + + <p>If the Cache Type field contains the value one + (<code>1</code>), then the scratch-pad space + contains cached meta data for another object header + in the following format: + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Object Header Scratch-pad Format</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4>Address of B-tree</td> + + <tr align=center> + <td colspan=4>Address of Name Heap</td> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Address of B-tree</td> + <td>This is the file address for the root of the + group's B-tree.</td> + </tr> + + <tr valign=top> + <td>Address of Name Heap</td> + <td>This is the file address for the group's local + heap, in which are stored the symbol names.</td> + </tr> + </table> + </center> + + + <p>If the Cache Type field contains the value two + (<code>2</code>), then the scratch-pad space + contains cached meta data for another symbolic link + in the following format: + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Symbolic Link Scratch-pad Format</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4>Offset to Link Value</td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Offset to Link Value</td> + <td>The value of a symbolic link (that is, the name of the + thing to which it points) is stored in the local heap. + This field is the 4-byte offset into the local heap for + the start of the link value, which is null terminated.</td> + </tr> + </table> + </center> + + <h3><a name="LocalHeap">Disk Format: Level 1D - Local Heaps</a></h3> + + <p>A heap is a collection of small heap objects. Objects can be + inserted and removed from the heap at any time. + The address of a heap does not change once the heap is created. + References to objects are stored in the group table; + the names of those objects are stored in the local heap. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Local Heaps</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4>Heap Signature</td> + </tr> + + <tr align=center> + <td colspan=4>Reserved (zero)</td> + </tr> + + <tr align=center> + <td colspan=4>Data Segment Size</td> + </tr> + + <tr align=center> + <td colspan=4>Offset to Head of Free-list (<size> bytes)</td> + </tr> + + <tr align=center> + <td colspan=4>Address of Data Segment</td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Heap Signature</td> + <td>The ASCII character string <code>HEAP</code> + is used to indicate the + beginning of a heap. This gives file consistency + checking utilities a better chance of reconstructing a + damaged file.</td> + </tr> + + <tr valign=top> + <td>Data Segment Size</td> + <td>The total amount of disk memory allocated for the heap + data. This may be larger than the amount of space + required by the object stored in the heap. The extra + unused space holds a linked list of free blocks.</td> + </tr> + + <tr valign=top> + <td>Offset to Head of Free-list</td> + <td>This is the offset within the heap data segment of the + first free block (or all 0xff bytes if there is no free + block). The free block contains <size> bytes that + are the offset of the next free chunk (or all 0xff bytes + if this is the last free chunk) followed by <size> + bytes that store the size of this free chunk.</td> + </tr> + + <tr valign=top> + <td>Address of Data Segment</td> + <td>The data segment originally starts immediately after + the heap header, but if the data segment must grow as a + result of adding more objects, then the data segment may + be relocated, in its entirety, to another part of the + file.</td> + </tr> + </table> + </center> + + <p>Objects within the heap should be aligned on an 8-byte boundary. + + <h3><a name="GlobalHeap">Disk Format: Level 1E - Global Heap</a></h3> + + <p>Each HDF5 file has a global heap which stores various types of + information which is typically shared between datasets. The + global heap was designed to satisfy these goals: + + <ol type="A"> + <li>Repeated access to a heap object must be efficient without + resulting in repeated file I/O requests. Since global heap + objects will typically be shared among several datasets, it is + probable that the object will be accessed repeatedly. + + <br><br> + <li>Collections of related global heap objects should result in + fewer and larger I/O requests. For instance, a dataset of + void pointers will have a global heap object for each + pointer. Reading the entire set of void pointer objects + should result in a few large I/O requests instead of one small + I/O request for each object. + + <br><br> + <li>It should be possible to remove objects from the global heap + and the resulting file hole should be eligible to be reclaimed + for other uses. + <br><br> + </ol> + + <p>The implementation of the heap makes use of the memory + management already available at the file level and combines that + with a new top-level object called a <em>collection</em> to + achieve Goal B. The global heap is the set of all collections. + Each global heap object belongs to exactly one collection and + each collection contains one or more global heap objects. For + the purposes of disk I/O and caching, a collection is treated as + an atomic object. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>A Global Heap Collection</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4>Magic Number</td> + </tr> + + <tr align=center> + <td>Version</td> + <td colspan=3>Reserved</td> + </td> + + <tr align=center> + <td colspan=4>Collection Size</td> + </tr> + + <tr align=center> + <td colspan=4><br>Global Heap Object 1 + <i>(described below)</i><br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Global Heap Object 2<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>...<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Global Heap Object <em>N</em><br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Global Heap Object 0 (free space)<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Magic Number</td> + <td>The magic number for global heap collections are the + four bytes <code>G</code>, <code>C</code>, <code>O</code>, + and <code>L</code>.</td> + </tr> + + <tr valign=top> + <td>Version</td> + <td>Each collection has its own version number so that new + collections can be added to old files. This document + describes version zero of the collections. + </tr> + + <tr valign=top> + <td>Collection Data Size</td> + <td>This is the size in bytes of the entire collection + including this field. The default (and minimum) + collection size is 4096 bytes which is a typical file + system block size and which allows for 170 16-byte heap + objects plus their overhead.</td> + </tr> + + <tr valign=top> + <td>Object 1 through <em>N</em></td> + <td>The objects are stored in any order with no + intervening unused space.</td> + </tr> + + <tr valign=top> + <td>Object 0</td> + <td>Object 0 (zero), when present, represents the free space in + the collection. Free space always appears at the end of + the collection. If the free space is too small to store + the header for Object 0 (described below) then the + header is implied and the collection contains no free space. + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Global Heap Object</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=2>Object ID</td> + <td colspan=2>Reference Count</td> + </tr> + + <tr align=center> + <td colspan=4>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4>Object Data Size</td> + </tr> + + <tr align=center> + <td colspan=4><br>Object Data<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Object ID</td> + <td>Each object has a unique identification number within a + collection. The identification numbers are chosen so that + new objects have the smallest value possible with the + exception that the identifier <code>0</code> always refers to the + object which represents all free space within the + collection.</td> + </tr> + + <tr valign=top> + <td>Reference Count</td> + <td>All heap objects have a reference count field. An + object which is referenced from some other part of the + file will have a positive reference count. The reference + count for Object 0 is always zero.</td> + </tr> + + <tr valign=top> + <td>Reserved</td> + <td>Zero padding to align next field on an 8-byte + boundary.</td> + </tr> + + <tr valign=top> + <td>Object Size</td> <td>This is the size of the the fields + above plus the object data stored for the object. The + actual storage size is rounded up to a multiple of + eight.</td> + </tr> + + <tr valign=top> + <td>Object Data</td> + <td>The object data is treated as a one-dimensional array + of bytes to be interpreted by the caller.</td> + </tr> + </table> + </center> + + <h3><a name="FreeSpaceIndex">Disk Format: Level 1F - Free-space Heap</a></h3> + + <p>The Free-space Index is a collection of blocks of data, + dispersed throughout the file, which are currently not used by + any file objects. + + <p>The super block contains a pointer to root of the free-space description; + that pointer is currently (i.e., in HDF5 Release 1.2) required + to be the undefined address <code>0xfff...ff</code>. + + <p>The free-sapce index is not otherwise publicly defined at this time. + + + <!-- + <p>The Free-space Index is a collection of blocks of data, + dispersed throughout the file, which are currently not used by + any file objects. The blocks of data are indexed by a B-tree of + their length within the file. + + + <p>Each B-tree page is composed of the following entries and + B-tree management information, organized as follows: + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=bottom> + <B>HDF5 Free-space Heap Page</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4>Free-space Heap Signature</td> + <tr align=center> + <td colspan=4>B-tree Left-link Offset</td> + <tr align=center> + <td colspan=4><br>Length of Free-block #1<br> <br></td> + <tr align=center> + <td colspan=4><br>Offset of Free-block #1<br> <br></td> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + <tr align=center> + <td colspan=4><br>Length of Free-block #n<br> <br></td> + <tr align=center> + <td colspan=4><br>Offset of Free-block #n<br> <br></td> + <tr align=center> + <td colspan=4>"High" Offset</td> + <tr align=center> + <td colspan=4>Right-link Offset</td> + </table> + </center> + + <p> + <dl> + <dt> The elements of the free-space heap page are described below: + <dd> + <dl> + <dt>Free-space Heap Signature: (4 bytes) + <dd>The ASCII character string <code>FREE</code> + is used to indicate the + beginning of a free-space heap B-tree page. This gives + file consistency checking utilities a better chance of + reconstructing a damaged file. + + <dt>B-tree Left-link Offset: (<offset> bytes) + <dd>This value is used to indicate the offset of all offsets + in the B-link-tree which are smaller than the value of the + offset in entry #1. This value is also used to indicate a + leaf node in the B-link-tree by being set to all ones. + + <dt>Length of Free-block #n: (<length> bytes) + <dd>This value indicates the length of an unused block in + the file. + + <dt>Offset of Free-block #n: (<offset> bytes) + <dd>This value indicates the offset in the file of an + unused block in the file. + + <dt>"High" Offset: (4-bytes) + <dd>This offset is used as the upper bound on offsets + contained within a page when the page has been split. + + <dt>Right-link Offset: (<offset> bytes) + <dd>This value is used to indicate the offset of the next + child to the right of the parent of this group + page. When there is no node to the right, this value is + all zeros. + </dl> + </dl> + + <p>The algorithms for searching and inserting objects in the + B-tree pages are described fully in the Lehman and Yao paper, + which should be read to provide a full description of the + B-tree's usage. +--> + + +<br><br> +<br><br> + + + <h2><a name="DataObject">Disk Format: Level 2 - Data Objects </a></h2> + + <p>Data objects contain the real information in the file. These + objects compose the scientific data and other information which + are generally thought of as "data" by the end-user. All the + other information in the file is provided as a framework for + these data objects. + + <p>A data object is composed of header information and data + information. The header information contains the information + needed to interpret the data information for the data object as + well as additional "meta-data" or pointers to additional + "meta-data" used to describe or annotate each data object. + + <h3><a name="ObjectHeader"> + Disk Format: Level 2a - Data Object Headers</a></h3> + + <p>The header information of an object is designed to encompass + all the information about an object which would be desired to be + known, except for the data itself. This information includes + the dimensionality, number-type, information about how the data + is stored on disk (in external files, compressed, broken up in + blocks, etc.), as well as other information used by the library + to speed up access to the data objects or maintain a file's + integrity. The header of each object is not necessarily located + immediately prior to the object's data in the file and in fact + may be located in any position in the file. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Object Headers</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=1 width="25%">Version # of Object Header</td> + <td colspan=1 width="25%">Reserved</td> + <td colspan=2 width="50%">Number of Header Messages</td> + </tr> + <tr align=center> + <td colspan=4>Object Reference Count</td> + </tr> + <tr align=center> + <td colspan=4><br>Total Object Header Size<br><br></td> + </tr> + <tr align=center> + <td colspan=2>Header Message Type #1</td> + <td colspan=2>Size of Header Message Data #1</td> + </tr> + <tr align=center> + <td>Flags</td> + <td colspan=3>Reserved</td> + </tr> + <tr align=center> + <td colspan=4><br>Header Message Data #1<br><br></td> + </tr> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + </tr> + <tr align=center> + <td colspan=2>Header Message Type #n</td> + <td colspan=2>Size of Header Message Data #n</td> + </tr> + <tr align=center> + <td>Flags</td> + <td colspan=3>Reserved</td> + </tr> + <tr align=center> + <td colspan=4><br>Header Message Data #n<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Version number of the object header</td> + <td>This value is used to determine the format of the + information in the object header. When the format of the + information in the object header is changed, the version number + is incremented and can be used to determine how the + information in the object header is formatted.</td> + </tr> + + <tr valign=top> + <td>Reserved</td> + <td>Always set to zero.</td> + </tr> + + <tr valign=top> + <td>Number of header messages</td> + <td>This value determines the number of messages listed in + this object header. This provides a fast way for software + to prepare storage for the messages in the header.</td> + </tr> + + <tr valign=top> + <td>Object Reference Count</td> + <td>This value specifies the number of references to this + object within the current file. References to the + data object from external files are not tracked.</td> + </tr> + + <tr valign=top> + <td>Total Object Header Size</td> + <td>This value specifies the total number of bytes of header + message data following this length field for the current + message as well as any continuation data located elsewhere + in the file.</td> + </tr> + + <tr valign=top> + <td>Header Message Type</td> + <td>The header message type specifies the type of + information included in the header message data following + the type along with a small amount of other information. + Bit 15 of the message type is set if the message is + constant (constant messages cannot be changed since they + may be cached in group entries throughout the + file). The header message types for the pre-defined + header messages will be included in further discussion + below.</td> + </tr> + + <tr valign=top> + <td>Size of Header Message Data</td> + <td>This value specifies the number of bytes of header + message data following the header message type and length + information for the current message. The size includes + padding bytes to make the message a multiple of eight + bytes.</td> + </tr> + + <tr valign=top> + <td>Flags</td> + <td>This is a bit field with the following definition: + <dl> + <dt><code>0</code> + <dd>If set, the message data is constant. This is used + for messages like the datatype message of a dataset. + <dt><code>1</code> + <dd>If set, the message is stored in the global heap and + the Header Message Data field contains a Shared Object + message and the Size of Header Message Data field + contains the size of that Shared Object message. + <dt><code>2-7</code> + <dd>Reserved + </dl> + </td> + + <tr valign=top> + <td>Header Message Data</td> + <td>The format and length of this field is determined by the + header message type and size respectively. Some header + message types do not require any data and this information + can be eliminated by setting the length of the message to + zero. The data is padded with enough zeros to make the + size a multiple of eight.</td> + </tr> + </table> + </center> + + <p>The header message types and the message data associated with + them compose the critical "meta-data" about each object. Some + header messages are required for each object while others are + optional. Some optional header messages may also be repeated + several times in the header itself, the requirements and number + of times allowed in the header will be noted in each header + message description below. + + <P>The following is a list of currently defined header messages: + + <hr> + <h4><a name="NILMessage">Name: NIL</a></h4> + <b>Type: </b>0x0000<br> + <b>Length:</b> varies<br> + <b>Status:</b> Optional, may be repeated.<br> + <b>Purpose and Description:</b> The NIL message is used to + indicate a message + which is to be ignored when reading the header messages for a data object. + [Probably one which has been deleted for some reason.]<br> + <b>Format of Data:</b> Unspecified.<br> + +<!-- Delete examples throughout doc + <b>Examples:</b> None. +--> + + + <hr> + <h4><a name="SimpleDataSpace">Name: Simple Dataspace</a></h4> + + <b>Type: </b>0x0001<br> + <b>Length:</b> Varies according to the number of dimensions, + as described in the following table<br> + <b>Status:</b> The <em>Simple Dataspace</em> message is required + and may not be repeated. This message is currently used with + datasets and named dataspaces.<br> + + <p>The <em>Simple Dataspace</em> message describes the number + of dimensions and size of each dimension that the data object + has. This message is only used for datasets which have a + simple, rectilinear grid layout; datasets requiring a more + complex layout (irregularly structured or unstructured grids, etc.) + must use the <em>Complex Dataspace</em> message for expressing + the space the dataset inhabits. + <i>(Note: The <em>Complex Dataspace</em> functionality is + not yet implemented (as of HDF5 Release 1.2). It is not described + in this document.)</i> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Simple Dataspace Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td>Version</td> + <td>Dimensionality</td> + <td>Flags</td> + <td>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4>Dimension Size #1 (<size> bytes)</td> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + <tr align=center> + <td colspan=4>Dimension Size #n (<size> bytes)</td> + <tr align=center> + <td colspan=4>Dimension Maximum #1 (<size> bytes)</td> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + <tr align=center> + <td colspan=4>Dimension Maximum #n (<size> bytes)</td> + <tr align=center> + <td colspan=4>Permutation Index #1</td> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + <tr align=center> + <td colspan=4>Permutation Index #n</td> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Version </td> + <td>This value is used to determine the format of the + Simple Dataspace Message. When the format of the + information in the message is changed, the version number + is incremented and can be used to determine how the + information in the object header is formatted.</td> + </tr> + + <tr valign=top> + <td>Dimensionality</td> + <td>This value is the number of dimensions that the data + object has.</td> + </tr> + + <tr valign=top> + <td>Flags</td> + <td>This field is used to store flags to indicate the + presence of parts of this message. Bit 0 (the least + significant bit) is used to indicate that maximum + dimensions are present. Bit 1 is used to indicate that + permutation indices are present for each dimension.</td> + </tr> + + <tr valign=top> + <td>Dimension Size #n (<size> bytes)</td> + <td>This value is the current size of the dimension of the + data as stored in the file. The first dimension stored in + the list of dimensions is the slowest changing dimension + and the last dimension stored is the fastest changing + dimension.</td> + </tr> + + <tr valign=top> + <td>Dimension Maximum #n (<size> bytes)</td> + <td>This value is the maximum size of the dimension of the + data as stored in the file. This value may be the special + value <UNLIMITED> (all bits set) which indicates + that the data may expand along this dimension + indefinitely. If these values are not stored, the maximum + value of each dimension is assumed to be the same as the + current size value.</td> + </tr> + + <tr valign=top> + <td>Permutation Index #n (4 bytes)</td> + <td>This value is the index permutation used to map + each dimension from the canonical representation to an + alternate axis for each dimension. If these values are + not stored, the first dimension stored in the list of + dimensions is the slowest changing dimension and the last + dimension stored is the fastest changing dimension.</td> + </tr> + </table> + </center> + +<!-- Delete examples throughout doc + <h4>Examples</h4> + <dl> + <dt> Example #1 + <dd>A sample 640 horizontally by 480 vertically raster image + dimension header. The number of dimensions would be set to 2 + and the first dimension's size and maximum would both be set + to 480. The second dimension's size and maximum would both be + set to 640 +. + <dt>Example #2 + <dd>A sample 4 dimensional scientific dataset which is composed + of 30x24x3 slabs of data being written out in an unlimited + series every several minutes as timestep data (currently there + are five slabs). The number of dimensions is 4. The first + dimension size is 5 and its maximum is <UNLIMITED>. The + second through fourth dimension's size and maximum value are + set to 3, 24, and 30 respectively. + + <dt>Example #3 + <dd>A sample unlimited length text string, currently of length + 83. The number of dimensions is 1, the size of the first + dimension is 83 and the maximum of the first dimension is set + to <UNLIMITED>, allowing further text data to be + appended to the string or possibly the string to be replaced + with another string of a different size. (This could also be + stored as a scalar dataset with number-type set to "string") + </dl> +--> + +<!-- DELETE ENTIRE DATASPACE SECTION --> +<!-- + <hr> + <h4><a name="DataSpaceMessage">Name: Complex Dataspace (Fiber Bundle?)</a></h4> + <b>Type: </b>0x0002<br> + <b>Length:</b> varies<br> + + <b>Status:</b> One of the <em>Simple Dataspace</em> or + <em>Complex Dataspace</em> messages is required (but not both) and may + not be repeated.<br> <b>Purpose and Description:</b> The + <em>Dataspace</em> message describes space that the dataset is + mapped onto in a more comprehensive way than the <em>Simple + Dimensionality</em> message is capable of handling. The + dataspace of a dataset encompasses the type of coordinate system + used to locate the dataset's elements as well as the structure and + regularity of the coordinate system. The dataspace also + describes the number of dimensions which the dataset inhabits as + well as a possible higher dimensional space in which the dataset + is located within. + + <br> + <b>Format of Data:</b> + + <center> + <table border cellpadding=4 width="80%"> + <caption align=bottom> + <B>HDF5 Dataspace Message Layout</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4>Mesh Type</td> + <tr align=center> + <td colspan=4>Logical Dimensionality</td> + </table> + </center> + + <p> + <dl> + <dt>The elements of the dimensionality message are described below: + <dd> + <dl> + <dt>Mesh Type: (unsigned 32-bit integer) + <dd>This value indicates whether the grid is + polar/spherical/cartesion, + structured/unstructured and regular/irregular. <br> + The mesh type value is broken up as follows: <br> + + <P> + <center> + <table border cellpadding=4 width="80%"> + <caption align=bottom> + <B>HDF5 Mesh-type Layout</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=1>Mesh Embedding</td> + <td colspan=1>Coordinate System</td> + <td colspan=1>Structure</td> + <td colspan=1>Regularity</td> + </table> + </center> + The following are the definitions of mesh-type bytes: + <dl> + <dt>Mesh Embedding + <dd>This value indicates whether the dataset dataspace + is located within + another dataspace or not: + <dl> <dl> + <dt><STANDALONE> + <dd>The dataset mesh is self-contained and is not + embedded in another mesh. + <dt><EMBEDDED> + <dd>The dataset's dataspace is located within + another dataspace, as + described in information below. + </dl> </dl> + <dt>Coordinate System + <dd>This value defines the type of coordinate system + used for the mesh: + <dl> <dl> + <dt><POLAR> + <dd>The last two dimensions are in polar + coordinates, higher dimensions are + cartesian. + <dt><SPHERICAL> + <dd>The last three dimensions are in spherical + coordinates, higher dimensions + are cartesian. + <dt><CARTESIAN> + <dd>All dimensions are in cartesian coordinates. + </dl> </dl> + <dt>Structure + <dd>This value defines the locations of the grid-points + on the axes: + <dl> <dl> + <dt><STRUCTURED> + <dd>All grid-points are on integral, sequential + locations, starting from 0. + <dt><UNSTRUCTURED> + <dd>Grid-points locations in each dimension are + explicitly defined and + may be of any numeric datatype. + </dl> </dl> + <dt>Regularity + <dd>This value defines the locations of the dataset + points on the grid: + <dl> <dl> + <dt><REGULAR> + <dd>All dataset elements are located at the + grid-points defined. + <dt><IRREGULAR> + <dd>Each dataset element has a particular + grid-location defined. + </dl> </dl> + </dl> + <p>The following grid combinations are currently allowed: + <dl> <dl> + <dt><POLAR-STRUCTURED-REGULAR> + <dt><SPHERICAL-STRUCTURED-REGULAR> + <dt><CARTESIAN-STRUCTURED-REGULAR> + <dt><POLAR-UNSTRUCTURED-REGULAR> + <dt><SPHERICAL-UNSTRUCTURED-REGULAR> + <dt><CARTESIAN-UNSTRUCTURED-REGULAR> + <dt><CARTESIAN-UNSTRUCTURED-IRREGULAR> + </dl> </dl> + All of the above grid types can be embedded within another + dataspace. + <br> <br> + <dt>Logical Dimensionality: (unsigned 32-bit integer) + <dd>This value is the number of dimensions that the dataset occupies. + + <P> + <center> + <table border cellpadding=4 width="80%"> + <caption align=bottom> + <B>HDF5 Dataspace Embedded Dimensionality Information</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4>Embedded Dimensionality</td> + <tr align=center> + <td colspan=4>Embedded Dimension Size #1</td> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + <tr align=center> + <td colspan=4>Embedded Dimension Size #n</td> + <tr align=center> + <td colspan=4>Embedded Origin Location #1</td> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + <tr align=center> + <td colspan=4>Embedded Origin Location #n</td> + </table> + </center> + + <dt>Embedded Dimensionality: (unsigned 32-bit integer) + <dd>This value is the number of dimensions of the space the + dataset is located + within. i.e. a planar dataset located within a 3-D space, + or a 3-D dataset + which is a subset of another 3-D space, etc. + <dt>Embedded Dimension Size: (unsigned 32-bit integer) + <dd>These values are the sizes of the dimensions of the + embedded dataspace + that the dataset is located within. + <dt>Embedded Origin Location: (unsigned 32-bit integer) + <dd>These values comprise the location of the dataset's + origin within the embedded dataspace. + </dl> + </dl> + [Comment: need some way to handle different orientations of the + dataset dataspace + within the embedded dataspace]<br> + + <P> + <center> + <table border cellpadding=4 width="80%"> + <caption align=bottom> + <B>HDF5 Dataspace Structured/Regular Grid Information</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4>Logical Dimension Size #1</td> + <tr align=center> + <td colspan=4>Logical Dimension Maximum #1</td> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + <tr align=center> + <td colspan=4>Logical Dimension Size #n</td> + <tr align=center> + <td colspan=4>Logical Dimension Maximum #n</td> + </table> + </center> + + <p> + <dl> + <dt>The elements of the dimensionality message are described below: + <dd> + <dl> + <dt>Logical Dimension Size #n: (unsigned 32-bit integer) + <dd>This value is the current size of the dimension of the + data as stored in + the file. The first dimension stored in the list of + dimensions is the slowest + changing dimension and the last dimension stored is the + fastest changing + dimension. + <dt>Logical Dimension Maximum #n: (unsigned 32-bit integer) + <dd>This value is the maximum size of the dimension of the + data as stored in + the file. This value may be the special value + <UNLIMITED> which + indicates that the data may expand along this dimension + indefinitely. + </dl> + </dl> + <P> + <center> + <table border cellpadding=4 width="80%"> + <caption align=bottom> + <B>HDF5 Dataspace Structured/Irregular Grid Information</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4># of Grid Points in Dimension #1</td> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + <tr align=center> + <td colspan=4># of Grid Points in Dimension #n</td> + <tr align=center> + <td colspan=4>Datatype of Grid Point Locations</td> + <tr align=center> + <td colspan=4>Location of Grid Points in Dimension #1</td> + <tr align=center> + <td colspan=4>.<br>.<br>.<br></td> + <tr align=center> + <td colspan=4>Location of Grid Points in Dimension #n</td> + </table> + </center> + + <P> + <center> + <table border cellpadding=4 width="80%"> + <caption align=bottom> + <B>HDF5 Dataspace Unstructured Grid Information</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4># of Grid Points</td> + <tr align=center> + <td colspan=4>Datatype of Grid Point Locations</td> + <tr align=center> + <td colspan=4>Grid Point Locations<br>.<br>.<br></td> + </table> + </center> + + <h4><a name="DataSpaceExample">Examples:</a></h4> + Need some good examples, this is complex! +--> + + + <hr> + <h4><a name="DataTypeMessage">Name: Datatype</a></h4> + + <b>Type:</b> 0x0003<br> + <b>Length:</b> variable<br> + <b>Status:</b> One required per dataset or named datatype<br> + + <p>The datatype message defines the datatype for each data point + of a dataset. A datatype can describe an atomic type like a + fixed- or floating-point type or a compound type like a C + struct. A datatype does not, however, describe how data points + are combined to produce a dataset. Datatypes are stored on disk + as a datatype message, which is a list of datatype classes and + their associated properties. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Datatype Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td>Type Class and Version</td> + <td colspan=3>Class Bit Field</td> + </tr> + + <tr align=center> + <td colspan=4>Size in Bytes (4 bytes)</td> + </tr> + + <tr align=center> + <td colspan=4><br><br>Properties<br><br><br></td> + </tr> + </table> + </center> + + <p>The Class Bit Field and Properties fields vary depending + on the Type Class, which is the low-order four bits of the Type + Class and Version field (the high-order four bits are the + version, which should be set to the value one). The type class + is one of 0 (fixed-point number), 1 (floating-point number), + 2 (date and time), 3 (text string), 4 (bit field), 5 (opaque), + 6 (compound), 7 (reference), 8 (enumeration), or 9 (variable-length). + The Class Bit Field is zero and the size of the + Properties field is zero except for the cases noted here. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Bit Field for Fixed-point Numbers (Class 0)</b> + </caption> + + <tr align=center> + <th width="10%">Bits</th> + <th width="90%">Meaning</th> + </tr> + + <tr valign=top> + <td>0</td> + <td><b>Byte Order.</b> If zero, byte order is little-endian; + otherwise, byte order is big endian.</td> + </tr> + + <tr valign=top> + <td>1, 2</td> + <td><b>Padding type.</b> Bit 1 is the lo_pad type and bit 2 + is the hi_pad type. If a datum has unused bits at either + end, then the lo_pad or hi_pad bit is copied to those + locations.</td> + </tr> + + <tr valign=top> + <td>3</td> + <td><b>Signed.</b> If this bit is set then the fixed-point + number is in 2's complement form.</td> + </tr> + + <tr valign=top> + <td>4-23</td> + <td>Reserved (zero).</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Properties for Fixed-point Numbers (Class 0)</b> + </caption> + + <tr align=center> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + </tr> + + <tr align=center> + <td colspan=2>Bit Offset</td> + <td colspan=2>Bit Precision</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Bit Field for Floating-point Numbers (Class 1)</b> + </caption> + + <tr align=center> + <th width="10%">Bits</th> + <th width="90%">Meaning</th> + </tr> + + <tr valign=top> + <td>0</td> + <td><b>Byte Order.</b> If zero, byte order is little-endian; + otherwise, byte order is big endian.</td> + </tr> + + <tr valign=top> + <td>1, 2, 3</td> + <td><b>Padding type.</b> Bit 1 is the low bits pad type, bit 2 + is the high bits pad type, and bit 3 is the internal bits + pad type. If a datum has unused bits at either or between + the sign bit, exponent, or mantissa, then the value of bit + 1, 2, or 3 is copied to those locations.</td> + </tr> + + <tr valign=top> + <td>4-5</td> + <td><b>Normalization.</b> The value can be 0 if there is no + normalization, 1 if the most significant bit of the + mantissa is always set (except for 0.0), and 2 if the most + signficant bit of the mantissa is not stored but is + implied to be set. The value 3 is reserved and will not + appear in this field.</td> + </tr> + + <tr valign=top> + <td>6-7</td> + <td>Reserved (zero).</td> + </tr> + + <tr valign=top> + <td>8-15</td> + <td><b>Sign.</b> This is the bit position of the sign + bit.</td> + </tr> + + <tr valign=top> + <td>16-23</td> + <td>Reserved (zero).</td> + </tr> + + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Properties for Floating-point Numbers (Class 1)</b> + </caption> + + <tr align=center> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + </tr> + + <tr align=center> + <td colspan=2>Bit Offset</td> + <td colspan=2>Bit Precision</td> + </tr> + + <tr align=center> + <td>Exponent Location</td> + <td>Exponent Size in Bits</td> + <td>Mantissa Location</td> + <td>Mantissa Size in Bits</td> + </tr> + + <tr align=center> + <td colspan=4>Exponent Bias</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Bit Field for Strings (Class 3)</b> + </caption> + + <tr align=center> + <th width="10%">Bits</th> + <th width="90%">Meaning</th> + </tr> + + <tr valign=top> + <td>0-3</td> + <td><b>Padding type.</b> This four-bit value determines the + type of padding to use for the string. The values are: + + <dl> + <dt><code>0</code> Null terminate. + <dd>A zero byte marks the end of the string and is + guaranteed to be present after converting a long + string to a short string. When converting a short + string to a long string the value is padded with + additional null characters as necessary. + + <br><br> + <dt><code>1</code> Null pad. + <dd>Null characters are added to the end of the value + during conversions from short values to long values + but conversion in the opposite direction simply + truncates the value. + + <br><br> + <dt><code>2</code> Space pad. + <dd>Space characters are added to the end of the value + during conversions from short values to long values + but conversion in the opposite direction simply + truncates the value. This is the Fortran + representation of the string. + + <br><br> + <dt><code>3-15</code> Reserved. + <dd>These values are reserved for future use. + </dl> + </tr> + + <tr valign=top> + <td>4-7</td> + <td><b>Character Set.</b> The character set to use for + encoding the string. The only character set supported is + the 8-bit ASCII (zero) so no translations have been defined + yet.</td> + </tr> + + <tr valign=top> + <td>8-23</td> + <td>Reserved (zero).</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Bit Field for Bitfield Types (Class 4)</b> + </caption> + + <tr align=center> + <th width="10%">Bits</th> + <th width="90%">Meaning</th> + </tr> + + <tr valign=top> + <td>0</td> + <td><b>Byte Order.</b> If zero, byte order is little-endian; + otherwise, byte order is big endian.</td> + </tr> + + <tr valign=top> + <td>1, 2</td> + <td><b>Padding type.</b> Bit 1 is the lo_pad type and bit 2 + is the hi_pad type. If a datum has unused bits at either + end, then the lo_pad or hi_pad bit is copied to those + locations.</td> + </tr> + + <tr valign=top> + <td>3-23</td> + <td>Reserved (zero).</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Properties for Bitfield Types (Class 4)</b> + </caption> + + <tr align=center> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + </tr> + + <tr align=center> + <td colspan=2>Bit Offset</td> + <td colspan=2>Bit Precision</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Bit Field for Opaque Types (Class 5)</b> + </caption> + + <tr align=center> + <th width="10%">Bits</th> + <th width="90%">Meaning</th> + </tr> + + <tr valign=top> + <td>0-23</td> + <td>Reserved (zero).</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Properties for Opaque Types (Class 5)</b> + </caption> + + <tr align=center> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + </tr> + + <tr align=center> + <td colspan=4><br>Null-terminated ASCII Tag<br> + (multiple of 8 bytes)<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Bit Field for Compound Types (Class 6)</b> + </caption> + + <tr align=center> + <th width="10%">Bits</th> + <th width="90%">Meaning</th> + </tr> + + <tr valign=top> + <td>0-15</td> + <td><b>Number of Members.</b> This field contains the number + of members defined for the compound datatype. The member + definitions are listed in the Properties field of the data + type message. + </tr> + + <tr valign=top> + <td>15-23</td> + <td>Reserved (zero).</td> + </tr> + </table> + </center> + + <p>The Properties field of a compound datatype is a list of the + member definitions of the compound datatype. The member + definitions appear one after another with no intervening bytes. + The member types are described with a recursive datatype + message. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Properties for Compound Types (Class 6)</b> + </caption> + + <tr align=center> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + </tr> + + <tr align=center> + <td colspan=4><br><br>Name (null terminated, multiple of + eight bytes)<br><br><br></td> + </tr> + + <tr align=center> + <td colspan=4>Byte Offset of Member in Compound Instance</td> + </tr> + + <tr align=center> + <td>Dimensionality</td> + <td colspan=3>reserved</td> + </tr> + + <tr align=center> + <td colspan=4>Dimension Permutation</td> + </tr> + + <tr align=center> + <td colspan=4>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4>Size of Dimension 0 (required)</td> + </tr> + + <tr align=center> + <td colspan=4>Size of Dimension 1 (required)</td> + </tr> + + <tr align=center> + <td colspan=4>Size of Dimension 2 (required)</td> + </tr> + + <tr align=center> + <td colspan=4>Size of Dimension 3 (required)</td> + </tr> + + <tr align=center> + <td colspan=4><br><br>Member Type Message<br><br><br></td> + </tr> + + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Bit Field for Enumeration Types (Class 8)</b> + </caption> + + <tr align=center> + <th width="10%">Bits</th> + <th width="90%">Meaning</th> + </tr> + + <tr valign=top> + <td>0-15</td> + <td><b>Number of Members.</b> The number of name/value + pairs defined for the enumeration type.</td> + </tr> + + <tr valign=top> + <td>16-23</td> + <td>Reserved (zero).</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Properties for Enumeration Types (Class 8)</b> + </caption> + + <tr align=center> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + </tr> + + <tr align=center> + <td colspan=4><br>Parent Type<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Names<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Values<br><br></td> + </tr> + + </table> + </center> + + <center> + <table border=0 cellpadding=4 width="80%"> + <tr align=left valign=top> + <td valign=top width=20%>Parent Type:</td> + <td valign=top>Each enumeration type is based on some parent type, + usually an integer. The information for that parent type is + described recursively by this field.</td> + </tr><tr align=left valign=top> + <td valign=top>Names:</td> + <td valign=top>The name for each name/value pair. Each name is + stored as a null terminated ASCII string in a multiple of + eight bytes. The names are in no particular order.</td> + </tr><tr align=left valign=top> + <td valign=top>Values:</td> + <td valign=top>The list of values in the same order as the names. + The values are packed (no inter-value padding) and the + size of each value is determined by the parent type.</td> + </tr> + </table> + </center> + + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Bit Field for Variable-length Types (Class 9)</b> + </caption> + + <tr align=center> + <th width="10%">Bits</th> + <th width="90%">Meaning</th> + </tr> + + <tr valign=top> + <td>0-3</td> + <td><dl><dt><b>Type</b></dt> + <dt>0 Variable-length sequence</dt> + <dd>This variable-length datatype can be of any sequence + of data. Variable-length sequences do not have padding + or character set information.</dd> + <dt>1 Variable-length string</dt> + <dd>This variable-length datatype is composed of a series of + characters. Variable-length strings have padding and + character set information.</dd></dl> + </td> + </tr> + + <tr valign=top> + <td>4-7</td> + <td><dl><dt><b>Padding type</b> (variable-length string only)</dt> + <dd>This four-bit value determines the type of padding + used for variable-length strings. The values are the same + as for the string padding type, as follows:</dd> + <dt>0 Null terminate</dt> + <dd>A zero byte marks the end of a string and is guaranteed + to be present after converting a long string to a short + string. When converting a short string to a long string, + the value is padded with additional null characters + as necessary. + <dt>1 Null pad</dt> + <dd>Null characters are added to the end of the value + during conversion from a short string to a longer string. + Conversion from a long string to a shorter string + simply truncates the value.</dd> + <dt>2 Space pad</dt> + <dd>Space characters are added to the end of the value + during conversion from a short string to a longer string. + Conversion from a long string to a shorter string simply + truncates the value. + This is the Fortran representation of the string. + </dd> + <dt>3-15 Reserved</dt> + <dd>These values are reserved for future use.</dd></dl> + </td> + </tr> + + <tr valign=top> + <td>8-11</td> + <td><dl><dt><b>Character set</b> (variable-length string only)</dt> + <dd>This four-bit value specifies the character set + to be used for encoding the string.</dd> + <dt>0 8-bit ASCII</dt> + <dd>As of this writing (July 2002, Release 1.4.4), + 8-bit ASCII is the only character set supported. + Therefore, no translations have been defined.</dd></dl> + </td> + </tr> + + <tr valign=top> + <td>12-23</td> + <td>Reserved (zero).</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Properties for Variable-length Types (Class 9)</b> + </caption> + + <tr align=center> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + </tr> + + <tr align=center> + <td colspan=4><br>Parent Type<br><br></td> + </tr> + + </table> + </center> + + <center> + <table border=0 cellpadding=4 width="80%"> + <tr align=left valign=top> + <td valign=top width=20%>Parent Type:</td> + <td valign=top>Each variable-length type is based on + some parent type. The information for that parent type is + described recursively by this field.</td> + </tr> + </table> + </center> + + + + <p> + +<!-- + <p>Datatype examples are <a href="Datatypes.html">here</a>. +--> + + + <hr> + <h4><a name="FillValueMessage">Name: Data Storage - Fill Value</a></h4> + <b>Type:</b> 0x0004<br> + <b>Length:</b> varies<br> + <b>Status:</b> Optional, may not be repeated.<br> + + <p>The fill value message stores a single data point value which + is returned to the application when an uninitialized data point + is read from the dataset. The fill value is interpretted with + the same datatype as the dataset. If no fill value message is + present then a fill value of all zero is assumed. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Fill Value Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4>Size (4 bytes)</td> + </tr> + + <tr align=center> + <td colspan=4><br>Fill Value<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Size (4 bytes)</td> + <td>This is the size of the Fill Value field in bytes.</td> + </tr> + + <tr valign=top> + <td>Fill Value</td> + <td>The fill value. The bytes of the fill value are + interpreted using the same datatype as for the dataset.</td> + </tr> + </table> + </center> + + <hr> + <h4><a name="ReservedMessage_0005">Name: Reserved - Not Assigned Yet</a></h4> + <b>Type:</b> 0x0005<br> + <b>Length:</b> N/A<br> + <b>Status:</b> N/A<br> + + + + <hr> + <h4><a name="CompactDataStorageMessage">Name: Data Storage - Compact</a></h4> + + <b>Type:</b> 0x0006<br> + <b>Length:</b> varies<br> + <b>Status:</b> Optional, may not be repeated.<br> + + <p>This message indicates that the data for the data object is + stored within the current HDF file by including the actual + data as the header data for this message. The data is + stored internally in + the <em>normal format</em>, i.e. in one chunk, uncompressed, etc. + + <P>Note that one and only one of the <em>Data Storage</em> headers can be + stored for each data object. + + <P><b>Format of Data:</b> The message data is actually composed + of dataset data, so the format will be determined by the dataset + format. + +<!-- Delete examples throughout doc + <h4><a name="CompactDataStorageExample">Examples:</a></h4> + [very straightforward] +--> + + <hr> + <h4><a name="ExternalFileListMessage">Name: Data Storage - + External Data Files</a></h4> + <b>Type:</b> 0x0007<BR> + <b>Length:</b> varies<BR> + <b>Status:</b> Optional, may not be repeated.<BR> + + <p><b>Purpose and Description:</b> The external object message + indicates that the data for an object is stored outside the HDF5 + file. The filename of the object is stored as a Universal + Resource Location (URL) of the actual filename containing the + data. An external file list record also contains the byte offset + of the start of the data within the file and the amount of space + reserved in the file for that data. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>External File List Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td>Version</td> + <td colspan=3>Reserved</td> + </tr> + + <tr align=center> + <td colspan=2>Allocated Slots</td> + <td colspan=2>Used Slots</td> + </tr> + + <tr align=center> + <td colspan=4><br>Heap Address<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Slot Definitions...<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Version </td> + <td>This value is used to determine the format of the + External File List Message. When the format of the + information in the message is changed, the version number + is incremented and can be used to determine how the + information in the object header is formatted.</td> + </tr> + + <tr valign=top> + <td>Reserved</td> + <td>This field is reserved for future use.</td> + </tr> + + <tr valign=top> + <td>Allocated Slots</td> + <td>The total number of slots allocated in the message. Its + value must be at least as large as the value contained in + the Used Slots field.</td> + </tr> + + <tr valign=top> + <td>Used Slots</td> + <td>The number of initial slots which contain valid + information. The remaining slots are zero filled.</td> + </tr> + + <tr valign=top> + <td>Heap Address</td> + <td>This is the address of a local name heap which contains + the names for the external files. The name at offset zero + in the heap is always the empty string.</td> + </tr> + + <tr valign=top> + <td>Slot Definitions</td> + <td>The slot definitions are stored in order according to + the array addresses they represent. If more slots have + been allocated than what has been used then the defined + slots are all at the beginning of the list.</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>External File List Slot</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4><br>Name Offset (<size> bytes)<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>File Offset (<size> bytes)<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Size<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Name Offset (<size> bytes)</td> + <td>The byte offset within the local name heap for the name + of the file. File names are stored as a URL which has a + protocol name, a host name, a port number, and a file + name: + <code><em>protocol</em>:<em>port</em>//<em>host</em>/<em>file</em></code>. + If the protocol is omitted then "file:" is assumed. If + the port number is omitted then a default port for that + protocol is used. If both the protocol and the port + number are omitted then the colon can also be omitted. If + the double slash and host name are omitted then + "localhost" is assumed. The file name is the only + mandatory part, and if the leading slash is missing then + it is relative to the application's current working + directory (the use of relative names is not + recommended).</td> + </tr> + + <tr valign=top> + <td>File Offset (<size> bytes)</td> + <td>This is the byte offset to the start of the data in the + specified file. For files that contain data for a single + dataset this will usually be zero.</td> + </tr> + + <tr valign=top> + <td>Size</td> + <td>This is the total number of bytes reserved in the + specified file for raw data storage. For a file that + contains exactly one complete dataset which is not + extendable, the size will usually be the exact size of the + dataset. However, by making the size larger one allows + HDF5 to extend the dataset. The size can be set to a value + larger than the entire file since HDF5 will read zeros + past the end of the file without failing.</td> + </tr> + </table> + </center> + + + <hr> + <h4><a name="LayoutMessage">Name: Data Storage - Layout</a></h4> + + <b>Type:</b> 0x0008<BR> + <b>Length:</b> varies<BR> + <b>Status:</b> Required for datasets, may not be repeated. + + <p><b>Purpose and Description:</b> Data layout describes how the + elements of a multi-dimensional array are arranged in the linear + address space of the file. Two types of data layout are + supported: + + <ol> + <li>The array can be stored in one contiguous area of the file. + The layout requires that the size of the array be constant and + does not permit chunking, compression, checksums, encryption, + etc. The message stores the total size of the array and the + offset of an element from the beginning of the storage area is + computed as in C. + + <li>The array domain can be regularly decomposed into chunks and + each chunk is allocated separately. This layout supports + arbitrary element traversals, compression, encryption, and + checksums, and the chunks can be distributed across external + raw data files (these features are described in other + messages). The message stores the size of a chunk instead of + the size of the entire array; the size of the entire array can + be calculated by traversing the B-tree that stores the chunk + addresses. + </ol> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Data Layout Message</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td>Version</td> + <td>Dimensionality</td> + <td>Layout Class</td> + <td>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4><br>Address<br><br></td> + </tr> + + <tr align=center> + <td colspan=4>Dimension 0 (4-bytes)</td> + </tr> + + <tr align=center> + <td colspan=4>Dimension 1 (4-bytes)</td> + </tr> + + <tr align=center> + <td colspan=4>...</td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Version</td> + <td>A version number for the layout message. This + documentation describes version one.</td> + </tr> + + <tr valign=top> + <td>Dimensionality</td> + <td>An array has a fixed dimensionality. This field + specifies the number of dimension size fields later in the + message.</td> + </tr> + + <tr valign=top> + <td>Layout Class</td> + <td>The layout class specifies how the other fields of the + layout message are to be interpreted. A value of one + indicates contiguous storage while a value of two + indicates chunked storage. Other values will be defined + in the future.</td> + </tr> + + <tr valign=top> + <td>Address</td> + <td>For contiguous storage, this is the address of the first + byte of storage. For chunked storage this is the address + of the B-tree that is used to look up the addresses of the + chunks.</td> + </tr> + + <tr valign=top> + <td>Dimensions</td> + <td>For contiguous storage the dimensions define the entire + size of the array while for chunked storage they define + the size of a single chunk.</td> + </tr> + </table> + </center> + + + <hr> + <h4><a name="ReservedMessage_0009">Name: Reserved - Not Assigned Yet</a></h4> + <b>Type:</b> 0x0009<BR> + <b>Length:</b> N/A<BR> + <b>Status:</b> N/A<BR> + <b>Purpose and Description:</b> N/A<BR> + <b>Format of Data:</b> N/A + + <hr> + <h4><a name="ReservedMessage_000A">Name: Reserved - Not Assigned Yet</a></h4> + <b>Type:</b> 0x000A<BR> + <b>Length:</b> N/A<BR> + <b>Status:</b> N/A<BR> + <b>Purpose and Description:</b> N/A<BR> + <b>Format of Data:</b> N/A + + <hr> + <h4><a name="FilterMessage">Name: Data Storage - Filter Pipeline</a></h4> + <b>Type:</b> 0x000B<BR> + <b>Length:</b> varies<BR> + <b>Status:</b> Optional, may not be repeated. + + <p><b>Purpose and Description:</b> This message describes the + filter pipeline which should be applied to the data stream by + providing filter identification numbers, flags, a name, an + client data. + + <p> + <center> + <table border align=center cellpadding=4 witdh="80%"> + <caption align=top> + <b>Filter Pipeline Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td>Version</td> + <td>Number of Filters</td> + <td colspan=2>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4><br>Filter List<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Version</td> + <td>The version number for this message. This document + describes version one.</td> + </tr> + + <tr valign=top> + <td>Number of Filters</td> + <td>The total number of filters described by this + message. The maximum possible number of filters in a + message is 32.</td> + </tr> + + <tr valign=top> + <td>Filter List</td> + <td>A description of each filter. A filter description + appears in the next table.</td> + </tr> + </table> + </center> + + <p> + <center> + <table border align=center cellpadding=4 witdh="80%"> + <caption align=top> + <b>Filter Pipeline Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=2>Filter Identification</td> + <td colspan=2>Name Length</td> + </tr> + + <tr align=center> + <td colspan=2>Flags</td> + <td colspan=2>Client Data Number of Values</td> + </tr> + + <tr align=center> + <td colspan=4><br>Name<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Client Data<br><br></td> + </tr> + + <tr align=center> + <td colspan=4>Padding</td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Filter Identification</td> + <td>This is a unique (except in the case of testing) + identifier for the filter. Values from zero through 255 + are reserved for filters defined by the NCSA HDF5 + library. Values 256 through 511 have been set aside for + use when developing/testing new filters. The remaining + values are allocated to specific filters by contacting the + <a href="mailto:hdf5dev@ncsa.uiuc.edu">HDF5 Development + Team</a>.</td> + </tr> + + <tr valign=top> + <td>Name Length</td> + <td>Each filter has an optional null-terminated ASCII name + and this field holds the length of the name including the + null termination padded with nulls to be a multiple of + eight. If the filter has no name then a value of zero is + stored in this field.</td> + </tr> + + <tr valign=top> + <td>Flags</td> + <td>The flags indicate certain properties for a filter. The + bit values defined so far are: + + <dl> + <dt><code>bit 1</code> + <dd>If set then the filter is an optional filter. + During output, if an optional filter fails it will be + silently removed from the pipeline. + </dl> + </tr> + + <tr valign=top> + <td>Client Data Number of Values</td> + <td>Each filter can store a few integer values to control + how the filter operates. The number of entries in the + Client Data array is stored in this field.</td> + </tr> + + <tr valign=top> + <td>Name</td> + <td>If the Name Length field is non-zero then it will + contain the size of this field, a multiple of eight. This + field contains a null-terminated, ASCII character + string to serve as a comment/name for the filter.</td> + </tr> + + <tr valign=top> + <td>Client Data</td> + <td>This is an array of four-byte integers which will be + passed to the filter function. The Client Data Number of + Values determines the number of elements in the + array.</td> + </tr> + + <tr valign=top> + <td>Padding</td> + <td>Four bytes of zeros are added to the message at this + point if the Client Data Number of Values field contains + an odd number.</td> + </tr> + </table> + </center> + + <hr> + <h4><a name="AttributeMessage">Name: Attribute</a></h4> + <b>Type:</b> 0x000C<BR> + <b>Length:</b> varies<BR> + <b>Status:</b> Optional, may be repeated.<BR> + + <p><b>Purpose and Description:</b> The <em>Attribute</em> + message is used to list objects in the HDF file which are used + as attributes, or "meta-data" about the current object. An + attribute is a small dataset; it has a name, a datatype, a data + space, and raw data. Since attributes are stored in the object + header they must be relatively small (<64kb) and can be + associated with any type of object which has an object header + (groups, datasets, named types and spaces, etc.). + + <p> + <center> + <table border align=center cellpadding=4 width="80%"> + <caption align=top> + <b>Attribute Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td>Version</td> + <td>Reserved</td> + <td colspan=2>Name Size</td> + </tr> + + <tr align=center> + <td colspan=2>Type Size</td> + <td colspan=2>Space Size</td> + </tr> + + <tr align=center> + <td colspan=4><br>Name<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Type<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Space<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Data<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Version</td> + <td>Version number for the message. This document describes + version 1 of attribute messages.</td> + </tr> + + <tr valign=top> + <td>Reserved</td> + <td>This field is reserved for later use and is set to + zero.</td> + </tr> + + <tr valign=top> + <td>Name Size</td> + <td>The length of the attribute name in bytes including the + null terminator. Note that the Name field below may + contain additional padding not represented by this + field.</td> + </tr> + + <tr valign=top> + <td>Type Size</td> + <td>The length of the datatype description in the Type + field below. Note that the Type field may contain + additional padding not represented by this field.</td> + </tr> + + <tr valign=top> + <td>Space Size</td> + <td>The length of the dataspace description in the Space + field below. Note that the Space field may contain + additional padding not represented by this field.</td> + </tr> + + <tr valign=top> + <td>Name</td> + <td>The null-terminated attribute name. This field is + padded with additional null characters to make it a + multiple of eight bytes.</td> + </tr> + + <tr valign=top> + <td>Type</td> + <td>The datatype description follows the same format as + described for the datatype object header message. This + field is padded with additional zero bytes to make it a + multiple of eight bytes.</td> + </tr> + + <tr valign=top> + <td>Space</td> + <td>The dataspace description follows the same format as + described for the dataspace object header message. This + field is padded with additional zero bytes to make it a + multiple of eight bytes.</td> + </tr> + + <tr valign=top> + <td>Data</td> + <td>The raw data for the attribute. The size is determined + from the datatype and dataspace descriptions. This + field is <em>not</em> padded with additional zero + bytes.</td> + </tr> + </table> + </center> + + <hr> + <h4><a name="NameMessage">Name: Object Name</a></h4> + + <p><b>Type:</b> 0x000D<br> + <b>Length:</b> varies<br> + <b>Status:</b> Optional, may not be repeated. + + <p><b>Purpose and Description:</b> The object name or comment is + designed to be a short description of an object. An object name + is a sequence of non-zero (<code>\0</code>) ASCII characters with no other + formatting included by the library. + + <p> + <center> + <table border align=center cellpadding=4 width="80%"> + <caption align=top> + <b>Name Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4><br>Name<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Name</td> + <td>A null terminated ASCII character string.</td> + </tr> + </table> + </center> + + <hr> + <h4><a name="ModifiedMessage">Name: Object Modification Date & Time</a></h4> + + <p><b>Type:</b> 0x000E<br> + <b>Length:</b> fixed<br> + <b>Status:</b> Optional, may not be repeated. + + <p><b>Purpose and Description:</b> The object modification date + and time is a timestamp which indicates (using ISO-8601 date and + time format) the last modification of an object. The time is + updated when any object header message changes according to the + system clock where the change was posted. + + <p> + <center> + <table border align=center cellpadding=4 width="80%"> + <caption align=top> + <b>Modification Time Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=4>Year</td> + </tr> + + <tr align=center> + <td colspan=2>Month</td> + <td colspan=2>Day of Month</td> + </tr> + + <tr align=center> + <td colspan=2>Hour</td> + <td colspan=2>Minute</td> + </tr> + + <tr align=center> + <td colspan=2>Second</td> + <td colspan=2>Reserved</td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Year</td> + <td>The four-digit year as an ASCII string. For example, + <code>1998</code>. All fields of this message should be interpreted + as coordinated universal time (UTC)</td> + </tr> + + <tr valign=top> + <td>Month</td> + <td>The month number as a two digit ASCII string where + January is <code>01</code> and December is <code>12</code>.</td> + </tr> + + <tr valign=top> + <td>Day of Month</td> + <td>The day number within the month as a two digit ASCII + string. The first day of the month is <code>01</code>.</td> + </tr> + + <tr valign=top> + <td>Hour</td> + <td>The hour of the day as a two digit ASCII string where + midnight is <code>00</code> and 11:00pm is <code>23</code>.</td> + </tr> + + <tr valign=top> + <td>Minute</td> + <td>The minute of the hour as a two digit ASCII string where + the first minute of the hour is <code>00</code> and + the last is <code>59</code>.</td> + </tr> + + <tr valign=top> + <td>Second</td> + <td>The second of the minute as a two digit ASCII string + where the first second of the minute is <code>00</code> + and the last is <code>59</code>.</td> + </tr> + + <tr valign=top> + <td>Reserved</td> + <td>This field is reserved and should always be zero.</td> + </tr> + </table> + </center> + + <hr> + <h4><a name="SharedMessage">Name: Shared Object Message</a></h4> + <b>Type:</b> 0x000F<br> + <b>Length:</b> 4 Bytes<br> + <b>Status:</b> Optional, may be repeated. + + <p>A constant message can be shared among several object headers + by writing that message in the global heap and having the object + headers all point to it. The pointing is accomplished with a + Shared Object message which is understood directly by the object + header layer of the library. It is also possible to have a + message of one object header point to a message in some other + object header, but care must be exercised to prevent cycles. + + <p>If a message is shared, then the message appears in the global + heap and its message ID appears in the Header Message Type + field of the object header. Also, the Flags field in the object + header for that message will have bit two set (the + <code>H5O_FLAG_SHARED</code> bit). The message body in the + object header will be that of a Shared Object message defined + here and not that of the pointed-to message. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Shared Message Message</b> + </caption> + + <tr align=center> + <th width="25%">byte</td> + <th width="25%">byte</td> + <th width="25%">byte</td> + <th width="25%">byte</td> + </tr> + + <tr align=center> + <td>Version</td> + <td>Flags</td> + <td colspan=2>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4>Reserved</td> + </tr> + + <tr align=center> + <td colspan=4><br>Pointer<br><br></td> + </tr> + </table> + </center> + + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> + + <tr valign=top> + <td>Version</td> + <td>The version number for the message. This document + describes version one of shared messages.</td> + </tr> + + <tr valign=top> + <td>Flags</td> + <td>The Shared Message message points to a message which is + shared among multiple object headers. The Flags field + describes the type of sharing: + + <dl> + <dt><code>Bit 0</code> + <dd>If this bit is clear then the actual message is the + first message in some other object header; otherwise + the actual message is stored in the global heap. + + <dt><code>Bits 2-7</code> + <dd>Reserved (always zero) + </dl> + </tr> + + <tr valign=top> + <td>Pointer</td> + <td>This field points to the actual message. The format of + the pointer depends on the value of the Flags field. If + the actual message is in the global heap then the pointer + is the file address of the global heap collection that + holds the message, and a four-byte index into that + collection. Otherwise the pointer is a group entry + that points to some other object header.</td> + </tr> + </table> + </center> + + +<hr> +<h4><a name="ContinuationMessage">Name: Object Header Continuation</a></h4> +<b>Type:</b> 0x0010<BR> +<b>Length:</b> fixed<BR> +<b>Status:</b> Optional, may be repeated.<BR> +<b>Purpose and Description:</b> The object header continuation is the location +in the file of more header messages for the current data object. This can be +used when header blocks are large, or likely to change over time.<BR> +<b>Format of Data:</b><p> + The object header continuation is formatted as follows (assuming a 4-byte +length & offset are being used in the current file): + +<P> +<center> +<table border cellpadding=4 width=60%> +<caption align=bottom> +<B>HDF5 Object Header Continuation Message Layout</B> +</caption> + +<tr align=center> +<th width=25%>byte</th> +<th width=25%>byte</th> +<th width=25%>byte</th> +<th width=25%>byte</th> + +<tr align=center> +<td colspan=4>Header Continuation Offset</td> +<tr align=center> +<td colspan=4>Header Continuation Length</td> +</table> +</center> + +<P> +<dl> +<dt>The elements of the Header Continuation Message are described below: +<dd> +<dl> +<dt>Header Continuation Offset: (<offset> bytes) +<dd>This value is the offset in bytes from the beginning of the file where the +header continuation information is located. +<dt>Header Continuation Length: (<length> bytes) +<dd>This value is the length in bytes of the header continuation information in +the file. +</dl> +</dl> + +<!-- Delete examples throughout doc +<h4><a name="ContinuationExample">Examples:</a></h4> + [straightforward] +--> + +<hr> +<h4><a name="SymbolTableMessage">Name: Group Message</a></h4> +<b>Type:</b> 0x0011<BR> +<b>Length:</b> fixed<BR> +<b>Status:</b> Required for groups, may not be repeated.<BR> +<b>Purpose and Description:</b> Each group has a B-tree and a +name heap which are pointed to by this message.<BR> +<b>Format of data:</b> +<p>The group message is formatted as follows: + +<p> +<center> +<table border cellpadding=4 width="80%"> +<caption align=bottom> +<b>HDF5 Object Header Group Message Layout</b> +</caption> + +<tr align=center> +<th width="25%">byte</th> +<th width="25%">byte</th> +<th width="25%">byte</th> +<th width="25%">byte</th> + +<tr align=center> +<td colspan=4>B-tree Address</td> + +<tr align=center> +<td colspan=4>Heap Address</td> +</table> +</center> + +<P> +<dl> +<dt>The elements of the Group Message are described below: +<dd> +<dl> +<dt>B-tree Address (<offset> bytes) +<dd>This value is the offset in bytes from the beginning of the file +where the B-tree is located. +<dt>Heap Address (<offset> bytes) +<dd>This value is the offset in bytes from the beginning of the file +where the group name heap is located. +</dl> +</dl> + +<h3><a name="SharedObjectHeader">Disk Format: Level 2b - Shared Data Object Headers</a></h3> +<P>In order to share header messages between several dataset objects, object +header messages may be placed into the global heap. Since these +messages require additional information beyond the basic object header message +information, the format of the shared message is detailed below. + +<BR> <BR> +<center> +<table border cellpadding=4 width=60%> +<caption align=bottom> +<B>HDF5 Shared Object Header Message</B> +</caption> + +<tr align=center> +<th width=25%>byte</th> +<th width=25%>byte</th> +<th width=25%>byte</th> +<th width=25%>byte</th> + +<tr align=center> +<td colspan=4>Reference Count of Shared Header Message</td> +<tr align=center> +<td colspan=4><br> Shared Object Header Message<br> <br></td> +</table> +</center> + +<p> +<dl> +<dt> The elements of the shared object header message are described below: +<dd> +<dl> +<dt>Reference Count of Shared Header Message: (32-bit unsigned integer) +<dd>This value is used to keep a count of the number of dataset objects which +refer to this message from their dataset headers. When this count reaches zero, +the shared message header may be removed from the global heap. +<dt>Shared Object Header Message: (various lengths) +<dd>The data stored for the shared object header message is formatted in the +same way as the private object header messages described in the object header +description earlier in this document and begins with the header message Type. +</dl> +</dl> + + +<h3><a name="DataStorage">Disk Format: Level 2c - Data Object Data Storage</a></h3> +<P>The data for an object is stored separately from the header +information in the file and may not actually be located in the HDF5 file +itself if the header indicates that the data is stored externally. The +information for each record in the object is stored according to the +dimensionality of the object (indicated in the dimensionality header message). +Multi-dimensional data is stored in C order [same as current scheme], i.e. the +"last" dimension changes fastest. +<P>Data whose elements are composed of simple number-types are stored in +native-endian IEEE format, unless they are specifically defined as being stored +in a different machine format with the architecture-type information from the +number-type header message. This means that each architecture will need to +[potentially] byte-swap data values into the internal representation for that +particular machine. +<P> Data with a "variable" sized number-type is stored in a data heap +internal to the HDF5 file. Global heap identifiers are stored in the +data object storage. +<P>Data whose elements are composed of pointer number-types are stored in several +different ways depending on the particular pointer type involved. Simple +pointers are just stored as the dataset offset of the object being pointed to with the +size of the pointer being the same number of bytes as offsets in the file. +Partial-object pointers are stored as a heap-ID which points to the following +information within the file-heap: an offset of the object pointed to, number-type +information (same format as header message), dimensionality information (same +format as header message), sub-set start and end information (i.e. a coordinate +location for each), and field start and end names (i.e. a [pointer to the] +string indicating the first field included and a [pointer to the] string name +for the last field). + +<P>Data of a compound datatype is stored as a contiguous stream of the items +in the structure, with each item formatted according to its datatype. + +</body> +</html> |