diff options
-rw-r--r-- | doc/html/FF-IH_FileGroup.gif | bin | 0 -> 3407 bytes | |||
-rw-r--r-- | doc/html/FF-IH_FileObject.gif | bin | 0 -> 2136 bytes | |||
-rw-r--r-- | doc/html/H5.format.html | 2538 |
3 files changed, 1413 insertions, 1125 deletions
diff --git a/doc/html/FF-IH_FileGroup.gif b/doc/html/FF-IH_FileGroup.gif Binary files differnew file mode 100644 index 0000000..b0d76f5 --- /dev/null +++ b/doc/html/FF-IH_FileGroup.gif diff --git a/doc/html/FF-IH_FileObject.gif b/doc/html/FF-IH_FileObject.gif Binary files differnew file mode 100644 index 0000000..8eba623 --- /dev/null +++ b/doc/html/FF-IH_FileObject.gif diff --git a/doc/html/H5.format.html b/doc/html/H5.format.html index afcd444..8c0d8b2 100644 --- a/doc/html/H5.format.html +++ b/doc/html/H5.format.html @@ -1,144 +1,169 @@ <html> <head> <title> - HDF5 Disk-Format Specification + HDF5 File Format Specification </title> </head> <body bgcolor="#FFFFFF"> - <center><h1>HDF5: Disk Format Implementation</h1></center> +<hr> +<center> +<table border=0 width=98%> +<tr><td valign=top align=left> +<a href="index.html">Other HDF5 documents and links</a> <br> +<a href="H5.intro.html">Introduction to HDF5</a> <br> +</td> +<td> </td> +<td valign=top align=right> +<a href="H5.user.html">HDF5 User Guide</a> <br> +<a href="RM_H5Front.html">HDF5 Reference Manual</a> <br> +</td></tr> +</table> +</center> +<hr> + + <center><h1>HDF5 File Format Specification</h1></center> + + <center> + <table border=0 width=90%> + <tr> + <td valign=top> <ol type=I> - <li><a href="#BootBlock"> - Disk Format Level 0 - File Signature and Boot Block</a> - <li><a href="#ObjectDir"> - Disk Format Level 1 - File Infrastructure</a> + <li><a href="#Intro">Introduction</a> + <li><a href="#BootBlock">Disk Format Level 0 - File Signature and Super Block</a> + <li><a href="#Group">Disk Format Level 1 - File Infrastructure</a> + <font size=-2> <ol type=A> - <li><a href="#Btrees"> - Disk Format Level 1A - B-link Trees</a> - <li><a href="#SymbolTable"> - Disk Format Level 1B - Symbol Table</a> - <li><a href="#SymbolTableEntry"> - Disk Format Level 1C - Symbol Table Entry</a> - <li><a href="#LocalHeap"> - Disk Format Level 1D - Local Heaps</a> - <li><a href="#GlobalHeap"> - Disk Format Level 1E - Global Heap</a> - <li><a href="#FreeSpaceIndex"> - Disk Format Level 1F - Free-Space Index</a> + <li><a href="#Btrees">Disk Format Level 1A - B-link Trees and B-tree Nodes</a> + <li><a href="#SymbolTable">Disk Format Level 1B - Group</a> + <li><a href="#SymbolTableEntry">Disk Format Level 1C - Group Entry</a> + <li><a href="#LocalHeap">Disk Format Level 1D - Local Heaps</a> + <li><a href="#GlobalHeap">Disk Format Level 1E - Global Heap</a> + <li><a href="#FreeSpaceIndex">Disk Format Level 1F - Free-space Index</a> </ol> - <li><a href="#DataObject"> - Disk Format Level 2 - Data Objects</a> + </font> + <li><a href="#DataObject">Disk Format Level 2 - Data Objects</a> + <font size=-2> <ol type=A> - <li><a href="#ObjectHeader"> - Disk Format Level 2a - Data Object Headers</a> + <li><a href="#ObjectHeader">Disk Format Level 2a - Data Object Headers</a> <ol type=1> - <li><a href="#NILMessage"> <!-- 0x0000 --> - Name: NIL</a> - <li><a href="#SimpleDataSpace"> <!-- 0x0001 --> - Name: Simple Data Space</a> - <li><a href="#DataSpaceMessage"> <!-- 0x0002 --> - Name: Data-Space</a> - <li><a href="#DataTypeMessage"> <!-- 0x0003 --> - Name: Data-Type</a> - <li><a href="#FillValueMessage"> <!-- 0x0004 --> - Name: Data Storage - Fill Value</a> - <li><a href="#ReservedMessage_0005"> <!-- 0x0005 --> - Name: Reserved - not assigned yet</a> - <li><a href="#CompactDataStorageMessage"> <!-- 0x0006 --> - Name: Data Storage - Compact</a> - <li><a href="#ExternalFileListMessage"> <!-- 0x0007 --> - Name: Data Storage - External Data Files</a> - <li><a href="#LayoutMessage"> <!-- 0x0008 --> - Name: Data Storage - Layout</a> - <li><a href="#ReservedMessage_0009"> <!-- 0x0009 --> - Name: Reserved - not assigned yet</a> - <li><a href="#ReservedMessage_000A"> <!-- 0x000a --> - Name: Reserved - not assigned yet</a> - <li><a href="#FilterMessage"> <!-- 0x000b --> - Name: Data Storage - Filter Pipeline</a> - <li><a href="#AttributeMessage"> <!-- 0x000c --> - Name: Attribute</a> - <li><a href="#NameMessage"> <!-- 0x000d --> - Name: Object Name</a> - <li><a href="#ModifiedMessage"> <!-- 0x000e --> - Name: Object Modification Date & Time</a> - <li><a href="#SharedMessage"> <!-- 0x000f --> - Name: Shared Object Message</a> - <li><a href="#ContinuationMessage"> <!-- 0x0010 --> - Name: Object Header Continuation</a> - <li><a href="#SymbolTableMessage"> <!-- 0x0011 --> - Name: Symbol Table Message</a> + <li><a href="#NILMessage">Name: NIL</a> <!-- 0x0000 --> + <li><a href="#SimpleDataSpace">Name: Simple Dataspace</a> <!-- 0x0001 --> +<!-- + <li><a href="#DataSpaceMessage">Name: Complex Dataspace</a> --> <!-- 0x0002 --> + <li><a href="#DataTypeMessage">Name: Datatype</a> <!-- 0x0003 --> + <li><a href="#FillValueMessage">Name: Data Storage - Fill Value</a> <!-- 0x0004 --> + <li><a href="#ReservedMessage_0005">Name: Reserved - not assigned yet</a> <!-- 0x0005 --> </ol> - <li><a href="#SharedObjectHeader"> - Disk Format: Level 2b - Shared Data Object Headers</a> - <li><a href="#DataStorage"> - Disk Format: Level 2c - Data Object Data Storage</a> + </ol> + </font> + </ol> + </td><td> </td><td valign=top> + <ol type=I> + + <li><a href="#DataObject">Disk Format Level 2 - Data Objects</a> + <font size=-2><i>(Continued)</i> + <ol type=A> + <li><a href="#ObjectHeader">Disk Format Level 2a - Data Object Headers</a><i>(Continued)</i> + <ol type=1> + <li><a href="#CompactDataStorageMessage">Name: Data Storage - Compact</a> <!-- 0x0006 --> + <li><a href="#ExternalFileListMessage">Name: Data Storage - External Data Files</a> <!-- 0x0007 --> + <li><a href="#LayoutMessage">Name: Data Storage - Layout</a> <!-- 0x0008 --> + <li><a href="#ReservedMessage_0009">Name: Reserved - not assigned yet</a> <!-- 0x0009 --> + <li><a href="#ReservedMessage_000A">Name: Reserved - not assigned yet</a> <!-- 0x000a --> + <li><a href="#FilterMessage">Name: Data Storage - Filter Pipeline</a> <!-- 0x000b --> + <li><a href="#AttributeMessage">Name: Attribute</a> <!-- 0x000c --> + <li><a href="#NameMessage">Name: Object Name</a> <!-- 0x000d --> + <li><a href="#ModifiedMessage">Name: Object Modification Date and Time</a> <!-- 0x000e --> + <li><a href="#SharedMessage">Name: Shared Object Message</a> <!-- 0x000f --> + <li><a href="#ContinuationMessage">Name: Object Header Continuation</a> <!-- 0x0010 --> + <li><a href="#SymbolTableMessage">Name: Group Message</a> <!-- 0x0011 --> + </ol> + <li><a href="#SharedObjectHeader">Disk Format: Level 2b - Shared Data Object Headers</a> + <li><a href="#DataStorage">Disk Format: Level 2c - Data Object Data Storage</a> </ol> + </font> </ol> +</td></tr> +</table> +</center> + +<br><br> + + <h2>Introduction</h2> - <h2>Disk Format Implementation</h2> + <table align=right width=100> + <tr><td> </td><td align=center> + <hr> + <img src="FF-IH_FileGroup.gif" alt="HDF5 Groups" hspace=15 vspace=15> + </td><td> </td></tr> + <tr><td> </td><td align=center> + <strong>Figure 1:</strong> Relationships among the HDF5 root group, other groups, and objects + <hr> + </td><td> </td></tr> - <P>The format of a HDF5 file on disk encompasses several - key ideas of the current HDF4 & AIO file formats as well as - addressing some short-comings therein. The new format is + <tr><td> </td><td align=center> + <img src="FF-IH_FileObject.gif" alt="HDF5 Objects" hspace=15 vspace=15> + </td><td> </td></tr> + <tr><td> </td><td align=center> + <strong>Figure 2:</strong> HDF5 objects -- datasets, datatypes, or dataspaces + <hr> + </td><td> </td></tr> + </table> + + + <P>The format of an HDF5 file on disk encompasses several + key ideas of the HDF4 and AIO file formats as well as + addressing some shortcomings therein. The new format is more self-describing than the HDF4 format and is more uniformly applied to data objects in the file. - <P>An HDF5 file can be thought of as a directed graph. - The nodes of this graph are the higher-level HDF5 objects, - including groups, datasets, datatypes, and dataspaces. - This document describes the lower-level data objects used by - the HDF5 library to represent those higher-level objects and - their properties. - - <P>At the lowest level, an HDF5 file is made up of the following - objects: + <P>An HDF5 file appears to the user as a directed graph. + The nodes of this graph are the higher-level HDF5 objects + that are exposed by the HDF5 APIs: + + <ul> + <li>Groups + <li>Datasets + <li>Datatypes + <li>Dataspaces + </ul> + + <P>At the lowest level, as information is actually written to the disk, + an HDF5 file is made up of the following objects: <ul> - <li>A bool block + <li>A super block <li>B-tree nodes (containing either symbol nodes or raw data chunks) <li>Object headers + <li>Collections <li>Local heaps <li>Free space </ul> - As indicated above, the HDF5 library uses and interprets these - low-level objects to describe the high-level HDF5 objects that - are revealed to the user, and to higher-level applications, - through the HDF5 APIs. - -<!-- -<blockquote> -<pre> ----------- Edit from here... ------------- + The HDF5 library uses these lower-level objects to represent the + higher-level objects that are then presented to the user or + to applications through the APIs. + For instance, a group is an object header that contains a message that + points to a local heap and to a B-tree which points to symbol nodes. + A dataset is an object header that contains messages that describe + datatype, space, layout, filters, external files, fill value, etc + with the layout message pointing to either a raw data chunk or to a + B-tree that points to raw data chunks. -Once you know about all these low-level objects you can build bigger -and better things, which is what most people are interested in and -which are the objects that the hdf5 library exposes in the API. They -are: + <h3>This Document</h3> - * Groups - * Datasets - * Datatypes - * Data spaces + <p>This document describes the lower-level data objects; + the higher-level objects and their properties are described + in the <a href="H5.user.html"><cite>HDF5 User's Guide</cite></a>. -For instance, a group is an object header that contains a message that -points to a local heap and to a B-tree which points to symbol nodes. -A dataset is an object header that contains messages that describe -data type, space, layout, filters, external files, fill value, etc -with the layout message pointing to either a raw data chunk or to a -B-tree that points to raw data chunks. -Elena> Would it be more logical to discuss things in this order? +<!-- +<blockquote> +<pre> -Elena> Intro ( What is HDF5 file, etc.) -Elena> File Header -Elena> File Body -Elena> Objects -Elena> Object Header -Elena> Object Header Message Data Elena> NOTE: give reference to the detailed discussion of the B-trees Elena> when needed. Right now we do not have specification (only general one) Elena> for the Symbol Table B-trees and B-trees used to manage chunked datasets. @@ -149,23 +174,6 @@ Elena> Symbol Tables Elena> Global heap Elena> "Free-space object" -That might be a good order for someone that's familiar with the API -but if you're trying to get all the way down to the file format level -it results in a lot of forward references in the documentation. It -might be better to do a bottom-up documentation similar to the order I -used above: - - * General file layout - * Boot block - * Format-level objects (B-trees, symbol nodes, object headers, etc). - * Object header messages - * High-level objects (datasets, groups, named types and spaces, etc) - -where "high-level objects" description mostly describes which object -header messages are required, optional, mutually exclusive, etc. for -each high-level object. - ----------- ...to here. ------------- </pre> </blockquote> @@ -173,27 +181,33 @@ each high-level object. - <P>Three levels of information compose the file format. The level - 0 contains basic information for identifying and - "boot-strapping" the file. Level 1 information is composed of - the object directory (stored as a B-tree) and is used as the - index for all the objects in the file. The rest of the file is - composed of data-objects at level 2, with each object - partitioned into header (or "meta") information and data - information. + <P>Three levels of information comprise the file format. + Level 0 contains basic information for identifying and + defining information about the file. Level 1 information contains + the group information (stored as a B-tree) and is used as the + index for all the objects in the file. Level 2 is the rest + of the file and contains all of the data objects, with each object + partitioned into header information, also known as + <em>meta information</em>, and data. <p>The sizes of various fields in the following layout tables are determined by looking at the number of columns the field spans in the table. There are three exceptions: (1) The size may be overridden by specifying a size in parentheses, (2) the size of - addresses is determined by the <em>Size of Addresses</em> field - in the boot block, and (3) the size of size fields is determined - by the <em>Size of Sizes</em> field in the boot block. + addresses is determined by the <em>Size of Offsets</em> field + in the super block, and (3) the size of size fields is determined + by the <em>Size of Lengths</em> field in the super block. + - <h3><a name="BootBlock"> - Disk Format: Level 0 - File Signature and Boot Block</a></h3> - <P>The boot block may begin at certain predefined offsets within +<br><br> +<br><br> + + + <h2><a name="BootBlock"> + Disk Format: Level 0 - File Signature and Super Block</a></h2> + + <P>The super block may begin at certain predefined offsets within the HDF5 file, allowing a block of unspecified content for users to place additional information at the beginning (and end) of the HDF5 file without limiting the HDF5 library's @@ -201,22 +215,22 @@ each high-level object. feature was designed to accommodate wrapping an HDF5 file in another file format or adding descriptive information to the file without requiring the modification of the actual file's - information. The boot-block is located by searching for the + information. The super block is located by searching for the HDF5 file signature at byte offset 0, byte offset 512 and at successive locations in the file, each a multiple of two of the previous location, i.e. 0, 512, 1024, 2048, etc. - <P>The boot-block is composed of a file signature, followed by - boot block and object directory version numbers, information + <P>The super block is composed of a file signature, followed by + super block and group version numbers, information about the sizes of offset and length values used to describe - items within the file, the size of each object directory page, - and a symbol table entry for the root object in the file. + items within the file, the size of each group page, + and a group entry for the root object in the file. <p> <center> <table border align=center cellpadding=4 width="80%"> <caption align=top> - <B>HDF5 Boot Block Layout</B> + <B>HDF5 Super Block Layout</B> </caption> <tr align=center> @@ -231,22 +245,22 @@ each high-level object. </tr> <tr align=center> - <td>Version # of Boot Block</td> - <td>Version # of Global Free-Space Storage</td> - <td>Version # of Object Directory</td> + <td>Version # of Super Block</td> + <td>Version # of Global Free-space Storage</td> + <td>Version # of Group</td> <td>Reserved</td> </tr> <tr align=center> <td>Version # of Shared Header Message Format</td> - <td>Size of Addresses</td> - <td>Size of Sizes</td> + <td>Size of Offsets</td> + <td>Size of Lengths</td> <td>Reserved (zero)</td> </tr> <tr align=center> - <td colspan=2>Symbol Table Leaf Node K</td> - <td colspan=2>Symbol Table Internal Node K</td> + <td colspan=2>Group Leaf Node K</td> + <td colspan=2>Group Internal Node K</td> </tr> <tr align=center> @@ -258,7 +272,7 @@ each high-level object. </tr> <tr align=center> - <td colspan=4>Address of Global Free-Space Heap</td> + <td colspan=4>Address of Global Free-space Heap</td> </tr> <tr align=center> @@ -270,14 +284,14 @@ each high-level object. </tr> <tr align=center> - <td colspan=4><br>Root Group Symbol Table Entry<br><br></td> + <td colspan=4><br>Root Group Address<br><br></td> </tr> </table> </center> <p> <center> - <table align=center width="80%"> + <table width="80%"> <tr> <th width="30%">Field Name</th> <th width="70%">Description</th> @@ -289,11 +303,11 @@ each high-level object. quickly identify a file as being an HDF5 file. The constant value is designed to allow easy identification of an HDF5 file and to allow certain types of data corruption - to be detected. The file signature of a HDF5 file always - contain the following values: + to be detected. The file signature of an HDF5 file always + contains the following values: <br><br><center> - <table border align=center cellpadding=4 width="80%"> + <table border align=center cellpadding=4 width="100%"> <tr align=center> <td>decimal</td> <td width="8%">137</td> @@ -333,13 +347,13 @@ each high-level object. </center> <br> - This signature both identifies the file as a HDF5 file + This signature both identifies the file as an HDF5 file and provides for immediate detection of common file-transfer problems. The first two bytes distinguish HDF5 files on systems that expect the first two bytes to identify the file type uniquely. The first byte is chosen as a non-ASCII value to reduce the probability - that a text file may be misrecognized as a HDF5 file; + that a text file may be misrecognized as an HDF5 file; also, it catches bad file transfers that clear bit 7. Bytes two through four name the format. The CR-LF sequence catches bad file transfers that alter newline @@ -350,174 +364,188 @@ each high-level object. </tr> <tr valign=top> - <td>Version # of the Boot Block</td> + <td>Version Number of the Super Block</td> <td>This value is used to determine the format of the - information in the boot block. When the format of the - information in the boot block is changed, the version # + information in the super block. When the format of the + information in the super block is changed, the version number is incremented to the next integer and can be used to - determine how the information in the boot block is + determine how the information in the super block is formatted.</td> </tr> <tr valign=top> - <td>Version # of the Global Free-Space Storage</td> + <td>Version Number of the Global Free-space Heap</td> <td>This value is used to determine the format of the - information in the Global Free-Space Heap. Currently, - this is implemented as a B-tree of length/offset pairs - to locate free space in the file, but future advances in - the file-format could change the method of finding - global free-space. When the format of the information - is changed, the version # is incremented to the next - integer and can be used to determine how the information - is formatted.</td> + information in the Global Free-space Heap.</td> </tr> <tr valign=top> - <td>Version # of the Object Directory</td> + <td>Version Number of the Group</td> <td>This value is used to determine the format of the - information in the Object Directory. When the format of - the information in the Object Directory is changed, the - version # is incremented to the next integer and can be - used to determine how the information in the Object - Directory is formatted.</td> + information in the Group. When the format of + the information in the Group is changed, the + version number is incremented to the next integer and can be + used to determine how the information in the Group + is formatted.</td> </tr> <tr valign=top> - <td>Version # of the Shared Header Message Format</td> + <td>Version Number of the Shared Header Message Format</td> <td>This value is used to determine the format of the information in a shared object header message, which is stored in the global small-data heap. Since the format - of the shared header messages differ from the private - header messages, a version # is used to identify changes + of the shared header messages differs from the private + header messages, a version number is used to identify changes in the format.</td> </tr> <tr valign=top> - <td>Size of Addresses</td> - <td>This value contains the number of bytes used for + <td>Size of Offsets</td> + <td>This value contains the number of bytes used to store addresses in the file. The values for the addresses of - objects in the file are relative to a base address, - usually the address of the boot block signature. This + objects in the file are offsets relative to a base address, + usually the address of the super block signature. This allows a wrapper to be added after the file is created without invalidating the internal offset locations.</td> </tr> <tr valign=top> - <td>Size of Sizes</td> + <td>Size of Lengths</td> <td>This value contains the number of bytes used to store the size of an object.</td> </tr> <tr valign=top> - <td>Symbol Table Leaf Node K</td> - <td>Each leaf node of a symbol table B-tree will have at + <td>Group Leaf Node K</td> + <td>Each leaf node of a group B-tree will have at least this many entries but not more than twice this - many. If a symbol table has a single leaf node then it + many. If a group has a single leaf node then it may have fewer entries.</td> </tr> <tr valign=top> - <td>Symbol Table Internal Node K</td> - <td>Each internal node of a symbol table B-tree will have + <td>Group Internal Node K</td> + <td>Each internal node of a group B-tree will have at least K pointers to other nodes but not more than 2K - pointers. If the symbol table has only one internal + pointers. If the group has only one internal node then it might have fewer than K pointers.</td> </tr> <tr valign=top> - <td>Bytes per B-Tree Page</td> - <td>This value contains the # of bytes used for symbol - pairs per page of the B-Trees used in the file. All - B-Tree pages will have the same size per page. <br>(For - 32-bit file offsets, 340 objects is the maximum per 4KB - page, and for 64-bit file offset, 254 objects will fit - per 4KB page. In general, the equation is: <br> <# - of objects> = FLOOR((<page size>-<offset - size>)/(<Symbol size>+<offset size>))-1 )</td> - </tr> + <td>Bytes per B-tree Page</td> + <td>This value contains the number of bytes used for symbol + pairs per page of the B-trees used in the file. All + B-tree pages will have the same size per page. + <br> + For 32-bit file offsets, 340 objects is the maximum + per 4KB page; for 64-bit file offset, 254 objects will fit + per 4KB page. In general, the equation is: + <br> + <code> <<i>number of objects</i>> = + <br> + FLOOR((<<i>page size</i>> - <<i>offset size</i>>) / + <br> + (<<i>Symbol size</i>> + <<i>offset size</i>>)) + - 1 </code></td> + </tr> - <tr valign=top> - <td>File Consistency Flags</td> - <td>This value contains flags to indicate information - about the consistency of the information contained - within the file. Currently, the following bit flags are - defined: bit 0 set indicates that the file is opened for - write-access and bit 1 set indicates that the file has - been verified for consistency and is guaranteed to be - consistent with the format defined in this document. - Bits 2-31 are reserved for future use. Bit 0 should be - set as the first action when a file is opened for write - access and should be cleared only as the final action - when closing a file. Bit 1 should be cleared during - normal access to a file and only set after the file's - consistency is guaranteed by the library or a - consistency utility.</td> - </tr> + <tr valign=top> + <td>File Consistency Flags</td> + <td>This value contains flags to indicate information + about the consistency of the information contained + within the file. Currently, the following bit flags are + defined: + <ul> + <li>Bit 0 set indicates that the file is opened for + write-access. + <li>Bit 1 set indicates that the file has + been verified for consistency and is guaranteed to be + consistent with the format defined in this document. + <li>Bits 2-31 are reserved for future use. + </ul> + Bit 0 should be + set as the first action when a file is opened for write + access and should be cleared only as the final action + when closing a file. Bit 1 should be cleared during + normal access to a file and only set after the file's + consistency is guaranteed by the library or a + consistency utility.</td> + </tr> - <tr valign=top> - <td>Base Address</td> - <td>This is the absolute file address of the first byte of - the hdf5 data within the file. Unless otherwise noted, - all other file addresses are relative to this base - address.</td> - </tr> + <tr valign=top> + <td>Base Address</td> + <td>This is the absolute file address of the first byte of + the HDF5 data within the file. The library currently + constrains this value to be the absolute file address + of the super block itself when creating new files; + future versions of the library may provide greater + flexibility. Unless otherwise noted, + all other file addresses are relative to this base + address.</td> + </tr> - <tr valign=top> - <td>Address of Global Free-Space Heap</td> - <td>This value contains the relative address of the B-Tree - used to manage the blocks of data which are unused in the - file currently. The free-space heap is used to manage the - blocks of bytes at the file-level which become unused with - objects are moved within the file.</td> - </tr> + <tr valign=top> + <td>Address of Global Free-space Heap</td> + <td>Free-space management is not yet defined in the HDF5 + file format and is not handled by the library. + Currently this field always contains the + undefined address <code>0xfff...ff</code>. +<!-- + <td>This value contains the relative address of the B-tree + used to manage the blocks of data which are unused in the + file currently. The free-space heap is used to manage the + blocks of bytes at the file-level which become unused when + objects are moved within the file.</td> +--> + </tr> - <tr valign=top> - <td>End of File Address</td> - <td>This is the relative file address of the first byte past - the end of all HDF5 data. It is used to determine if a - file has been accidently truncated and as an address where - file memory allocation can occur if the free list is not - used.</td> - </tr> + <tr valign=top> + <td>End of File Address</td> + <td>This is the relative file address of the first byte past + the end of all HDF5 data. It is used to determine whether a + file has been accidently truncated and as an address where + file data allocation can occur if the free list is not + used.</td> + </tr> - <tr valign=top> - <td>Driver Information Block Address</td> - <td>This is the relative file address of the file driver - information block which contains driver-specific - information needed to reopen the file. If there is no - driver information block then this entry should be the - undefined address (all bits set).</td> - </tr> + <tr valign=top> + <td>Driver Information Block Address</td> + <td>This is the relative file address of the file driver + information block which contains driver-specific + information needed to reopen the file. If there is no + driver information block then this entry should be the + undefined address (all bits set).</td> + </tr> + + <tr valign=top> + <td>Root Group Address</td> + <td>This is the address of the root group (described later + in this document), which serves as the entry point into + the group graph.</td> + </tr> + </table> + </center> - <tr valign=top> - <td>Root Group Symbol Table Entry</td> - <td>This symbol-table entry (described later in this - document) refers to the entry point into the group - graph. If the file contains a single object, then that - object can be the root object and no groups are used.</td> - </tr> - </table> - </center> - <p>The file driver information block is an optional region of the + <p>The <em>file driver information block</em> is an optional region of the file which contains information needed by the file driver in - order to reopen a file. The format of the driver information + order to reopen a file. The format of the file driver information block is: - + <p> <center> <table border align=center cellpadding=4 width="80%"> - <caption align=top> - <B>Driver Information Block</B> - </caption> - - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> + <caption align=top> + <B>Driver Information Block</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> </tr> - + <tr align=center> <td>Version</td> <td colspan=3>Reserved (zero)</td> @@ -540,12 +568,12 @@ each high-level object. <p> <center> <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> - <tr valign=top> + <tr valign=top> <td>Version</td> <td>The version number of the driver information block. The file format documented here is version zero.</td> @@ -563,14 +591,15 @@ each high-level object. termination which identifies the driver and version number of the Driver Information block. The predefined drivers supplied with the HDF5 library are identified by the - letters "NCSA" followed by the first four characters of + letters <code>NCSA</code> followed by the first four characters of the driver name. If the Driver Information block is not the original version then the last letter(s) of the identification will be replaced by a version number in - ASCII. For example, the various versions of the "family" - driver will be identified by "NCSAfami", "NCSAfam0", - NCSAfam1", etc. Identification for user-defined drivers is - arbitrary but should be unique.</td> + ASCII. + For example, the various versions of the <em>family driver</em> + will be identified by <code>NCSAfami</code>, <code>NCSAfam0</code>, + <code>NCSAfam1</code>, etc. Identification for user-defined drivers + is arbitrary but should be unique.</td> </tr> <tr valign=top> @@ -584,826 +613,951 @@ each high-level object. </center> - <h3><a name="Btrees">Disk Format: Level 1A - B-link Trees</a></h3> + <br><br> + <br><br> - <p>B-link trees allow flexible storage for objects which tend to grow - in ways that cause the object to be stored discontiguously. B-trees - are described in various algorithms books including "Introduction to - Algorithms" by Thomas H. Cormen, Charles E. Leiserson, and Ronald - L. Rivest. The B-link tree, in which the sibling nodes at a - particular level in the tree are stored in a doubly-linked list, - is described in the "Efficient Locking for Concurrent Operations - on B-trees" paper by Phillip Lehman and S. Bing Yao as published - in the <em>ACM Transactions on Database Systems</em>, Vol. 6, - No. 4, December 1981. - <p>The B-link trees implemented by the file format contain one more - key than the number of children. In other words, each child - pointer out of a B-tree node has a left key and a right key. - The pointers out of internal nodes point to sub-trees while - the pointers out of leaf nodes point to other file data types. - Notwithstanding that difference, internal nodes and leaf nodes - are identical. + <h2><a name="Group"> + Disk Format: Level 1 - File Infrastructure</a></h2> + <h3><a name="Btrees">Disk Format: Level 1A - B-link Trees and B-tree Nodes</a></h3> + + <p>B-link trees allow flexible storage for objects which tend to grow + in ways that cause the object to be stored discontiguously. B-trees + are described in various algorithms books including "Introduction to + Algorithms" by Thomas H. Cormen, Charles E. Leiserson, and Ronald + L. Rivest. The B-link tree, in which the sibling nodes at a + particular level in the tree are stored in a doubly-linked list, + is described in the "Efficient Locking for Concurrent Operations + on B-trees" paper by Phillip Lehman and S. Bing Yao as published + in the <em>ACM Transactions on Database Systems</em>, Vol. 6, + No. 4, December 1981. + + <p>The B-link trees implemented by the file format contain one more + key than the number of children. In other words, each child + pointer out of a B-tree node has a left key and a right key. + The pointers out of internal nodes point to sub-trees while + the pointers out of leaf nodes point to symbol nodes and + raw data chunks. + Aside from that difference, internal nodes and leaf nodes + are identical. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>B-tree Nodes</B> + </caption> - <p> - <center> - <table border cellpadding=4 width="80%"> - <caption align=top> - <B>B-tree Nodes</B> - </caption> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> + <tr align=center> + <td colspan=4>Node Signature</td> - <tr align=center> - <td colspan=4>Node Signature</td> + <tr align=center> + <td>Node Type</td> + <td>Node Level</td> + <td colspan=2>Entries Used</td> - <tr align=center> - <td>Node Type</td> - <td>Node Level</td> - <td colspan=2>Entries Used</td> + <tr align=center> + <td colspan=4>Address of Left Sibling</td> - <tr align=center> - <td colspan=4>Address of Left Sibling</td> + <tr align=center> + <td colspan=4>Address of Right Sibling</td> - <tr align=center> - <td colspan=4>Address of Right Sibling</td> + <tr align=center> + <td colspan=4>Key 0 (variable size)</td> - <tr align=center> - <td colspan=4>Key 0 (variable size)</td> + <tr align=center> + <td colspan=4>Address of Child 0</td> - <tr align=center> - <td colspan=4>Address of Child 0</td> + <tr align=center> + <td colspan=4>Key 1 (variable size)</td> - <tr align=center> - <td colspan=4>Key 1 (variable size)</td> + <tr align=center> + <td colspan=4>Address of Child 1</td> - <tr align=center> - <td colspan=4>Address of Child 1</td> + <tr align=center> + <td colspan=4>...</td> - <tr align=center> - <td colspan=4>...</td> + <tr align=center> + <td colspan=4>Key 2<em>K</em> (variable size)</td> - <tr align=center> - <td colspan=4>Key 2<em>K</em> (variable size)</td> + <tr align=center> + <td colspan=4>Address of Child 2<em>K</em></td> - <tr align=center> - <td colspan=4>Address of Child 2<em>K</em></td> + <tr align=center> + <td colspan=4>Key 2<em>K</em>+1 (variable size)</td> + </table> + </center> - <tr align=center> - <td colspan=4>Key 2<em>K</em>+1 (variable size)</td> - </table> - </center> + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> + <tr valign=top> + <td>Node Signature</td> + <td>The ASCII character string <code>TREE</code> is + used to indicate the + beginning of a B-link tree node. This gives file + consistency checking utilities a better chance of + reconstructing a damaged file.</td> + </tr> - <tr valign=top> - <td>Node Signature</td> - <td>The value ASCII 'TREE' is used to indicate the - beginning of a B-link tree node. This gives file - consistency checking utilities a better chance of - reconstructing a damaged file.</td> - </tr> + <tr valign=top> + <td>Node Type</td> + <td>Each B-link tree points to a particular type of data. + This field indicates the type of data as well as + implying the maximum degree <em>K</em> of the tree and + the size of each Key field. + <br> + <dl compact> + <dt>0 + <dd>This tree points to group nodes. + <dt>1 + <dd>This tree points to a new data chunk. + </dl> + </td> + </tr> - <tr valign=top> - <td>Node Type</td> - <td>Each B-link tree points to a particular type of data. - This field indicates the type of data as well as - implying the maximum degree <em>K</em> of the tree and - the size of each Key field. - <br> - <dl compact> - <dt>0 - <dd>This tree points to symbol table nodes. - <dt>1 - <dd>This tree points to a (partial) linear address space. - </dl> - </td> - </tr> + <tr valign=top> + <td>Node Level</td> + <td>The node level indicates the level at which this node + appears in the tree (leaf nodes are at level zero). Not + only does the level indicate whether child pointers + point to sub-trees or to data, but it can also be used + to help file consistency checking utilities reconstruct + damanged trees.</td> + </tr> - <tr valign=top> - <td>Node Level</td> - <td>The node level indicates the level at which this node - appears in the tree (leaf nodes are at level zero). Not - only does the level indicate whether child pointers - point to sub-trees or to data, but it can also be used - to help file consistency checking utilities reconstruct - damanged trees.</td> - </tr> + <tr valign=top> + <td>Entries Used</td> + <td>This determines the number of children to which this + node points. All nodes of a particular type of tree + have the same maximum degree, but most nodes will point + to less than that number of children. The valid child + pointers and keys appear at the beginning of the node + and the unused pointers and keys appear at the end of + the node. The unused pointers and keys have undefined + values.</td> + </tr> - <tr valign=top> - <td>Entries Used</td> - <td>This determines the number of children to which this - node points. All nodes of a particular type of tree - have the same maximum degree, but most nodes will point - to less than that number of children. The valid child - pointers and keys appear at the beginning of the node - and the unused pointers and keys appear at the end of - the node. The unused pointers and keys have undefined - values.</td> - </tr> + <tr valign=top> + <td>Address of Left Sibling</td> + <td>This is the file address of the left sibling of the + current node relative to the super block. If the current + node is the left-most node at this level then this field + is the undefined address (all bits set).</td> + </tr> - <tr valign=top> - <td>Address of Left Sibling</td> - <td>This is the file address of the left sibling of the - current node relative to the boot block. If the current - node is the left-most node at this level then this field - is the undefined address (all bits set).</td> - </tr> + <tr valign=top> + <td>Address of Right Sibling</td> + <td>This is the file address of the right sibling of the + current node relative to the super block. If the current + node is the right-most node at this level then this + field is the undefined address (all bits set).</td> + </tr> - <tr valign=top> - <td>Address of Right Sibling</td> - <td>This is the file address of the right sibling of the - current node relative to the boot block. If the current - node is the right-most node at this level then this - field is the undefined address (all bits set).</td> - </tr> + <tr valign=top> + <td>Keys and Child Pointers</td> + <td>Each tree has 2<em>K</em>+1 keys with 2<em>K</em> + child pointers interleaved between the keys. The number + of keys and child pointers actually containing valid + values is determined by the <em>Entries Used</em> field. If + that field is <em>N</em> then the B-link tree contains + <em>N</em> child pointers and <em>N</em>+1 keys.</td> + </tr> - <tr valign=top> - <td>Keys and Child Pointers</td> - <td>Each tree has 2<em>K</em>+1 keys with 2<em>K</em> - child pointers interleaved between the keys. The number - of keys and child pointers actually containing valid - values is determined by the `Entries Used' field. If - that field is <em>N</em> then the B-link tree contains - <em>N</em> child pointers and <em>N</em>+1 keys.</td> - </tr> + <tr valign=top> + <td>Key</td> + <td>The format and size of the key values is determined by + the type of data to which this tree points. The keys are + ordered and are boundaries for the contents of the child + pointer; that is, the key values represented by child + <em>N</em> fall between Key <em>N</em> and Key + <em>N</em>+1. Whether the interval is open or closed on + each end is determined by the type of data to which the + tree points. + <p> + The format of the key depends on the node type. + For nodes of node type 1, the key is formatted as follows: + <center> + <table> + <tr valign=top align=left> + <td width=40%>Bytes 1-4</td> + <td>Size of chunk in bytes.</td> + <tr valign=top align=left></tr> + <td>Bytes 4-8</td> + <td>Filter mask, a 32-bit bitfield indicating which + filters have been applied to that chunk.</td> + </tr><tr valign=top align=left> + <td><i>N</i> fields of 8 bytes each</td> + <td>A 64-bit index indicating the offset of the + chunk within the dataset where <i>N</i> is the number + of dimensions of the dataset. For example, if + a chunk in a 3-dimensional dataset begins at the + position <code>[5,5,5]</code>, there will be three + such 8-bit indices, each with the value of + <code>5</code>.</td> + </tr> + </table> + </center> + <p> + For nodes of node type 0, the key is formatted as follows: + <center> + <table> + <tr valign=top align=left> + <td width=40%>A single field of <i>Size of Lengths</i> + bytes</td> + <td>Indicates the byte offset into the local heap + for the first object name in the subtree which + that key describes.</td> + </tr> + </table> + </center> + </td> + </tr> - <tr valign=top> - <td>Key</td> - <td>The format and size of the key values is determined by - the type of data to which this tree points. The keys are - ordered and are boundaries for the contents of the child - pointer. That is, the key values represented by child - <em>N</em> fall between Key <em>N</em> and Key - <em>N</em>+1. Whether the interval is open or closed on - each end is determined by the type of data to which the - tree points.</td> - </tr> + <tr valign=top> + <td>Child Pointers</td> + <td>The tree node contains file addresses of subtrees or + data depending on the node level. Nodes at Level 0 point + to data addresses, either data chunk or group nodes. + Nodes at non-zero levels point to other nodes of the + same B-tree.</td> + </tr> + </table> + </center> - <tr valign=top> - <td>Address of Children</td> - <td>The tree node contains file addresses of subtrees or - data depending on the node level (0 implies data - addresses).</td> - </tr> - </table> - </center> +<p> + Each B-tree node looks like this: + + <center> + <table> + <tr valign=top align=center> + <td>key[0]</td><td> </td> + <td>child[0]</td><td> </td> + <td>key[1]</td><td> </td> + <td>child[1]</td><td> </td> + <td>key[2]</td><td> </td> + <td>...</td><td> </td> + <td>...</td><td> </td> + <td>key[<i>N</i>-1]</td><td> </td> + <td>child[<i>N</i>-1]</td><td> </td> + <td>key[<i>N</i>]</td> + </tr> + </table> + </center> + + where child[<i>i</i>] is a pointer to a sub-tree (at a level + above Level 0) or to data (at Level 0). + Each key[<i>i</i>] describes an <i>item</i> stored by the B-tree + (a chunk or an object of a group node). The range of values + represented by child[<i>i</i>] are indicated by key[<i>i</i>] + and key[<i>i</i>+1]. + + + <p>The following question must next be answered: + "Is the value described by key[<i>i</i>] contained in + child[<i>i</i>-1] or in child[<i>i</i>]?" + The answer depends on the type of tree. + In trees for groups (node type 0) the object described by + key[<i>i</i>] is the greatest object contained in + child[<i>i</i>-1] while in chunk trees (node type 1) the + chunk described by key[<i>i</i>] is the least chunk in + child[<i>i</i>]. + + <p>That means that key[0] for group trees is sometimes unused; + it points to offset zero in the heap, which is always the + empty string and compares as "less-than" any valid object name. + + <p>And key[<i>N</i>] for chunk trees is sometimes unused; + it contains a chunk offset which compares as "greater-than" + any other chunk offset and has a chunk byte size of zero + to indicate that it is not actually allocated. + + + <h3><a name="SymbolTable">Disk Format: Level 1B - Group and Symbol Nodes</a></h3> + + <p>A group is an object internal to the file that allows + arbitrary nesting of objects (including other groups). + A group maps a set of names to a set of file + address relative to the base address. Certain meta data + for an object to which the group points can be duplicated + in the group symbol table in addition to the object header. + + <p>An HDF5 object name space can be stored hierarchically by + partitioning the name into components and storing each + component in a group. The group entry for a + non-ultimate component points to the group containing + the next component. The group entry for the last + component points to the object being named. + + <p>A group is a collection of group nodes pointed + to by a B-link tree. Each group node contains entries + for one or more symbols. If an attempt is made to add a + symbol to an already full group node containing + 2<em>K</em> entries, then the node is split and one node + contains <em>K</em> symbols and the other contains + <em>K</em>+1 symbols. + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Group Node (A Leaf of a B-tree)</B> + </caption> - <h3><a name="SymbolTable">Disk Format: Level 1B - Symbol Table</a></h3> - - <p>A symbol table is a group internal to the file that allows - arbitrary nesting of objects (including other symbol - tables). A symbol table maps a set of names to a set of file - address relative to the file boot block. Certain meta data - for an object to which the symbol table points can be cached - in the symbol table in addition to (or in place of?) the - object header. - - <p>An HDF5 object name space can be stored hierarchically by - partitioning the name into components and storing each - component in a symbol table. The symbol table entry for a - non-ultimate component points to the symbol table containing - the next component. The symbol table entry for the last - component points to the object being named. - - <p>A symbol table is a collection of symbol table nodes pointed - to by a B-link tree. Each symbol table node contains entries - for one or more symbols. If an attempt is made to add a - symbol to an already full symbol table node containing - 2<em>K</em> entries, then the node is split and one node - contains <em>K</em> symbols and the other contains - <em>K</em>+1 symbols. + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> - <p> - <center> - <table border cellpadding=4 width="80%"> - <caption align=top> - <B>Symbol Table Node</B> - </caption> + <tr align=center> + <td colspan=4>Node Signature</td> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> + <tr align=center> + <td>Version Number</td> + <td>Reserved for Future Use</td> + <td colspan=2>Number of Symbols</td> - <tr align=center> - <td colspan=4>Node Signature</td> + <tr align=center> + <td colspan=4><br><br>Group Entries<br><br><br></td> + </table> + </center> - <tr align=center> - <td>Version Number</td> - <td>Reserved for Future Use</td> - <td colspan=2>Number of Symbols</td> + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> - <tr align=center> - <td colspan=4><br><br>Symbol Table Entries<br><br><br></td> - </table> - </center> + <tr valign=top> + <td>Node Signature</td> + <td>The ASCII character string <code>SNOD</code> is + used to indicate the + beginning of a group node. This gives file + consistency checking utilities a better chance of + reconstructing a damaged file.</td> + </tr> - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> + <tr valign=top> + <td>Version Number</td> + <td>The version number for the group node. This + document describes version 1.</td> + </tr> - <tr valign=top> - <td>Node Signature</td> - <td>The value ASCII 'SNOD' is used to indicate the - beginning of a symbol table node. This gives file - consistency checking utilities a better chance of - reconstructing a damaged file.</td> - </tr> + <tr valign=top> + <td>Number of Symbols</td> + <td>Although all group nodes have the same length, + most contain fewer than the maximum possible number of + symbol entries. This field indicates how many entries + contain valid data. The valid entries are packed at the + beginning of the group node while the remaining + entries contain undefined values.</td> + </tr> - <tr valign=top> - <td>Version Number</td> - <td>The version number for the symbol table node. This - document describes version 1.</td> - </tr> + <tr valign=top> + <td>Group Entries</td> + <td>Each symbol has an entry in the group node. + The format of the entry is described below.</td> + </tr> + </table> + </center> - <tr valign=top> - <td>Number of Symbols</td> - <td>Although all symbol table nodes have the same length, - most contain fewer than the maximum possible number of - symbol entries. This field indicates how many entries - contain valid data. The valid entries are packed at the - beginning of the symbol table node while the remaining - entries contain undefined values.</td> - </tr> + <h3><a name="SymbolTableEntry"> + Disk Format: Level 1C - Group Entry </a></h3> - <tr valign=top> - <td>Symbol Table Entries</td> - <td>Each symbol has an entry in the symbol table node. - The format of the entry is described below.</td> - </tr> - </table> - </center> + <p>Each group entry in a group node is designed + to allow for very fast browsing of stored objects. + Toward that design goal, the group entries + include space for caching certain constant meta data from the + object header. - <h3><a name="SymbolTableEntry"> - Disk Format: Level 1C - Symbol-Table Entry </a></h3> + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Group Entry</B> + </caption> - <p>Each symbol table entry in a symbol table node is designed to allow - for very fast browsing of commonly stored scientific objects. - Toward that design goal, the format of the symbol-table entries - includes space for caching certain constant meta data from the - object header. + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> - <p> - <center> - <table border cellpadding=4 width="80%"> - <caption align=top> - <B>Symbol Table Entry</B> - </caption> + <tr align=center> + <td colspan=4>Name Offset (<size> bytes)</td> + </tr> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - </tr> + <tr align=center> + <td colspan=4>Object Header Address</td> + </tr> - <tr align=center> - <td colspan=4>Name Offset (<size> bytes)</td> - </tr> + <tr align=center> + <td colspan=4>Cache Type</td> + </tr> - <tr align=center> - <td colspan=4>Object Header Address</td> - </tr> + <tr align=center> + <td colspan=4>Reserved</td> + </tr> - <tr align=center> - <td colspan=4>Symbol-Type</td> - </tr> + <tr align=center> + <td colspan=4><br><br>Scratch-pad Space (16 bytes)<br><br><br></td> + </tr> + </table> + </center> - <tr align=center> - <td colspan=4>Reserved</td> - </tr> + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> - <tr align=center> - <td colspan=4><br><br>Scratch-pad Space (16 bytes)<br><br><br></td> - </tr> - </table> - </center> + <tr valign=top> + <td>Name Offset</td> + <td>This is the byte offset into the group local + heap for the name of the object. The name is null + terminated.</td> + </tr> - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> + <tr valign=top> + <td>Object Header Address</td> + <td>Every object has an object header which serves as a + permanent location for the object's meta data. In addition + to appearing in the object header, some meta data can be + cached in the scratch-pad space.</td> + </tr> - <tr valign=top> - <td>Name Offset</td> - <td>This is the byte offset into the symbol table local - heap for the name of the symbol. The name is null - terminated.</td> - </tr> + <tr valign=top> + <td>Cache Type</td> + <td>The cache type is determined from the object header. + It also determines the format for the scratch-pad space. + <br> + <dl compact> + <dt>0 + <dd>No data is cached by the group entry. This + is guaranteed to be the case when an object header + has a link count greater than one. + + <dt>1 + <dd>Object header meta data is cached in the group + entry. This implies that the group + entry refers to another group. + + <dt>2 + <dd>The entry is a symbolic link. The first four bytes + of the scratch-pad space are the offset into the local + heap for the link value. The object header address + will be undefined. + + <dt><em>N</em> + <dd>Other cache values can be defined later and + libraries that do not understand the new values will + still work properly. + </dl> + </td> + </tr> - <tr valign=top> - <td>Object Header Address</td> - <td>Every object has an object header which serves as a - permanent home for the object's meta data. In addition - to appearing in the object header, the meta data can be - cached in the scratch-pad space.</td> - </tr> + <tr valign=top> + <td>Reserved</td> + <td>These four bytes are present so that the scratch-pad + space is aligned on an eight-byte boundary. They are + always set to zero.</td> + </tr> - <tr valign=top> - <td>Symbol-Type</td> - <td>The symbol type is determined from the object header. - It also determines the format for the scratch-pad space. - The value zero indicates that no object header meta data - is cached in the symbol table entry. - <br> - <dl compact> - <dt>0 - <dd>No data is cached by the symbol table entry. This - is guaranteed to be the case when an object header - has a link count greater than one. - - <dt>1 - <dd>Symbol table meta data is cached in the symbol - table entry. This implies that the symbol table - entry refers to another symbol table. - - <dt>2 - <dd>The entry is a symbolic link. The first four bytes - of the scratch pad space are the offset into the local - heap for the link value. The object header address - will be undefined. - - <dt><em>N</em> - <dd>Other cache values can be defined later and - libraries that don't understand the new values will - still work properly. - </dl> - </td> - </tr> + <tr valign=top> + <td>Scratch-pad Space</td> + <td>This space is used for different purposes, depending + on the value of the Cache Type field. Any meta-data + about a dataset object represented in the scratch-pad + space is duplicated in the object header for that + dataset. This meta data can include the datatype + and the size of the dataspace for a dataset whose datatype + is atomic and whose dataspace is fixed and less than + four dimensions. + Furthermore, no data is cached in the group + entry scratch-pad space if the object header for + the group entry has a link count greater than + one.</td> + </tr> + </table> + </center> - <tr valign=top> - <td>Reserved</td> - <td>These for bytes are present so that the scratch pad - space is aligned on an eight-byte boundary. They are - always set to zero.</td> - </tr> + <h4>Format of the Scratch-pad Space</h4> - <tr valign=top> - <td>Scratch-Pad Space</td> - <td>This space is used for different purposes, depending - on the value of the Symbol Type field. Any meta-data - about a dataset object represented in the scratch-pad - space is duplicated in the object header for that - dataset. Furthermore, no data is cached in the symbol - table entry scratch-pad space if the object header for - the symbol table entry has a link count greater than - one.</td> - </tr> - </table> - </center> + <p>The group entry scratch-pad space is formatted + according to the value in the Cache Type field. - <p>The symbol table entry scratch-pad space is formatted - according to the value of the Symbol Type field. If the - Symbol Type field has the value zero then no information is - stored in the scratch pad space. + <p>If the Cache Type field contains the value zero + (<code>0</code>) then no information is + stored in the scratch-pad space. - <p>If the Symbol Type field is one, then the scratch pad space - contains cached meta data for another symbol table with the format: + <p>If the Cache Type field contains the value one + (<code>1</code>), then the scratch-pad space + contains cached meta data for another object header + in the following format: - <p> - <center> - <table border cellpadding=4 width="80%"> - <caption align=top> - <B>Symbol Table Scratch-Pad Format</B> - </caption> + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Object Header Scratch-pad Format</B> + </caption> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> - <tr align=center> - <td colspan=4>Address of B-tree</td> + <tr align=center> + <td colspan=4>Address of B-tree</td> - <tr align=center> - <td colspan=4>Address of Name Heap</td> - </table> - </center> + <tr align=center> + <td colspan=4>Address of Name Heap</td> + </table> + </center> - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> - <tr valign=top> - <td>Address of B-tree</td> - <td>This is the file address for the symbol table's - B-tree.</td> - </tr> + <tr valign=top> + <td>Address of B-tree</td> + <td>This is the file address for the root of the + group's B-tree.</td> + </tr> - <tr valign=top> - <td>Address of Name Heap</td> - <td>This is the file address for the symbol table's local - heap that stores the symbol names.</td> - </tr> - </table> - </center> + <tr valign=top> + <td>Address of Name Heap</td> + <td>This is the file address for the group's local + heap, in which are stored the symbol names.</td> + </tr> + </table> + </center> - <p> - <center> - <table border cellpadding=4 width="80%"> - <caption align=top> - <B>Symbolic Link Scratch-Pad Format</B> - </caption> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - </tr> + <p>If the Cache Type field contains the value two + (<code>2</code>), then the scratch-pad space + contains cached meta data for another symbolic link + in the following format: - <tr align=center> - <td colspan=4>Offset to Link Value</td> - </tr> - </table> - </center> + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Symbolic Link Scratch-pad Format</B> + </caption> - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> - <tr valign=top> - <td>Offset to Link Value</td> - <td>The value of a symbolic link (that is, the name of the - thing to which it points) is stored in the local heap. - This field is the 4-byte offset into the local heap for - the start of the link value, which is null terminated.</td> - </tr> - </table> - </center> + <tr align=center> + <td colspan=4>Offset to Link Value</td> + </tr> + </table> + </center> - <h3><a name="LocalHeap">Disk Format: Level 1D - Local Heaps</a></h3> + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> - <p>A heap is a collection of small heap objects. Objects can be - inserted and removed from the heap at any time and the address - of a heap doesn't change once the heap is created. Note: this - is the "local" version of the heap mostly intended for the - storage of names in a symbol table. The storage of small - objects in a global heap is described below. + <tr valign=top> + <td>Offset to Link Value</td> + <td>The value of a symbolic link (that is, the name of the + thing to which it points) is stored in the local heap. + This field is the 4-byte offset into the local heap for + the start of the link value, which is null terminated.</td> + </tr> + </table> + </center> - <p> - <center> - <table border cellpadding=4 width="80%"> - <caption align=top> - <b>Local Heaps</b> - </caption> + <h3><a name="LocalHeap">Disk Format: Level 1D - Local Heaps</a></h3> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - </tr> + <p>A heap is a collection of small heap objects. Objects can be + inserted and removed from the heap at any time. + The address of a heap does not change once the heap is created. + References to objects are stored in the group table; + the names of those objects are stored in the local heap. - <tr align=center> - <td colspan=4>Heap Signature</td> - </tr> + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Local Heaps</b> + </caption> - <tr align=center> - <td colspan=4>Reserved (zero)</td> - </tr> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> - <tr align=center> - <td colspan=4>Data Segment Size</td> - </tr> + <tr align=center> + <td colspan=4>Heap Signature</td> + </tr> - <tr align=center> - <td colspan=4>Offset to Head of Free-list (<size> bytes)</td> - </tr> + <tr align=center> + <td colspan=4>Reserved (zero)</td> + </tr> - <tr align=center> - <td colspan=4>Address of Data Segment</td> - </tr> - </table> - </center> + <tr align=center> + <td colspan=4>Data Segment Size</td> + </tr> - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> + <tr align=center> + <td colspan=4>Offset to Head of Free-list (<size> bytes)</td> + </tr> - <tr valign=top> - <td>Heap Signature</td> - <td>The valid ASCII 'HEAP' is used to indicate the - beginning of a heap. This gives file consistency - checking utilities a better chance of reconstructing a - damaged file.</td> - </tr> + <tr align=center> + <td colspan=4>Address of Data Segment</td> + </tr> + </table> + </center> - <tr valign=top> - <td>Data Segment Size</td> - <td>The total amount of disk memory allocated for the heap - data. This may be larger than the amount of space - required by the object stored in the heap. The extra - unused space holds a linked list of free blocks.</td> - </tr> + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> - <tr valign=top> - <td>Offset to Head of Free-list</td> - <td>This is the offset within the heap data segment of the - first free block (or all 0xff bytes if there is no free - block). The free block contains <size> bytes that - are the offset of the next free chunk (or all 0xff bytes - if this is the last free chunk) followed by <size> - bytes that store the size of this free chunk.</td> - </tr> + <tr valign=top> + <td>Heap Signature</td> + <td>The ASCII character string <code>HEAP</code> + is used to indicate the + beginning of a heap. This gives file consistency + checking utilities a better chance of reconstructing a + damaged file.</td> + </tr> - <tr valign=top> - <td>Address of Data Segment</td> - <td>The data segment originally starts immediately after - the heap header, but if the data segment must grow as a - result of adding more objects, then the data segment may - be relocated to another part of the file.</td> - </tr> - </table> - </center> + <tr valign=top> + <td>Data Segment Size</td> + <td>The total amount of disk memory allocated for the heap + data. This may be larger than the amount of space + required by the object stored in the heap. The extra + unused space holds a linked list of free blocks.</td> + </tr> - <p>Objects within the heap should be aligned on an 8-byte boundary. + <tr valign=top> + <td>Offset to Head of Free-list</td> + <td>This is the offset within the heap data segment of the + first free block (or all 0xff bytes if there is no free + block). The free block contains <size> bytes that + are the offset of the next free chunk (or all 0xff bytes + if this is the last free chunk) followed by <size> + bytes that store the size of this free chunk.</td> + </tr> - <h3><a name="GlobalHeap">Disk Format: Level 1E - Global Heap</a></h3> + <tr valign=top> + <td>Address of Data Segment</td> + <td>The data segment originally starts immediately after + the heap header, but if the data segment must grow as a + result of adding more objects, then the data segment may + be relocated, in its entirety, to another part of the + file.</td> + </tr> + </table> + </center> - <p>Each HDF5 file has a global heap which stores various types of - information which is typically shared between datasets. The - global heap was designed to satisfy these goals: + <p>Objects within the heap should be aligned on an 8-byte boundary. - <ol type="A"> - <li>Repeated access to a heap object must be efficient without - resulting in repeated file I/O requests. Since global heap - objects will typically be shared among several datasets it's - probable that the object will be accessed repeatedly. + <h3><a name="GlobalHeap">Disk Format: Level 1E - Global Heap</a></h3> - <br><br> - <li>Collections of related global heap objects should result in - fewer and larger I/O requests. For instance, a dataset of - void pointers will have a global heap object for each - pointer. Reading the entire set of void pointer objects - should result in a few large I/O requests instead of one small - I/O request for each object. + <p>Each HDF5 file has a global heap which stores various types of + information which is typically shared between datasets. The + global heap was designed to satisfy these goals: - <br><br> - <li>It should be possible to remove objects from the global heap - and the resulting file hole should be eligible to be reclaimed - for other uses. - <br><br> - </ol> + <ol type="A"> + <li>Repeated access to a heap object must be efficient without + resulting in repeated file I/O requests. Since global heap + objects will typically be shared among several datasets, it is + probable that the object will be accessed repeatedly. - <p>The implementation of the heap makes use of the memory - management already available at the file level and combines that - with a new top-level object called a <em>collection</em> to - achieve Goal B. The global heap is the set of all collections. - Each global heap object belongs to exactly one collection and - each collection contains one or more global heap objects. For - the purposes of disk I/O and caching, a collection is treated as - an atomic object. + <br><br> + <li>Collections of related global heap objects should result in + fewer and larger I/O requests. For instance, a dataset of + void pointers will have a global heap object for each + pointer. Reading the entire set of void pointer objects + should result in a few large I/O requests instead of one small + I/O request for each object. - <p> - <center> - <table border cellpadding=4 width="80%"> - <caption align=top> - <B>Global Heap Collection</B> - </caption> + <br><br> + <li>It should be possible to remove objects from the global heap + and the resulting file hole should be eligible to be reclaimed + for other uses. + <br><br> + </ol> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - </tr> + <p>The implementation of the heap makes use of the memory + management already available at the file level and combines that + with a new top-level object called a <em>collection</em> to + achieve Goal B. The global heap is the set of all collections. + Each global heap object belongs to exactly one collection and + each collection contains one or more global heap objects. For + the purposes of disk I/O and caching, a collection is treated as + an atomic object. - <tr align=center> - <td colspan=4>Magic Number</td> - </tr> - - <tr align=center> - <td>Version</td> - <td colspan=3>Reserved</td> - </td> - - <tr align=center> - <td colspan=4>Collection Size</td> - </tr> - - <tr align=center> - <td colspan=4><br>Object 1<br><br></td> - </tr> + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>A Global Heap Collection</B> + </caption> - <tr align=center> - <td colspan=4><br>Object 2<br><br></td> - </tr> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> - <tr align=center> - <td colspan=4><br>...<br><br></td> - </tr> + <tr align=center> + <td colspan=4>Magic Number</td> + </tr> + + <tr align=center> + <td>Version</td> + <td colspan=3>Reserved</td> + </td> + + <tr align=center> + <td colspan=4>Collection Size</td> + </tr> + + <tr align=center> + <td colspan=4><br>Global Heap Object 1 + <i>(described below)</i><br><br></td> + </tr> - <tr align=center> - <td colspan=4><br>Object <em>N</em><br><br></td> - </tr> + <tr align=center> + <td colspan=4><br>Global Heap Object 2<br><br></td> + </tr> - <tr align=center> - <td colspan=4><br>Object 0 (free space)<br><br></td> - </tr> - </table> - </center> + <tr align=center> + <td colspan=4><br>...<br><br></td> + </tr> - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> + <tr align=center> + <td colspan=4><br>Global Heap Object <em>N</em><br><br></td> + </tr> - <tr valign=top> - <td>Magic Number</td> - <td>The magic number for global heap collections are the - four bytes `G', `C', `O', `L'.</td> - </tr> - - <tr valign=top> - <td>Version</td> - <td>Each collection has its own version number so that new - collections can be added to old files. This document - describes version zero of the collections. - </tr> + <tr align=center> + <td colspan=4><br>Global Heap Object 0 (free space)<br><br></td> + </tr> + </table> + </center> - <tr valign=top> - <td>Collection Data Size</td> - <td>This is the size in bytes of the entire collection - including this field. The default (and minimum) - collection size is 4096 bytes which is a typical file - system block size and which allows for 170 16-byte heap - objects plus their overhead.</td> - </tr> + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> - <tr valign=top> - <td>Object <em>i</em> for positive <em>i</em></td> <td>The - objects are stored in any order with no intervening unused - space.</td> - </tr> + <tr valign=top> + <td>Magic Number</td> + <td>The magic number for global heap collections are the + four bytes <code>G</code>, <code>C</code>, <code>O</code>, + and <code>L</code>.</td> + </tr> + + <tr valign=top> + <td>Version</td> + <td>Each collection has its own version number so that new + collections can be added to old files. This document + describes version zero of the collections. + </tr> - <tr valign=top> - <td>Object 0</td> - <td>Object zero, when present, represents the free space in - the collection. Free space always appears at the end of - the collection. If the free space is too small to store - the header for object zero (described below) then the - header is implied. - </table> - </center> - - <p> - <center> - <table border cellpadding=4 width="80%"> - <caption align=top> - <B>Global Heap Object</B> - </caption> + <tr valign=top> + <td>Collection Data Size</td> + <td>This is the size in bytes of the entire collection + including this field. The default (and minimum) + collection size is 4096 bytes which is a typical file + system block size and which allows for 170 16-byte heap + objects plus their overhead.</td> + </tr> - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - </tr> - - <tr align=center> - <td colspan=2>Object ID</td> - <td colspan=2>Reference Count</td> - </tr> + <tr valign=top> + <td>Object 1 through <em>N</em></td> + <td>The objects are stored in any order with no + intervening unused space.</td> + </tr> - <tr align=center> - <td colspan=4>Reserved</td> - </tr> + <tr valign=top> + <td>Object 0</td> + <td>Object 0 (zero), when present, represents the free space in + the collection. Free space always appears at the end of + the collection. If the free space is too small to store + the header for Object 0 (described below) then the + header is implied and the collection contains no free space. + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <B>Global Heap Object</B> + </caption> - <tr align=center> - <td colspan=4>Object Data Size</td> - </tr> + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + </tr> + + <tr align=center> + <td colspan=2>Object ID</td> + <td colspan=2>Reference Count</td> + </tr> - <tr align=center> - <td colspan=4><br>Object Data<br><br></td> - </tr> - </table> - </center> + <tr align=center> + <td colspan=4>Reserved</td> + </tr> - <p> - <center> - <table align=center width="80%"> - <tr> - <th width="30%">Field Name</th> - <th width="70%">Description</th> - </tr> + <tr align=center> + <td colspan=4>Object Data Size</td> + </tr> - <tr valign=top> - <td>Object ID</td> - <td>Each object has a unique identification number within a - collection. The identification numbers are chosen so that - new objects have the smallest value possible with the - exception that the identifier `0' always refers to the - object which represents all free space within the - collection.</td> - </tr> + <tr align=center> + <td colspan=4><br>Object Data<br><br></td> + </tr> + </table> + </center> - <tr valign=top> - <td>Reference Count</td> - <td>All heap objects have a reference count field. An - object which is referenced from some other part of the - file will have a positive reference count. The reference - count for Object zero is always zero.</td> - </tr> + <p> + <center> + <table align=center width="80%"> + <tr> + <th width="30%">Field Name</th> + <th width="70%">Description</th> + </tr> - <tr valign=top> - <td>Reserved</td> - <td>Zero padding to align next field on an 8-byte - boundary.</td> - </tr> + <tr valign=top> + <td>Object ID</td> + <td>Each object has a unique identification number within a + collection. The identification numbers are chosen so that + new objects have the smallest value possible with the + exception that the identifier <code>0</code> always refers to the + object which represents all free space within the + collection.</td> + </tr> - <tr valign=top> - <td>Object Size</td> <td>This is the size of the the fields - above plus the object data stored for the object. The - actual storage size is rounded up to a multiple of - eight.</td> - </tr> + <tr valign=top> + <td>Reference Count</td> + <td>All heap objects have a reference count field. An + object which is referenced from some other part of the + file will have a positive reference count. The reference + count for Object 0 is always zero.</td> + </tr> - <tr valign=top> - <td>Object Data</td> - <td>The object data is treated as a one-dimensional array - of bytes to be interpreted by the caller.</td> - </tr> - </table> - </center> + <tr valign=top> + <td>Reserved</td> + <td>Zero padding to align next field on an 8-byte + boundary.</td> + </tr> - <h3><a name="FreeSpaceIndex">Disk Format: Level 1F - Free-Space - Index (NOT FULLY DEFINED)</a></h3> + <tr valign=top> + <td>Object Size</td> <td>This is the size of the the fields + above plus the object data stored for the object. The + actual storage size is rounded up to a multiple of + eight.</td> + </tr> - <p>The Free-Space Index is a collection of blocks of data, - dispersed throughout the file, which are currently not used by - any file objects. The blocks of data are indexed by a B-tree of - their length within the file. + <tr valign=top> + <td>Object Data</td> + <td>The object data is treated as a one-dimensional array + of bytes to be interpreted by the caller.</td> + </tr> + </table> + </center> - <p>Each B-Tree page is composed of the following entries and - B-tree management information, organized as follows: + <h3><a name="FreeSpaceIndex">Disk Format: Level 1F - Free-space Heap</a></h3> - <p> - <center> - <table border cellpadding=4 width="80%"> - <caption align=bottom> - <B>HDF5 Free-Space Heap Page</B> - </caption> + <p>The Free-space Index is a collection of blocks of data, + dispersed throughout the file, which are currently not used by + any file objects. - <tr align=center> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> - <th width="25%">byte</th> + <p>The super block contains a pointer to root of the free-space description; + that pointer is currently (i.e., in HDF5 Release 1.2) required + to be the undefined address <code>0xfff...ff</code>. + <p>The free-sapce index is not otherwise publicly defined at this time. + + + <!-- + <p>The Free-space Index is a collection of blocks of data, + dispersed throughout the file, which are currently not used by + any file objects. The blocks of data are indexed by a B-tree of + their length within the file. + + + <p>Each B-tree page is composed of the following entries and + B-tree management information, organized as follows: + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=bottom> + <B>HDF5 Free-space Heap Page</B> + </caption> + + <tr align=center> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + <th width="25%">byte</th> + + <tr align=center> + <td colspan=4>Free-space Heap Signature</td> + <tr align=center> + <td colspan=4>B-tree Left-link Offset</td> + <tr align=center> + <td colspan=4><br>Length of Free-block #1<br> <br></td> <tr align=center> - <td colspan=4>Free-Space Heap Signature</td> - <tr align=center> - <td colspan=4>B-Tree Left-Link Offset</td> - <tr align=center> - <td colspan=4><br>Length of Free-Block #1<br> <br></td> - <tr align=center> - <td colspan=4><br>Offset of Free-Block #1<br> <br></td> + <td colspan=4><br>Offset of Free-block #1<br> <br></td> <tr align=center> <td colspan=4>.<br>.<br>.<br></td> <tr align=center> - <td colspan=4><br>Length of Free-Block #n<br> <br></td> + <td colspan=4><br>Length of Free-block #n<br> <br></td> <tr align=center> - <td colspan=4><br>Offset of Free-Block #n<br> <br></td> + <td colspan=4><br>Offset of Free-block #n<br> <br></td> <tr align=center> <td colspan=4>"High" Offset</td> <tr align=center> - <td colspan=4>Right-Link Offset</td> + <td colspan=4>Right-link Offset</td> </table> </center> @@ -1412,25 +1566,26 @@ each high-level object. <dt> The elements of the free-space heap page are described below: <dd> <dl> - <dt>Free-Space Heap Signature: (4 bytes) - <dd>The value ASCII: 'FREE' is used to indicate the - beginning of a free-space heap B-Tree page. This gives + <dt>Free-space Heap Signature: (4 bytes) + <dd>The ASCII character string <code>FREE</code> + is used to indicate the + beginning of a free-space heap B-tree page. This gives file consistency checking utilities a better chance of reconstructing a damaged file. - <dt>B-Tree Left-Link Offset: (<offset> bytes) + <dt>B-tree Left-link Offset: (<offset> bytes) <dd>This value is used to indicate the offset of all offsets in the B-link-tree which are smaller than the value of the offset in entry #1. This value is also used to indicate a leaf node in the B-link-tree by being set to all ones. - <dt>Length of Free-Block #n: (<length> bytes) - <dd>This value indicates the length of an un-used block in + <dt>Length of Free-block #n: (<length> bytes) + <dd>This value indicates the length of an unused block in the file. - <dt>Offset of Free-Block #n: (<offset> bytes) + <dt>Offset of Free-block #n: (<offset> bytes) <dd>This value indicates the offset in the file of an - un-used block in the file. + unused block in the file. <dt>"High" Offset: (4-bytes) <dd>This offset is used as the upper bound on offsets @@ -1438,18 +1593,24 @@ each high-level object. <dt>Right-link Offset: (<offset> bytes) <dd>This value is used to indicate the offset of the next - child to the right of the parent of this object directory + child to the right of the parent of this group page. When there is no node to the right, this value is all zeros. </dl> </dl> <p>The algorithms for searching and inserting objects in the - B-tree pages are described fully in the Lehman & Yao paper, + B-tree pages are described fully in the Lehman and Yao paper, which should be read to provide a full description of the - B-Tree's usage. + B-tree's usage. +--> + - <h3><a name="DataObject">Disk Format: Level 2 - Data Objects </a></h3> +<br><br> +<br><br> + + + <h2><a name="DataObject">Disk Format: Level 2 - Data Objects </a></h2> <p>Data objects contain the real information in the file. These objects compose the scientific data and other information which @@ -1500,7 +1661,7 @@ each high-level object. <td colspan=4>Object Reference Count</td> </tr> <tr align=center> - <td colspan=4><br>Total Object-Header Size<br><br></td> + <td colspan=4><br>Total Object Header Size<br><br></td> </tr> <tr align=center> <td colspan=2>Header Message Type #1</td> @@ -1539,10 +1700,10 @@ each high-level object. </tr> <tr valign=top> - <td>Version # of the object header</td> + <td>Version number of the object header</td> <td>This value is used to determine the format of the information in the object header. When the format of the - information in the object header is changed, the version # + information in the object header is changed, the version number is incremented and can be used to determine how the information in the object header is formatted.</td> </tr> @@ -1563,11 +1724,11 @@ each high-level object. <td>Object Reference Count</td> <td>This value specifies the number of references to this object within the current file. References to the - data-object from external files are not tracked.</td> + data object from external files are not tracked.</td> </tr> <tr valign=top> - <td>Total Object-Header Size</td> + <td>Total Object Header Size</td> <td>This value specifies the total number of bytes of header message data following this length field for the current message as well as any continuation data located elsewhere @@ -1581,7 +1742,7 @@ each high-level object. the type along with a small amount of other information. Bit 15 of the message type is set if the message is constant (constant messages cannot be changed since they - may be cached in symbol table entries throughout the + may be cached in group entries throughout the file). The header message types for the pre-defined header messages will be included in further discussion below.</td> @@ -1602,11 +1763,11 @@ each high-level object. <dl> <dt><code>0</code> <dd>If set, the message data is constant. This is used - for messages like the data type message of a dataset. + for messages like the datatype message of a dataset. <dt><code>1</code> <dd>If set, the message is stored in the global heap and the Header Message Data field contains a Shared Object - message. and the Size of Header Message Data field + message and the Size of Header Message Data field contains the size of that Shared Object message. <dt><code>2-7</code> <dd>Reserved @@ -1636,7 +1797,7 @@ each high-level object. <P>The following is a list of currently defined header messages: <hr> - <h3><a name="NILMessage">Name: NIL</a></h3> + <h4><a name="NILMessage">Name: NIL</a></h4> <b>Type: </b>0x0000<br> <b>Length:</b> varies<br> <b>Status:</b> Optional, may be repeated.<br> @@ -1645,31 +1806,38 @@ each high-level object. which is to be ignored when reading the header messages for a data object. [Probably one which has been deleted for some reason.]<br> <b>Format of Data:</b> Unspecified.<br> + +<!-- Delete examples throughout doc <b>Examples:</b> None. +--> <hr> - <h3><a name="SimpleDataSpace">Name: Simple Data Space</a></h3> + <h4><a name="SimpleDataSpace">Name: Simple Dataspace</a></h4> <b>Type: </b>0x0001<br> - <b>Length:</b> varies<br> - <b>Status:</b> One of the <em>Simple Data Space</em> or - <em>Data-Space</em> messages is required (but not both) and may - not be repeated.<br> + <b>Length:</b> Varies according to the number of dimensions, + as described in the following table<br> + <b>Status:</b> The <em>Simple Dataspace</em> message is required + and may not be repeated. This message is currently used with + datasets and named dataspaces.<br> - <p>The <em>Simple Dimensionality</em> message describes the number + <p>The <em>Simple Dataspace</em> message describes the number of dimensions and size of each dimension that the data object has. This message is only used for datasets which have a - simple, rectilinear grid layout, datasets requiring a more - complex layout (irregularly or unstructured grids, etc) must use - the <em>Data-Space</em> message for expressing the space the - dataset inhabits. + simple, rectilinear grid layout; datasets requiring a more + complex layout (irregularly structured or unstructured grids, etc.) + must use the <em>Complex Dataspace</em> message for expressing + the space the dataset inhabits. + <i>(Note: The <em>Complex Dataspace</em> functionality is + not yet implemented (as of HDF5 Release 1.2). It is not described + in this document.)</i> <p> <center> <table border cellpadding=4 width="80%"> <caption align=top> - <b>Simple Data Space Message</b> + <b>Simple Dataspace Message</b> </caption> <tr align=center> @@ -1720,6 +1888,15 @@ each high-level object. </tr> <tr valign=top> + <td>Version </td> + <td>This value is used to determine the format of the + Simple Dataspace Message. When the format of the + information in the message is changed, the version number + is incremented and can be used to determine how the + information in the object header is formatted.</td> + </tr> + + <tr valign=top> <td>Dimensionality</td> <td>This value is the number of dimensions that the data object has.</td> @@ -1766,6 +1943,7 @@ each high-level object. </table> </center> +<!-- Delete examples throughout doc <h4>Examples</h4> <dl> <dt> Example #1 @@ -1780,8 +1958,8 @@ each high-level object. of 30x24x3 slabs of data being written out in an unlimited series every several minutes as timestep data (currently there are five slabs). The number of dimensions is 4. The first - dimension size is 5 and it's maximum is <UNLIMITED>. The - second through fourth dimensions' size and maximum value are + dimension size is 5 and its maximum is <UNLIMITED>. The + second through fourth dimension's size and maximum value are set to 3, 24, and 30 respectively. <dt>Example #3 @@ -1793,21 +1971,24 @@ each high-level object. with another string of a different size. (This could also be stored as a scalar dataset with number-type set to "string") </dl> +--> +<!-- DELETE ENTIRE DATASPACE SECTION --> +<!-- <hr> - <h3><a name="DataSpaceMessage">Name: Data-Space (Fiber Bundle?)</a></h3> + <h4><a name="DataSpaceMessage">Name: Complex Dataspace (Fiber Bundle?)</a></h4> <b>Type: </b>0x0002<br> <b>Length:</b> varies<br> - <b>Status:</b> One of the <em>Simple Dimensionality</em> or - <em>Data-Space</em> messages is required (but not both) and may + <b>Status:</b> One of the <em>Simple Dataspace</em> or + <em>Complex Dataspace</em> messages is required (but not both) and may not be repeated.<br> <b>Purpose and Description:</b> The - <em>Data-Space</em> message describes space that the dataset is + <em>Dataspace</em> message describes space that the dataset is mapped onto in a more comprehensive way than the <em>Simple Dimensionality</em> message is capable of handling. The - data-space of a dataset encompasses the type of coordinate system + dataspace of a dataset encompasses the type of coordinate system used to locate the dataset's elements as well as the structure and - regularity of the coordinate system. The data-space also + regularity of the coordinate system. The dataspace also describes the number of dimensions which the dataset inhabits as well as a possible higher dimensional space in which the dataset is located within. @@ -1818,7 +1999,7 @@ each high-level object. <center> <table border cellpadding=4 width="80%"> <caption align=bottom> - <B>HDF5 Data-Space Message Layout</B> + <B>HDF5 Dataspace Message Layout</B> </caption> <tr align=center> @@ -1849,7 +2030,7 @@ each high-level object. <center> <table border cellpadding=4 width="80%"> <caption align=bottom> - <B>HDF5 Mesh-Type Layout</B> + <B>HDF5 Mesh-type Layout</B> </caption> <tr align=center> @@ -1868,7 +2049,7 @@ each high-level object. The following are the definitions of mesh-type bytes: <dl> <dt>Mesh Embedding - <dd>This value indicates whether the dataset data-space + <dd>This value indicates whether the dataset dataspace is located within another dataspace or not: <dl> <dl> @@ -1876,8 +2057,8 @@ each high-level object. <dd>The dataset mesh is self-contained and is not embedded in another mesh. <dt><EMBEDDED> - <dd>The dataset's data-space is located within - another data-space, as + <dd>The dataset's dataspace is located within + another dataspace, as described in information below. </dl> </dl> <dt>Coordinate System @@ -1905,7 +2086,7 @@ each high-level object. <dt><UNSTRUCTURED> <dd>Grid-points locations in each dimension are explicitly defined and - may be of any numeric data-type. + may be of any numeric datatype. </dl> </dl> <dt>Regularity <dd>This value defines the locations of the dataset @@ -1930,7 +2111,7 @@ each high-level object. <dt><CARTESIAN-UNSTRUCTURED-IRREGULAR> </dl> </dl> All of the above grid types can be embedded within another - data-space. + dataspace. <br> <br> <dt>Logical Dimensionality: (unsigned 32-bit integer) <dd>This value is the number of dimensions that the dataset occupies. @@ -1939,7 +2120,7 @@ each high-level object. <center> <table border cellpadding=4 width="80%"> <caption align=bottom> - <B>HDF5 Data-Space Embedded Dimensionality Information</B> + <B>HDF5 Dataspace Embedded Dimensionality Information</B> </caption> <tr align=center> @@ -1973,22 +2154,22 @@ each high-level object. which is a subset of another 3-D space, etc. <dt>Embedded Dimension Size: (unsigned 32-bit integer) <dd>These values are the sizes of the dimensions of the - embedded data-space + embedded dataspace that the dataset is located within. <dt>Embedded Origin Location: (unsigned 32-bit integer) <dd>These values comprise the location of the dataset's - origin within the embedded data-space. + origin within the embedded dataspace. </dl> </dl> [Comment: need some way to handle different orientations of the - dataset data-space - within the embedded data-space]<br> + dataset dataspace + within the embedded dataspace]<br> <P> <center> <table border cellpadding=4 width="80%"> <caption align=bottom> - <B>HDF5 Data-Space Structured/Regular Grid Information</B> + <B>HDF5 Dataspace Structured/Regular Grid Information</B> </caption> <tr align=center> @@ -2036,7 +2217,7 @@ each high-level object. <center> <table border cellpadding=4 width="80%"> <caption align=bottom> - <B>HDF5 Data-Space Structured/Irregular Grid Information</B> + <B>HDF5 Dataspace Structured/Irregular Grid Information</B> </caption> <tr align=center> @@ -2052,7 +2233,7 @@ each high-level object. <tr align=center> <td colspan=4># of Grid Points in Dimension #n</td> <tr align=center> - <td colspan=4>Data-Type of Grid Point Locations</td> + <td colspan=4>Datatype of Grid Point Locations</td> <tr align=center> <td colspan=4>Location of Grid Points in Dimension #1</td> <tr align=center> @@ -2066,7 +2247,7 @@ each high-level object. <center> <table border cellpadding=4 width="80%"> <caption align=bottom> - <B>HDF5 Data-Space Unstructured Grid Information</B> + <B>HDF5 Dataspace Unstructured Grid Information</B> </caption> <tr align=center> @@ -2078,7 +2259,7 @@ each high-level object. <tr align=center> <td colspan=4># of Grid Points</td> <tr align=center> - <td colspan=4>Data-Type of Grid Point Locations</td> + <td colspan=4>Datatype of Grid Point Locations</td> <tr align=center> <td colspan=4>Grid Point Locations<br>.<br>.<br></td> </table> @@ -2086,28 +2267,29 @@ each high-level object. <h4><a name="DataSpaceExample">Examples:</a></h4> Need some good examples, this is complex! +--> <hr> - <h3><a name="DataTypeMessage">Name: Data Type</a></h3> + <h4><a name="DataTypeMessage">Name: Datatype</a></h4> <b>Type:</b> 0x0003<br> <b>Length:</b> variable<br> - <b>Status:</b> One required per dataset<br> + <b>Status:</b> One required per dataset or named datatype<br> - <p>The data type message defines the data type for each data point - of a dataset. A data type can describe an atomic type like a + <p>The datatype message defines the datatype for each data point + of a dataset. A datatype can describe an atomic type like a fixed- or floating-point type or a compound type like a C - struct. A data type does not, however, describe how data points - are combined to produce a dataset. Data types are stored on disk - as a data type message, which is a list of data type classes and + struct. A datatype does not, however, describe how data points + are combined to produce a dataset. Datatypes are stored on disk + as a datatype message, which is a list of datatype classes and their associated properties. <p> <center> <table border cellpadding=4 width="80%"> <caption align=top> - <b>Data Type Message</b> + <b>Datatype Message</b> </caption> <tr align=center> @@ -2136,16 +2318,17 @@ each high-level object. on the Type Class, which is the low-order four bits of the Type Class and Version field (the high-order four byte are the version which should be set to the value one). The type class - is one of: 0 (fixed-point number), 1 (floating-point number), 2 - (date and time), 3 (text string), 4 (bit field), 5 (opaque), 6 - (compound). The Class Bit Field is zero and the size of the + is one of 0 (fixed-point number), 1 (floating-point number), + 2 (date and time), 3 (text string), 4 (bit field), 5 (opaque), + 6 (compound), 7 (reference), 8 (enumeration), or 9 (variable-length). + The Class Bit Field is zero and the size of the Properties field is zero except for the cases noted here. <p> <center> <table border cellpadding=4 width="80%"> <caption align=top> - <b>Bit Field for Fixed-Point Numbers (Class 0)</b> + <b>Bit Field for Fixed-point Numbers (Class 0)</b> </caption> <tr align=center> @@ -2184,7 +2367,7 @@ each high-level object. <center> <table border cellpadding=4 width="80%"> <caption align=top> - <b>Properties for Fixed-Point Numbers (Class 0)</b> + <b>Properties for Fixed-point Numbers (Class 0)</b> </caption> <tr align=center> @@ -2205,7 +2388,7 @@ each high-level object. <center> <table border cellpadding=4 width="80%"> <caption align=top> - <b>Bit Field for Floating-Point Numbers (Class 1)</b> + <b>Bit Field for Floating-point Numbers (Class 1)</b> </caption> <tr align=center> @@ -2261,7 +2444,7 @@ each high-level object. <center> <table border cellpadding=4 width="80%"> <caption align=top> - <b>Properties for Floating-Point Numbers (Class 1)</b> + <b>Properties for Floating-point Numbers (Class 1)</b> </caption> <tr align=center> @@ -2459,7 +2642,7 @@ each high-level object. <tr valign=top> <td>0-15</td> <td><b>Number of Members.</b> This field contains the number - of members defined for the compound data type. The member + of members defined for the compound datatype. The member definitions are listed in the Properties field of the data type message. </tr> @@ -2471,10 +2654,10 @@ each high-level object. </table> </center> - <p>The Properties field of a compound data type is a list of the - member definitions of the compound data type. The member + <p>The Properties field of a compound datatype is a list of the + member definitions of the compound datatype. The member definitions appear one after another with no intervening bytes. - The member types are described with a recursive data type + The member types are described with a recursive datatype message. <p> @@ -2536,11 +2719,88 @@ each high-level object. </table> </center> - <p>Data type examples are <a href="Datatypes.html">here</a>. + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Bit Field for Enumeration types (Class 8)</b> + </caption> + + <tr align=center> + <th width="10%">Bits</th> + <th width="90%">Meaning</th> + </tr> + + <tr valign=top> + <td>0-15</td> + <td><b>Number of Members.</b> The number of name/value + pairs defined for the enumeration type.</td> + </tr> + + <tr valign=top> + <td>16-23</td> + <td>Reserved (zero).</td> + </tr> + </table> + </center> + + <p> + <center> + <table border cellpadding=4 width="80%"> + <caption align=top> + <b>Properties for Enumeration types (Class 8)</b> + </caption> + + <tr align=center> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + <th width="25%">Byte</th> + </tr> + + <tr align=center> + <td colspan=4><br>Parent Type<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Names<br><br></td> + </tr> + + <tr align=center> + <td colspan=4><br>Values<br><br></td> + </tr> + + </table> + </center> + + <center> + <table border=0 cellpadding=4 width="80%"> + <tr align=left valign=top> + <td valign=top width=20%>Parent Type:</td> + <td valign=top>Each enumeration type is based on some parent type, + usually an integer. The information for that parent type is + described recursively by this field.</td> + </tr><tr align=left valign=top> + <td valign=top>Names:</td> + <td valign=top>The name for each name/value pair. Each name is + stored as a null terminated ASCII string in a multiple of + eight bytes. The names are in no particular order.</td> + </tr><tr align=left valign=top> + <td valign=top>Values:</td> + <td valign=top>The list of values in the same order as the names. + The values are packed (no inter-value padding) and the + size of each value is determined by the parent type.</td> + </tr> + </table> + </center> + +<!-- + <p>Datatype examples are <a href="Datatypes.html">here</a>. +--> <hr> - <h3><a name="FillValueMessage">Name: Data Storage - Fill Value</a></h3> + <h4><a name="FillValueMessage">Name: Data Storage - Fill Value</a></h4> <b>Type:</b> 0x0004<br> <b>Length:</b> varies<br> <b>Status:</b> Optional, may not be repeated.<br> @@ -2548,7 +2808,7 @@ each high-level object. <p>The fill value message stores a single data point value which is returned to the application when an uninitialized data point is read from the dataset. The fill value is interpretted with - the same data type as the dataset. If no fill value message is + the same datatype as the dataset. If no fill value message is present then a fill value of all zero is assumed. <p> @@ -2591,14 +2851,13 @@ each high-level object. <tr valign=top> <td>Fill Value</td> <td>The fill value. The bytes of the fill value are - interpreted using the same data type as for the dataset.</td> + interpreted using the same datatype as for the dataset.</td> </tr> </table> </center> <hr> - <h3><a name="ReservedMessage_0005">Name: Reserved - Not Assigned - Yet</a></h3> + <h4><a name="ReservedMessage_0005">Name: Reserved - Not Assigned Yet</a></h4> <b>Type:</b> 0x0005<br> <b>Length:</b> N/A<br> <b>Status:</b> N/A<br> @@ -2606,7 +2865,7 @@ each high-level object. <hr> - <h3><a name="CompactDataStorageMessage">Name: Data Storage - Compact</a></h3> + <h4><a name="CompactDataStorageMessage">Name: Data Storage - Compact</a></h4> <b>Type:</b> 0x0006<br> <b>Length:</b> varies<br> @@ -2614,23 +2873,25 @@ each high-level object. <p>This message indicates that the data for the data object is stored within the current HDF file by including the actual - data within the header data for this message. The data is + data as the header data for this message. The data is stored internally in - the "normal" format, i.e. in one chunk, un-compressed, etc. + the <em>normal format</em>, i.e. in one chunk, uncompressed, etc. - <P>Note that one and only one of the "Data Storage" headers can be + <P>Note that one and only one of the <em>Data Storage</em> headers can be stored for each data object. <P><b>Format of Data:</b> The message data is actually composed of dataset data, so the format will be determined by the dataset format. +<!-- Delete examples throughout doc <h4><a name="CompactDataStorageExample">Examples:</a></h4> [very straightforward] +--> <hr> - <h3><a name="ExternalFileListMessage">Name: Data Storage - - External Data Files</a></h3> + <h4><a name="ExternalFileListMessage">Name: Data Storage - + External Data Files</a></h4> <b>Type:</b> 0x0007<BR> <b>Length:</b> varies<BR> <b>Status:</b> Optional, may not be repeated.<BR> @@ -2686,10 +2947,17 @@ each high-level object. </tr> <tr valign=top> - <td>Heap Address</td> - <td>This is the address of a local name heap which contains - the names for the external files. The name at offset zero - in the heap is always the empty string.</td> + <td>Version </td> + <td>This value is used to determine the format of the + External File List Message. When the format of the + information in the message is changed, the version number + is incremented and can be used to determine how the + information in the object header is formatted.</td> + </tr> + + <tr valign=top> + <td>Reserved</td> + <td>This field is reserved for future use.</td> </tr> <tr valign=top> @@ -2706,8 +2974,10 @@ each high-level object. </tr> <tr valign=top> - <td>Reserved</td> - <td>This field is reserved for future use.</td> + <td>Heap Address</td> + <td>This is the address of a local name heap which contains + the names for the external files. The name at offset zero + in the heap is always the empty string.</td> </tr> <tr valign=top> @@ -2798,7 +3068,7 @@ each high-level object. <hr> - <h3><a name="LayoutMessage">Name: Data Storage - Layout</a></h3> + <h4><a name="LayoutMessage">Name: Data Storage - Layout</a></h4> <b>Type:</b> 0x0008<BR> <b>Length:</b> varies<BR> @@ -2920,7 +3190,7 @@ each high-level object. <hr> - <h3><a name="ReservedMessage_0009">Name: Reserved - Not Assigned Yet</a></h3> + <h4><a name="ReservedMessage_0009">Name: Reserved - Not Assigned Yet</a></h4> <b>Type:</b> 0x0009<BR> <b>Length:</b> N/A<BR> <b>Status:</b> N/A<BR> @@ -2928,7 +3198,7 @@ each high-level object. <b>Format of Data:</b> N/A <hr> - <h3><a name="ReservedMessage_000A">Name: Reserved - Not Assigned Yet</a></h3> + <h4><a name="ReservedMessage_000A">Name: Reserved - Not Assigned Yet</a></h4> <b>Type:</b> 0x000A<BR> <b>Length:</b> N/A<BR> <b>Status:</b> N/A<BR> @@ -2936,7 +3206,7 @@ each high-level object. <b>Format of Data:</b> N/A <hr> - <h3><a name="FilterMessage">Name: Data Storage - Filter Pipeline</a></h3> + <h4><a name="FilterMessage">Name: Data Storage - Filter Pipeline</a></h4> <b>Type:</b> 0x000B<BR> <b>Length:</b> varies<BR> <b>Status:</b> Optional, may not be repeated. @@ -3118,7 +3388,7 @@ each high-level object. </center> <hr> - <h3><a name="AttributeMessage">Name: Attribute</a></h3> + <h4><a name="AttributeMessage">Name: Attribute</a></h4> <b>Type:</b> 0x000C<BR> <b>Length:</b> varies<BR> <b>Status:</b> Optional, may be repeated.<BR> @@ -3126,7 +3396,7 @@ each high-level object. <p><b>Purpose and Description:</b> The <em>Attribute</em> message is used to list objects in the HDF file which are used as attributes, or "meta-data" about the current object. An - attribute is a small dataset; it has a name, a data type, a data + attribute is a small dataset; it has a name, a datatype, a data space, and raw data. Since attributes are stored in the object header they must be relatively small (<64kb) and can be associated with any type of object which has an object header @@ -3190,6 +3460,12 @@ each high-level object. </tr> <tr valign=top> + <td>Reserved</td> + <td>This field is reserved for later use and is set to + zero.</td> + </tr> + + <tr valign=top> <td>Name Size</td> <td>The length of the attribute name in bytes including the null terminator. Note that the Name field below may @@ -3199,25 +3475,19 @@ each high-level object. <tr valign=top> <td>Type Size</td> - <td>The length of the data type description in the Type + <td>The length of the datatype description in the Type field below. Note that the Type field may contain additional padding not represented by this field.</td> </tr> <tr valign=top> <td>Space Size</td> - <td>The length of the data space description in the Space + <td>The length of the dataspace description in the Space field below. Note that the Space field may contain additional padding not represented by this field.</td> </tr> <tr valign=top> - <td>Reserved</td> - <td>This field is reserved for later use and is set to - zero.</td> - </tr> - - <tr valign=top> <td>Name</td> <td>The null-terminated attribute name. This field is padded with additional null characters to make it a @@ -3226,16 +3496,16 @@ each high-level object. <tr valign=top> <td>Type</td> - <td>The data type description follows the same format as - described for the data type object header message. This + <td>The datatype description follows the same format as + described for the datatype object header message. This field is padded with additional zero bytes to make it a multiple of eight bytes.</td> </tr> <tr valign=top> <td>Space</td> - <td>The data space description follows the same format as - described for the data space object header message. This + <td>The dataspace description follows the same format as + described for the dataspace object header message. This field is padded with additional zero bytes to make it a multiple of eight bytes.</td> </tr> @@ -3243,7 +3513,7 @@ each high-level object. <tr valign=top> <td>Data</td> <td>The raw data for the attribute. The size is determined - from the data type and data space descriptions. This + from the datatype and dataspace descriptions. This field is <em>not</em> padded with additional zero bytes.</td> </tr> @@ -3251,7 +3521,7 @@ each high-level object. </center> <hr> - <h3><a name="NameMessage">Name: Object Name</a></h3> + <h4><a name="NameMessage">Name: Object Name</a></h4> <p><b>Type:</b> 0x000D<br> <b>Length:</b> varies<br> @@ -3259,7 +3529,7 @@ each high-level object. <p><b>Purpose and Description:</b> The object name or comment is designed to be a short description of an object. An object name - is a sequence of non-zero ('\0') ASCII characters with no other + is a sequence of non-zero (<code>\0</code>) ASCII characters with no other formatting included by the library. <p> @@ -3298,8 +3568,7 @@ each high-level object. </center> <hr> - <h3><a name="ModifiedMessage">Name: Object Modification Date & - Time</a></h3> + <h4><a name="ModifiedMessage">Name: Object Modification Date & Time</a></h4> <p><b>Type:</b> 0x000E<br> <b>Length:</b> fixed<br> @@ -3357,40 +3626,40 @@ each high-level object. <tr valign=top> <td>Year</td> <td>The four-digit year as an ASCII string. For example, - "1998". All fields of this message should be interpreted + <code>1998</code>. All fields of this message should be interpreted as coordinated universal time (UTC)</td> </tr> <tr valign=top> <td>Month</td> <td>The month number as a two digit ASCII string where - January is "01" and December is "12".</td> + January is <code>01</code> and December is <code>12</code>.</td> </tr> <tr valign=top> <td>Day of Month</td> <td>The day number within the month as a two digit ASCII - string. The first day of the month is "01".</td> + string. The first day of the month is <code>01</code>.</td> </tr> <tr valign=top> <td>Hour</td> <td>The hour of the day as a two digit ASCII string where - midnight is "00" and 11:00pm is "23".</td> + midnight is <code>00</code> and 11:00pm is <code>23</code>.</td> </tr> <tr valign=top> <td>Minute</td> <td>The minute of the hour as a two digit ASCII string where - the first minute of the hour is "00" and the last is - "59".</td> + the first minute of the hour is <code>00</code> and + the last is <code>59</code>.</td> </tr> <tr valign=top> <td>Second</td> <td>The second of the minute as a two digit ASCII string - where the first second of the minute is "00" and the last - is "59".</td> + where the first second of the minute is <code>00</code> + and the last is <code>59</code>.</td> </tr> <tr valign=top> @@ -3401,7 +3670,7 @@ each high-level object. </center> <hr> - <h3><a name="SharedMessage">Name: Shared Object Message</a></h3> + <h4><a name="SharedMessage">Name: Shared Object Message</a></h4> <b>Type:</b> 0x000F<br> <b>Length:</b> 4 Bytes<br> <b>Status:</b> Optional, may be repeated. @@ -3490,7 +3759,7 @@ each high-level object. the actual message is in the global heap then the pointer is the file address of the global heap collection that holds the message, and a four-byte index into that - collection. Otherwise the pointer is a symbol table entry + collection. Otherwise the pointer is a group entry that points to some other object header.</td> </tr> </table> @@ -3498,7 +3767,7 @@ each high-level object. <hr> -<h3><a name="ContinuationMessage">Name: Object Header Continuation</a></h3> +<h4><a name="ContinuationMessage">Name: Object Header Continuation</a></h4> <b>Type:</b> 0x0010<BR> <b>Length:</b> fixed<BR> <b>Status:</b> Optional, may be repeated.<BR> @@ -3543,24 +3812,26 @@ the file. </dl> </dl> +<!-- Delete examples throughout doc <h4><a name="ContinuationExample">Examples:</a></h4> [straightforward] +--> <hr> -<h3><a name="SymbolTableMessage">Name: Symbol Table Message</a></h3> +<h4><a name="SymbolTableMessage">Name: Group Message</a></h4> <b>Type:</b> 0x0011<BR> <b>Length:</b> fixed<BR> -<b>Status:</b> Required for symbol tables, may not be repeated.<BR> -<b>Purpose and Description:</b> Each symbol table has a B-tree and a +<b>Status:</b> Required for groups, may not be repeated.<BR> +<b>Purpose and Description:</b> Each group has a B-tree and a name heap which are pointed to by this message.<BR> <b>Format of data:</b> -<p>The symbol table message is formatted as follows: +<p>The group message is formatted as follows: <p> <center> <table border cellpadding=4 width="80%"> <caption align=bottom> -<b>HDF5 Object Header Symbol Table Message Layout</b> +<b>HDF5 Object Header Group Message Layout</b> </caption> <tr align=center> @@ -3570,7 +3841,7 @@ name heap which are pointed to by this message.<BR> <th width="25%">byte</th> <tr align=center> -<td colspan=4>B-Tree Address</td> +<td colspan=4>B-tree Address</td> <tr align=center> <td colspan=4>Heap Address</td> @@ -3579,7 +3850,7 @@ name heap which are pointed to by this message.<BR> <P> <dl> -<dt>The elements of the Symbol Table Message are described below: +<dt>The elements of the Group Message are described below: <dd> <dl> <dt>B-tree Address (<offset> bytes) @@ -3587,13 +3858,13 @@ name heap which are pointed to by this message.<BR> where the B-tree is located. <dt>Heap Address (<offset> bytes) <dd>This value is the offset in bytes from the beginning of the file -where the symbol table name heap is located. +where the group name heap is located. </dl> </dl> <h3><a name="SharedObjectHeader">Disk Format: Level 2b - Shared Data Object Headers</a></h3> <P>In order to share header messages between several dataset objects, object -header messages may be placed into the global small-data heap. Since these +header messages may be placed into the global heap. Since these messages require additional information beyond the basic object header message information, the format of the shared message is detailed below. @@ -3625,7 +3896,7 @@ information, the format of the shared message is detailed below. <dt>Reference Count of Shared Header Message: (32-bit unsigned integer) <dd>This value is used to keep a count of the number of dataset objects which refer to this message from their dataset headers. When this count reaches zero, -the shared message header may be removed from the global small-data heap. +the shared message header may be removed from the global heap. <dt>Shared Object Header Message: (various lengths) <dd>The data stored for the shared object header message is formatted in the same way as the private object header messages described in the object header @@ -3635,7 +3906,7 @@ description earlier in this document and begins with the header message Type. <h3><a name="DataStorage">Disk Format: Level 2c - Data Object Data Storage</a></h3> -<P>The data information for an object is stored separately from the header +<P>The data for an object is stored separately from the header information in the file and may not actually be located in the HDF5 file itself if the header indicates that the data is stored externally. The information for each record in the object is stored according to the @@ -3648,8 +3919,9 @@ in a different machine format with the architecture-type information from the number-type header message. This means that each architecture will need to [potentially] byte-swap data values into the internal representation for that particular machine. -<P> Data with a "variable" sized number-type is stored in an data heap -internal to the HDF file [which should not be user-modifiable]. +<P> Data with a "variable" sized number-type is stored in a data heap +internal to the HDF5 file. Global heap identifiers are stored in the +data object storage. <P>Data whose elements are composed of pointer number-types are stored in several different ways depending on the particular pointer type involved. Simple pointers are just stored as the dataset offset of the object being pointed to with the @@ -3661,17 +3933,33 @@ format as header message), sub-set start and end information (i.e. a coordinate location for each), and field start and end names (i.e. a [pointer to the] string indicating the first field included and a [pointer to the] string name for the last field). -Browse pointers are stored as an heap-ID (for the name in the file-heap) -followed by a offset of the data object being referenced. -<P>Data of a compound data-type is stored as a contiguous stream of the items -in the structure, with each item formatted according to it's -data-type. +<P>Data of a compound datatype is stored as a contiguous stream of the items +in the structure, with each item formatted according to its datatype. + +<hr> +<center> +<table border=0 width=98%> +<tr><td valign=top align=left> +<a href="index.html">Other HDF5 documents and links</a> <br> +<a href="H5.intro.html">Introduction to HDF5</a> <br> +</td> +<td> </td> +<td valign=top align=right> +<a href="H5.user.html">HDF5 User Guide</a> <br> +<a href="RM_H5Front.html">HDF5 Reference Manual</a> <br> +</td></tr> +</table> +</center> <hr> + +<!-- <address><a href="mailto:koziol@ncsa.uiuc.edu">Quincey Koziol</a></address> <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> +--> +<address><a href="mailto:hdfhelp@ncsa.uiuc.edu">HDF Help Desk</a></address> <!-- hhmts start --> -Last modified: Tue Aug 17 10:57:50 EDT 1999 +Last modified: 8 March 2000 <!-- hhmts end --> </body> </html> |