From bd1e676c521d881b3143829f493a28b5ced1294b Mon Sep 17 00:00:00 2001
From: Quincey Koziol The appribute API (H5A) is primarily designed to easily allow small
+ datasets to be attached to primary datasets as metadata information.
+ Additional goals for the H5A interface include keeping storage
+ requirement for each attribute to a minimum and easily sharing
+ attributes among datasets.
+ Because attributes are intended to be small objects, large datasets
+ intended as additional information for a primary dataset should be
+ stored as supplemental datasets in a group with the primary dataset.
+ Attributes can then be attached to the group containing everything to
+ indicate a particular type of dataset with supplemental datasets is
+ located in the group. How small is "small" is not defined by the
+ library and is up to the user's interpretation.
+ Attributes are not seperate objects in the file, they are always
+ contained in the object header of the object they are attached to. The
+ I/O functions defined below are required to read or write attribute
+ information, not the H5D I/O routines.
+
+ Attributes are created with the Attributes may only be written as an entire object, no partial I/O
+ is currently supported.
+
+ The iterator returns a negative value if something is wrong, the return
+ value of the last operator if it was non-zero, or zero if all attributes
+ were processed.
+ The prototype for H5A_operator_t is: The operation receives the ID for the group or dataset being iterated over
+ (loc_id), the name of the current attribute about the object (attr_name)
+ and the pointer to the operator data passed in to H5Aiterate
+ (operator_data). The return values from an operator are:
+ The HDF5 library is able to handle files larger than the
+ maximum file size, and datasets larger than the maximum memory
+ size. For instance, a machine where Two "tricks" must be imployed on these small systems in order
+ to store large datasets. The first trick circumvents the
+ Some 32-bit operating systems have special file systems that
+ can support large (>2GB) files and HDF5 will detect these and
+ use them automatically. If this is the case, the output from
+ configure will show:
+
+ Otherwise one must use an HDF5 file family. Such a family is
+ created by setting file family properties in a file access
+ property list and then supplying a file name that includes a
+ The second argument ( If the effective HDF5 address space is limited then one may be
+ able to store datasets as external datasets each spanning
+ multiple files of any length since HDF5 opens external dataset
+ files one at a time. To arrange storage for a 5TB dataset one
+ could say:
+
+ The second limit which must be overcome is that of
+ To create a dataset with 8*2^30 4-byte integers for a total of
+ 32GB one first creates the dataspace. We give two examples
+ here: a 4-dimensional dataset whose dimension sizes are smaller
+ than the maximum value of a However, the For compilers that don't support The HDF5 library caches two types of data: meta data and raw
+ data. The meta data cache holds file objects like the file
+ header, symbol table nodes, global heap collections, object
+ headers and their messages, etc. in a partially decoded
+ state. The cache has a fixed number of entries which is set with
+ the file access property list (defaults to 10k) and each entry
+ can hold a single meta data object. Collisions between objects
+ are handled by preempting the older object in favor of the new
+ one.
+
+ Raw data chunks are cached because I/O requests at the
+ application level typically don't map well to chunks at the
+ storage level. The chunk cache has a maximum size in bytes
+ set with the file access property list (defaults to 1MB) and
+ when the limit is reached chunks are preempted based on the
+ following set of heuristics.
+
+ One should choose large values for w0 if I/O requests
+ typically do not overlap but smaller values for w0 if
+ the requests do overlap. For instance, reading an entire 2d
+ array by reading from non-overlapping "windows" in a row-major
+ order would benefit from a high w0 value while reading
+ a diagonal accross the dataset where each request overlaps the
+ previous request would benefit from a small w0.
+
+ The cache parameters for both caches are part of a file access
+ property list and are set and queried with this pair of
+ functions:
+
+ This is one of the functions exported from the
+ All pointer arguments are initialized when defined. I don't
+ worry much about non-pointers because it's usually obvious when
+ the value isn't initialized.
+
+ I use You'll see this quite often in the low-level stuff and it's
+ documented in the The alternative is to call the slighlty cheaper
+ Code is arranged in paragraphs with a comment starting each
+ paragraph. The previous paragraph is a standard binary search
+ algorithm. The It's also my standard practice to have side effects in
+ conditional expressions because I can write code faster and it's
+ more apparent to me what the condition is testing. But if I
+ have an assignment in a conditional expr, then I use an extra
+ set of parens even if they're not required (usually they are, as
+ in this case) so it's clear that I meant Here I broke the "side effect in conditional" rule, which I
+ sometimes do if the expression is so long that the
+ For lack of a better way to handle errors during error cleanup,
+ I just call the The following code is an API function from the H5F package...
+
+ An API prologue is used for each API function instead of my
+ normal function prologue. I use the prologue from Code Review 1
+ for non-API functions because it's more suited to C programmers,
+ it requires less work to keep it synchronized with the code, and
+ I have better editing tools for it.
+
+ API functions are never called internally, therefore I always
+ clear the error stack before doing anything.
+
+ If something is wrong with the arguments then we raise an
+ error. We never An internal version of the function does the real work. That
+ internal version calls
+
+ For Example:
+
+
+ For Example:
+
+
+ For Example:
+
+
+ and a header file of private stuff
+
+
+
+ and a header for private prototypes
+
+
+
+ By splitting the prototypes into separate include files we don't
+ have to recompile everything when just one function prototype
+ changes.
+
+
+
+ PACKAGES
+
+
+
+Names exported beyond function scope begin with `H5' followed by zero,
+one, or two upper-case letters that describe the class of object.
+This prefix is the package name. The implementation of packages
+doesn't necessarily have to map 1:1 to the source files.
+
+
+
+Each package implements a single main class of object (e.g., the H5B
+package implements B-link trees). The main data type of a package is
+the package name followed by `_t'.
+
+
+
+
+Not all packages implement a data type (H5, H5MF) and some
+packages provide access to a preexisting data type (H5MM, H5S).
+
+
+
+ PUBLIC vs PRIVATE
+
+
+If the symbol is for internal use only, then the package name is
+followed by an underscore and the rest of the name. Otherwise, the
+symbol is part of the API and there is no underscore between the
+package name and the rest of the name.
+
+
+
+For functions, this is important because the API functions never pass
+pointers around (they use atoms instead for hiding the implementation)
+and they perform stringent checks on their arguments. Internal
+unctions, on the other hand, check arguments with assert().
+
+Data types like H5B_t carry no information about whether the type is
+public or private since it doesn't matter.
+
+
+
+
+ INTEGRAL TYPES
+
+
+Integral fixed-point type names are an optional `u' followed by `int'
+followed by the size in bits (8, 16,
+32, or 64). There is no trailing `_t' because these are common
+enough and follow their own naming convention.
+
+
+
+ OTHER TYPES
+
+
+
+
+Other data types are always followed by `_t'.
+
+
+
+However, if the name is so common that it's used almost everywhere,
+then we make an alias for it by removing the package name and leading
+underscore and replacing it with an `h' (the main datatype for a
+package already has a short enough name, so we don't have aliases for
+them).
+
+
+
+ GLOBAL VARIABLES
+
+
+Global variables include the package name and end with `_g'.
+
+
+
+
+
+
+MACROS, PREPROCESSOR CONSTANTS, AND ENUM MEMBERS
+
+
+
+Same rules as other symbols except the name is all upper case. There
+are a few exceptions:
+ No naming scheme; determined by OS and compiler.
+
+
+
+
+
+ HDF5 supports compression of raw data by compression methods
+ built into the library or defined by an application. A
+ compression method is associated with a dataset when the dataset
+ is created and is applied independently to each storage chunk of
+ the dataset.
+
+ The dataset must use the The library identifies compression methods with small
+ integers, with values less than 16 reserved for use by NCSA and
+ values between 16 and 255 (inclusive) available for general
+ use. This range may be extended in the future if it proves to
+ be too small.
+
+
+ Setting the compression for a dataset to a method which was
+ not compiled into the library and/or not registered by the
+ application is allowed, but writing to such a dataset will
+ silently not compress the data. Reading a compressed
+ dataset for a method which is not available will result in
+ errors (specifically, Compression methods 16 through 255 can be defined by an
+ application. As mentioned above, methods that have not been
+ released should use high numbers in that range while methods
+ that have been published will be assigned an official number in
+ the low region of the range (possibly less than 16). Users
+ should be aware that using unpublished compression methods
+ results in unsharable files.
+
+ A compression method has two halves: one have handles
+ compression and the other half handles uncompression. The
+ halves are implemented as functions
+ Both the The application associates the pair of functions with a name
+ and a method number by calling
+ Here's a simple-minded "compression" method
+ that just copies the input value to the output. It's
+ similar to the The function could be registered as method 250 as
+ follows:
+
+ The function can be unregistered by saying:
+
+ Notice that we kept the name "bogus" even
+ though we unregistered the functions that perform the
+ compression and uncompression. This makes compression
+ statistics more understandable when they're printed.
+ If a dataset is to be compressed then the compression
+ information must be specified when the dataset is created since
+ once a dataset is created compression parameters cannot be
+ adjusted. The compression is specified through the dataset
+ creation property list (see It is possible to set the compression to a method which hasn't
+ been defined with
+ If an application attempts to use an unsupported
+ method then the compression statistics will show large
+ numbers of compression errors and no data
+ uncompressed.
+
+ This example is from a program that tried to use
+ If the library is compiled with debugging turned on for the H5Z
+ layer (usually as a result of
+
+ The purpose of the dataset interface is to provide a mechanism
+ to describe properties of datasets and to transfer data between
+ memory and disk. A dataset is composed of a collection of raw
+ data points and four classes of meta data to describe the data
+ points. The interface is hopefully designed in such a way as to
+ allow new features to be added without disrupting current
+ applications that use the dataset interface.
+
+ The four classes of meta data are:
+
+ Each of these classes of meta data is handled differently by
+ the library although the same API might be used to create them.
+ For instance, the data type exists as constant meta data and as
+ memory meta data; the same API (the The dataset API partitions these terms on three orthogonal axes
+ (layout, compression, and external storage) and uses a
+ dataset creation property list to hold the various
+ settings and pass them through the dataset interface. This is
+ similar to the way HDF5 files are created with a file creation
+ property list. A dataset creation property list is always
+ derived from the default dataset creation property list (use
+ Once the general layout is defined, the user can define
+ properties of that layout. Currently, the only layout that has
+ user-settable properties is the
+ This example shows how a two-dimensional dataset
+ is partitioned into chunks. The library can manage file
+ memory by moving the chunks around, and each chunk could be
+ compressed. The chunks are allocated in the file on demand
+ when data is written to the chunk.
+ Although it is most efficient if I/O requests are aligned on chunk
+ boundaries, this is not a constraint. The application can perform I/O
+ on any set of data points as long as the set can be described by the
+ data space. The set on which I/O is performed is called the
+ selection.
+
+ Some types of storage layout allow data compression which is
+ defined by the functions described here. Compression is not
+ implemented yet.
+
+ Some storage formats may allow storage of data across a set of
+ non-HDF5 files. Currently, only the
+ This example shows how a contiguous, one-dimensional dataset
+ is partitioned into three parts and each of those parts is
+ stored in a segment of an external file. The top rectangle
+ represents the logical address space of the dataset
+ while the bottom rectangle represents an external file.
+ One should note that the segments are defined in order of the
+ logical addresses they represent, not their order within the
+ external file. It would also have been possible to put the
+ segments in separate files. Care should be taken when setting
+ up segments in a single file since the library doesn't
+ automatically check for segments that overlap.
+
+ This example shows how a contiguous, two-dimensional dataset
+ is partitioned into three parts and each of those parts is
+ stored in a separate external file. The top rectangle
+ represents the logical address space of the dataset
+ while the bottom rectangles represent external files.
+ The library maps the multi-dimensional array onto a linear
+ address space like normal, and then maps that address space
+ into the segments defined in the external file list.
+ The segments of an external file can exist beyond the end of the
+ file. The library reads that part of a segment as zeros. When writing
+ to a segment that exists beyond the end of a file, the file is
+ automatically extended. Using this feature, one can create a segment
+ (or set of segments) which is larger than the current size of the
+ dataset, which allows to dataset to be extended at a future time
+ (provided the data space also allows the extension).
+
+ All referenced external data files must exist before performing raw
+ data I/O on the dataset. This is normally not a problem since those
+ files are being managed directly by the application, or indirectly
+ through some other library.
+
+
+ Raw data has a constant data type which describes the data type
+ of the raw data stored in the file, and a memory data type that
+ describes the data type stored in application memory. Both data
+ types are manipulated with the The constant file data type is associated with the dataset when
+ the dataset is created in a manner described below. Once
+ assigned, the constant datatype can never be changed.
+
+ The memory data type is specified when data is transferred
+ to/from application memory. In the name of data sharability,
+ the memory data type must be specified, but can be the same
+ type identifier as the constant data type.
+
+ During dataset I/O operations, the library translates the raw
+ data from the constant data type to the memory data type or vice
+ versa. Structured data types include member offsets to allow
+ reordering of struct members and/or selection of a subset of
+ members and array data types include index permutation
+ information to allow things like transpose operations (the
+ prototype does not support array reordering) Permutations
+ are relative to some extrinsic descritpion of the dataset.
+
+
+
+ The dataspace of a dataset defines the number of dimensions
+ and the size of each dimension and is manipulated with the
+ The dataspace can also be used to define partial I/O
+ operations. Since I/O operations have two end-points, the raw
+ data transfer functions take two data space arguments: one which
+ describes the application memory data space or subset thereof
+ and another which describes the file data space or subset
+ thereof.
+
+
+ Each dataset has a set of constant and persistent properties
+ which describe the layout method, pre-compression
+ transformation, compression method, data type, external storage,
+ and data space. The constant properties are set as described
+ above in a dataset creation property list whose identifier is
+ passed to Constant or persistent properties can be queried with a set of
+ three functions. Each function returns an identifier for a copy
+ of the requested properties. The identifier can be passed to
+ various functions which modify the underlying object to derive a
+ new object; the original dataset is completely unchanged. The
+ return values from these functions should be properly destroyed
+ when no longer needed.
+
+ A dataset also has memory properties which describe memory
+ within the application, and transfer properties that control
+ various aspects of the I/O operations. The memory can have a
+ data type different than the permanent file data type (different
+ number types, different struct member offsets, different array
+ element orderings) and can also be a different size (memory is a
+ subset of the permanent dataset elements, or vice versa). The
+ transfer properties might provide caching hints or collective
+ I/O information. Therefore, each I/O operation must specify
+ memory and transfer properties.
+
+ The memory properties are specified with type_id and
+ space_id arguments while the transfer properties are
+ specified with the transfer_id property list for the
+ If the maximum size of the temporary I/O pipeline buffers is
+ too small to hold the entire I/O request, then the I/O request
+ will be fragmented and the transfer operation will be strip
+ mined. However, certain restrictions apply to the strip
+ mining. For instance, when performing I/O on a hyperslab of a
+ simple data space the strip mining is in terms of the slowest
+ varying dimension. So if a 100x200x300 hyperslab is requested,
+ the temporary buffer must be large enough to hold a 1x200x300
+ sub-hyperslab.
+
+ To prevent strip mining from happening, the application should
+ use
+ This example shows how to define a function that sets
+ a dataset transfer property list so that strip mining
+ does not occur. It takes an (optional) dataset transfer
+ property list, a dataset, a data space that describes
+ what data points are being transfered, and a data type
+ for the data points in memory. It returns a (new)
+ dataset transfer property list with the temporary
+ buffer size set to an appropriate value. The return
+ value should be passed as the fifth argument to
+ Unlike constant and persistent properties, a dataset cannot be
+ queried for it's memory or transfer properties. Memory
+ properties cannot be queried because the application already
+ stores those properties separate from the buffer that holds the
+ raw data, and the buffer may hold multiple segments from various
+ datasets and thus have more than one set of memory properties.
+ The transfer properties cannot be queried from the dataset
+ because they're associated with the transfer itself and not with
+ the dataset (but one can call
+ All raw data I/O is accomplished through these functions which
+ take a dataset handle, a memory data type, a memory data space,
+ a file data space, transfer properties, and an application
+ memory buffer. They translate data between the memory data type
+ and space and the file data type and space. The data spaces can
+ be used to describe partial I/O operations.
+
+ In the name of sharability, the memory datatype must be
+ supplied. However, it can be the same identifier as was used to
+ create the dataset or as was returned by
+ For complete reads of the dataset one may supply
+ The examples in this section illustrate some common dataset
+ practices.
+
+
+ This example shows how to create a dataset which is stored in
+ memory as a two-dimensional array of native
+ This example uses the file created in Example 1 and reads a
+ hyperslab of the 500x600 file dataset. The hyperslab size is
+ 100x200 and it is located beginning at element
+ <200,200>. We read the hyperslab into an 200x400 array in
+ memory beginning at element <0,0> in memory. Visually,
+ the transfer looks something like this:
+
+
+ If the file contains a compound data structure one of whose
+ members is a floating point value (call it "delta") but the
+ application is interested in reading an array of floating point
+ values which are just the "delta" values, then the application
+ should cast the floating point array as a struct with a single
+ "delta" member.
+
+
+ A dataspace describes the locations that dataset elements are located at.
+A dataspace is either a regular N-dimensional array of data points,
+called a simple dataspace, or a more general collection of data
+points organized in another manner, called a complex dataspace.
+A scalar dataspace is a special case of the simple data
+space and is defined to be a 0-dimensional single data point in size. Currently
+only scalar and simple dataspaces are supported with this version
+of the H5S interface.
+Complex dataspaces will be defined and implemented in a future
+version. Complex dataspaces are intended to be used for such structures
+which are awkward to express in simple dataspaces, such as irregularly
+gridded data or adaptive mesh refinement data. This interface provides
+functions to set and query properties of a dataspace.
+
+ Operations on a dataspace include defining or extending the extent of
+the dataspace, selecting portions of the dataspace for I/O and storing the
+dataspaces in the file. The extent of a dataspace is the range of coordinates
+over which dataset elements are defined and stored. Dataspace selections are
+subsets of the extent (up to the entire extent) which are selected for some
+operation.
+
+ For example, a 2-dimensional dataspace with an extent of 10 by 10 may have
+the following very simple selection:
+ Selections within dataspaces have an offset within the extent which is used
+to locate the selection within the extent of the dataspace. Selection offsets
+default to 0 in each dimension, but may be changed to move the selection within
+a dataspace. In example 2 above, if the offset was changed to 1,1, the selection
+would look like this:
+ Selections also have an linearization ordering of the points selected
+(defaulting to "C" order, ie. last dimension changing fastest). The
+linearization order may be specified for each point or it may be chosen by
+the axis of the dataspace. For example, with the default "C" ordering,
+example 1's selected points are iterated through in this order: (1,1), (2,1),
+(3,1), (1,2), (2,2), etc. With "FORTRAN" ordering, example 1's selected points
+would be iterated through in this order: (1,1), (1,2), (1,3), (1,4), (1,5),
+(2,1), (2,2), etc.
+
+ A dataspace may be stored in the file as a permanent object, to allow many
+datasets to use a commonly defined dataspace. Dataspaces with extendable
+extents (ie. unlimited dimensions) are not able to be stored as permanent
+dataspaces.
+
+ Dataspaces may be created using an existing permanent dataspace as a
+container to locate the new dataspace within. These dataspaces are complete
+dataspaces and may be used to define datasets. A dataspaces with a "parent"
+can be queried to determine the parent dataspace and the location within the
+parent. These dataspaces must currently be the same number of dimensions as
+the parent dataspace.
+
+ The start array determines the starting coordinates of the hyperslab
+to select. The stride array chooses array locations from the dataspace
+with each value in the stride array determining how many elements to move
+in each dimension. Setting a value in the stride array to 1 moves to
+each element in that dimension of the dataspace, setting a value of 2 in a
+location in the stride array moves to every other element in that
+dimension of the dataspace. In other words, the stride determines the
+number of elements to move from the start location in each dimension.
+Stride values of 0 are not allowed. If the stride parameter is NULL,
+a contiguous hyperslab is selected (as if each value in the stride array
+was set to all 1's). The count array determines how many blocks to
+select from the dataspace, in each dimension. The block array determines
+the size of the element block selected from the dataspace. If the block
+parameter is set to NULL, the block size defaults to a single element
+in each dimension (as if the block array was set to all 1's).
+ For example, in a 2-dimensional dataspace, setting start to [1,1],
+stride to [4,4], count to [3,7] and block to [2,2] selects
+21 2x2 blocks of array elements starting with location (1,1) and selecting
+blocks at locations (1,1), (5,1), (9,1), (1,5), (5,5), etc.
+ Regions selected with this function call default to 'C' order iteration when
+I/O is performed.
+ The selection operator op determines how the new selection is to be
+combined with the already existing selection for the dataspace. Currently,
+only the H5S_SELECT_SET operator is supported, which replaces the existing
+selection with the parameters from this call. When operators other than
+H5S_SELECT_SET are used to combine a new selection with an existing selection,
+the selection ordering is reset to 'C' array ordering.
+ The data type interface provides a mechanism to describe the
+ storage format of individual data points of a data set and is
+ hopefully designed in such a way as to allow new features to be
+ easily added without disrupting applications that use the data
+ type interface. A dataset (the H5D interface) is composed of a
+ collection or raw data points of homogeneous type organized
+ according to the data space (the H5S interface).
+
+ A data type is a collection of data type properties, all of
+ which can be stored on disk, and which when taken as a whole,
+ provide complete information for data conversion to or from that
+ data type. The interface provides functions to set and query
+ properties of a data type.
+
+ A data point is an instance of a data type,
+ which is an instance of a type class. We have defined
+ a set of type classes and properties which can be extended at a
+ later time. The atomic type classes are those which describe
+ types which cannot be decomposed at the data type interface
+ level; all other classes are compound.
+
+ The functions defined in this section operate on data types as
+ a whole. New data types can be created from scratch or copied
+ from existing data types. When a data type is no longer needed
+ its resources should be released by calling Data types come in two flavors: named data types and transient
+ data types. A named data type is stored in a file while the
+ transient flavor is independent of any file. Named data types
+ are always read-only, but transient types come in three
+ varieties: modifiable, read-only, and immutable. The difference
+ between read-only and immutable types is that immutable types
+ cannot be closed except when the entire library is closed (the
+ predefined types like An atomic type is a type which cannot be decomposed into
+ smaller units at the API level. All atomic types have a common
+ set of properties which are augmented by properties specific to
+ a particular type class. Some of these properties also apply to
+ compound data types, but we discuss them only as they apply to
+ atomic data types here. The properties and the functions that
+ query and set their values are:
+
+ Integer atomic types ( The library supports floating-point atomic types
+ ( Dates and times ( I'm deferring definition until later since they're probably not
+ as important as the other data types.
+
+ Fixed-length character string types are used to store textual
+ information. The Converting a bit field ( Opaque atomic types ( A compound data type is similar to a Properties of members of a compound data type are
+ defined when the member is added to the compound type (see
+ The library predefines a modest number of data types having
+ names like
+ The base name of most types consists of a letter, a precision
+ in bits, and an indication of the byte order. The letters are:
+
+
+ The byte order is a two-letter sequence:
+
+
+
+ The
+
+ To create a 128-bit, little-endian signed integer
+ type one could use the following (increasing the
+ precision of a type automatically increases the total
+ size):
+
+
+ To create an 80-byte null terminated string type one
+ might do this (the offset of a character string is
+ always zero and the precision is adjusted
+ automatically to match the size):
+
+ Unlike atomic data types which are derived from other atomic
+ data types, compound data types are created from scratch. First,
+ one creates an empty compound data type and specifies it's total
+ size. Then members are added to the compound data type in any
+ order.
+
+ Usually a C struct will be defined to hold a data point in
+ memory, and the offsets of the members in memory will be the
+ offsets of the struct members from the beginning of an instance
+ of the struct.
+
+ Each member must have a descriptive name which is the
+ key used to uniquely identify the member within the compound
+ data type. A member name in an HDF5 data type does not
+ necessarily have to be the same as the name of the member in the
+ C struct, although this is often the case. Nor does one need to
+ define all members of the C struct in the HDF5 compound data
+ type (or vice versa).
+
+
+ An HDF5 data type is created to describe complex
+ numbers whose type is defined by the
+ Member alignment is handled by the
+ This example shows how to create a disk version of a
+ compound data type in order to store data on disk in
+ as compact a form as possible. Packed compound data
+ types should generally not be used to describe memory
+ as they may violate alignment constraints for the
+ architecture being used. Note also that using a
+ packed data type for disk storage may involve a higher
+ data conversion cost.
+
+ Compound data types that have a compound data type
+ member can be handled two ways. This example shows
+ that the compound data type can be flattened,
+ resulting in a compound type with only atomic
+ members.
+
+
+ However, when the If a file has lots of datasets which have a common data type
+ then the file could be made smaller by having all the datasets
+ share a single data type. Instead of storing a copy of the data
+ type in each dataset object header, a single data type is stored
+ and the object headers point to it. The space savings is
+ probably only significant for datasets with a compound data type
+ since the simple data types can be described with just a few
+ bytes anyway.
+
+ To create a bunch of datasets that share a single data type
+ just create the datasets with a committed (named) data type.
+
+
+ To create two datasets that share a common data type
+ one just commits the data type, giving it a name, and
+ then uses that data type to create the datasets.
+
+ And to create two additional datasets later which
+ share the same type as the first two datasets:
+
+ The library is capable of converting data from one type to
+ another and does so automatically when reading or writing the
+ raw data of a dataset. The data type interface does not provide
+ functions to the application for changing data types directly,
+ but the user is allowed a certain amount of control over the
+ conversion process.
+
+ In order to insure that data conversion exceeds disk I/O rates,
+ common data conversion paths can be hand-tuned and optimized for
+ performance. If a hand-tuned conversion function is not
+ available, then the library falls back to a slower but more
+ general conversion function. Although conversion paths include
+ data space conversion, only data type conversions are described
+ here. Most applications will not be concerned with data type
+ conversions since the library will contain hand-tuned conversion
+ functions for many common conversion paths. In fact, if an
+ application does define a conversion function which would be of
+ general interest, we request that the function be submitted to
+ the HDF5 development team for inclusion in the library (there
+ might be less overhead involved with calling an internal
+ conversion functions than calling an application-defined
+ conversion function).
+
+ Note: The alpha version of the library does not contain
+ a full set of conversions. It can convert from one integer
+ format to another and one struct to another. It can also
+ perform byte swapping when the source and destination types are
+ otherwise the same.
+
+ A conversion path contains a source and destination data type
+ and each path contains a hard conversion function
+ and/or a soft conversion function. The only difference
+ between hard and soft functions is the way in which the library
+ chooses which function applies: A hard function applies to a
+ specific conversion path while a soft function may apply to
+ multiple paths. When both hard and soft functions apply to a
+ conversion path, then the hard function is favored and when
+ multiple soft functions apply, the one defined last is favored.
+
+ A data conversion function is of type
+ The conversion function is called with the source and
+ destination data types (src_type and
+ dst_type), path-constant data (cdata), the
+ number of instances of the data type to convert
+ (nelmts), a buffer which initially contains an array of
+ data having the source type and on return will contain an array
+ of data having the destination type (buffer), and a
+ temporary or background buffer (background). Functions
+ return a negative value on failure and some other value on
+ success.
+
+ The Whether a background buffer is supplied to a conversion
+ function, and whether the background buffer is initialized
+ depends on the value of Other fields of cdata can be read or written by
+ the conversion functions. Many of these contain
+ performance-measuring fields which can be printed by the
+ conversion function during the Once a conversion function is written it can be registered and
+ unregistered with these functions:
+
+
+ Here's an example application-level function that
+ converts Cray The background argument is ignored since
+ it's generally not applicable to atomic data types.
+
+ The convesion function described in the previous
+ example applies to more than one conversion path.
+ Instead of enumerating all possible paths, we register
+ it as a soft function and allow it to decide which
+ paths it can handle.
+
+ This causes it to be consulted for any conversion
+ from an integer type to another integer type. The
+ first argument is just a short identifier which will
+ be printed with the data type conversion statistics.
+ NOTE: The idea of a master soft list and being able to
+ query conversion functions for their abilities tries to overcome
+ problems we saw with AIO. Namely, that there was a dichotomy
+ between generic conversions and specific conversions that made
+ it very difficult to write a conversion function that operated
+ on, say, integers of any size and order as long as they don't
+ have zero padding. The AIO mechanism required such a function
+ to be explicitly registered (like
+ When an error occurs deep within the HDF5 library a record is
+ pushed onto an error stack and that function returns a failure
+ indication. Its caller detects the failure, pushes another
+ record onto the stack, and returns a failure indication. This
+ continues until the application-called API function returns a
+ failure indication (a negative integer or null pointer). The
+ next API function which is called (with a few exceptions) resets
+ the stack.
+
+ In normal circumstances, an error causes the stack to be
+ printed on the standard error stream. The first item, number
+ "#000" is produced by the API function itself and is usually
+ sufficient to indicate to the application programmer what went
+ wrong.
+
+
+ If an application calls The error stack can also be printed and manipulated by these
+ functions, but if an application wishes make explicit calls to
+ Sometimes an application will call a function for the sake of
+ its return value, fully expecting the function to fail. Under
+ these conditions, it would be misleading if an error message
+ were automatically printed. Automatic printing of messages is
+ controlled by the
+ An application can temporarily turn off error
+ messages while "probing" a function.
+
+ Or automatic printing can be disabled altogether and
+ error messages can be explicitly printed.
+
+ The application is allowed to define an automatic error
+ traversal function other than the default
+
+ The application defines a function to print a simple
+ error message to the standard error stream.
+
+ The function is installed as the error handler by
+ saying
+
+ The
+ This is the implementation of the default error stack
+ traversal callback.
+
+ This table shows some of the layers of HDF5. Each layer calls
+ functions at the same or lower layers and never functions at
+ higher layers. An object identifier (OID) takes various forms
+ at the various layers: at layer 0 an OID is an absolute physical
+ file address; at layers 1 and 2 it's an absolute virtual file
+ address. At layers 3 through 6 it's a relative address, and at
+ layers 7 and above it's an object handle.
+
+ The simplest form of hdf5 file is a single file containing only
+ hdf5 data. The file begins with the boot block, which is
+ followed until the end of the file by hdf5 data. The next most
+ complicated file allows non-hdf5 data (user defined data or
+ internal wrappers) to appear before the boot block and after the
+ end of the hdf5 data. The hdf5 data is treated as a single
+ linear address space in both cases.
+
+ The next level of complexity comes when non-hdf5 data is
+ interspersed with the hdf5 data. We handle that by including
+ the non-hdf5 interspersed data in the hdf5 address space and
+ simply not referencing it (eventually we might add those
+ addresses to a "do-not-disturb" list using the same mechanism as
+ the hdf5 free list, but it's not absolutely necessary). This is
+ implemented except for the "do-not-disturb" list.
+
+ The most complicated single address space hdf5 file is when we
+ allow the address space to be split among multiple physical
+ files. For instance, a >2GB file can be split into smaller
+ chunks and transfered to a 32 bit machine, then accessed as a
+ single logical hdf5 file. The library already supports >32 bit
+ addresses, so at layer 1 we split a 64-bit address into a 32-bit
+ file number and a 32-bit offset (the 64 and 32 are
+ arbitrary). The rest of the library still operates with a linear
+ address space.
+
+ Another variation might be a family of two files where all the
+ meta data is stored in one file and all the raw data is stored
+ in another file to allow the HDF5 wrapper to be easily replaced
+ with some other wrapper.
+
+ The I've implemented fixed-size family members. The entire hdf5
+ file is partitioned into members where each member is the same
+ size. The family scheme is used if one passes a name to
+ I haven't implemented a split meta/raw family yet but am rather
+ curious to see how it would perform. I was planning to use the
+ `.h5' extension for the meta data file and `.raw' for the raw
+ data file. The high-order bit in the address would determine
+ whether the address refers to meta data or raw data. If the user
+ passes a name that ends with `.raw' to We also need the ability to point to raw data that isn't in the
+ HDF5 linear address space. For instance, a dataset might be
+ striped across several raw data files.
+
+ Fortunately, the only two packages that need to be aware of
+ this are the packages for reading/writing contiguous raw data
+ and discontiguous raw data. Since contiguous raw data is a
+ special case, I'll discuss how to implement external raw data in
+ the discontiguous case.
+
+ Discontiguous data is stored as a B-tree whose keys are the
+ chunk indices and whose leaf nodes point to the raw data by
+ storing a file address. So what we need is some way to name the
+ external files, and a way to efficiently store the external file
+ name for each chunk.
+
+ I propose adding to the object header an External File
+ List message that is a 1-origin array of file names.
+ Then, in the B-tree, each key has an index into the External
+ File List (or zero for the HDF5 file) for the file where the
+ chunk can be found. The external file index is only used at
+ the leaf nodes to get to the raw data (the entire B-tree is in
+ the HDF5 file) but because of the way keys are copied among
+ the B-tree nodes, it's much easier to store the index with
+ every key.
+
+ One might also want to combine two or more HDF5 files in a
+ manner similar to mounting file systems in Unix. That is, the
+ group structure and meta data from one file appear as though
+ they exist in the first file. One opens File-A, and then
+ mounts File-B at some point in File-A, the mount
+ point, so that traversing into the mount point actually
+ causes one to enter the root object of File-B. File-A and
+ File-B are each complete HDF5 files and can be accessed
+ individually without mounting them.
+
+ We need a couple additional pieces of machinery to make this
+ work. First, an haddr_t type (a file address) doesn't contain
+ any info about which HDF5 file's address space the address
+ belongs to. But since haddr_t is an opaque type except at
+ layers 2 and below, it should be quite easy to add a pointer to
+ the HDF5 file. This would also remove the H5F_t argument from
+ most of the low-level functions since it would be part of the
+ OID.
+
+ The other thing we need is a table of mount points and some
+ functions that understand them. We would add the following
+ table to each H5F_t struct:
+
+ The The I'm expecting to be able to implement the two new flavors of
+ single linear address space in about two days. It took two hours
+ to implement the malloc/free file driver at level zero and I
+ don't expect this to be much more work.
+
+ I'm expecting three days to implement the external raw data for
+ discontiguous arrays. Adding the file index to the B-tree is
+ quite trivial; adding the external file list message shouldn't
+ be too hard since the object header message class from wich this
+ message derives is fully implemented; and changing
+ I'm expecting four days to implement being able to mount one
+ HDF5 file on another. I was originally planning a lot more, but
+ making The external raw data could be implemented as a single linear
+ address space, but doing so would require one to allocate large
+ enough file addresses throughout the file (>32bits) before the
+ file was created. It would make mixing an HDF5 file family with
+ external raw data, or external HDF5 wrapper around an HDF4 file
+ a more difficult process. So I consider the implementation of
+ external raw data files as a single HDF5 linear address space a
+ kludge.
+
+ The ability to mount one HDF5 file on another might not be a
+ very important feature especially since each HDF5 file must be a
+ complete file by itself. It's not possible to stripe an array
+ over multiple HDF5 files because the B-tree wouldn't be complete
+ in any one file, so the only choice is to stripe the array
+ across multiple raw data files and store the B-tree in the HDF5
+ file. On the other hand, it might be useful if one file
+ contains some public data which can be mounted by other files
+ (e.g., a mesh topology shared among collaborators and mounted by
+ files that contain other fields defined on the mesh). Of course
+ the applications can open the two files separately, but it might
+ be more portable if we support it in the library.
+
+ So we're looking at about two weeks to implement all three
+ versions. I didn't get a chance to do any of them in AIO
+ although we had long-term plans for the first two with a
+ possibility of the third. They'll be much easier to implement in
+ HDF5 than AIO since I've been keeping these in mind from the
+ start.
+
+ HDF5 files are composed of a "boot block" describing information
+ required to portably access files on multiple platforms, followed
+ by information about the groups in a file and the datasets in the
+ file. The boot block contains information about the size of offsets
+ and lengths of objects, the number of entries in symbol tables
+ (used to store groups) and additional version information for the
+ file.
+
+ The HDF5 library assumes that all files are implicitly opened for read
+ access at all times. Passing the Files are created with the Additional parameters to File creation property lists apply to File access property lists apply to This following example shows how to create a file with 64-bit object
+ offsets and lengths: This following example shows how to open an existing file for
+ independent datasets access by MPI parallel I/O: HDF5 is able to access its address space through various types of
+ low-level file drivers. For instance, an address space might
+ correspond to a single file on a Unix file system, multiple files on a
+ Unix file system, multiple files on a parallel file system, or a block
+ of memory within the application. Generally, an HDF5 address space is
+ referred to as an "HDF5 file" regardless of how the space is organized
+ at the storage level.
+
+ The sec2 driver uses functions from section 2 of the
+ Posix manual to access files stored on a local file system. These are
+ the The stdio driver uses the functions declared in the
+ The core driver uses This driver uses MPI I/O to provide parallel access to a file.
+
+ A single HDF5 address space may be split into multiple files which,
+ together, form a file family. Each member of the family must be the
+ same logical size although the size and disk storage reported by
+ Any HDF5 file can be split into a family of files by running
+ the file through On occasion, it might be useful to separate meta data from raw
+ data. The split driver does this by creating two files: one for
+ meta data and another for raw data. The application provides a base
+ file name to An object in HDF5 consists of an object header at a fixed file
+ address that contains messages describing various properties of
+ the object such as its storage location, layout, compression,
+ etc. and some of these messages point to other data such as the
+ raw data of a dataset. The address of the object header is also
+ known as an OID and HDF5 has facilities for translating
+ names to OIDs.
+
+ Every HDF5 object has at least one name and a set of names can
+ be stored together in a group. Each group implements a name
+ space where the names are any length and unique with respect to
+ other names in the group.
+
+ Since a group is a type of HDF5 object it has an object header
+ and a name which exists as a member of some other group. In this
+ way, groups can be linked together to form a directed graph.
+ One particular group is called the Root Group and is
+ the group to which the HDF5 file boot block points. Its name is
+ "/" by convention. The full name of an object is
+ created by joining component names with slashes much like Unix.
+
+
+ However, unlike Unix which arranges directories hierarchically,
+ HDF5 arranges groups in a directed graph. Therefore, there is
+ no ".." entry in a group since a group can have more than one
+ parent. There is no "." entry either but the library understands
+ it internally.
+
+ HDF5 places few restrictions on names: component names may be
+ any length except zero and may contain any character except
+ slash ("/") and the null terminator. A full name may be
+ composed of any number of component names separated by slashes,
+ with any of the component names being the special name ".". A
+ name which begins with a slash is an absolute name
+ which is looked up beginning at the root group of the file while
+ all other relative names are looked up beginning at the
+ current working group (described below) or a specified group.
+ Multiple consecutive slashes in a full name are treated as
+ single slashes and trailing slashes are not significant. A
+ special case is the name "/" (or equivalent) which refers to the
+ root group.
+
+ Functions which operate on names generally take a location
+ identifier which is either a file ID or a group ID and perform
+ the lookup with respect to that location. Some possibilities
+ are:
+
+
+ Groups are created with the Each file handle ( An object (including a group) can have more than one
+ name. Creating the object gives it the first name, and then
+ functions described here can be used to give it additional
+ names. The association between a name and the object is called
+ a link and HDF5 supports two types of links: a hard
+ link is a direct association between the name and the
+ object where both exist in a single HDF5 address space, and a
+ soft link is an indirect association.
+
+
+
+ These functions are designed to provide access to HDF5 application/library
+behavior. They are used to get information about or change global library
+parameters.
+ These functions are designed to provide file-level access to HDF5 files.
+Further manipulation of objects inside a file is performed through one of APIs
+documented below.
+ These functions manipulate template objects to allow objects which require
+many different parameters to be easily manipulated.
+ These functions create and manipulate dataset objects. Each dataset must
+be constructed from a datatype and a dataspace.
+ These functions create and manipulate the datatype which describes elements
+of a dataset.
+ If the precision is increased then the offset is decreased and then
+ the size is increased to insure that significant bits do not "hang
+ over" the edge of the data type.
+ Changing the precision of an H5T_STRING automatically changes the
+ size as well. The precision must be a multiple of 8.
+ When decreasing the precision of a floating point type, set the
+ locations and sizes of the sign, mantissa, and exponent fields
+ first.
+ If the offset is incremented then the total size is
+incremented also if necessary to prevent significant bits of
+the value from hanging over the edge of the data type.
+
+ The offset of an H5T_STRING cannot be set to anything but
+zero.
+ Fields are not allowed to extend beyond the number of bits of
+ precision, nor are they allowed to overlap with one another.
+ Note: All members of a compound data type must be atomic; a
+ compound data type cannot have a member which is a compound data
+ type.
+ If The type of the conversion function pointer is declared as:
+ typedef herr_t (*H5T_conv_t) (hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata,
+ size_t nelmts, void *buf, void *bkg);
+ The type of the conversion function pointer is declared as:
+ typedef herr_t (*H5T_conv_t) (hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata,
+ size_t nelmts, void *buf, void *bkg);
+ The type of the conversion function pointer is declared as:
+ typedef herr_t (*H5T_conv_t) (hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata,
+ size_t nelmts, void *buf, void *bkg);
+ These functions create and manipulate the dataspace in which to store the
+elements of a dataset.
+ A group associates names with objects and provides a mechanism
+which can map a name to an object. Since all objects
+appear in at least one group (with the possible exception of the root
+object) and since objects can have names in more than one group, the
+set of all objects in an HDF5 file is a directed graph. The internal
+nodes (nodes with out-degree greater than zero) must be groups while
+the leaf nodes (nodes with out-degree zero) are either empty groups or
+objects of some other type. Exactly one object in every non-empty
+file is the root object. The root object always has a positive
+in-degree because it is pointed to by the file boot block.
+
+ Every file handle returned by An object name consists of one or more components separated from
+one another by slashes. If the name begins with a slash then the
+object is located by looking for the first component in the root
+object, then looking for the second component in that object, etc.,
+until the entire name is traversed. If the name doesn't begin with a
+slash then the traversal begins with the current working group.
+
+ The library does not maintain the full absolute name of its current
+working group because (1) cycles in the graph can make the name length
+unbounded and (2) a group doesn't necessarily have a unique name. A
+more Unix-like hierarchical naming scheme can be implemented on top of
+the directed graph scheme by creating a ".." entry in each group that
+points to its single predecessor and then a Since many of the typedefs in the HDF5 API are not well-defined yet,
+the types below may change radically en route to a final API...
+ The format of a HDF5 file on disk encompasses several
+ key ideas of the current HDF4 & AIO file formats as well as
+ addressing some short-comings therein. The new format will be
+ more self-describing than the HDF4 format and will be more
+ uniformly applied to data objects in the file.
+
+
+ Three levels of information compose the file format. The level
+ 0 contains basic information for identifying and
+ "boot-strapping" the file. Level 1 information is composed of
+ the object directory (stored as a B-tree) and is used as the
+ index for all the objects in the file. The rest of the file is
+ composed of data-objects at level 2, with each object
+ partitioned into header (or "meta") information and data
+ information.
+
+ The sizes of various fields in the following layout tables are
+ determined by looking at the number of columns the field spans
+ in the table. There are three exceptions: (1) The size may be
+ overridden by specifying a size in parentheses, (2) the size of
+ addresses is determined by the Size of Addresses field
+ in the boot block, and (3) the size of size fields is determined
+ by the Size of Sizes field in the boot block.
+
+ The boot block may begin at certain predefined offsets within
+ the HDF5 file, allowing a block of unspecified content for
+ users to place additional information at the beginning (and
+ end) of the HDF5 file without limiting the HDF5 library's
+ ability to manage the objects within the file itself. This
+ feature was designed to accommodate wrapping an HDF5 file in
+ another file format or adding descriptive information to the
+ file without requiring the modification of the actual file's
+ information. The boot-block is located by searching for the
+ HDF5 file signature at byte offset 0, byte offset 512 and at
+ successive locations in the file, each a multiple of two of
+ the previous location, i.e. 0, 512, 1024, 2048, etc.
+
+ The boot-block is composed of a file signature, followed by
+ boot block and object directory version numbers, information
+ about the sizes of offset and length values used to describe
+ items within the file, the size of each object directory page,
+ and a symbol table entry for the root object in the file.
+
+
+
+ B-link trees allow flexible storage for objects which tend to grow
+ in ways that cause the object to be stored discontiguously. B-trees
+ are described in various algorithms books including "Introduction to
+ Algorithms" by Thomas H. Cormen, Charles E. Leiserson, and Ronald
+ L. Rivest. The B-link tree, in which the sibling nodes at a
+ particular level in the tree are stored in a doubly-linked list,
+ is described in the "Efficient Locking for Concurrent Operations
+ on B-trees" paper by Phillip Lehman and S. Bing Yao as published
+ in the ACM Transactions on Database Systems, Vol. 6,
+ No. 4, December 1981.
+
+ The B-link trees implemented by the file format contain one more
+ key than the number of children. In other words, each child
+ pointer out of a B-tree node has a left key and a right key.
+ The pointers out of internal nodes point to sub-trees while
+ the pointers out of leaf nodes point to other file data types.
+ Notwithstanding that difference, internal nodes and leaf nodes
+ are identical.
+
+
+
+ A symbol table is a group internal to the file that allows
+ arbitrary nesting of objects (including other symbol
+ tables). A symbol table maps a set of names to a set of file
+ address relative to the file boot block. Certain meta data
+ for an object to which the symbol table points can be cached
+ in the symbol table in addition to (or in place of?) the
+ object header.
+
+ An HDF5 object name space can be stored hierarchically by
+ partitioning the name into components and storing each
+ component in a symbol table. The symbol table entry for a
+ non-ultimate component points to the symbol table containing
+ the next component. The symbol table entry for the last
+ component points to the object being named.
+
+ A symbol table is a collection of symbol table nodes pointed
+ to by a B-link tree. Each symbol table node contains entries
+ for one or more symbols. If an attempt is made to add a
+ symbol to an already full symbol table node containing
+ 2K entries, then the node is split and one node
+ contains K symbols and the other contains
+ K+1 symbols.
+
+
+
+ Each symbol table entry in a symbol table node is designed to allow
+ for very fast browsing of commonly stored scientific objects.
+ Toward that design goal, the format of the symbol-table entries
+ includes space for caching certain constant meta data from the
+ object header.
+
+
+
+ The symbol table entry scratch-pad space is formatted
+ according to the value of the Symbol Type field. If the
+ Symbol Type field has the value zero then no information is
+ stored in the scratch pad space.
+
+ If the Symbol Type field is one, then the scratch pad space
+ contains cached meta data for another symbol table with the format:
+
+
+
+
+
+ A heap is a collection of small heap objects. Objects can be
+ inserted and removed from the heap at any time and the address
+ of a heap doesn't change once the heap is created. Note: this
+ is the "local" version of the heap mostly intended for the
+ storage of names in a symbol table. The storage of small
+ objects in a global heap is described below.
+
+
+
+ Objects within the heap should be aligned on an 8-byte boundary.
+
+ Each HDF5 file has a global heap which stores various types of
+ information which is typically shared between datasets. The
+ global heap was designed to satisfy these goals:
+
+ The implementation of the heap makes use of the memory
+ management already available at the file level and combines that
+ with a new top-level object called a collection to
+ achieve Goal B. The global heap is the set of all collections.
+ Each global heap object belongs to exactly one collection and
+ each collection contains one or more global heap objects. For
+ the purposes of disk I/O and caching, a collection is treated as
+ an atomic object.
+
+
+
+
+
+ The Free-Space Index is a collection of blocks of data,
+ dispersed throughout the file, which are currently not used by
+ any file objects. The blocks of data are indexed by a B-tree of
+ their length within the file.
+
+ Each B-Tree page is composed of the following entries and
+ B-tree management information, organized as follows:
+
+
+
+ The algorithms for searching and inserting objects in the
+ B-tree pages are described fully in the Lehman & Yao paper,
+ which should be read to provide a full description of the
+ B-Tree's usage.
+
+ Data objects contain the real information in the file. These
+ objects compose the scientific data and other information which
+ are generally thought of as "data" by the end-user. All the
+ other information in the file is provided as a framework for
+ these data objects.
+
+ A data object is composed of header information and data
+ information. The header information contains the information
+ needed to interpret the data information for the data object as
+ well as additional "meta-data" or pointers to additional
+ "meta-data" used to describe or annotate each data object.
+
+ The header information of an object is designed to encompass
+ all the information about an object which would be desired to be
+ known, except for the data itself. This information includes
+ the dimensionality, number-type, information about how the data
+ is stored on disk (in external files, compressed, broken up in
+ blocks, etc.), as well as other information used by the library
+ to speed up access to the data objects or maintain a file's
+ integrity. The header of each object is not necessarily located
+ immediately prior to the object's data in the file and in fact
+ may be located in any position in the file.
+
+
+
+ The header message types and the message data associated with
+ them compose the critical "meta-data" about each object. Some
+ header messages are required for each object while others are
+ optional. Some optional header messages may also be repeated
+ several times in the header itself, the requirements and number
+ of times allowed in the header will be noted in each header
+ message description below.
+
+ The following is a list of currently defined header messages:
+
+ The Simple Dimensionality message describes the number
+ of dimensions and size of each dimension that the data object
+ has. This message is only used for datasets which have a
+ simple, rectilinear grid layout, datasets requiring a more
+ complex layout (irregularly or unstructured grids, etc) must use
+ the Data-Space message for expressing the space the
+ dataset inhabits.
+
+
+
+
+
+ The following grid combinations are currently allowed:
+
+
+
+
+
+ The data type message defines the data type for each data point
+ of a dataset. A data type can describe an atomic type like a
+ fixed- or floating-point type or a compound type like a C
+ struct. A data type does not, however, describe how data points
+ are combined to produce a dataset. Data types are stored on disk
+ as a data type message, which is a list of data type classes and
+ their associated properties.
+
+
+ The Class Bit Field and Properties fields vary depending
+ on the Type Class. The type class is one of: 0 (fixed-point
+ number), 1 (floating-point number), 2 (date and time), 3 (text
+ string), 4 (bit field), 5 (opaque), 6 (compound). The Class Bit
+ Field is zero and the size of the Properties field is zero
+ except for the cases noted here.
+
+
+
+
+
+
+ The Properties field of a compound data type is a list of the
+ member definitions of the compound data type. The member
+ definitions appear one after another with no intervening bytes.
+ The member types are described with a recursive data type
+ message.
+
+
+ Data type examples are here.
+
+
+ This message indicates that the data for the data object is
+ stored within the current HDF file by including the actual
+ data within the header data for this message. The data is
+ stored internally in
+ the "normal" format, i.e. in one chunk, un-compressed, etc.
+
+ Note that one and only one of the "Data Storage" headers can be
+ stored for each data object.
+
+ Format of Data: The message data is actually composed
+ of dataset data, so the format will be determined by the dataset
+ format.
+
+ Purpose and Description: The external object message
+ indicates that the data for an object is stored outside the HDF5
+ file. The filename of the object is stored as a Universal
+ Resource Location (URL) of the actual filename containing the
+ data. An external file list record also contains the byte offset
+ of the start of the data within the file and the amount of space
+ reserved in the file for that data.
+
+
+
+
+
+ Purpose and Description: Data layout describes how the
+ elements of a multi-dimensional array are arranged in the linear
+ address space of the file. Two types of data layout are
+ supported:
+
+
+
+ Purpose and Description: Compressed objects are
+ datasets which are stored in an HDF file after they have been
+ compressed. The encoding algorithm and its parameters are
+ stored in a Compression Message in the object header of the
+ dataset.
+
+
+
+ Sometimes additional redundancy can be added to the data before
+ it's compressed to result in a better compression ratio. The
+ library doesn't specifically support modeling methods to add
+ redundancy, but the effect can be achieved through the use of
+ user-defined data types.
+
+ The library uses the following compression methods.
+ The compression is applied independently to each chunk of
+ storage (after data space and data type conversions). If the
+ compression is unable to make the chunk smaller than it would
+ normally be, the chunk is stored without compression. At the
+ library's discretion, chunks which fail the compression can also
+ be stored in their raw format.
+
+
+
+ Purpose and Description: The Attribute List
+ message is used to list objects in the HDF file which are used
+ as attributes, or "meta-data" about the current object. Other
+ objects can be used as attributes for either the entire object
+ or portions of the current object. The attribute list is
+ composed of two lists of objects, the first being simple
+ attributes about the entire dataset, and the second being
+ pointers attribute objects about the entire dataset. Partial
+ dataset pointers are currently unspecified and
+ unimplemented.
+
+ Format of Data:
+
+
+
+ [Note: It has been suggested that each attribute have an
+ additional "units" field, so this is being considered.]
+
+ A constant message can be shared among several object headers
+ by writing that message in the global heap and having the object
+ headers all point to it. The pointing is accomplished with a
+ Shared Object message which is understood directly by the object
+ header layer of the library and never actually appears as a
+ message in the file. It is also possible to have a message of
+ one object header point to a message in some other object
+ header, but care must be exercised to prevent cycles.
+
+ If a message is shared, then the message appears in the global
+ heap and its message ID appears in the Header Message Type
+ field of the object header. Also, the Flags field in the object
+ header for that message will have bit two set (the
+
+
+
+ The object header continuation is formatted as follows (assuming a 4-byte
+length & offset are being used in the current file):
+
+
+
+ The symbol table message is formatted as follows:
+
+
+
+ In order to share header messages between several dataset objects, object
+header messages may be placed into the global small-data heap. Since these
+messages require additional information beyond the basic object header message
+information, the format of the shared message is detailed below.
+
+
+ The data information for an object is stored separately from the header
+information in the file and may not actually be located in the HDF5 file
+itself if the header indicates that the data is stored externally. The
+information for each record in the object is stored according to the
+dimensionality of the object (indicated in the dimensionality header message).
+Multi-dimensional data is stored in C order [same as current scheme], i.e. the
+"last" dimension changes fastest.
+ Data whose elements are composed of simple number-types are stored in
+native-endian IEEE format, unless they are specifically defined as being stored
+in a different machine format with the architecture-type information from the
+number-type header message. This means that each architecture will need to
+[potentially] byte-swap data values into the internal representation for that
+particular machine.
+ Data with a "variable" sized number-type is stored in an data heap
+internal to the HDF file [which should not be user-modifiable].
+ Data whose elements are composed of pointer number-types are stored in several
+different ways depending on the particular pointer type involved. Simple
+pointers are just stored as the dataset offset of the object being pointed to with the
+size of the pointer being the same number of bytes as offsets in the file.
+Partial-object pointers are stored as a heap-ID which points to the following
+information within the file-heap: an offset of the object pointed to, number-type
+information (same format as header message), dimensionality information (same
+format as header message), sub-set start and end information (i.e. a coordinate
+location for each), and field start and end names (i.e. a [pointer to the]
+string indicating the first field included and a [pointer to the] string name
+for the last field).
+Browse pointers are stored as an heap-ID (for the name in the file-heap)
+followed by a offset of the data object being referenced.
+ Data of a compound data-type is stored as a contiguous stream of the items
+in the structure, with each item formatted according to it's
+data-type.
+
+ Introduction to HDF5 1.0 Alpha1.0 This is a brief introduction to the HDF5 data model and programming model. It is not a full user's guide, but should provide enough information for you to understand how HDF5 is meant to work. Knowledge of the current version of HDF should make it easier to follow the text, but it is not required. For further information on the topics covered here, see the HDF5 documentation at HDF5 is a new, experimental version of HDF that is designed to address some of the limitations of the current version of HDF (HDF4.1) and to address current and anticipated requirements of modern systems and applications. This HDF5 prototype is not complete, but it should be sufficient show the basic features of HDF5. We urge you to look at it and give us feedback on what you like or don't like about it, and what features you would like to see added to it. Why HDF5? The development of HDF5 is motivated by a number of limitations in the current HDF format, as well as limitations in the library. Some of these limitations are: When complete HDF5 will include the following improvements. The prototype release includes most of the basic functionality that is planned for the HDF5 library. However, the library does not implement all of the features detailed in the format and API specifications. Here is a listing of some of the limitations of the current release: See the API Specification at Attributes
+
+ 1. Introduction
+
+ 2. Creating, Opening, Closing and Deleting Attributes
+
+ H5Acreate()
function,
+ and existing attributes can be accessed with either the
+ H5Aopen_name()
or H5Aopen_idx()
functions. All
+ three functions return an object ID which should be eventually released
+ by calling H5Aclose()
.
+
+
+
+
+ hid_t H5Acreate (hid_t loc_id, const char
+ *name, hid_t type_id, hid_t space_id,
+ hid_t create_plist_id)
+
+ hid_t H5Aopen_name (hid_t loc_id, const char
+ *name)
+
+ hid_t H5Aopen_idx (hid_t loc_id, unsigned
+ idx)
+
+ herr_t H5Aclose (hid_t attr_id)
+
+ herr_t H5Adelete (hid_t loc_id,
+ const char *name)
+ 3. Attribute I/O Functions
+
+
+
+
+ herr_t H5Awrite (hid_t attr_id,
+ hid_t mem_type_id, void *buf)
+
+ herr_t H5Aread (hid_t attr_id,
+ hid_t mem_type_id, void *buf)
+ 4. Attribute Inquiry Functions
+
+
+
+
+ int H5Aiterate (hid_t loc_id,
+ unsigned *attr_number,
+ H5A_operator operator,
+ void *operator_data)
+
+ typedef herr_t (*H5A_operator_t)(hid_t loc_id,
+ const char *attr_name, void *operator_data);
+
+
+
+ hid_t H5Aget_space (hid_t attr_id)
+
+ hid_t H5Aget_type (hid_t attr_id)
+
+ size_t H5Aget_name (hid_t attr_id,
+ char *buf, size_t buf_size)
+
+ int H5Anum_attrs (hid_t loc_id)
+
+ HDF Support
+
+
diff --git a/doc/html/Big.html b/doc/html/Big.html
new file mode 100644
index 0000000..080f786
--- /dev/null
+++ b/doc/html/Big.html
@@ -0,0 +1,111 @@
+
+
+
+ Big Datasets on Small Machines
+
+ 1. Introduction
+
+ sizeof(off_t)
+ and sizeof(size_t)
are both four bytes can handle
+ datasets and files as large as 18x10^18 bytes. However, most
+ Unix systems limit the number of concurrently open files, so a
+ practical file size limit is closer to 512GB or 1TB.
+
+ off_t
file size limit and the second circumvents
+ the size_t
main memory limit.
+
+ 2. File Size Limits
+
+
+
+
+checking for lseek64... yes
+checking for fseek64... yes
+
printf
-style integer format. For instance:
+
+
+
+
+hid_t plist, file;
+plist = H5Pcreate (H5P_FILE_ACCESS);
+H5Pset_family (plist, 1<<30, H5P_DEFAULT);
+file = H5Fcreate ("big%03d.h5", H5F_ACC_TRUNC, H5P_DEFAULT, plist);
+
30
) to
+ H5Pset_family()
indicates that the family members
+ are to be 2^30 bytes (1GB) each. In general, family members
+ cannot be 2GB because writes to byte number 2,147,483,647 will
+ fail, so the largest safe value for a family member is
+ 2,147,483,647. HDF5 will create family members on demand as the
+ HDF5 address space increases, but since most Unix systems limit
+ the number of concurrently open files the effective maximum size
+ of the HDF5 address space will be limited.
+
+
+
+
+hid_t plist = H5Pcreate (H5P_DATASET_CREATE);
+for (i=0; i<5*1024; i++) {
+ sprintf (name, "velocity-%04d.raw", i);
+ H5Pset_external (plist, name, 0, (size_t)1<<30);
+}
+
3. Dataset Size Limits
+
+ sizeof(size_t)
. HDF5 defines a new data type
+ called hsize_t
which is used for sizes of datasets
+ and is, by default, defined as unsigned long long
.
+
+ size_t
, and a
+ 1-dimensional dataset whose dimension size is too large to fit
+ in a size_t
.
+
+
+
+
+hsize_t size1[4] = {8, 1024, 1024, 1024};
+hid_t space1 = H5Screate_simple (4, size1, size1);
+
+hsize_t size2[1] = {8589934592LL};
+hid_t space2 = H5Screate_simple (1, size2, size2};
+
LL
suffix is not portable, so it may
+ be better to replace the number with
+ (hsize_t)8*1024*1024*1024
.
+
+ long long
large
+ datasets will not be possible. The library performs too much
+ arithmetic on hsize_t
types to make the use of a
+ struct feasible.
+
+
+ Robb Matzke
+
+
+Last modified: Wed May 13 12:36:47 EDT 1998
+
+
+
diff --git a/doc/html/Caching.html b/doc/html/Caching.html
new file mode 100644
index 0000000..4e5a6ac
--- /dev/null
+++ b/doc/html/Caching.html
@@ -0,0 +1,82 @@
+
+
+
+ Meta Data Caching
+
+ Raw Data Chunk Caching
+
+
+
+
+ The API
+
+
+
+
+ herr_t H5Pset_cache(hid_t plist, unsigned int
+ mdc_nelmts, size_t rdcc_nbytes, double
+ w0)
+ herr_t H5Pget_cache(hid_t plist, unsigned int
+ *mdc_nelmts, size_t *rdcc_nbytes, double
+ w0)
+ H5Pget_cache()
any (or all) of
+ the pointer arguments may be null pointers.
+
+ Robb Matzke
+
+
+Last modified: Tue May 26 15:38:27 EDT 1998
+
+
+
diff --git a/doc/html/CodeReview.html b/doc/html/CodeReview.html
new file mode 100644
index 0000000..213cbbe
--- /dev/null
+++ b/doc/html/CodeReview.html
@@ -0,0 +1,300 @@
+
+
+
+ Code Review 1
Some background...
+ H5B.c
file that implements a B-link-tree class
+ without worrying about concurrency yet (thus the `Note:' in the
+ function prologue). The H5B.c
file provides the
+ basic machinery for operating on generic B-trees, but it isn't
+ much use by itself. Various subclasses of the B-tree (like
+ symbol tables or indirect storage) provide their own interface
+ and back end to this function. For instance,
+ H5G_stab_find()
takes a symbol table OID and a name
+ and calls H5B_find()
with an appropriate
+ udata
argument that eventually gets passed to the
+ H5G_stab_find()
function.
+
+
+
+
+ 1 /*-------------------------------------------------------------------------
+ 2 * Function: H5B_find
+ 3 *
+ 4 * Purpose: Locate the specified information in a B-tree and return
+ 5 * that information by filling in fields of the caller-supplied
+ 6 * UDATA pointer depending on the type of leaf node
+ 7 * requested. The UDATA can point to additional data passed
+ 8 * to the key comparison function.
+ 9 *
+10 * Note: This function does not follow the left/right sibling
+11 * pointers since it assumes that all nodes can be reached
+12 * from the parent node.
+13 *
+14 * Return: Success: SUCCEED if found, values returned through the
+15 * UDATA argument.
+16 *
+17 * Failure: FAIL if not found, UDATA is undefined.
+18 *
+19 * Programmer: Robb Matzke
+20 * matzke@llnl.gov
+21 * Jun 23 1997
+22 *
+23 * Modifications:
+24 *
+25 *-------------------------------------------------------------------------
+26 */
+27 herr_t
+28 H5B_find (H5F_t *f, const H5B_class_t *type, const haddr_t *addr, void *udata)
+29 {
+30 H5B_t *bt=NULL;
+31 intn idx=-1, lt=0, rt, cmp=1;
+32 int ret_value = FAIL;
+
+
+
+33
+34 FUNC_ENTER (H5B_find, NULL, FAIL);
+35
+36 /*
+37 * Check arguments.
+38 */
+39 assert (f);
+40 assert (type);
+41 assert (type->decode);
+42 assert (type->cmp3);
+43 assert (type->found);
+44 assert (addr && H5F_addr_defined (addr));
+
assert
to check invariant conditions. At
+ this level of the library, none of these assertions should fail
+ unless something is majorly wrong. The arguments should have
+ already been checked by higher layers. It also provides
+ documentation about what arguments might be optional.
+
+
+
+
+45
+46 /*
+47 * Perform a binary search to locate the child which contains
+48 * the thing for which we're searching.
+49 */
+50 if (NULL==(bt=H5AC_protect (f, H5AC_BT, addr, type, udata))) {
+51 HGOTO_ERROR (H5E_BTREE, H5E_CANTLOAD, FAIL);
+52 }
+
H5AC.c
file. The
+ H5AC_protect
insures that the B-tree node (which
+ inherits from the H5AC package) whose OID is addr
+ is locked into memory for the duration of this function (see the
+ H5AC_unprotect
on line 90). Most likely, if this
+ node has been accessed in the not-to-distant past, it will still
+ be in memory and the H5AC_protect
is almost a
+ no-op. If cache debugging is compiled in, then the protect also
+ prevents other parts of the library from accessing the node
+ while this function is protecting it, so this function can allow
+ the node to be in an inconsistent state while calling other
+ parts of the library.
+
+ H5AC_find
and assume that the pointer it returns is
+ valid only until some other library function is called, but
+ since we're accessing the pointer throughout this function, I
+ chose to use the simpler protect scheme. All protected objects
+ must be unprotected before the file is closed, thus the
+ use of HGOTO_ERROR
instead of
+ HRETURN_ERROR
.
+
+
+
+
+53 rt = bt->nchildren;
+54
+55 while (lt<rt && cmp) {
+56 idx = (lt + rt) / 2;
+57 if (H5B_decode_keys (f, bt, idx)<0) {
+58 HGOTO_ERROR (H5E_BTREE, H5E_CANTDECODE, FAIL);
+59 }
+60
+61 /* compare */
+62 if ((cmp=(type->cmp3)(f, bt->key[idx].nkey, udata,
+63 bt->key[idx+1].nkey))<0) {
+64 rt = idx;
+65 } else {
+66 lt = idx+1;
+67 }
+68 }
+69 if (cmp) {
+70 HGOTO_ERROR (H5E_BTREE, H5E_NOTFOUND, FAIL);
+71 }
+
(type->cmp3)()
is an indirect
+ function call into the subclass of the B-tree. All indirect
+ function calls have the function part in parentheses to document
+ that it's indirect (quite obvious here, but not so obvious when
+ the function is a variable).
+
+ =
instead
+ of ==
.
+
+
+
+
+72
+73 /*
+74 * Follow the link to the subtree or to the data node.
+75 */
+76 assert (idx>=0 && idx
<0
gets lost at the end. Another thing to note is
+ that success/failure is always determined by comparing with zero
+ instead of SUCCEED
or FAIL
. I do this
+ because occassionally one might want to return other meaningful
+ values (always non-negative) or distinguish between various types of
+ failure (always negative).
+
+
+
+
+88
+89 done:
+90 if (bt && H5AC_unprotect (f, H5AC_BT, addr, bt)<0) {
+91 HRETURN_ERROR (H5E_BTREE, H5E_PROTECT, FAIL);
+92 }
+93 FUNC_LEAVE (ret_value);
+94 }
+
HRETURN_ERROR
macro even though it
+ will make the error stack not quite right. I also use short
+ circuiting boolean operators instead of nested if
+ statements since that's standard C practice.
+
+ Code Review 2
+
+
+ 1 /*--------------------------------------------------------------------------
+ 2 NAME
+ 3 H5Fflush
+ 4
+ 5 PURPOSE
+ 6 Flush all cached data to disk and optionally invalidates all cached
+ 7 data.
+ 8
+ 9 USAGE
+10 herr_t H5Fflush(fid, invalidate)
+11 hid_t fid; IN: File ID of file to close.
+12 hbool_t invalidate; IN: Invalidate all of the cache?
+13
+14 ERRORS
+15 ARGS BADTYPE Not a file atom.
+16 ATOM BADATOM Can't get file struct.
+17 CACHE CANTFLUSH Flush failed.
+18
+19 RETURNS
+20 SUCCEED/FAIL
+21
+22 DESCRIPTION
+23 This function flushes all cached data to disk and, if INVALIDATE
+24 is non-zero, removes cached objects from the cache so they must be
+25 re-read from the file on the next access to the object.
+26
+27 MODIFICATIONS:
+28 --------------------------------------------------------------------------*/
+
+
+
+29 herr_t
+30 H5Fflush (hid_t fid, hbool_t invalidate)
+31 {
+32 H5F_t *file = NULL;
+33
+34 FUNC_ENTER (H5Fflush, H5F_init_interface, FAIL);
+35 H5ECLEAR;
+
+
+
+36
+37 /* check arguments */
+38 if (H5_FILE!=H5Aatom_group (fid)) {
+39 HRETURN_ERROR (H5E_ARGS, H5E_BADTYPE, FAIL); /*not a file atom*/
+40 }
+41 if (NULL==(file=H5Aatom_object (fid))) {
+42 HRETURN_ERROR (H5E_ATOM, H5E_BADATOM, FAIL); /*can't get file struct*/
+43 }
+
assert
arguments at this level.
+ We also convert atoms to pointers since atoms are really just a
+ pointer-hiding mechanism. Functions that can be called
+ internally always have pointer arguments instead of atoms
+ because (1) then they don't have to always convert atoms to
+ pointers, and (2) the various pointer data types provide more
+ documentation and type checking than just an hid_t
+ type.
+
+
+
+
+44
+45 /* do work */
+46 if (H5F_flush (file, invalidate)<0) {
+47 HRETURN_ERROR (H5E_CACHE, H5E_CANTFLUSH, FAIL); /*flush failed*/
+48 }
+
assert
to check/document
+ it's arguments and can be called from other library functions.
+
+
+
+
+49
+50 FUNC_LEAVE (SUCCEED);
+51 }
+
+ Robb Matzke
+
+
+Last modified: Mon Nov 10 15:33:33 EST 1997
+
+
+
diff --git a/doc/html/Coding.html b/doc/html/Coding.html
new file mode 100644
index 0000000..dbf55bf
--- /dev/null
+++ b/doc/html/Coding.html
@@ -0,0 +1,300 @@
+
+
+
+
+
+
+
+
+
+ FILES
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ These appear only in one header file anyway.
+
+ Always start with `HDF5_HAVE_' like HDF5_HAVE_STDARG_H for a
+ header file, or HDF5_HAVE_DEV_T for a data type, or
+ HDF5_HAVE_DIV for a function.
+
+
Compression
+
+ 1. Introduction
+
+ H5D_CHUNKED
storage
+ layout. The library doesn't support compression for contiguous
+ datasets because of the difficulty of implementing random access
+ for partial I/O, and compact dataset compression is not
+ supported because it wouldn't produce significant results.
+
+ 2. Supported Compression Methods
+
+
+
+
+
+
+ Method Name
+ Description
+
+
+
+
+ H5Z_NONE
The default is to not use compression. Specifying
+
+ H5Z_NONE
as the compression method results
+ in better perfomance than writing a function that just
+ copies data because the library's I/O pipeline
+ recognizes this method and is able to short circuit
+ parts of the pipeline.
+
+
+
+ H5Z_DEFLATE
The deflate method is the algorithm used by
+ the GNU
+ gzip
program. It's a combination of
+ a Huffman encoding followed by a 1977 Lempel-Ziv (LZ77)
+ dictionary encoding. The aggressiveness of the
+ compression can be controlled by passing an integer value
+ to the compressor with H5Pset_deflate()
+ (see below). In order for this compression method to be
+ used, the HDF5 library must be configured and compiled
+ in the presence of the GNU zlib version 1.1.2 or
+ later.
+
+
+
+ H5Z_RES_N
These compression methods (where N is in the
+ range two through 15, inclusive) are reserved by NCSA
+ for future use.
+
+
+ Values of N between 16 and 255, inclusive
+ These values can be used to represent application-defined
+ compression methods. We recommend that methods under
+ testing should be in the high range and when a method is
+ about to be published it should be given a number near
+ the low end of the range (or even below 16). Publishing
+ the compression method and its numeric ID will make a
+ file sharable.
+ H5Dread()
will return a
+ negative value). The errors will be displayed in the
+ compression statistics if the library was compiled with
+ debugging turned on for the "z" package. See the
+ section on diagnostics below for more details.
+
+ 3. Application-Defined Methods
+
+ method_c
and
+ method_u
respectively. One should not use
+ the names compress
or uncompress
since
+ they are likely to conflict with other compression libraries
+ (like the GNU zlib).
+
+ method_c
and
+ method_u
functions take the same arguments
+ and return the same values. They are defined with the type:
+
+
+
+
+ typedef size_t (*H5Z_func_t)(unsigned int
+ flags, size_t cd_size, const void
+ *client_data, size_t src_nbytes, const
+ void *src, size_t dst_nbytes, void
+ *dst/*out*/)
+ H5Zregister()
. This
+ function can also be used to remove a compression method from
+ the library by supplying null pointers for the functions.
+
+
+
+
+ herr_t H5Zregister (H5Z_method_t method,
+ const char *name, H5Z_func_t method_c,
+ H5Z_func_t method_u)
+
+
+ Example: Registering an
+ Application-Defined Compression Method
+
+
+
+ H5Z_NONE
method but
+ slower. Compression and uncompression are performed
+ by the same function.
+
+
+
+
+size_t
+bogus (unsigned int flags,
+ size_t cd_size, const void *client_data,
+ size_t src_nbytes, const void *src,
+ size_t dst_nbytes, void *dst/*out*/)
+{
+ memcpy (dst, src, src_nbytes);
+ return src_nbytes;
+}
+
+
+
+#define H5Z_BOGUS 250
+H5Zregister (H5Z_BOGUS, "bogus", bogus, bogus);
+
+
+
+H5Zregister (H5Z_BUGUS, "bogus", NULL, NULL);
+
4. Enabling Compression for a Dataset
+
+ H5Pcreate()
).
+
+
+
+
+ herr_t H5Pset_deflate (hid_t plist, int
+ level)
+ H5Z_DEFLATE
and the
+ aggression level is set to level. The level
+ must be a value between one and nine, inclusive, where one
+ indicates no (but fast) compression and nine is aggressive
+ compression.
+
+
+ int H5Pget_deflate (hid_t plist)
+ H5Z_DEFLATE
compression then this function
+ will return the aggression level, an integer between one and
+ nine inclusive. If plist isn't a valid dataset
+ creation property list or it isn't set to use the deflate
+ method then a negative value is returned.
+
+
+ herr_t H5Pset_compression (hid_t plist,
+ H5Z_method_t method, unsigned int flags,
+ size_t cd_size, const void *client_data)
+ H5Pset_deflate()
. The dataset creation property
+ list plist is adjusted to use the specified
+ compression method. The flags is an 8-bit vector
+ which is stored in the file as part of the compression message
+ and passed to the compress and uncompress functions. The
+ client_data is a byte array of length
+ cd_size which is copied to the file and passed to the
+ compress and uncompress methods.
+
+
+ H5Z_method_t H5Pget_compression (hid_t plist,
+ unsigned int *flags, size_t *cd_size, void
+ *client_data)
+ H5Pget_deflate()
. The
+ compression method (or a negative value on error) is returned
+ by value, and compression flags and client data is returned by
+ argument. The application should allocate the
+ client_data and pass its size as the
+ cd_size. On return, cd_size will contain
+ the actual size of the client data. If client_data
+ is not large enough to hold the entire client data then
+ cd_size bytes are copied into client_data
+ and cd_size is set to the total size of the client
+ data, a value larger than the original.
+ H5Zregister()
and which isn't
+ supported as a predefined method (for instance, setting the
+ method to H5Z_DEFLATE
when the GNU zlib isn't
+ available). If that happens then data will be written to the
+ file in its uncompressed form and the compression statistics
+ will show failures for the compression.
+
+
+
+ Example: Statistics for an
+ Unsupported Compression Method
+
+
+
+
+
+
+H5Z: compression statistics accumulated over life of library:
+ Method Total Overrun Errors User System Elapsed Bandwidth
+ ------ ----- ------- ------ ---- ------ ------- ---------
+ deflate-c 160000 0 160000 0.00 0.01 0.01 1.884e+07
+ deflate-u 0 0 0 0.00 0.00 0.00 NaN
+
H5Z_DEFLATE
on a system that didn't have
+ the GNU zlib to write to a dataset and then read the
+ result. The read and write both succeeded but the
+ data was not compressed.
+ 5. Compression Diagnostics
+
+ configure --enable-debug=z
)
+ then statistics about data compression are printed when the
+ application exits normally or the library is closed. The
+ statistics are written to the standard error stream and include
+ two lines for each compression method that was used: the first
+ line shows compression statistics while the second shows
+ uncompression statistics. The following fields are displayed:
+
+
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Method
+ This is the name of the method as defined with
+
+ H5Zregister()
with the letters
+ "-c" or "-u" appended to indicate
+ compression or uncompression.
+
+
+ Total
+ The total number of bytes compressed or decompressed
+ including buffer overruns and errors. Bytes of
+ non-compressed data are counted.
+
+
+
+ Overrun
+ During compression, if the algorithm causes the result
+ to be at least as large as the input then a buffer
+ overrun error occurs. This field shows the total number
+ of bytes from the Total column which can be attributed to
+ overruns. Overruns for decompression can only happen if
+ the data has been corrupted in some way and will result
+ in failure of
+ H5Dread()
.
+
+
+ Errors
+ If an error occurs during compression the data is
+ stored in it's uncompressed form; and an error during
+ uncompression causes
+ H5Dread()
to return
+ failure. This field shows the number of bytes of the
+ Total column which can be attributed to errors.
+
+
+ User, System, Elapsed
+ These are the amount of user time, system time, and
+ elapsed time in seconds spent by the library to perform
+ compression. Elapsed time is sensitive to system
+ load. These times may be zero on operating systems that
+ don't support the required operations.
+
+
+ Bandwidth
+ This is the compression bandwidth which is the total
+ number of bytes divided by elapsed time. Since elapsed
+ time is subject to system load the bandwidth numbers
+ cannot always be trusted. Furthermore, the bandwidth
+ includes overrun and error bytes which may significanly
+ taint the value.
+
+
+ Example: Compression
+ Statistics
+
+
+
+
+
+H5Z: compression statistics accumulated over life of library:
+ Method Total Overrun Errors User System Elapsed Bandwidth
+ ------ ----- ------- ------ ---- ------ ------- ---------
+ deflate-c 160000 200 0 0.62 0.74 1.33 1.204e+05
+ deflate-u 120000 0 0 0.11 0.00 0.12 9.885e+05
+
+ Robb Matzke
+
+
+Last modified: Fri Apr 17 16:15:21 EDT 1998
+
+
+
diff --git a/doc/html/Datasets.html b/doc/html/Datasets.html
new file mode 100644
index 0000000..e0f9680
--- /dev/null
+++ b/doc/html/Datasets.html
@@ -0,0 +1,839 @@
+
+
+
+ The Dataset Interface (H5D)
+
+ 1. Introduction
+
+
+
+
+ H5T
API) is
+ used to manipulate both pieces of meta data but they're handled
+ by the dataset API (the H5D
API) in different
+ manners.
+
+
+
+ 2. Storage Layout Properties
+
+ H5Pcreate()
to get a copy of the default property
+ list) by modifying properties with various
+ H5Pset_property()
functions.
+
+
+
+
+ herr_t H5Pset_layout (hid_t plist_id,
+ H5D_layout_t layout)
+
+
+
+ H5D_COMPACT
+
+ H5D_CONTIGUOUS
+
+ H5D_CHUNKED
+ H5Pset_chunk()
.
+
+
+ H5D_CHUNKED
layout,
+ which needs to know the dimensionality and chunk size.
+
+
+
+
+ herr_t H5Pset_chunk (hid_t plist_id, int
+ ndims, hsize_t dim[])
+ H5D_CHUNKED
and the chunk size is set to
+ dim. The number of elements in the dim array
+ is the dimensionality, ndims. One need not call
+ H5Dset_layout()
when using this function since
+ the chunked layout is implied.
+
+
+ Example: Chunked Storage
+
+
+
+
+
+size_t hsize[2] = {1000, 1000};
+plist = H5Pcreate (H5P_DATASET_CREATE);
+H5Pset_chunk (plist, 2, size);
+
3. Compression Properties
+
+
+
+
+ herr_t H5Pset_compression (hid_t plist_id,
+ H5Z_method_t method)
+ H5Z_method_t H5Pget_compression (hid_t
+ plist_id)
+
+
+
+
+ H5Z_NONE
+
+ H5Z_DEFLATE
+ gzip
program.
+
+ herr_t H5Pset_deflate (hid_t plist_id,
+ int level)
+ int H5Pget_deflate (hid_t plist_id)
+ H5Pset_deflate()
sets the compression method to
+ H5Z_DEFLATE
and sets the compression level to
+ some integer between one and nine (inclusive). One results in
+ the fastest compression while nine results in the best
+ compression ratio. The default value is six if
+ H5Pset_deflate()
isn't called. The
+ H5Pget_deflate()
returns the compression level
+ for the deflate method, or negative if the method is not the
+ deflate method.
+ 4. External Storage Properties
+
+ H5D_CONTIGUOUS
storage
+ format allows external storage. A set segments (offsets and sizes) in
+ one or more files is defined as an external file list, or EFL,
+ and the contiguous logical addresses of the data storage are mapped onto
+ these segments.
+
+
+
+
+ herr_t H5Pset_external (hid_t plist, const
+ char *name, off_t offset, hsize_t
+ size)
+ H5F_UNLIMITED.
+
+
+ int H5Pget_external_count (hid_t plist)
+
+ herr_t H5Pget_external (hid_t plist, int
+ idx, size_t name_size, char *name, off_t
+ *offset, hsize_t *size)
+ H5Pset_external()
+ function. Given a dataset creation property list and a zero-based
+ index into that list, the file name, byte offset, and segment size are
+ returned through non-null arguments. At most name_size
+ characters are copied into the name argument which is not
+ null terminated if the file name is longer than the supplied name
+ buffer (this is similar to strncpy()
).
+
+
+ Example: Multiple Segments
+
+
+
+
+
+
+plist = H5Pcreate (H5P_DATASET_CREATE);
+H5Pset_external (plist, "velocity.data", 3000, 1000);
+H5Pset_external (plist, "velocity.data", 0, 2500);
+H5Pset_external (plist, "velocity.data", 4500, 1500);
+
+
+ Example: Multi-Dimensional
+
+
+
+
+
+
+plist = H5Pcreate (H5P_DATASET_CREATE);
+H5Pset_external (plist, "scan1.data", 0, 24);
+H5Pset_external (plist, "scan2.data", 0, 24);
+H5Pset_external (plist, "scan3.data", 0, 16);
+
5. Data Type
+
+ H5T
API.
+
+ 6. Data Space
+
+ H5S
API. The simple dataspace consists of
+ maximum dimension sizes and actual dimension sizes, which are
+ usually the same. However, maximum dimension sizes can be the
+ constant H5D_UNLIMITED
in which case the actual
+ dimension size can be incremented with calls to
+ H5Dextend()
. The maximium dimension sizes are
+ constant meta data while the actual dimension sizes are
+ persistent meta data. Initial actual dimension sizes are
+ supplied at the same time as the maximum dimension sizes when
+ the dataset is created.
+
+ 7. Setting Constant or Persistent Properties
+
+ H5Dcreate()
.
+
+
+
+
+
+
+ hid_t H5Dcreate (hid_t file_id, const char
+ *name, hid_t type_id, hid_t
+ space_id, hid_t create_plist_id)
+ H5Dcreate
with
+ a file identifier, a dataset name, a data type, a data space,
+ and constant properties. The data type and data space are the
+ type and space of the dataset as it will exist in the file,
+ which may be different than in application memory. The
+ create_plist_id is a H5P_DATASET_CREATE
+ property list created with H5Pcreate()
and
+ initialized with the various functions described above.
+ H5Dcreate()
returns a dataset handle for success
+ or negative for failure. The handle should eventually be
+ closed by calling H5Dclose()
to release resources
+ it uses.
+
+
+ hid_t H5Dopen (hid_t file_id, const char
+ *name)
+ H5Dclose()
to
+ release resources it uses.
+
+
+ herr_t H5Dclose (hid_t dataset_id)
+
+ herr_t H5Dextend (hid_t dataset_id,
+ hsize_t dim[])
+ 8. Querying Constant or Persistent Properties
+
+
+
+
+
+
+ hid_t H5Dget_type (hid_t dataset_id)
+ hid_t H5Dget_space (hid_t dataset_id)
+ H5Dextend()
.
+
+ hid_t H5Dget_create_plist (hid_t
+ dataset_id)
+ 9. Setting Memory and Transfer Properties
+
+ H5Dread()
and H5Dwrite()
functions
+ (these functions are described below).
+
+
+
+
+ herr_t H5Pset_buffer (hid_t xfer_plist,
+ size_t max_buf_size, void *tconv_buf, void
+ *bkg_buf)
+ size_t H5Pget_buffer (hid_t xfer_plist, void
+ **tconv_buf, void **bkg_buf)
+ H5Pget_buffer()
function returns the maximum
+ buffer size or zero on error.
+ H5Pset_buffer()
to set the size of the
+ temporary buffer so it's large enough to hold the entire
+ request.
+
+
+
+ Example
+
+
+
+ H5Dread()
or H5Dwrite()
.
+
+
+ 1 hid_t
+ 2 disable_strip_mining (hid_t xfer_plist, hid_t dataset,
+ 3 hid_t space, hid_t mem_type)
+ 4 {
+ 5 hid_t file_type; /* File data type */
+ 6 size_t type_size; /* Sizeof larger type */
+ 7 size_t size; /* Temp buffer size */
+ 8 hid_t xfer_plist; /* Return value */
+ 9
+10 file_type = H5Dget_type (dataset);
+11 type_size = MAX(H5Tget_size(file_type), H5Tget_size(mem_type));
+12 H5Tclose (file_type);
+13 size = H5Sget_npoints(space) * type_size;
+14 if (xfer_plist<0) xfer_plist = H5Pcreate (H5P_DATASET_XFER);
+15 H5Pset_buffer(xfer_plist, size, NULL, NULL);
+16 return xfer_plist;
+17 }
+
10. Querying Memory or Transfer Properties
+
+ H5Pget_property()
to query transfer
+ properties from a tempalate).
+
+
+ 11. Raw Data I/O
+
+
+
+
+
+ herr_t H5Dread (hid_t dataset_id, hid_t
+ mem_type_id, hid_t mem_space_id, hid_t
+ file_space_id, hid_t xfer_plist_id,
+ void *buf/*out*/)
+
+ herr_t H5Dwrite (hid_t dataset_id, hid_t
+ mem_type_id, hid_t mem_space_id, hid_t
+ file_space_id, hid_t xfer_plist_id,
+ const void *buf)
+ H5Dget_type()
; the library will not implicitly
+ derive memory data types from constant data types.
+
+ H5S_ALL
as the argument for the file data space.
+ If H5S_ALL
is also supplied as the memory data
+ space then no data space conversion is performed. This is a
+ somewhat dangerous situation since the file data space might be
+ different than what the application expects.
+
+
+
+ 12. Examples
+
+ double
+ values but is stored in the file in Cray float
+ format using LZ77 compression. The dataset is written to the
+ HDF5 file and then read back as a two-dimensional array of
+ float
values.
+
+
+
+ Example 1
+
+
+
+
+
+ 1 hid_t file, data_space, dataset, properties;
+ 2 double dd[500][600];
+ 3 float ff[500][600];
+ 4 hsize_t dims[2], chunk_size[2];
+ 5
+ 6 /* Describe the size of the array */
+ 7 dims[0] = 500;
+ 8 dims[1] = 600;
+ 9 data_space = H5Screate_simple (2, dims);
+10
+11
+12 /*
+13 * Create a new file using with read/write access,
+14 * default file creation properties, and default file
+15 * access properties.
+16 */
+17 file = H5Fcreate ("test.h5", H5F_ACC_RDWR, H5P_DEFAULT,
+18 H5P_DEFAULT);
+19
+20 /*
+21 * Set the dataset creation plist to specify that
+22 * the raw data is to be partitioned into 100x100 element
+23 * chunks and that each chunk is to be compressed with
+24 * LZ77.
+25 */
+26 chunk_size[0] = chunk_size[1] = 100;
+27 properties = H5Pcreate (H5P_DATASET_CREATE);
+28 H5Pset_chunk (properties, 2, chunk_size);
+29 H5Pset_compression (properties, H5D_COMPRESS_LZ77);
+30
+31 /*
+32 * Create a new dataset within the file. The data type
+33 * and data space describe the data on disk, which may
+34 * be different than the format used in the application's
+35 * memory.
+36 */
+37 dataset = H5Dcreate (file, "dataset", H5T_CRAY_FLOAT,
+38 data_space, properties);
+39
+40 /*
+41 * Write the array to the file. The data type and data
+42 * space describe the format of the data in the `dd'
+43 * buffer. The raw data is translated to the format
+44 * required on disk defined above. We use default raw
+45 * data transfer properties.
+46 */
+47 H5Dwrite (dataset, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL,
+48 H5P_DEFAULT, dd);
+49
+50 /*
+51 * Read the array as floats. This is similar to writing
+52 * data except the data flows in the opposite direction.
+53 */
+54 H5Dread (dataset, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL,
+55 H5P_DEFAULT, ff);
+56
+64 H5Dclose (dataset);
+65 H5Sclose (data_space);
+66 H5Pclose (properties);
+67 H5Fclose (file);
+
+
+ Example 2
+
+
+
+
+
+ 1 hid_t file, mem_space, file_space, dataset;
+ 2 double dd[200][400];
+ 3 hssize_t offset[2];
+ 4 hsize size[2];
+ 5
+ 6 /*
+ 7 * Open an existing file and its dataset.
+ 8 */
+ 9 file = H5Fopen ("test.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
+10 dataset = H5Dopen (file, "dataset");
+11
+12 /*
+13 * Describe the file data space.
+14 */
+15 offset[0] = 200; /*offset of hyperslab in file*/
+16 offset[1] = 200;
+17 size[0] = 100; /*size of hyperslab*/
+18 size[1] = 200;
+19 file_space = H5Dget_space (dataset);
+20 H5Sset_hyperslab (file_space, 2, offset, size);
+21
+22 /*
+23 * Describe the memory data space.
+24 */
+25 size[0] = 200; /*size of memory array*/
+26 size[1] = 400;
+27 mem_space = H5Screate_simple (2, size);
+28
+29 offset[0] = 0; /*offset of hyperslab in memory*/
+30 offset[1] = 0;
+31 size[0] = 100; /*size of hyperslab*/
+32 size[1] = 200;
+33 H5Sset_hyperslab (mem_space, 2, offset, size);
+34
+35 /*
+36 * Read the dataset.
+37 */
+38 H5Dread (dataset, H5T_NATIVE_DOUBLE, mem_space,
+39 file_space, H5P_DEFAULT, dd);
+40
+41 /*
+42 * Close/release resources.
+43 */
+44 H5Dclose (dataset);
+45 H5Sclose (mem_space);
+46 H5Sclose (file_space);
+47 H5Fclose (file);
+
+
+ Example 3
+
+
+
+
+
+ 1 hid_t file, dataset, type;
+ 2 double delta[200];
+ 3
+ 4 /*
+ 5 * Open an existing file and its dataset.
+ 6 */
+ 7 file = H5Fopen ("test.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
+ 8 dataset = H5Dopen (file, "dataset");
+ 9
+10 /*
+11 * Describe the memory data type, a struct with a single
+12 * "delta" member.
+13 */
+14 type = H5Tcreate (H5T_COMPOUND, sizeof(double));
+15 H5Tinsert (type, "delta", 0, H5T_NATIVE_DOUBLE);
+16
+17 /*
+18 * Read the dataset.
+19 */
+20 H5Dread (dataset, type, H5S_ALL, H5S_ALL,
+21 H5P_DEFAULT, dd);
+22
+23 /*
+24 * Close/release resources.
+25 */
+26 H5Dclose (dataset);
+27 H5Tclose (type);
+28 H5Fclose (file);
+
+ Robb Matzke
+
+
+Last modified: Wed May 13 18:57:47 EDT 1998
+
+
+
diff --git a/doc/html/Dataspaces.html b/doc/html/Dataspaces.html
new file mode 100644
index 0000000..d2579b6
--- /dev/null
+++ b/doc/html/Dataspaces.html
@@ -0,0 +1,568 @@
+
+
+
+
+The Dataspace Interface (H5S)
+
+
+1. Introduction
+The dataspace interface (H5S) provides a mechanism to describe the positions
+of the elements of a dataset and is designed in such a way as to allow
+new features to be easily added without disrupting applications that use
+the dataspace interface. A dataset (defined with the dataset interface) is
+composed of a collection of raw data points of homogeneous type, defined in the
+datatype (H5T) interface, organized according to the dataspace with this
+interface.
+
+
+
+
+ 0 1 2 3 4 5 6 7 8 9
+
+ 0
+ - - - - - - - - - -
+
+ 1
+ - X X X - - - - - -
+
+ 2
+ - X X X - - - - - -
+
+ 3
+ - X X X - - - - - -
+
+ 4
+ - X X X - - - - - -
+
+ 5
+ - X X X - - - - - -
+
+ 6
+ - - - - - - - - - -
+
+ 7
+ - - - - - - - - - -
+
+ 8
+ - - - - - - - - - -
+
+ 9
+ - - - - - - - - - -
+
Example 1: Contiguous rectangular selection
+
Or, a more complex selection may be defined:
+
+
+
+ 0 1 2 3 4 5 6 7 8 9
+
+ 0
+ - - - - - - - - - -
+
+ 1
+ - X X X - - X - - -
+
+ 2
+ - X - X - - - - - -
+
+ 3
+ - X - X - - X - - -
+
+ 4
+ - X - X - - - - - -
+
+ 5
+ - X X X - - X - - -
+
+ 6
+ - - - - - - - - - -
+
+ 7
+ - - X X X X - - - -
+
+ 8
+ - - - - - - - - - -
+
+ 9
+ - - - - - - - - - -
+
Example 2: Non-contiguous selection
+
+
+
+ 0 1 2 3 4 5 6 7 8 9
+
+ 0
+ - - - - - - - - - -
+
+ 1
+ - - - - - - - - - -
+
+ 2
+ - - X X X - - X - -
+
+ 3
+ - - X - X - - - - -
+
+ 4
+ - - X - X - - X - -
+
+ 5
+ - - X - X - - - - -
+
+ 6
+ - - X X X - - X - -
+
+ 7
+ - - - - - - - - - -
+
+ 8
+ - - - X X X X - - -
+
+ 9
+ - - - - - - - - - -
+
Example 3: Non-contiguous selection with 1,1 offset
+ 2. General Dataspace Operations
+The functions defined in this section operate on dataspaces as a whole.
+New dataspaces can be created from scratch or copied from existing data
+spaces. When a dataspace is no longer needed its resources should be released
+by calling H5Sclose().
+
+
+
+
+
+ 3. Dataspace Extent Operations
+These functions operate on the extent portion of a dataspace.
+
+
+
+
+ 4. Dataspace Selection Operations
+Selections are maintained separately from extents in dataspaces and operations
+on the selection of a dataspace do not affect the extent of the dataspace.
+Selections are independent of extent type and the boundaries of selections are
+reconciled with the extent at the time of the data transfer. Selection offsets
+apply a selection to a location within an extent, allowing the same selection
+to be moved within the extent without requiring a new selection to be specified.
+Offsets default to 0 when the dataspace is created. Offsets are applied when
+an I/O transfer is performed (and checked during calls to H5Sselect_valid).
+Selections have an iteration order for the points selected, which can be any
+permutation of the dimensions involved (defaulting to 'C' array order) or a
+specific order for the selected points, for selections composed of single array
+elements with H5Sselect_elements. Selections can also be copied or combined
+together in various ways with H5Sselect_op. Further methods of selecting
+portions of a dataspace may be added in the future.
+
+
+
+
+
+
+
+ 5. Misc. Dataspace Operations
+
+
+
+
+
+
+
+Robb Matzke
+
+
+Quincey Koziol
+
+
Last
+modified: Thu May 28 15:12:04 EST 1998
+
+
diff --git a/doc/html/Datatypes.html b/doc/html/Datatypes.html
new file mode 100644
index 0000000..75bc57e
--- /dev/null
+++ b/doc/html/Datatypes.html
@@ -0,0 +1,1370 @@
+
+
+
+ The Data Type Interface (H5T)
+
+ 1. Introduction
+
+ 2. General Data Type Operations
+
+ H5Tclose()
.
+
+ H5T_NATIVE_INT
are immutable
+ transient types).
+
+
+
+
+ hid_t H5Tcreate (H5T_class_t class, size_t
+ size)
+ H5T_COMPOUND
to create a new empty compound data
+ type where size is the total size in bytes of an
+ instance of this data type. Other data types are created with
+ H5Tcopy()
. All functions that return data type
+ identifiers return a negative value for failure.
+
+
+ hid_t H5Topen (hid_t location, const char
+ *name)
+ H5Tclose()
to
+ release resources. The named data type returned by this
+ function is read-only or a negative value is returned for
+ failure. The location is either a file or group
+ handle.
+
+
+ herr_t H5Tcommit (hid_t location, const char
+ *name, hid_t type)
+
+ hbool_t H5Tcommitted (hid_t type)
+ H5Dget_type()
are able to share
+ the data type with other datasets in the same file.
+
+
+ hid_t H5Tcopy (hid_t type)
+
+ herr_t H5Tclose (hid_t type)
+
+ hbool_t H5Tequal (hid_t type1, hid_t
+ type2)
+ TRUE
, otherwise it returns FALSE
(an
+ error results in a negative return value).
+
+
+ herr_t H5Tlock (hid_t type)
+ H5close()
or by normal program termination).
+ 3. Properties of Atomic Types
+
+
+
+
+ H5T_class_t H5Tget_class (hid_t type)
+ H5T_INTEGER, H5T_FLOAT, H5T_TIME, H5T_STRING,
+ H5T_BITFIELD
, or H5T_OPAQUE
. This
+ property is read-only and is set when the datatype is
+ created or copied (see H5Tcreate()
,
+ H5Tcopy()
). If this function fails it returns
+ H5T_NO_CLASS
which has a negative value (all
+ other class constants are non-negative).
+
+
+ size_t H5Tget_size (hid_t type)
+ herr_t H5Tset_size (hid_t type, size_t
+ size)
+ offset
property is
+ decremented a bit at a time. If the offset reaches zero and
+ the significant part of the data still extends beyond the edge
+ of the data type then the precision
property is
+ decremented a bit at a time. Decreasing the size of a data
+ type may fail if the precesion must be decremented and the
+ data type is of the H5T_OPAQUE
class or the
+ H5T_FLOAT
bit fields would extend beyond the
+ significant part of the type. Increasing the size of an
+ H5T_STRING
automatically increases the precision
+ as well. On error, H5Tget_size()
returns zero
+ which is never a valid size.
+
+
+ H5T_order_t H5Tget_order (hid_t type)
+ herr_t H5Tset_order (hid_t type, H5T_order_t
+ order)
+ H5T_ORDER_LE
. If the bytes are in the oposite
+ order then they are said to be big-endian or
+ H5T_ORDER_BE
. Some data types have the same byte
+ order on all machines and are H5T_ORDER_NONE
+ (like character strings). If H5Tget_order()
+ fails then it returns H5T_ORDER_ERROR
which is a
+ negative value (all successful return values are
+ non-negative).
+
+
+ size_t H5Tget_precision (hid_t type)
+ herr_t H5Tset_precision (hid_t type, size_t
+ precision)
+ short
on a Cray
+ is 32 significant bits in an eight-byte field. The
+ precision
property identifies the number of
+ significant bits of a datatype and the offset
+ property (defined below) identifies its location. The
+ size
property defined above represents the entire
+ size (in bytes) of the data type. If the precision is
+ decreased then padding bits are inserted on the MSB side of
+ the significant bits (this will fail for
+ H5T_FLOAT
types if it results in the sign,
+ mantissa, or exponent bit field extending beyond the edge of
+ the significant bit field). On the other hand, if the
+ precision is increased so that it "hangs over" the edge of the
+ total size then the offset
property is
+ decremented a bit at a time. If the offset
+ reaches zero and the significant bits still hang over the
+ edge, then the total size is increased a byte at a time. The
+ precision of an H5T_STRING
is read-only and is
+ always eight times the value returned by
+ H5Tget_size()
. H5Tget_precision()
+ returns zero on failure since zero is never a valid precision.
+
+
+ size_t H5Tget_offset (hid_t type)
+ herr_t H5Tset_offset (hid_t type, size_t
+ offset)
+ precision
property defines the number
+ of significant bits, the offset
property defines
+ the location of those bits within the entire datum. The bits
+ of the entire data are numbered beginning at zero at the least
+ significant bit of the least significant byte (the byte at the
+ lowest memory address for a little-endian type or the byte at
+ the highest address for a big-endian type). The
+ offset
property defines the bit location of the
+ least signficant bit of a bit field whose length is
+ precision
. If the offset is increased so the
+ significant bits "hang over" the edge of the datum, then the
+ size
property is automatically incremented. The
+ offset is a read-only property of an H5T_STRING
+ and is always zero. H5Tget_offset()
returns zero
+ on failure which is also a valid offset, but is guaranteed to
+ succeed if a call to H5Tget_precision()
succeeds
+ with the same arguments.
+
+
+ herr_t H5Tget_pad (hid_t type, H5T_pad_t
+ *lsb, H5T_pad_t *msb)
+ herr_t H5Tset_pad (hid_t type, H5T_pad_t
+ lsb, H5T_pad_t msb)
+ precision
and offset
properties
+ are called padding. Padding falls into two
+ categories: padding in the low-numbered bits is lsb
+ padding and padding in the high-numbered bits is msb
+ padding (bits are numbered according to the description for
+ the offset
property). Padding bits can always be
+ set to zero (H5T_PAD_ZERO
) or always set to one
+ (H5T_PAD_ONE
). The current pad types are returned
+ through arguments of H5Tget_pad()
either of which
+ may be null pointers.
+ 3.1. Properties of Integer Atomic Types
+
+ class=H5T_INTEGER
)
+ describe integer number formats. Such types include the
+ following information which describes the type completely and
+ allows conversion between various integer atomic types.
+
+
+
+
+ H5T_sign_t H5Tget_sign (hid_t type)
+ herr_t H5Tset_sign (hid_t type, H5T_sign_t
+ sign)
+ H5T_SGN_2
) or unsigned
+ (H5T_SGN_NONE
). Whether data is signed or not
+ becomes important when converting between two integer data
+ types of differing sizes as it determines how values are
+ truncated and sign extended.
+ 3.2. Properties of Floating-point Atomic Types
+
+ class=H5T_FLOAT
) as long as the bits of the
+ exponent are contiguous and stored as a biased positive number,
+ the bits of the mantissa are contiguous and stored as a positive
+ magnitude, and a sign bit exists which is set for negative
+ values. Properties specific to floating-point types are:
+
+
+
+
+ herr_t H5Tget_fields (hid_t type, size_t
+ *spos, size_t *epos, size_t
+ *esize, size_t *mpos, size_t
+ *msize)
+ herr_t H5Tset_fields (hid_t type, size_t
+ spos, size_t epos, size_t esize,
+ size_t mpos, size_t msize)
+ precision
and offset
+ properties). The sign bit is always of length one and none of
+ the fields are allowed to overlap. When expanding a
+ floating-point type one should set the precision first; when
+ decreasing the size one should set the field positions and
+ sizes first.
+
+
+ size_t H5Tget_ebias (hid_t type)
+ herr_t H5Tset_ebias (hid_t type, size_t
+ ebias)
+ ebias
larger than the true exponent.
+ H5Tget_ebias()
returns zero on failure which is
+ also a valid exponent bias, but the function is guaranteed to
+ succeed if H5Tget_precision()
succeeds when
+ called with the same arguments.
+
+
+ H5T_norm_t H5Tget_norm (hid_t type)
+ herr_t H5Tset_norm (hid_t type, H5T_norm_t
+ norm)
+
+
+
+ H5T_NORM_MSBSET
then the
+ mantissa is shifted left (if non-zero) until the first bit
+ after the radix point is set and the exponent is adjusted
+ accordingly. All bits of the mantissa after the radix
+ point are stored.
+
+ H5T_NORM_IMPLIED
then the
+ mantissa is shifted left (if non-zero) until the first bit
+ after the radix point is set and the exponent is adjusted
+ accordingly. The first bit after the radix point is not stored
+ since it's always set.
+
+ H5T_NORM_NONE
then the fractional
+ part of the mantissa is stored without normalizing it.
+
+ H5T_pad_t H5Tget_inpad (hid_t type)
+ herr_t H5Tset_inpad (hid_t type, H5T_pad_t
+ inpad)
+ H5T_PAD_ZERO
if the internal
+ padding should always be set to zero, or H5T_PAD_ONE
+ if it should always be set to one.
+ H5Tget_inpad()
returns H5T_PAD_ERROR
+ on failure which is a negative value (successful return is
+ always non-negative).
+ 3.3. Properties of Date and Time Atomic Types
+
+ class=H5T_TIME
) are stored as
+ character strings in one of the ISO-8601 formats like
+ "1997-12-05 16:25:30"; as character strings using the
+ Unix asctime(3) format like "Thu Dec 05 16:25:30 1997";
+ as an integer value by juxtaposition of the year, month, and
+ day-of-month, hour, minute and second in decimal like
+ 19971205162530; as an integer value in Unix time(2)
+ format; or other variations.
+
+ 3.4. Properties of Character String Atomic Types
+
+ offset
property of a string is
+ always zero and the precision
property is eight
+ times as large as the value returned by
+ H5Tget_size()
(since precision is measured in bits
+ while size is measured in bytes). Both properties are
+ read-only.
+
+
+
+
+ H5T_cset_t H5Tget_cset (hid_t type)
+ herr_t H5Tset_cset (hid_t type, H5T_cset_t
+ cset)
+ H5T_CSET_ASCII
.
+
+
+ H5T_str_t H5Tget_strpad (hid_t type)
+ herr_t H5Tset_strpad (hid_t type, H5T_str_t
+ strpad)
+ H5T_STR_NULL
for C-style strings or
+ H5T_STR_SPACE
for Fortran-style
+ strings. H5Tget_strpad()
returns
+ H5T_STR_ERROR
on failure, a negative value (all
+ successful return values are non-negative).
+ 3.5. Properties of Bit Field Atomic Types
+
+ class=H5T_BITFIELD
) from
+ one type to another simply copies the significant bits. If the
+ destination is smaller than the source then bits are truncated.
+ Otherwise new bits are filled according to the msb
+ padding type.
+
+ 3.6. Properties of Opaque Atomic Types
+
+ class=H5T_OPAQUE
) act like
+ bit fields except conversions which change the precision are not
+ allowed. However, padding can be added or removed from either
+ end and the bytes can be reordered. Opaque types can be used to
+ create novel data types not directly supported by the library,
+ but the application is responsible for data conversion of these
+ types.
+
+ 4. Properties of Compound Types
+
+ struct
in C
+ or a common block in Fortran: it is a collection of one or more
+ atomic types or small arrays of such types. Each
+ member of a compound type has a name which is unique
+ within that type, and a byte offset that determines the first
+ byte (smallest byte address) of that member in a compound datum.
+ A compound data type has the following properties:
+
+
+
+
+ H5T_class_t H5Tget_class (hid_t type)
+ H5T_COMPOUND
. This property is read-only and is
+ defined when a data type is created or copied (see
+ H5Tcreate()
or H5Tcopy()
).
+
+
+ size_t H5Tget_size (hid_t type)
+
+ int H5Tget_nmembers (hid_t type)
+ H5Tget_nmembers()
returns -1 on failure.
+
+
+ char *H5Tget_member_name (hid_t type, int
+ membno)
+ malloc()
or the null pointer on failure. The
+ caller is responsible for freeing the memory returned by this
+ function.
+
+
+ size_t H5Tget_member_offset (hid_t type, int
+ membno)
+ H5Tget_member_dims()
+ succeeds when called with the same type and
+ membno arguments.
+
+
+ int H5Tget_member_dims (hid_t type, int
+ membno, int dims[4], int
+ perm[4])
+
+ hid_t H5Tget_member_type (hid_t type, int
+ membno)
+ H5Tclose()
on that type.
+ H5Tinsert()
) and cannot be subsequently modified.
+ This makes it imposible to define recursive data structures.
+
+ 5. Predefined Atomic Data Types
+
+ H5T_arch_base
where
+ arch is an architecture name and base is a
+ programming type name. New types can be derived from the
+ predifined types by copying the predefined type (see
+ H5Tcopy()
) and then modifying the result.
+
+
+
+
+
+
+ Architecture Name
+ Description
+
+
+
+
+ IEEE
This architecture defines standard floating point
+ types in various byte orders.
+
+
+
+
+ STD
This is an architecture that contains semi-standard
+ datatypes like signed two's complement integers,
+ unsigned integers, and bitfields in various byte
+ orders.
+
+
+
+
+ UNIX
Types which are specific to Unix operating systems are
+ defined in this architecture. The only type currently
+ defined is the Unix date and time types
+ (
+ time_t
).
+
+
+
+ C
FORTRANTypes which are specific to the C or Fortran
+ programming languages are defined in these
+ architectures. For instance,
+ H5T_C_STRING
+ defines a base string type with null termination which
+ can be used to derive string types of other
+ lengths.
+
+
+
+ NATIVE
This architecture contains C-like data types for the
+ machine on which the library was compiled. The types
+ were actually defined by running the
+
+ H5detect
program when the library was
+ compiled. In order to be portable, applications should
+ almost always use this architecture to describe things
+ in memory.
+
+
+
+ CRAY
Cray architectures. These are word-addressable,
+ big-endian systems with non-IEEE floating point.
+
+
+
+
+ INTEL
All Intel and compatible CPU's including 80286, 80386,
+ 80486, Pentium, Pentium-Pro, and Pentium-II. These are
+ little-endian systems with IEEE floating-point.
+
+
+
+
+ MIPS
All MIPS CPU's commonly used in SGI systems. These
+ are big-endian systems with IEEE floating-point.
+
+
+
+ ALPHA
All DEC Alpha CPU's, little-endian systems with IEEE
+ floating-point.
+
+
+
+
+ B
+ Bitfield
+
+
+ D
+ Date and time
+
+
+ F
+ Floating point
+
+
+ I
+ Signed integer
+
+
+ S
+ Character string
+
+
+ U
+ Unsigned integer
+
+
+
+
+ BE
+ Big endian
+
+
+ LE
+ Little endian
+
+
+ VX
+ Vax order
+
+
+
+
+
+
+
Example
+
Description
+
+
+ H5T_IEEE_F64LE
Eight-byte, little-endian, IEEE floating-point
+
+
+
+ H5T_IEEE_F32BE
Four-byte, big-endian, IEEE floating point
+
+
+
+ H5T_STD_I32LE
Four-byte, little-endian, signed two's complement integer
+
+
+
+ H5T_STD_U16BE
Two-byte, big-endian, unsigned integer
+
+
+
+ H5T_UNIX_D32LE
Four-byte, little-endian, time_t
+
+
+
+ H5T_C_S1
One-byte, null-terminated string of eight-bit characters
+
+
+
+ H5T_INTEL_B64
Eight-byte bit field on an Intel CPU
+
+
+
+ H5T_CRAY_F64
Eight-byte Cray floating point
+ NATIVE
architecture has base names which don't
+ follow the same rules as the others. Instead, native type names
+ are similar to the C type names. Here are some examples:
+
+
+
+
+
+
+
Example
+
Corresponding C Type
+
+
+ H5T_NATIVE_CHAR
+ signed char
+
+
+ H5T_NATIVE_UCHAR
+ unsigned char
+
+
+ H5T_NATIVE_SHORT
+ short
+
+
+ H5T_NATIVE_USHORT
+ unsigned short
+
+
+ H5T_NATIVE_INT
+ int
+
+
+ H5T_NATIVE_UINT
+ unsigned
+
+
+ H5T_NATIVE_LONG
+ long
+
+
+ H5T_NATIVE_ULONG
+ unsigned long
+
+
+ H5T_NATIVE_LLONG
+ long long
+
+
+ H5T_NATIVE_ULLONG
+ unsigned long long
+
+
+ H5T_NATIVE_FLOAT
+ float
+
+
+ H5T_NATIVE_DOUBLE
+ double
+
+
+ H5T_NATIVE_LDOUBLE
+ long double
+
+ Example: A 128-bit
+ integer
+
+
+
+
+
+hid_t new_type = H5Tcopy (H5T_NATIVE_INT);
+H5Tset_precision (new_type, 128);
+H5Tset_order (new_type, H5T_ORDER_LE);
+
+
+ Example: An 80-character
+ string
+
+
+
+
+
+hid_t str80 = H5Tcopy (H5T_C_S1);
+H5Tset_size (str80, 80);
+
6. Defining Compound Data Types
+
+
+
+
+ HOFFSET(s,m)
+ offsetof(s,m)
+ stddef.h
does
+ exactly the same thing as the HOFFSET()
macro.
+
+
+ Example: A simple struct
+
+
+
+ complex_t
struct.
+
+
+
+typedef struct {
+ double re; /*real part*/
+ double im; /*imaginary part*/
+} complex_t;
+
+hid_t complex_id = H5Tcreate (H5T_COMPOUND, sizeof tmp);
+H5Tinsert (complex_id, "real", HOFFSET(complex_t,re),
+ H5T_NATIVE_DOUBLE);
+H5Tinsert (complex_id, "imaginary", HOFFSET(complex_t,im),
+ H5T_NATIVE_DOUBLE);
+
HOFFSET
+ macro. However, data stored on disk does not require alignment,
+ so unaligned versions of compound data structures can be created
+ to improve space efficiency on disk. These unaligned compound
+ data types can be created by computing offsets by hand to
+ eliminate inter-member padding, or the members can be packed by
+ calling H5Tpack()
(which modifies a data type
+ directly, so it is usually preceded by a call to
+ H5Tcopy()
):
+
+
+
+ Example: A packed struct
+
+
+
+
+
+hid_t complex_disk_id = H5Tcopy (complex_id);
+H5Tpack (complex_disk_id);
+
+
+ Example: A flattened struct
+
+
+
+
+
+typedef struct {
+ complex_t x;
+ complex_t y;
+} surf_t;
+
+hid_t surf_id = H5Tcreate (H5T_COMPOUND, sizeof tmp);
+H5Tinsert (surf_id, "x-re", HOFFSET(surf_t,x.re),
+ H5T_NATIVE_DOUBLE);
+H5Tinsert (surf_id, "x-im", HOFFSET(surf_t,x.im),
+ H5T_NATIVE_DOUBLE);
+H5Tinsert (surf_id, "y-re", HOFFSET(surf_t,y.re),
+ H5T_NATIVE_DOUBLE);
+H5Tinsert (surf_id, "y-im", HOFFSET(surf_t,y.im),
+ H5T_NATIVE_DOUBLE);
+
+
+ Example: A nested struct
+
+
+
+ complex_t
is used
+ often it becomes inconvenient to list its members over
+ and over again. So the alternative approach to
+ flattening is to define a compound data type and then
+ use it as the type of the compound members, as is done
+ here (the typedefs are defined in the previous
+ examples).
+
+
+
+hid_t complex_id, surf_id; /*hdf5 data types*/
+
+complex_id = H5Tcreate (H5T_COMPOUND, sizeof c);
+H5Tinsert (complex_id, "re", HOFFSET(complex_t,re),
+ H5T_NATIVE_DOUBLE);
+H5Tinsert (complex_id, "im", HOFFSET(complex_t,im),
+ H5T_NATIVE_DOUBLE);
+
+surf_id = H5Tcreate (H5T_COMPOUND, sizeof s);
+H5Tinsert (surf_id, "x", HOFFSET(surf_t,x), complex_id);
+H5Tinsert (surf_id, "y", HOFFSET(surf_t,y), complex_id);
+
7. Sharing Data Types among Datasets
+
+
+
+ Example: Shared Types
+
+
+
+
+
+
+hid_t t1 = ...some transient type...;
+H5Tcommit (file, "shared_type", t1);
+hid_t dset1 = H5Dcreate (file, "dset1", t1, space, H5P_DEFAULT);
+hid_t dset2 = H5Dcreate (file, "dset2", t1, space, H5P_DEFAULT);
+
+
+hid_t dset1 = H5Dopen (file, "dset1");
+hid_t t2 = H5Dget_type (dset1);
+hid_t dset3 = H5Dcreate (file, "dset3", t2, space, H5P_DEFAULT);
+hid_t dset4 = H5Dcreate (file, "dset4", t2, space, H5P_DEFAULT);
+
8. Data Conversion
+
+ H5T_conv_t
+ which is defined as:
+
+
+
+
+typedef herr_t (*H5T_conv_t)(hid_t src_type,
+ hid_t dest_type,
+ H5T_cdata_t *cdata,
+ size_t nelmts,
+ void *buffer,
+ void *background);
+
command
field of the cdata argument
+ determines what happens within the conversion function. It's
+ values can be:
+
+
+
+
+
+ H5T_CONV_INIT
+ priv
field of cdata (or private data can
+ be initialized later). It should also initialize the
+ need_bkg
field described below. The buf
+ and background pointers will be null pointers.
+
+
+ H5T_CONV_CONV
+ priv
field of
+ cdata if it wasn't initialize during the
+ H5T_CONV_INIT
command and then convert
+ nelmts instances of the src_type to the
+ dst_type. The buffer serves as both input
+ and output. The background buffer is supplied
+ according to the value of the need_bkg
field of
+ cdata (the values are described below).
+
+
+ H5T_CONV_FREE
+ cdata->priv
pointer) should be freed and
+ set to null. All other pointer arguments are null and the
+ nelmts argument is zero.
+
+
+ cdata->need_bkg
+ which the conversion function should have initialized during the
+ H5T_CONV_INIT command. It can have one of these values:
+
+
+
+
+ H5T_BKG_NONE
+
+ H5T_BKG_TEMP
+
+ H5T_BKG_YES
+ H5T_CONV_FREE
+ command which is issued whenever the function is removed from a
+ conversion path.
+
+
+
+
+
+
+ hbool_t recalc
+
+ unsigned long ncalls
+ H5T_CONV_CONV
. It is updated automatically by
+ the library.
+
+
+ unsigned long nelmts
+
+
+
+ herr_t H5Tregister_hard (const char *name,
+ hid_t src_type, hid_t dest_type,
+ H5T_conv_t func)
+ H5Tregister_hard()
, displacing any previous hard
+ conversion for those paths. The name is used only
+ for debugging but must be supplied.
+
+
+ herr_t H5Tregister_soft (const char *name,
+ H5T_class_t src_class, H5T_class_t dest_class,
+ H5T_conv_t func)
+
+ herr_t H5Tunregister (H5T_conv_t func)
+ H5Tunregister()
. The
+ function is removed from all conversion paths.
+
+
+ Example: A conversion
+ function
+
+
+
+ unsigned short
to any other
+ 16-bit unsigned big-endian integer. A cray
+ short
is a big-endian value which has 32
+ bits of precision in the high-order bits of a 64-bit
+ word.
+
+
+
+
+ 1 typedef struct {
+ 2 size_t dst_size;
+ 3 int direction;
+ 4 } cray_ushort2be_t;
+ 5
+ 6 herr_t
+ 7 cray_ushort2be (hid_t src, hid_t dst,
+ 8 H5T_cdata_t *cdata,
+ 9 size_t nelmts, void *buf,
+10 const void *background)
+11 {
+12 unsigned char *src = (unsigned char *)buf;
+13 unsigned char *dst = src;
+14 cray_ushort2be_t *priv = NULL;
+15
+16 switch (cdata->command) {
+17 case H5T_CONV_INIT:
+18 /*
+19 * We are being queried to see if we handle this
+20 * conversion. We can handle conversion from
+21 * Cray unsigned short to any other big-endian
+22 * unsigned integer that doesn't have padding.
+23 */
+24 if (!H5Tequal (src, H5T_CRAY_USHORT) ||
+25 H5T_ORDER_BE != H5Tget_order (dst) ||
+26 H5T_SGN_NONE != H5Tget_signed (dst) ||
+27 8*H5Tget_size (dst) != H5Tget_precision (dst)) {
+28 return -1;
+29 }
+30
+31 /*
+32 * Initialize private data. If the destination size
+33 * is larger than the source size, then we must
+34 * process the elements from right to left.
+35 */
+36 cdata->priv = priv = malloc (sizeof(cray_ushort2be_t));
+37 priv->dst_size = H5Tget_size (dst);
+38 if (priv->dst_size>8) {
+39 priv->direction = -1;
+40 } else {
+41 priv->direction = 1;
+42 }
+43 break;
+44
+45 case H5T_CONV_FREE:
+46 /*
+47 * Free private data.
+48 */
+49 free (cdata->priv);
+50 cdata->priv = NULL;
+51 break;
+52
+53 case H5T_CONV_CONV:
+54 /*
+55 * Convert each element, watch out for overlap src
+56 * with dst on the left-most element of the buffer.
+57 */
+58 priv = (cray_ushort2be_t *)(cdata->priv);
+59 if (priv->direction<0) {
+60 src += (nelmts - 1) * 8;
+61 dst += (nelmts - 1) * dst_size;
+62 }
+63 for (i=0; i<n; i++) {
+64 if (src==dst && dst_size<4) {
+65 for (j=0; j<dst_size; j++) {
+66 dst[j] = src[j+4-dst_size];
+67 }
+68 } else {
+69 for (j=0; j<4 && j<dst_size; j++) {
+70 dst[dst_size-(j+1)] = src[3-j];
+71 }
+72 for (j=4; j<dst_size; j++) {
+73 dst[dst_size-(j+1)] = 0;
+74 }
+75 }
+76 src += 8 * direction;
+77 dst += dst_size * direction;
+78 }
+79 break;
+80
+81 default:
+82 /*
+83 * Unknown command.
+84 */
+85 return -1;
+86 }
+87 return 0;
+88 }
+
+
+ Example: Soft
+ Registration
+
+
+
+
+
+
+H5Tregister_soft ("cus2be", H5T_INTEGER, H5T_INTEGER, cray_ushort2be);
+
H5Tregister_hard()
) for each an every possible
+ conversion path whether that conversion path was actually used
+ or not.
+
+
+ Robb Matzke
+ Quincey Koziol
+
+
+Last modified: Thu Jun 18 13:59:12 EDT 1998
+
+
+
diff --git a/doc/html/Errors.html b/doc/html/Errors.html
new file mode 100644
index 0000000..4c3637d
--- /dev/null
+++ b/doc/html/Errors.html
@@ -0,0 +1,281 @@
+
+
+
+ The Error Handling Interface (H5E)
+
+ 1. Introduction
+
+
+
+ Example: An Error Message
+
+
+
+ H5Tclose
on a
+ predefined data type then the following message is
+ printed on the standard error stream. This is a
+ simple error that has only one component, the API
+ function; other errors may have many components.
+
+
+
+HDF5-DIAG: Error detected in thread 0. Back trace follows.
+ #000: H5T.c line 462 in H5Tclose(): predefined data type
+ major(01): Function argument
+ minor(05): Bad value
+
H5Eprint()
then the automatic printing should be
+ turned off to prevent error messages from being displayed twice
+ (see H5Eset_auto()
below).
+
+
+
+
+ herr_t H5Eprint (FILE *stream)
+ HDF5-DIAG: Error detected in thread 0.
+
+
+ herr_t H5Eclear (void)
+ H5Eprint()
).
+ H5Eset_auto()
function:
+
+
+
+
+ herr_t H5Eset_auto (herr_t(*func)(void*),
+ void *client_data)
+ H5Eprint()
(cast appropriately) and
+ client_data is the standard error stream pointer,
+ stderr
.
+
+
+ herr_t H5Eget_auto (herr_t(**func)(void*),
+ void **client_data)
+
+
+ Example: Error Control
+
+
+
+
+
+
+/* Save old error handler */
+herr_t (*old_func)(void*);
+void *old_client_data;
+H5Eget_auto(&old_func, &old_client_data);
+
+/* Turn off error handling */
+H5Eset_auto(NULL, NULL);
+
+/* Probe. Likely to fail, but that's okay */
+status = H5Fopen (......);
+
+/* Restore previous error handler */
+H5Eset_auto(old_func, old_client_data);
+
+
+/* Turn off error handling permanently */
+H5Eset_auto (NULL, NULL);
+
+/* If failure, print error message */
+if (H5Fopen (....)<0) {
+ H5Eprint (stderr);
+ exit (1);
+}
+
H5Eprint()
. For instance, one could define a
+ function that prints a simple, one-line error message to the
+ standard error stream and then exits.
+
+
+
+ Example: Simple Messages
+
+
+
+
+
+
+herr_t
+my_hdf5_error_handler (void *unused)
+{
+ fprintf (stderr, "An HDF5 error was detected. Bye.\n");
+ exit (1);
+}
+
+
+H5Eset_auto (my_hdf5_error_handler, NULL);
+
H5Eprint()
function is actually just a wrapper
+ around the more complex H5Ewalk()
function which
+ traverses an error stack and calls a user-defined function for
+ each member of the stack.
+
+
+
+
+ herr_t H5Ewalk (H5E_direction_t direction,
+ H5E_walk_t func, void *client_data)
+ H5E_WALK_UPWARD then traversal begins at the
+ inner-most function that detected the error and concludes with
+ the API function. The opposite order is
+
H5E_WALK_DOWNWARD
.
+
+
+ typedef herr_t (*H5E_walk_t)(int n,
+ H5E_error_t *eptr, void
+ *client_data)
+ H5Ewalk()
.
+
+
+
+ typedef struct {
+ H5E_major_t maj_num;
+ H5E_minor_t min_num;
+ const char *func_name;
+ const char *file_name;
+ unsigned line;
+ const char *desc;
+} H5E_error_t;
+ const char *H5Eget_major (H5E_major_t num)
+ const char *H5Eget_minor (H5E_minor_t num)
+
+
+ Example: H5Ewalk_cb
+
+
+
+
+
+herr_t
+H5Ewalk_cb(int n, H5E_error_t *err_desc, void *client_data)
+{
+ FILE *stream = (FILE *)client_data;
+ const char *maj_str = NULL;
+ const char *min_str = NULL;
+ const int indent = 2;
+
+ /* Check arguments */
+ assert (err_desc);
+ if (!client_data) client_data = stderr;
+
+ /* Get descriptions for the major and minor error numbers */
+ maj_str = H5Eget_major (err_desc->maj_num);
+ min_str = H5Eget_minor (err_desc->min_num);
+
+ /* Print error message */
+ fprintf (stream, "%*s#%03d: %s line %u in %s(): %s\n",
+ indent, "", n, err_desc->file_name, err_desc->line,
+ err_desc->func_name, err_desc->desc);
+ fprintf (stream, "%*smajor(%02d): %s\n",
+ indent*2, "", err_desc->maj_num, maj_str);
+ fprintf (stream, "%*sminor(%02d): %s\n",
+ indent*2, "", err_desc->min_num, min_str);
+
+ return 0;
+}
+
+ Robb Matzke
+
+
+Last modified: Wed Mar 4 10:06:17 EST 1998
+
+
+
diff --git a/doc/html/ExternalFiles.html b/doc/html/ExternalFiles.html
new file mode 100644
index 0000000..39ebd2b
--- /dev/null
+++ b/doc/html/ExternalFiles.html
@@ -0,0 +1,278 @@
+
+
+
+ External Files in HDF5
Overview of Layers
+
+
+
+
+
+ Layer-7
+ Groups
+ Datasets
+
+
+ Layer-6
+ Indirect Storage
+ Symbol Tables
+
+
+ Layer-5
+ B-trees
+ Object Hdrs
+ Heaps
+
+
+ Layer-4
+ Caching
+
+
+ Layer-3
+ H5F chunk I/O
+
+
+ Layer-2
+ H5F low
+
+
+ Layer-1
+ File Family
+ Split Meta/Raw
+
+
+ Layer-0
+ Section-2 I/O
+ Standard I/O
+ Malloc/Free
+ Single Address Space
+
+ H5Fcreate
and H5Fopen
functions
+ would need to be modified to pass file-type info down to layer 2
+ so the correct drivers can be called and parameters passed to
+ the drivers to initialize them.
+
+ Implementation
+
+ H5F_open
(which is called by H5Fopen()
+ and H5Fcreate
) that contains a
+ printf(3c)
-style integer format specifier.
+ Currently, the default low-level file driver is used for all
+ family members (H5F_LOW_DFLT, usually set to be Section 2 I/O or
+ Section 3 stdio), but we'll probably eventually want to pass
+ that as a parameter of the file access template, which hasn't
+ been implemented yet. When creating a family, a default family
+ member size is used (defined at the top H5Ffamily.c, currently
+ 64MB) but that also should be settable in the file access
+ template. When opening an existing family, the size of the first
+ member is used to determine the member size (flushing/closing a
+ family ensures that the first member is the correct size) but
+ the other family members don't have to be that large (the local
+ address space, however, is logically the same size for all
+ members).
+
+ H5F_open
+ then we'll chose the split family and use the default low level
+ driver for each of the two family members. Eventually we'll
+ want to pass these kinds of things through the file access
+ template instead of relying on naming convention.
+
+ External Raw Data
+
+ Multiple HDF5 Files
+
+
+
+
+struct H5F_mount_t {
+ H5F_t *parent; /* Parent HDF5 file if any */
+ struct {
+ H5F_t *f; /* File which is mounted */
+ haddr_t where; /* Address of mount point */
+ } *mount; /* Array sorted by mount point */
+ intn nmounts; /* Number of mounted files */
+ intn alloc; /* Size of mount table */
+}
+
H5Fmount
function takes the ID of an open
+ file, the name of a to-be-mounted file, the name of the mount
+ point, and a file access template (like H5Fopen
).
+ It opens the new file and adds a record to the parent's mount
+ table. The H5Funmount
function takes the parent
+ file ID and the name of the mount point and closes the file
+ that's mounted at that point. The H5Fclose
+ function closes/unmounts files recursively.
+
+ H5G_iname
function which translates a name to
+ a file address (haddr_t
) looks at the mount table
+ at each step in the translation and switches files where
+ appropriate. All name-to-address translations occur through
+ this function.
+
+ How Long?
+
+ H5F_istore_read
should be trivial. Most of the
+ time will be spent designing a way to cache Unix file
+ descriptors efficiently since the total number open files
+ allowed per process could be much smaller than the total number
+ of HDF5 files and external raw data files.
+
+ haddr_t
opaque turned out to be much easier
+ than I planned (I did it last Fri). Most of the work will
+ probably be removing the redundant H5F_t arguments for lots of
+ functions.
+
+ Conclusion
+
+
+ Robb Matzke
+
+
+Last modified: Wed Nov 12 15:01:14 EST 1997
+
+
+
diff --git a/doc/html/Files.html b/doc/html/Files.html
new file mode 100644
index 0000000..791cc1f
--- /dev/null
+++ b/doc/html/Files.html
@@ -0,0 +1,529 @@
+
+
+
+ Files
+
+ 1. Introduction
+
+ 2. File access modes
+
+ H5F_ACC_RDWR
+ parameter to H5Fopen()
allows write access to a
+ file also. H5Fcreate()
assumes write access as
+ well as read access, passing H5F_ACC_TRUNC
forces
+ the truncation of an existing file, otherwise H5Fcreate will
+ fail to overwrite an existing file.
+
+ 3. Creating, Opening, and Closing Files
+
+ H5Fcreate()
function,
+ and existing files can be accessed with H5Fopen()
. Both
+ functions return an object ID which should be eventually released by
+ calling H5Fclose()
.
+
+
+
+
+ hid_t H5Fcreate (const char *name, uintn
+ flags, hid_t create_properties, hid_t
+ access_properties)
+ H5F_ACC_TRUNC
flag is set,
+ any current file is truncated when the new file is created.
+ If a file of the same name exists and the
+ H5F_ACC_TRUNC
flag is not set (or the
+ H5F_ACC_EXCL
bit is set), this function will
+ fail. Passing H5P_DEFAULT
for the creation
+ and/or access property lists uses the library's default
+ values for those properties. Creating and changing the
+ values of a property list is documented further below. The
+ return value is an ID for the open file and it should be
+ closed by calling H5Fclose()
when it's no longer
+ needed. A negative value is returned for failure.
+
+
+ hid_t H5Fopen (const char *name, uintn
+ flags, hid_t access_properties)
+ H5F_ACC_RDWR
flag is
+ set. The access_properties is a file access property
+ list ID or H5P_DEFAULT
for the default I/O access
+ parameters. Creating and changing the parameters for access
+ templates is documented further below. Files which are opened
+ more than once return a unique identifier for each
+ H5Fopen()
call and can be accessed through all
+ file IDs. The return value is an ID for the open file and it
+ should be closed by calling H5Fclose()
when it's
+ no longer needed. A negative value is returned for failure.
+
+
+ herr_t H5Fclose (hid_t file_id)
+ H5Fcreate()
or H5Fopen()
. After
+ closing a file the file_id should not be used again. This
+ function returns zero for success or a negative value for failure.
+ 4. File Property Lists
+
+ H5Fcreate()
or
+ H5Fopen()
are passed through property list
+ objects, which are created with the H5Pcreate()
+ function. These objects allow many parameters of a file's
+ creation or access to be changed from the default values.
+ Property lists are used as a portable and extensible method of
+ modifying multiple parameter values with simple API functions.
+ There are two kinds of file-related property lists,
+ namely file creation properties and file access properties.
+
+ 4.1. File Creation Properties
+
+ H5Fcreate()
only
+ and are used to control the file meta-data which is maintained
+ in the boot block of the file. The parameters which can be
+ modified are:
+
+
+
+
+ H5Pset_userblock()
and
+ H5Pget_userblock()
calls.
+
+
+ H5Pset_sizes()
and
+ H5Pget_sizes()
calls.
+
+
+ H5Pset_sym_k()
and H5Pget_sym_k()
calls.
+
+
+ H5Pset_istore_k()
and H5Pget_istore_k()
+ calls.
+ 4.2. File Access Property Lists
+
+ H5Fcreate()
or
+ H5Fopen()
and are used to control different methods of
+ performing I/O on files.
+
+
+
+
+ open()
,
+ lseek()
, read()
, write()
, and
+ close()
. The lseek64()
function is used
+ on operating systems that support it. This driver is enabled and
+ configured with H5Pset_sec2()
, and queried with
+ H5Pget_sec2()
.
+
+
+ stdio.h
header file, namely
+ fopen()
, fseek()
, fread()
,
+ fwrite()
, and fclose()
. The
+ fseek64()
function is used on operating systems that
+ support it. This driver is enabled and configured with
+ H5Pset_stdio()
, and queried with
+ H5Pget_stdio()
.
+
+
+ malloc()
and free()
to create storage
+ space for the file. The total size of the file must be small enough
+ to fit in virtual memory. The name supplied to
+ H5Fcreate()
is irrelevant, and H5Fopen()
+ will always fail.
+
+
+ MPI_File_open()
during file creation or open.
+ The access_mode controls the kind of parallel access the application
+ intends. (Note that it is likely that the next API revision will
+ remove the access_mode parameter and have access control specified
+ via the raw data transfer property list of H5Dread()
+ and H5Dwrite()
.) These parameters are set and queried
+ with the H5Pset_mpi()
and H5Pget_mpi()
+ calls.
+
+
+ H5Pset_alignment()
function. Any allocation
+ request at least as large as some threshold will be aligned on
+ an address which is a multiple of some number.
+ 5. Examples of using file templates
+
+ 5.1. Example of using file creation templates
+
+
+
+ hid_t create_template;
+ hid_t file_id;
+
+ create_template = H5Pcreate(H5P_FILE_CREATE);
+ H5Pset_sizes(create_template, 8, 8);
+
+ file_id = H5Fcreate("test.h5", H5F_ACC_TRUNC,
+ create_template, H5P_DEFAULT);
+ .
+ .
+ .
+ H5Fclose(file_id);
+
+
+ 5.2. Example of using file creation templates
+
+
+
+ hid_t access_template;
+ hid_t file_id;
+
+ access_template = H5Pcreate(H5P_FILE_ACCESS);
+ H5Pset_mpi(access_template, MPI_COMM_WORLD, MPI_INFO_NULL);
+
+ /* H5Fopen must be called collectively */
+ file_id = H5Fopen("test.h5", H5F_ACC_RDWR, access_template);
+ .
+ .
+ .
+ /* H5Fclose must be called collectively */
+ H5Fclose(file_id);
+
+
+
+ 6. Low-level File Drivers
+
+ 6.1 Unbuffered Permanent Files
+
+ open()
, close()
, read()
,
+ write()
, and lseek()
functions. If the
+ operating system supports lseek64()
then it is used instead
+ of lseek()
. The library buffers meta data regardless of
+ the low-level driver, but using this driver prevents data from being
+ buffered again by the lowest layers of the HDF5 library.
+
+
+
+
+ H5F_driver_t H5Pget_driver (hid_t
+ access_properties)
+ H5F_LOW_SEC2
if the
+ sec2 driver is defined as the low-level driver for the
+ specified access property list.
+
+
+ herr_t H5Pset_sec2 (hid_t access_properties)
+
+ herr_t H5Pget_sec2 (hid_t access_properties)
+ H5Pset_sec2()
.
+ 6.2 Buffered Permanent Files
+
+ stdio.h
header file to access permanent files in a local
+ file system. These are the fopen()
, fclose()
,
+ fread()
, fwrite()
, and fseek()
+ functions. If the operating system supports fseek64()
then
+ it is used instead of fseek()
. Use of this driver
+ introduces an additional layer of buffering beneath the HDF5 library.
+
+
+
+
+ H5F_driver_t H5Pget_driver(hid_t
+ access_properties)
+ H5F_LOW_STDIO
if the
+ stdio driver is defined as the low-level driver for the
+ specified access property list.
+
+
+ herr_t H5Pset_stdio (hid_t access_properties)
+
+ herr_t H5Pget_stdio (hid_t access_properties)
+ H5Pset_stdio()
.
+ 6.3 Buffered Temporary Files
+
+ malloc()
and
+ free()
to allocated space for a file in the heap. Reading
+ and writing to a file of this type results in mem-to-mem copies instead
+ of disk I/O and as a result is somewhat faster. However, the total file
+ size must not exceed the amount of available virtual memory, and only
+ one HDF5 file handle can access the file (because the name of such a
+ file is insignificant and H5Fopen()
always fails).
+
+
+
+
+ H5F_driver_t H5Pget_driver (hid_t
+ access_properties)
+ H5F_LOW_CORE
if the
+ core driver is defined as the low-level driver for the
+ specified access property list.
+
+
+ herr_t H5Pset_core (hid_t access_properties, size_t
+ block_size)
+
+ herr_t H5Pget_core (hid_t access_properties, size_t
+ *block_size)
+ H5Pset_core()
.
+ 6.4 Parallel Files
+
+
+
+
+ H5F_driver_t H5Pget_driver (hid_t
+ access_properties)
+ H5F_LOW_MPI
if the
+ mpi driver is defined as the low-level driver for the
+ specified access property list.
+
+
+ herr_t H5Pset_mpi (hid_t access_properties, MPI_Comm
+ comm, MPI_info info)
+
+ herr_t H5Pget_mpi (hid_t access_properties, MPI_Comm
+ *comm, MPI_info *info)
+ H5Pset_mpi()
.
+ 6.4 File Families
+
+ ls
(1) may be substantially smaller. The name passed to
+ H5Fcreate()
or H5Fopen()
should include a
+ printf(3c)
style integer format specifier which will be
+ replaced with the family member number (the first family member is
+ zero).
+
+ split
(1) and numbering the output
+ files. However, because HDF5 is lazy about extending the size
+ of family members, a valid file cannot generally be created by
+ concatenation of the family members. Additionally,
+ split
and cat
don't attempt to
+ generate files with holes. The h5repart
program
+ can be used to repartition an HDF5 file or family into another
+ file or family and preserves holes in the files.
+
+
+
+
+ h5repart
[-v
] [-b
+ block_size[suffix]] [-m
+ member_size[suffix]] source
+ destination
+ printf
-style integer format such as "%d". The
+ -v
switch prints input and output file names on
+ the standard error stream for progress monitoring,
+ -b
sets the I/O block size (the default is 1kB),
+ and -m
sets the output member size if the
+ destination is a family name (the default is 1GB). The block
+ and member sizes may be suffixed with the letters
+ g
, m
, or k
for GB, MB,
+ or kB respectively.
+
+
+ H5F_driver_t H5Pget_driver (hid_t
+ access_properties)
+ H5F_LOW_FAMILY
if
+ the family driver is defined as the low-level driver for the
+ specified access property list.
+
+
+ herr_t H5Pset_family (hid_t access_properties,
+ hsize_t memb_size, hid_t member_properties)
+ off_t
type is
+ four bytes then the maximum family member size is usually
+ 2^31-1 because the byte at offset 2,147,483,647 is generally
+ inaccessable. Additional parameters may be added to this
+ function in the future.
+
+
+ herr_t H5Pget_family (hid_t access_properties,
+ hsize_t *memb_size, hid_t
+ *member_properties)
+ H5Pclose() when the application is finished with
+ it. If memb_size is non-null then it will contain
+ the logical size in bytes of each family member. In the
+ future, additional arguments may be added to this function to
+ match those added to
H5Pset_family()
.
+ 6.5 Split Meta/Raw Files
+
+ H5Fcreate()
or H5Fopen()
and this
+ driver appends a file extension which defaults to ".meta" for the meta
+ data file and ".raw" for the raw data file. Each file can have its own
+ file access property list which allows, for instance, a split file with
+ meta data stored with the core driver and raw data stored with
+ the sec2 driver.
+
+
+
+
+
+ H5F_driver_t H5Pget_driver (hid_t
+ access_properties)
+ H5F_LOW_SPLIT
if
+ the split driver is defined as the low-level driver for the
+ specified access property list.
+
+
+ herr_t H5Pset_split (hid_t access_properties,
+ const char *meta_extension, hid_t
+ meta_properties, const char *raw_extension, hid_t
+ raw_properties)
+
+ herr_t H5Pget_split (hid_t access_properties,
+ size_t meta_ext_size, const char *meta_extension,
+ hid_t meta_properties, size_t raw_ext_size, const
+ char *raw_extension, hid_t *raw_properties)
+ H5Pclose() when
+ the application is finished with them, but if the meta and/or
+ raw file has no property list then a negative value is
+ returned for that property list handle. Also, if
+ meta_extension and/or raw_extension are
+ non-null pointers, at most meta_ext_size or
+ raw_ext_size characters of the meta or raw file name
+ extension will be copied to the specified buffer. If the
+ actual name is longer than what was requested then the result
+ will not be null terminated (similar to
+
strncpy()
). In the future, additional arguments
+ may be added to this function to match those added to
+ H5Pset_split()
.
+
+ Quincey Koziol
+ Robb Matzke
+
+
+Last modified: Tue Jun 9 15:03:44 EDT 1998
+
+
+
diff --git a/doc/html/Groups.html b/doc/html/Groups.html
new file mode 100644
index 0000000..b1be2f1
--- /dev/null
+++ b/doc/html/Groups.html
@@ -0,0 +1,288 @@
+
+
+
+ Groups
+
+ 1. Introduction
+
+ 2. Names
+
+
+
+
+
+
+ Location Type
+ Object Name
+ Description
+
+
+
+ File ID
+
+ /foo/bar
The object
+ bar
in group foo
+ in the root group of the specified file.
+
+
+ Group ID
+
+ /foo/bar
The object
+ bar
in group foo
+ in the root group of the file containing the specified
+ group. In other words, the group ID's only purpose is
+ to supply a file.
+
+
+ File ID
+
+ /
The root group of the specified file.
+
+
+
+ Group ID
+
+ /
The root group of the file containing the specified
+ group.
+
+
+
+ File ID
+
+ foo/bar
The object
+ bar
in group foo
+ in the current working group of the specified file. The
+ initial current working group is the root group of the
+ file as described below.
+
+
+ Group ID
+
+ foo/bar
The object
+ bar
in group foo
+ in the specified group.
+
+
+ File ID
+
+ .
The current working group of the specified file.
+
+
+
+ Group ID
+
+ .
The specified group.
+
+
+
+ Other ID
+
+ .
The specified object.
+ 3. Creating, Opening, and Closing Groups
+
+ H5Gcreate()
function,
+ and existing groups can be access with
+ H5Gopen()
. Both functions return an object ID which
+ should be eventually released by calling
+ H5Gclose()
.
+
+
+
+
+ hid_t H5Gcreate (hid_t location_id, const char
+ *name, size_t size_hint)
+ H5Gclose()
+ when it's no longer needed. A negative value is returned for
+ failure.
+
+
+ hid_t H5Gopen (hid_t location_id, const char
+ *name)
+ H5Gclose()
when it is no
+ longer needed. A negative value is returned for failure.
+
+
+ herr_t H5Gclose (hid_t group_id)
+ H5Gcreate()
or
+ H5Gopen()
. After closing a group the
+ group_id should not be used again. This function
+ returns zero for success or a negative value for failure.
+ 4. Current Working Group
+
+ hid_t file_id
) has a
+ current working group, initially the root group of the file.
+ Names which do not begin with a slash are relative to the
+ specified group or to the current working group as described
+ above. For instance, the name "/Foo/Bar/Baz" is resolved by
+ first looking up "Foo" in the root group. But the name
+ "Foo/Bar/Baz" is resolved by first looking up "Foo" in the
+ current working group.
+
+
+
+
+ herr_t H5Gset (hid_t location_id, const char
+ *name)
+
+ herr_t H5Gpush (hid_t location_id, const char
+ *name)
+
+ herr_t H5Gpop (hid_t location_id)
+ 5. Objects with Multiple Names
+
+
+
+
+
+ herr_t H5Glink (hid_t file_id, H5G_link_t
+ link_type, const char *current_name,
+ const char *new_name)
+ H5G_LINK_HARD
then a new
+ hard link is created. Otherwise if link_type is
+ H5T_LINK_SOFT
a soft link is created which is an
+ alias for the current_name. When creating a soft
+ link the object need not exist. This function returns zero
+ for success or negative for failure. This function is not
+ part of the prototype API.
+
+
+ herr_t H5Gunlink (hid_t file_id, const char
+ *name)
+
+ Robb Matzke
+
+
+Last modified: Tue Mar 24 15:52:14 EST 1998
+
+
+
diff --git a/doc/html/H5.api.html b/doc/html/H5.api.html
new file mode 100644
index 0000000..b2402a5
--- /dev/null
+++ b/doc/html/H5.api.html
@@ -0,0 +1,4611 @@
+HDF5: API Specification
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Library API Functions
+
+
+
+
+
+
+
+H5dont_atexit
(void)
+
+
+
+
+H5close
(void)
+
+
+
+
+H5version
(uintn *majversion
,
+ uintn *minversion
,
+ uintn *relversion
,
+ uintn *patversion
+ )
+
+
+majversion
+ minversion
+ relversion
+ patversion
+
+File API Functions
+
+
+
+
+
+
+
+H5Fopen
(const char *name
,
+ uintn flags
,
+ hid_t access_template
+ )
+flags
parameter determines the file access mode.
+ There is no read flag, all open files are implicitily opened for
+ read access.
+ All flags may be combined with the '|' (boolean OR operator) to
+ change the behavior of the file open call.
+ The access_template
parameter is a template containing
+ additional information required for specific methods of access,
+ parallel I/O for example. The paramters for access templates are
+ described in the H5P API documentation.
+
+
+name
+ flags
+
+
+
access_template
+
+
+
+
+H5Fcreate
(const char *name
,
+ uintn flags
,
+ hid_t create_template
,
+ hid_t access_template
+ )
+flags
parameter determines whether an existing
+ file will be overwritten or not. All newly created files are opened
+ for both reading and writing.
+ All flags may be combined with the '|' (boolean OR operator) to
+ change the behavior of the file open call.
+ The create_template
and access_template
+ parameters are templates containing additional information required
+ for specific methods of access or particular aspects of the file
+ to set when creating a file.
+ The parameters for creation and access templates are
+ described in the H5P API documentation.
+
+
+name
+ flags
+
+
+
create_template
+ access_template
+
+
+
+
+H5Fis_hdf5
(const char *name
+ )
+
+
+name
+
+
+
+
+H5Fget_create_template
(hid_t file_id
+ )
+
+
+file_id
+
+
+
+
+H5Fclose
(hid_t file_id
+ )
+
+
+file_id
+
+Template API Functions
+
+
+
+
+
+
+
+H5Pcreate
(H5P_class_t type
+ )
+
+
+
+
+
+
+
+type
+
+
+
+
+H5Pclose
(hid_t template_id
+ )
+
+
+template_id
+
+
+
+
+H5Pget_class
(hid_t template_id
+ )
+
+
+template_id
+
+
+
+
+H5Pcopy
(hid_t template_id
+ )
+
+
+template_id
+
+
+
+
+H5Pget_version
(hid_t template_id
,
+ int * boot
,
+ int * freelist
,
+ int * stab
,
+ int * shhdr
+ )
+
+
+template_id
+ boot
+ freelist
+ stab
+ shhdr
+
+
+
+
+H5Pset_userblock
(hid_t template_id
,
+ hsize_t size
+ )
+
+
+template_id
+ size
+
+
+
+
+H5Pget_userblock
(hid_t template_id
,
+ hsize_t * size
+ )
+
+
+template_id
+ size
+
+
+
+
+H5Pset_sizes
(hid_t template_id
,
+ size_t sizeof_addr
,
+ size_t sizeof_size
+ )
+
+
+template_id
+ sizeof_addr
+ sizeof_size
+
+
+
+
+H5Pget_sizes
(hid_t template_id
,
+ size_t * sizeof_addr
,
+ size_t * sizeof_size
+ )
+
+
+template_id
+ size
+ size
+
+
+
+
+H5Pset_mpi
(hid_t tid
,
+ MPI_Comm comm
,
+ MPI_Info info
+ )
+
+
+tid
+ comm
+ info
+
+
+
+
+H5Pget_mpi
(hid_t tid
,
+ MPI_Comm *comm
,
+ MPI_Info *info
+ )
+
+
+tid
+ comm
+ info
+
+
+
+
+H5Pset_xfer
(hid_t tid
,
+ H5D_transfer_t data_xfer_mode
+ )
+
+
+tid
+ data_xfer_mode
+
+
+
+
+
+
+H5Pget_xfer
(hid_t tid
,
+ H5D_transfer_t * data_xfer_mode
+ )
+
+
+tid
+ data_xfer_mode
+
+
+
+
+H5Pset_sym_k
(hid_t template_id
,
+ size_t ik
,
+ size_t lk
+ )
+ik
is one half the rank of a tree that stores a symbol
+ table for a group. Internal nodes of the symbol table are on
+ average 75% full. That is, the average rank of the tree is
+ 1.5 times the value of ik
.
+ lk
is one half of the number of symbols that can be stored in
+ a symbol table node. A symbol table node is the leaf of a
+ symbol table tree which is used to store a group. When
+ symbols are inserted randomly into a group, the group's
+ symbol table nodes are 75% full on average. That is, they
+ contain 1.5 times the number of symbols specified by lk
.
+
+
+template_id
+ ik
+ lk
+
+
+
+
+H5Pget_sym_k
(hid_t template_id
,
+ size_t * ik
,
+ size_t * lk
+ )
+
+
+template_id
+ ik
+ size
+
+
+
+
+H5Pset_istore_k
(hid_t template_id
,
+ size_t ik
+ )
+ik
is one half the rank of a tree that stores chunked raw
+ data. On average, such a tree will be 75% full, or have an
+ average rank of 1.5 times the value of ik
.
+
+
+template_id
+ ik
+
+
+
+
+H5Pget_istore_k
(hid_t template_id
,
+ size_t * ik
+ )
+ik
may be the null pointer. This
+ function is only valid for file creation templates.
+
+
+template_id
+ ik
+
+
+
+
+H5Pset_layout
(hid_t template_id
,
+ H5D_layout_t layout
+ )
+layout
are:
+
+
+
+
+template_id
+ layout
+
+
+
+
+H5Pget_layout
(hid_t template_id
,
+ H5D_layout_t * layout
+ )
+layout
are:
+
+
+
+
+template_id
+ layout
+
+
+
+
+H5Pset_chunk
(hid_t template_id
,
+ int ndims
,
+ const hsize_t * dim
+ )
+ndims
parameter currently must be the
+ same size as the rank of the dataset. The values of the
+ dim
array define the size of the chunks to store the
+ dataset's raw data. As a side-effect, the layout of the dataset is
+ changed to H5D_CHUNKED, if it isn't already.
+
+
+template_id
+ ndims
+ dim
+
+
+
+
+
+
+
+
+H5Pget_chunk
(hid_t template_id
,
+ int max_ndims
+ hsize_t * dims
+ )
+
+
+template_id
+ max_ndims
+ dims
array.
+ dims
+
+Dataset Object API Functions
+
+
+
+
+
+
+
+H5Dcreate
(hid_t file_id
,
+ const char *name
,
+ hid_ttype_id
,
+ hid_tspace_id
,
+ hid_ttemplate_id
+ )
+file_id
. The type_id
and space_id
+ are the IDs of the datatype and dataspace used to construct the
+ framework of the dataset. The datatype and dataspace parameters
+ describe the dataset as it will exist in the file, which is not
+ necessarily the same as it exists in memory. The template_id
+ contains either the default template (H5P_DEFAULT) or a template_id
+ with particular constant properties used to create the dataset. The
+ name
is used to identify the dataset in a group and must
+ be unique within that group.
+
+
+file_id
+ name
+ type_id
+ space_id
+ template_id
+
+
+
+
+H5Dopen
(hid_t file_id
,
+ const char *name
+ )
+file_id
. The name
is
+ used to identify the dataset in the file.
+
+
+file_id
+ name
+
+
+
+
+H5Dget_space
(hid_t dataset_id
+ )
+
+
+dataset_id
+
+
+
+
+H5Dget_type
(hid_t dataset_id
+ )
+
+
+dataset_id
+
+
+
+
+H5Dget_create_plist
(hid_t dataset_id
+ )
+
+
+dataset_id
+
+
+
+
+H5Dread
(hid_t dataset_id
,
+ hid_t mem_type_id
,
+ hid_t mem_space_id
,
+ hid_t file_space_id
,
+ hid_t transfer_template_id
,
+ void * buf
+ )
+buf
,
+ converting from the file datatype of the dataset into the memory
+ datatype specified in mem_type_id
. The portion of the
+ dataset to read from disk is specified with the file_spaceid
+ which can contain a dataspace with a hyperslab selected or the constant
+ H5S_ALL, which indicates the entire dataset is to be read. The portion
+ of the dataset read into the memory buffer is specified with the
+ mem_space_id
which can also be a hyperslab of the same
+ size or the H5S_ALL parameter to store the entire dataset. The
+ transfer_template_id
is a dataset transfer template ID which
+ is used to provide addition parameters for the I/O operation or can
+ be H5P_DEFAULT for the default library behavior.
+
+
+dataset_id
+ mem_type_id
+ mem_space_id
+ file_space_id
+ transfer_template_id
+ buf
+
+
+
+
+H5Dwrite
(hid_t dataset_id
,
+ hid_t mem_type_id
,
+ hid_t mem_space_id
,
+ hid_t file_space_id
,
+ hid_t transfer_template_id
,
+ const void * buf
+ )
+mem_type_id
into the file datatype.
+ The portion of the
+ dataset to written to disk is specified with the file_spaceid
+ which can contain a dataspace with a hyperslab selected or the constant
+ H5S_ALL, which indicates the entire dataset is to be written. The portion
+ of the dataset written from the memory buffer is specified with the
+ mem_space_id
which can also be a hyperslab of the same
+ size or the H5S_ALL parameter to store the entire dataset. The
+ transfer_template_id
is a dataset transfer template ID which
+ is used to provide addition parameters for the I/O operation or can
+ be H5P_DEFAULT for the default library behavior.
+
+
+dataset_id
+ mem_type_id
+ mem_space_id
+ file_space_id
+ transfer_template_id
+ buf
+
+
+
+
+H5Dextend
(hid_t dataset_id
,
+ const hsize_t * size
+ )
+size
array must have
+ the same number of entries as the rank of the dataset's dataspace.
+
+
+dataset_id
+ size
+
+
+
+
+H5Dclose
(hid_t dataset_id
+ )
+
+
+dataset_id
+
+Datatype Object API Functions
+
+
+
+
+
+
+
+H5Tcreate
(H5T_class_t class
,
+ size_tsize
+ )
+H5T_COMPOUND
+ datatype class is supported with this function, use H5Tcopy
+ to create integer or floating-point datatypes. The datatype ID
+ returned from this function should be released with H5Tclose or resource
+ leaks will result.
+
+
+class
+ size
+
+
+
+
+H5Tcopy
(hid_t type_id
+ )
+
+
+
+
+type_id
+
+
+
+
+H5Tequal
(hid_t type_id1
,
+ hid_ttype_id2
+ )
+
+
+type_id1
+ type_id2
+
+
+
+
+H5Tlock
(hid_t type_id
+ )
+
+
+type_id
+
+
+
+
+H5Tget_class
(hid_t type_id
+ )
+
+
+type_id
+
+
+
+
+H5Tget_size
(hid_t type_id
+ )
+
+
+type_id
+
+
+
+
+H5Tset_size
(hid_t type_id
,
+ size_tsize
+ )
+
+
+type_id
+ size
+
+
+
+
+H5Tget_order
(hid_t type_id
+ )
+
+
+type_id
+
+
+
+
+H5Tset_order
(hid_t type_id
,
+ H5T_order_torder
+ )
+
+
+
+
+type_id
+ order
+
+
+
+
+H5Tget_precision
(hid_t type_id
+ )
+
+
+type_id
+
+
+
+
+H5Tset_precision
(hid_t type_id
,
+ size_tprecision
+ )
+
+
+type_id
+ precision
+
+
+
+
+H5Tget_offset
(hid_t type_id
+ )
+
+
+
+
+
+
+
+ Byte Position
+ Big-Endian Offset=0
+ Big-Endian Offset=16
+ Little-Endian Offset=0
+ Little-Endian Offset=16
+
+
+ 0:
+ [ pad]
+ [0x11]
+ [0x22]
+ [ pad]
+
+
+ 1:
+ [ pad]
+ [0x22]
+ [0x11]
+ [ pad]
+
+
+ 2:
+ [0x11]
+ [ pad]
+ [ pad]
+ [0x22]
+
+
+ 3:
+ [0x22]
+ [ pad]
+ [ pad]
+ [0x11]
+
+
+type_id
+
+
+
+
+H5Tset_offset
(hid_t type_id
,
+ size_t offset
+ )
+
+
+
+
+
+
+
+
+ Byte Position
+ Big-Endian Offset=0
+ Big-Endian Offset=16
+ Little-Endian Offset=0
+ Little-Endian Offset=16
+
+
+ 0:
+ [ pad]
+ [0x11]
+ [0x22]
+ [ pad]
+
+
+ 1:
+ [ pad]
+ [0x22]
+ [0x11]
+ [ pad]
+
+
+ 2:
+ [0x11]
+ [ pad]
+ [ pad]
+ [0x22]
+
+
+ 3:
+ [0x22]
+ [ pad]
+ [ pad]
+ [0x11]
+
+
+type_id
+ offset
+
+
+
+
+H5Tget_pad
(hid_t type_id
,
+ H5T_pad_t * lsb
,
+ H5T_pad_t * msb
+ )
+
+
+
+
+type_id
+ lsb
+ msb
+
+
+
+
+H5Tset_pad
(hid_t type_id
,
+ H5T_pad_t lsb
,
+ H5T_pad_t msb
+ )
+
+
+
+
+type_id
+ lsb
+ msb
+
+
+
+
+H5Tget_sign
(hid_t type_id
+ )
+
+
+
+
+type_id
+
+
+
+
+H5Tset_sign
(hid_t type_id
,
+ H5T_sign_t sign
+ )
+
+
+
+
+type_id
+ sign
+
+
+
+
+H5Tget_fields
(hid_t type_id
,
+ size_t * epos
,
+ size_t * esize
,
+ size_t * mpos
,
+ size_t * msize
+ )
+
+
+type_id
+ epos
+ esize
+ mpos
+ msize
+
+
+
+
+H5Tset_fields
(hid_t type_id
,
+ size_t epos
,
+ size_t esize
,
+ size_t mpos
,
+ size_t msize
+ )
+
+
+type_id
+ epos
+ esize
+ mpos
+ msize
+
+
+
+
+H5Tget_ebias
(hid_t type_id
+ )
+
+
+type_id
+
+
+
+
+H5Tset_ebias
(hid_t type_id
,
+ size_t ebias
+ )
+
+
+type_id
+ ebias
+
+
+
+
+H5Tget_norm
(hid_t type_id
+ )
+
+
+
+
+type_id
+
+
+
+
+H5Tset_norm
(hid_t type_id
,
+ H5T_norm_t norm
+ )
+
+
+
+
+type_id
+ norm
+
+
+
+
+H5Tget_inpad
(hid_t type_id
+ )
+
+
+
+
+type_id
+
+
+
+
+H5Tset_inpad
(hid_t type_id
,
+ H5T_pad_t inpad
+ )
+
+
+
+
+type_id
+ pad
+
+
+
+
+H5Tget_cset
(hid_t type_id
+ )
+
+
+
+
+type_id
+
+
+
+
+H5Tset_cset
(hid_t type_id
,
+ H5T_cset_t cset
+ )
+
+
+
+
+type_id
+ cset
+
+
+
+
+H5Tget_strpad
(hid_t type_id
+ )
+
+
+
+
+type_id
+
+
+
+
+H5Tset_strpad
(hid_t type_id
,
+ H5T_str_t strpad
+ )
+
+
+
+
+type_id
+ strpad
+
+
+
+
+H5Tget_nmembers
(hid_t type_id
+ )
+
+
+type_id
+
+
+
+
+H5Tget_member_name
(hid_t type_id
,
+ intn fieldno
+ )
+
+
+type_id
+ fieldno
+
+
+
+
+H5Tget_member_dims
(hid_t type_id
,
+ intn fieldno
,
+ size_t * dims
,
+ int * perm
+ )
+dims
+ and perm
, both arrays of at least four elements. Either
+ (or even both) may be null pointers.
+
+
+type_id
+ fieldno
+ dims
+ perm
+
+
+
+
+H5Tget_member_type
(hid_t type_id
,
+ intn fieldno
+ )
+
+
+type_id
+ fieldno
+
+
+
+
+H5Tinsert
(hid_t type_id
,
+ const char * name
,
+ off_t offset
,
+ hid_t field_id
+ )
+type_id
. The new member has a name
which
+ must be unique within the compound data type. The offset
+ argument defines the start of the member in an instance of the compound
+ data type, and field_id
is the type of the new member.
+
+
+
+type_id
+ name
+ offset
+ field_id
+
+
+
+
+H5Tpack
(hid_t type_id
+ )
+
+
+type_id
+
+
+
+
+H5Tregister_hard
(const char
+ * name
, hid_t src_id
,
+ hid_t dst_id
,
+ H5T_conv_t func
+ )
+src_id
and dst_id
. A conversion
+ path can only have one hard function, so func
replaces any
+ previous hard function.
+ func
is the null pointer then any hard function
+ registered for this path is removed from this path. The soft functions
+ are then used when determining which conversion function is appropriate
+ for this path. The name
argument is used only
+ for debugging and should be a short identifier for the function.
+
+
+name
+ src_id
+ dst_id
+ func
+
+
+
+
+H5Tregister_soft
(const char
+ * name
, hid_t src_id
,
+ hid_t dst_id
,
+ H5T_conv_t func
+ )
+name
+ is used only for debugging and should be a short identifier
+ for the function.
+
+
+name
+ src_id
+ dst_id
+ func
+
+
+
+
+H5Tunregister
(H5T_conv_t func
+ )
+
+
+func
+
+
+
+
+
+H5Tclose
(hid_t type_id
+ )
+
+
+type_id
+
+Dataspace Object API Functions
+
+
+
+
+
+
+
+H5Screate_simple
(int rank
,
+ const hsize_t * dims
,
+ const hsize_t * maxdims
+ )
+rank
is the number of dimensions used in the
+ dataspace. The dims
argument is the size of the simple
+ dataset and the maxdims
argument is the upper limit on the
+ size of the dataset. maxdims
may be the null pointer in
+ which case the upper limit is the same as dims
. If an
+ element of maxdims
is zero then the corresponding dimension
+ is unlimited, otherwise no element of maxdims
should be
+ smaller than the corresponding element of dims
. The
+ dataspace ID returned from this function should be released with
+ H5Sclose or resource leaks will occur.
+
+
+rank
+ dims
+ maxdims
+
+
+
+
+H5Scopy
(hid_t space_id
+ )
+
+
+space_id
+
+
+
+
+H5Sget_npoints
(hid_t space_id
)
+
+
+space_id
+
+
+
+
+H5Sget_ndims
(hid_t space_id
)
+
+
+space_id
+
+
+
+
+H5Sget_dims
(hid_t space_id
,
+ hsize_t *dims
,
+ hsize_t *maxdims
+ )
+dims
parameter.
+
+
+space_id
+ dims
+ maxdims
+
+
+
+
+H5Sis_simple
(hid_t space_id
)
+
+
+space_id
+
+
+
+
+H5Sset_space
(hid_t space_id
,
+ uint32 rank
,
+ uint32 *dims
+ )
+dims
+ array (i.e. 'C' order). Setting the size of a dimension to zero
+ indicates that the dimension is of unlimited size and should be allowed
+ to expand. Currently, only the first dimension in the array (the
+ slowest) may be unlimited in size.
+ [Currently, all dataspace objects are simple
+ dataspaces, complex dataspace support will be added in the future]
+
+
+space_id
+ rank
+ dims
+
+
+
+
+H5Sset_hyperslab
(hid_t space_id
,
+ const hssize_t *start
,
+ const hsize_t *count
,
+ const hsize_t *stride
+ )
+
+
+space_id
+ start
+ count
+ stride
+
+
+
+
+H5Sget_hyperslab
(hid_t space_id
,
+ hssize_t *start
,
+ hsize_t *count
,
+ hsize_t *stride
+ )
+
+
+space_id
+ start
+ count
+ stride
+
+
+
+
+H5Sclose
(hid_t space_id
+ )
+
+
+space_id
+
+Group Object API Functions
+
+H5Fcreate
or
+H5Fopen
maintains an independent current working group
+stack, the top item of which is the current working group (the root
+object is the current working group if the stack is empty). The stack
+can be manipulated with H5Gset
, H5Gpush
, and
+H5Gpop
.
+
+getcwd
+function would be trivial.
+
+
+
+
+
+
+
+
+H5Gset
(hid_t
+ file
, const char *name
,
+ size_t size_hint
)
+
+
+ file
+ H5Fcreate
or
+ H5Fopen
.
+ name
+ size_hint
+
+
+
+
+H5Gopen
(hid_t file_id
,
+ const char *name
+ )
+
+
+file_id
+ name
+
+
+
+
+H5Gset
(hid_t
+ file
, const char *name
)
+
+
+ file
+ H5Fcreate
or
+ H5Fopen
.
+ name
+
+
+
+
+H5Gpush
(hid_t
+ file
, const char *name
)
+
+
+ file
+ H5Fcreate
or
+ H5Fopen
.
+ name
+
+
+
+
+H5Gpop
(hid_t
+ file
)
+
+
+ file
+ H5Fcreate
or
+ H5Fopen
.
+
+
+
+
+
+
+
+
+H5Gclose
(hid_t group_id
+ )
+
+
+group_id
+
+Glossary of data-types used
+
+
+
+Basic Types:
+
+
+
+Complex Types:
+
+
+
+Disk I/O Types:
+
+
+
diff --git a/doc/html/H5.api_map.html b/doc/html/H5.api_map.html
new file mode 100644
index 0000000..c35102a
--- /dev/null
+++ b/doc/html/H5.api_map.html
@@ -0,0 +1,849 @@
+HDF5: API Mapping to legacy APIs
+
+
+
diff --git a/doc/html/H5.format.html b/doc/html/H5.format.html
new file mode 100644
index 0000000..a3c9a7c
--- /dev/null
+++ b/doc/html/H5.format.html
@@ -0,0 +1,3183 @@
+
+
+
+
+
+Functionality
+netCDF
+SD
+AIO
+HDF5
+Comments
+
+
+
+Open existing file for read/write
+ncopen
+SDstart
+AIO_open
+H5Fopen
+
+
+
+Creates new file for read/write.
+nccreate
+
+
+H5Fcreate
+SD API handles this with SDopen
+
+
+
+Close file
+ncclose
+SDend
+AIO_close
+H5Fclose
+
+
+
+Redefine parameters
+ncredef
+
+
+
+Unneccessary under SD & HDF5 data-models
+
+
+
+End "define" mode
+ncendef
+
+
+
+Unneccessary under SD & HDF5 data-models
+
+
+
+Query the number of datasets, dimensions and attributes in a file
+ncinquire
+SDfileinfo
+
+H5Dget_info
+
H5Rget_num_relations
H5Gget_num_contentsHDF5 interface is more granular and flexible
+
+
+
+Update a writeable file with current changes
+ncsync
+
+AIO_flush
+H5Mflush
+HDF5 interface is more flexible because it can be applied to parts of the
+file hierarchy instead of the whole file at once. The SD interface does not
+have this feature, although most of the lower HDF library supports it.
+
+
+
+Close file access without applying recent changes
+ncabort
+
+
+
+How useful is this feature?
+
+
+
+Create new dimension
+ncdimdef
+SDsetdimname
+
+H5Mcreate
+SD interface actually creates dimensions with datasets, this just allows
+naming them
+
+
+
+Get ID of existing dimension
+ncdimid
+SDgetdimid
+
+H5Maccess
+SD interface looks up dimensions by index and the netCDF interface uses
+names, but they are close enough. The HDF5 interface does not current allow
+access to particular dimensions, only the dataspace as a whole.
+
+
+
+Get size & name of dimension
+ncdiminq
+SDdiminfo
+
+H5Mget_name
+
H5Sget_lrankOnly a rough match
+
+
+
+Rename dimension
+ncdimrename
+SDsetdimname
+
+H5Mset_name
+
+
+
+
+Create a new dataset
+ncvardef
+SDcreate
+AIO_mkarray
+H5Mcreate
+
+
+
+
+Attach to an existing dataset
+ncvarid
+SDselect
+AIO_arr_load
+H5Maccess
+
+
+
+
+Get basic information about a dataset
+ncvarinq
+SDgetinfo
+AIO_arr_get_btype
+
AIO_arr_get_nelmts
AIO_arr_get_nbdims
AIO_arr_get_bdims
AIO_arr_get_slabH5Dget_info
+All interfaces have different levels of information that they return, some
+use of auxilliary functions is required to get equivalent amount of information
+
+
+
+Write a single value to a dataset
+ncvarput1
+SDwritedata
+AIO_write
+H5Dwrite
+What is this useful for?
+
+
+
+Read a single value from a dataset
+ncvarget1
+SDreaddata
+AIO_read
+H5Dread
+What is this useful for?
+
+
+
+Write a solid hyperslab of data (i.e. subset) to a dataset
+ncvarput
+SDwritedata
+AIO_write
+H5Dwrite
+
+
+
+
+Read a solid hyperslab of data (i.e. subset) from a dataset
+ncvarget
+SDreaddata
+AIO_read
+H5Dread
+
+
+
+
+Write a general hyperslab of data (i.e. possibly subsampled) to a dataset
+ncvarputg
+SDwritedata
+AIO_write
+H5Dwrite
+
+
+
+
+Read a general hyperslab of data (i.e. possibly subsampled) from a dataset
+ncvargetg
+SDreaddata
+AIO_read
+H5Dread
+
+
+
+
+Rename a dataset variable
+ncvarrename
+
+
+H5Mset_name
+
+
+
+
+Add an attribute to a dataset
+ncattput
+SDsetattr
+
+H5Rattach_oid
+HDF5 requires creating a seperate object to attach to a dataset, but it also
+allows objects to be attributes of any other object, even nested.
+
+
+
+Get attribute information
+ncattinq
+SDattrinfo
+
+H5Dget_info
+HDF5 has no specific function for attributes, they are treated as all other
+objects in the file.
+
+
+
+Retrieve attribute for a dataset
+ncattget
+SDreadattr
+
+H5Dread
+HDF5 uses general dataset I/O for attributes.
+
+
+
+Copy attribute from one dataset to another
+ncattcopy
+
+
+
+What is this used for?
+
+
+
+Get name of attribute
+ncattname
+SDattrinfo
+
+H5Mget_name
+
+
+
+
+Rename attribute
+ncattrename
+
+
+H5Mset_name
+
+
+
+
+Delete attribute
+ncattdel
+
+
+H5Mdelete
+This can be faked in current HDF interface with lower-level calls
+
+
+
+Compute # of bytes to store a number-type
+nctypelen
+DFKNTsize
+
+
+Hmm, the HDF5 Datatype interface needs this functionality.
+
+
+
+Indicate that fill-values are to be written to dataset
+ncsetfill
+SDsetfillmode
+
+
+HDF5 Datatype interface should work on this functionality
+
+
+
+Get information about "record" variables (Those datasets which share the
+same unlimited dimension
+ncrecinq
+
+
+
+This should probably be wrapped in a higher layer interface, if it's
+needed for HDF5.
+
+
+
+Get a record from each dataset sharing the unlimited dimension
+ncrecget
+
+
+
+This is somewhat equivalent to reading a vdata with non-interlaced
+fields, only in a dataset oriented way. This should also be wrapped in a
+higher layer interface if it's necessary for HDF5.
+
+
+
+Put a record from each dataset sharing the unlimited dimension
+ncrecput
+
+
+
+This is somewhat equivalent to writing a vdata with non-interlaced
+fields, only in a dataset oriented way. This should also be wrapped in a
+higher layer interface if it's necessary for HDF5.
+
+
+
+Map a dataset's name to an index to reference it with
+
+SDnametoindex
+
+H5Mfind_name
+Equivalent functionality except HDF5 call returns an OID instead of an
+index.
+
+
+
+Get the valid range of values for data in a dataset
+
+SDgetrange
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Release access to a dataset
+
+SDendaccess
+AIO_arr_destroy
+H5Mrelease
+Odd that the netCDF API doesn't have this...
+
+
+
+Set the valid range of data in a dataset
+
+SDsetrange
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Set the label, units, format, etc. of the data values in a dataset
+
+SDsetdatastrs
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Get the label, units, format, etc. of the data values in a dataset
+
+SDgetdatastrs
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Set the label, units, format, etc. of the dimensions in a dataset
+
+SDsetdimstrs
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Get the label, units, format, etc. of the dimensions in a dataset
+
+SDgetdimstrs
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Set the scale of the dimensions in a dataset
+
+SDsetdimscale
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Get the scale of the dimensions in a dataset
+
+SDgetdimscale
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Set the calibration parameters of the data values in a dataset
+
+SDsetcal
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Get the calibration parameters of the data values in a dataset
+
+SDgetcal
+
+
+Easily implemented with attributes at a higher level for HDF5.
+
+
+
+Set the fill value for the data values in a dataset
+
+SDsetfillvalue
+
+
+HDF5 needs something like this, I'm not certain where to put it.
+
+
+
+Get the fill value for the data values in a dataset
+
+SDgetfillvalue
+
+
+HDF5 needs something like this, I'm not certain where to put it.
+
+
+
+Move/Set the dataset to be in an 'external' file
+
+SDsetexternalfile
+
+H5Dset_storage
+HDF5 has simple functions for this, but needs an API for setting up the
+storage flow.
+
+
+
+Move/Set the dataset to be stored using only certain bits from the dataset
+
+SDsetnbitdataset
+
+H5Dset_storage
+HDF5 has simple functions for this, but needs an API for setting up the
+storage flow.
+
+
+
+Move/Set the dataset to be stored in compressed form
+
+SDsetcompress
+
+H5Dset_storage
+HDF5 has simple functions for this, but needs an API for setting up the
+storage flow.
+
+
+
+Search for an dataset attribute with particular name
+
+SDfindattr
+
+H5Mfind_name
+
H5Mwild_searchHDF5 can handle wildcard searchs for this feature.
+
+
+
+Map a run-time dataset handle to a persistant disk reference
+
+SDidtoref
+
+
+I'm not certain this is needed for HDF5.
+
+
+
+Map a persistant disk reference for a dataset to an index in a group
+
+SDreftoindex
+
+
+I'm not certain this is needed for HDF5.
+
+
+
+Determine if a dataset is a 'record' variable (i.e. it has an unlimited dimension)
+
+SDisrecord
+
+
+Easily implemented by querying the dimensionality at a higher level for HDF5.
+
+
+
+Determine if a dataset is a 'coordinate' variable (i.e. it is used as a dimension)
+
+SDiscoord
+
+
+I'm not certain this is needed for HDF5.
+
+
+
+Set the access type (i.e. parallel or serial) for dataset I/O
+
+SDsetaccesstype
+
+
+HDF5 has functions for reading the information about this, but needs a better
+API for setting up the storage flow.
+
+
+
+Set the size of blocks used to store a dataset with unlimited dimensions
+
+SDsetblocksize
+
+
+HDF5 has functions for reading the information about this, but needs a better
+API for setting up the storage flow.
+
+
+
+Sets backward compatibility of dimensions created.
+
+SDsetdimval_comp
+
+
+Unneccessary in HDF5.
+
+
+
+Checks backward compatibility of dimensions created.
+
+SDisdimval_comp
+
+
+Unneccessary in HDF5.
+
+
+
+Move/Set the dataset to be stored in chunked form
+
+SDsetchunk
+
+H5Dset_storage
+HDF5 has simple functions for this, but needs an API for setting up the
+storage flow.
+
+
+
+Get the chunking information for a dataset stored in chunked form
+
+SDgetchunkinfo
+
+H5Dstorage_detail
+
+
+
+
+Read/Write chunks of a dataset using a chunk index
+
+SDreadchunk
+
SDwritechunk
+
+I'm not certain that HDF5 needs something like this.
+
+
+
+Tune chunk caching parameters for chunked datasets
+
+SDsetchunkcache
+
+
+HDF5 needs something like this.
+
+
+
+Change some default behavior of the library
+
+
+AIO_defaults
+
+Something like this would be useful in HDF5, to tune I/O pipelines, etc.
+
+
+
+Flush and close all open files
+
+
+AIO_exit
+
+Something like this might be useful in HDF5, although it could be
+ encapsulated with a higher-level function.
+
+
+
+Target an architecture for data-type storage
+
+
+AIO_target
+
+There are some rough parallels with using the data-type in HDF5 to create
+ data-type objects which can be used to write out future datasets.
+
+
+
+Map a filename to a file ID
+
+
+AIO_filename
+H5Mget_name
+
+
+
+
+Get the active directory (where new datasets are created)
+
+
+AIO_getcwd
+
+HDF5 allows multiple directories (groups) to be attached to, any of which
+ can have new datasets created within it.
+
+
+
+Change active directory
+
+
+AIO_chdir
+
+Since HDF5 has a slightly different access method for directories (groups),
+ this functionality can be wrapped around calls to H5Gget_oid_by_name.
+
+
+
+Create directory
+
+
+AIO_mkdir
+H5Mcreate
+
+
+
+
+Return detailed information about an object
+
+
+AIO_stat
+H5Dget_info
+
H5Dstorage_detailPerhaps more information should be provided through another function in
+ HDF5?
+
+
+
+Get "flag" information
+
+
+AIO_getflags
+
+Not required in HDF5.
+
+
+
+Set "flag" information
+
+
+AIO_setflags
+
+Not required in HDF5.
+
+
+
+Get detailed information about all objects in a directory
+
+
+AIO_ls
+H5Gget_content_info_mult
+
H5Dget_info
H5Dstorage_detailOnly roughly equivalent functionality in HDF5, perhaps more should be
+ added?
+
+
+
+Get base type of object
+
+
+AIO_BASIC
+H5Gget_content_info
+
+
+
+
+Set base type of dataset
+
+
+AIO_arr_set_btype
+H5Mcreate(DATATYPE)
+
+
+
+
+Set dimensionality of dataset
+
+
+AIO_arr_set_bdims
+H5Mcreate(DATASPACE)
+
+
+
+
+Set slab of dataset to write
+
+
+AIO_arr_set_slab
+
+This is similar to the process of creating a dataspace for use when
+ performing I/O on an HDF5 dataset
+
+
+
+Describe chunking of dataset to write
+
+
+AIO_arr_set_chunk
+H5Dset_storage
+
+
+
+
+Describe array index permutation of dataset to write
+
+
+AIO_arr_set_perm
+H5Dset_storage
+
+
+
+
+Create a new dataset with dataspace and datatype information from an
+ existing dataset.
+
+
+AIO_arr_copy
+
+This can be mimicked in HDF5 by attaching to the datatype and dataspace of
+an existing dataset and using the IDs to create new datasets.
+
+
+
+Create a new directory to group objects within
+
+
+AIO_mkgroup
+H5Mcreate(GROUP)
+
+
+
+
+Read name of objects in directory
+
+
+AIO_read_group
+H5Gget_content_info_mult
+
+
+
+
+Add objects to directory
+
+
+AIO_write_group
+H5Ginsert_item_mult
+
+
+
+
+Combine an architecture and numeric type to derive the format's datatype
+
+
+AIO_COMBINE
+
+This is a nice feature to add to HDF5.
+
+
+
+Derive an architecture from the format's datatype
+
+
+AIO_ARCH
+
+This is a nice feature to add to HDF5.
+
+
+
+Derive a numeric type from the format's datatype
+
+
+AIO_PNT
+
+This is a nice feature to add to HDF5.
+
+
+
+Register error handling function for library to call when errors occur
+
+
+AIO_error_handler
+
+This should be added to HDF5.
+HDF5: Disk Format Implementation
+
+
+
+
+
+
+
+ Disk Format Implementation
+
+
+ Disk Format: Level 0 - File Signature and Boot Block
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+
+
HDF5 File Signature (8 bytes)
+
+
+ Version # of Boot Block
+ Version # of Global Free-Space Storage
+ Version # of Object Directory
+ Reserved
+
+
+
+ Version # of Shared Header Message Format
+ Size of Addresses
+ Size of Sizes
+ Reserved (zero)
+
+
+
+ Symbol Table Leaf Node K
+ Symbol Table Internal Node K
+
+
+
+ File Consistency Flags
+
+
+
+ Base Address
+
+
+
+ Address of Global Free-Space Heap
+
+
+
+ End of File Address
+
+
+
+
+ Symbol-Table Entry of the "Root Object"
+
+
+
+
+
+ Field Name
+ Description
+
+
+
+ File Signature
+ This field contains a constant value and can be used to
+ quickly identify a file as being an HDF5 file. The
+ constant value is designed to allow easy identification of
+ an HDF5 file and to allow certain types of data corruption
+ to be detected. The file signature of a HDF5 file always
+ contain the following values:
+
+
+
+
+
+
+
+ decimal
+ 137
+ 72
+ 68
+ 70
+ 13
+ 10
+ 26
+ 10
+
+
+
+ hexadecimal
+ 89
+ 48
+ 44
+ 46
+ 0d
+ 0a
+ 1a
+ 0a
+
+
+ ASCII C Notation
+ \211
+ H
+ D
+ F
+ \r
+ \n
+ \032
+ \n
+
+
+ This signature both identifies the file as a HDF5 file
+ and provides for immediate detection of common
+ file-transfer problems. The first two bytes distinguish
+ HDF5 files on systems that expect the first two bytes to
+ identify the file type uniquely. The first byte is
+ chosen as a non-ASCII value to reduce the probability
+ that a text file may be misrecognized as a HDF5 file;
+ also, it catches bad file transfers that clear bit
+ 7. Bytes two through four name the format. The CR-LF
+ sequence catches bad file transfers that alter newline
+ sequences. The control-Z character stops file display
+ under MS-DOS. The final line feed checks for the inverse
+ of the CR-LF translation problem. (This is a direct
+ descendent of the PNG file signature.)
+
+
+ Version # of the Boot Block
+ This value is used to determine the format of the
+ information in the boot block. When the format of the
+ information in the boot block is changed, the version #
+ is incremented to the next integer and can be used to
+ determine how the information in the boot block is
+ formatted.
+
+
+
+ Version # of the Global Free-Space Storage
+ This value is used to determine the format of the
+ information in the Global Free-Space Heap. Currently,
+ this is implemented as a B-tree of length/offset pairs
+ to locate free space in the file, but future advances in
+ the file-format could change the method of finding
+ global free-space. When the format of the information
+ is changed, the version # is incremented to the next
+ integer and can be used to determine how the information
+ is formatted.
+
+
+
+ Version # of the Object Directory
+ This value is used to determine the format of the
+ information in the Object Directory. When the format of
+ the information in the Object Directory is changed, the
+ version # is incremented to the next integer and can be
+ used to determine how the information in the Object
+ Directory is formatted.
+
+
+
+ Version # of the Shared Header Message Format
+ This value is used to determine the format of the
+ information in a shared object header message, which is
+ stored in the global small-data heap. Since the format
+ of the shared header messages differ from the private
+ header messages, a version # is used to identify changes
+ in the format.
+
+
+
+ Size of Addresses
+ This value contains the number of bytes used for
+ addresses in the file. The values for the addresses of
+ objects in the file are relative to a base address,
+ usually the address of the boot block signature. This
+ allows a wrapper to be added after the file is created
+ without invalidating the internal offset locations.
+
+
+
+ Size of Sizes
+ This value contains the number of bytes used to store
+ the size of an object.
+
+
+
+ Symbol Table Leaf Node K
+ Each leaf node of a symbol table B-tree will have at
+ least this many entries but not more than twice this
+ many. If a symbol table has a single leaf node then it
+ may have fewer entries.
+
+
+
+ Symbol Table Internal Node K
+ Each internal node of a symbol table B-tree will have
+ at least K pointers to other nodes but not more than 2K
+ pointers. If the symbol table has only one internal
+ node then it might have fewer than K pointers.
+
+
+
+ Bytes per B-Tree Page
+ This value contains the # of bytes used for symbol
+ pairs per page of the B-Trees used in the file. All
+ B-Tree pages will have the same size per page.
+
(For
+ 32-bit file offsets, 340 objects is the maximum per 4KB
+ page, and for 64-bit file offset, 254 objects will fit
+ per 4KB page. In general, the equation is:
<#
+ of objects> = FLOOR((<page size>-<offset
+ size>)/(<Symbol size>+<offset size>))-1 )
+
+
+ File Consistency Flags
+ This value contains flags to indicate information
+ about the consistency of the information contained
+ within the file. Currently, the following bit flags are
+ defined: bit 0 set indicates that the file is opened for
+ write-access and bit 1 set indicates that the file has
+ been verified for consistency and is guaranteed to be
+ consistent with the format defined in this document.
+ Bits 2-31 are reserved for future use. Bit 0 should be
+ set as the first action when a file is opened for write
+ access and should be cleared only as the final action
+ when closing a file. Bit 1 should be cleared during
+ normal access to a file and only set after the file's
+ consistency is guaranteed by the library or a
+ consistency utility.
+
+
+
+ Base Address
+ This is the absolute file address of the first byte of
+ the hdf5 data within the file. Unless otherwise noted,
+ all other file addresses are relative to this base
+ address.
+
+
+
+ Address of Global Free-Space Heap
+ This value contains the relative address of the B-Tree
+ used to manage the blocks of data which are unused in the
+ file currently. The free-space heap is used to manage the
+ blocks of bytes at the file-level which become unused with
+ objects are moved within the file.
+
+
+
+ End of File Address
+ This is the relative file address of the first byte past
+ the end of all HDF5 data. It is used to determine if a
+ file has been accidently truncated and as an address where
+ file memory allocation can occur if the free list is not
+ used.
+
+
+ Symbol-Table Entry of the Root Object
+ This symbol-table entry (described later in this
+ document) refers to the entry point into the group
+ graph. If the file contains a single object, then that
+ object can be the root object and no groups are used.
+ Disk Format: Level 1A - B-link Trees
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Node Signature
+
+
+ Node Type
+ Node Level
+ Entries Used
+
+
+ Address of Left Sibling
+
+
+ Address of Right Sibling
+
+
+ Key 0 (variable size)
+
+
+ Address of Child 0
+
+
+ Key 1 (variable size)
+
+
+ Address of Child 1
+
+
+ ...
+
+
+ Key 2K (variable size)
+
+
+ Address of Child 2K
+
+
+ Key 2K+1 (variable size)
+
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Node Signature
+ The value ASCII 'TREE' is used to indicate the
+ beginning of a B-link tree node. This gives file
+ consistency checking utilities a better chance of
+ reconstructing a damaged file.
+
+
+
+ Node Type
+ Each B-link tree points to a particular type of data.
+ This field indicates the type of data as well as
+ implying the maximum degree K of the tree and
+ the size of each Key field.
+
+
+
+
+
+
+
+ Node Level
+ The node level indicates the level at which this node
+ appears in the tree (leaf nodes are at level zero). Not
+ only does the level indicate whether child pointers
+ point to sub-trees or to data, but it can also be used
+ to help file consistency checking utilities reconstruct
+ damanged trees.
+
+
+
+ Entries Used
+ This determines the number of children to which this
+ node points. All nodes of a particular type of tree
+ have the same maximum degree, but most nodes will point
+ to less than that number of children. The valid child
+ pointers and keys appear at the beginning of the node
+ and the unused pointers and keys appear at the end of
+ the node. The unused pointers and keys have undefined
+ values.
+
+
+
+ Address of Left Sibling
+ This is the file address of the left sibling of the
+ current node relative to the boot block. If the current
+ node is the left-most node at this level then this field
+ is the undefined address (all bits set).
+
+
+
+ Address of Right Sibling
+ This is the file address of the right sibling of the
+ current node relative to the boot block. If the current
+ node is the right-most node at this level then this
+ field is the undefined address (all bits set).
+
+
+
+ Keys and Child Pointers
+ Each tree has 2K+1 keys with 2K
+ child pointers interleaved between the keys. The number
+ of keys and child pointers actually containing valid
+ values is determined by the `Entries Used' field. If
+ that field is N then the B-link tree contains
+ N child pointers and N+1 keys.
+
+
+
+ Key
+ The format and size of the key values is determined by
+ the type of data to which this tree points. The keys are
+ ordered and are boundaries for the contents of the child
+ pointer. That is, the key values represented by child
+ N fall between Key N and Key
+ N+1. Whether the interval is open or closed on
+ each end is determined by the type of data to which the
+ tree points.
+
+
+ Address of Children
+ The tree node contains file addresses of subtrees or
+ data depending on the node level (0 implies data
+ addresses).
+ Disk Format: Level 1B - Symbol Table
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Node Signature
+
+
+ Version Number
+ Reserved for Future Use
+ Number of Symbols
+
+
+
+
Symbol Table Entries
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Node Signature
+ The value ASCII 'SNOD' is used to indicate the
+ beginning of a symbol table node. This gives file
+ consistency checking utilities a better chance of
+ reconstructing a damaged file.
+
+
+
+ Version Number
+ The version number for the symbol table node. This
+ document describes version 1.
+
+
+
+ Number of Symbols
+ Although all symbol table nodes have the same length,
+ most contain fewer than the maximum possible number of
+ symbol entries. This field indicates how many entries
+ contain valid data. The valid entries are packed at the
+ beginning of the symbol table node while the remaining
+ entries contain undefined values.
+
+
+ Symbol Table Entries
+ Each symbol has an entry in the symbol table node.
+ The format of the entry is described below.
+
+ Disk Format: Level 1C - Symbol-Table Entry
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Name Offset (<size> bytes)
+
+
+ Object Header Address
+
+
+ Symbol-Type
+
+
+
+
Scratch-pad Space (24 bytes)
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Name Offset
+ This is the byte offset into the symbol table local
+ heap for the name of the symbol. The name is null
+ terminated.
+
+
+
+ Object Header Address
+ Every object has an object header which serves as a
+ permanent home for the object's meta data. In addition
+ to appearing in the object header, the meta data can be
+ cached in the scratch-pad space.
+
+
+
+ Symbol-Type
+ The symbol type is determined from the object header.
+ It also determines the format for the scratch-pad space.
+ The value zero indicates that no object header meta data
+ is cached in the symbol table entry.
+
+
+
+
+
+
+ Scratch-Pad Space
+ This space is used for different purposes, depending
+ on the value of the Symbol Type field. Any meta-data
+ about a dataset object represented in the scratch-pad
+ space is duplicated in the object header for that
+ dataset. Furthermore, no data is cached in the symbol
+ table entry scratch-pad space if the object header for
+ the symbol table entry has a link count greater than
+ one.
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Address of B-tree
+
+
+ Address of Name Heap
+
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Address of B-tree
+ This is the file address for the symbol table's
+ B-tree.
+
+
+ Address of Name Heap
+ This is the file address for the symbol table's local
+ heap that stores the symbol names.
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Offset to Link Value
+
+
+
+
+
+ Field Name
+ Description
+
+
+ Offset to Link Value
+ The value of a symbolic link (that is, the name of the
+ thing to which it points) is stored in the local heap.
+ This field is the 4-byte offset into the local heap for
+ the start of the link value, which is null terminated.
+ Disk Format: Level 1D - Local Heaps
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+ Heap Signature
+
+
+
+ Reserved (zero)
+
+
+
+ Data Segment Size
+
+
+
+ Offset to Head of Free-list (<size> bytes)
+
+
+ Address of Data Segment
+
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Heap Signature
+ The valid ASCII 'HEAP' is used to indicate the
+ beginning of a heap. This gives file consistency
+ checking utilities a better chance of reconstructing a
+ damaged file.
+
+
+
+ Data Segment Size
+ The total amount of disk memory allocated for the heap
+ data. This may be larger than the amount of space
+ required by the object stored in the heap. The extra
+ unused space holds a linked list of free blocks.
+
+
+
+ Offset to Head of Free-list
+ This is the offset within the heap data segment of the
+ first free block (or all 0xff bytes if there is no free
+ block). The free block contains <size> bytes that
+ are the offset of the next free chunk (or all 0xff bytes
+ if this is the last free chunk) followed by <size>
+ bytes that store the size of this free chunk.
+
+
+ Address of Data Segment
+ The data segment originally starts immediately after
+ the heap header, but if the data segment must grow as a
+ result of adding more objects, then the data segment may
+ be relocated to another part of the file.
+ Disk Format: Level 1E - Global Heap
+
+
+
+
+
+
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+ Magic Number
+
+ Version
+ Reserved
+
+
+
+
+
+ Collection Size
+
+
+
+
+
Object 1
+
+
+
+
Object 2
+
+
+
+
...
+
+
+
+
Object N
+
+
+
Object 0 (free space)
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Magic Number
+ The magic number for global heap collections are the
+ four bytes `G', `C', `O', `L'.
+
+
+
+ Version
+ Each collection has its own version number so that new
+ collections can be added to old files. This document
+ describes version zero of the collections.
+
+
+
+ Collection Data Size
+ This is the size in bytes of the entire collection
+ including this field. The default (and minimum)
+ collection size is 4096 bytes which is a typical file
+ system block size and which allows for 170 16-byte heap
+ objects plus their overhead.
+
+
+
+ Object i for positive i The
+ objects are stored in any order with no intervening unused
+ space.
+
+ Object 0
+ Object zero, when present, represents the free space in
+ the collection. Free space always appears at the end of
+ the collection. If the free space is too small to store
+ the header for object zero (described below) then the
+ header is implied.
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+ Object ID
+ Reference Count
+
+
+
+ Object Total Size
+
+
+
+
Object Data
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Object ID
+ Each object has a unique identification number within a
+ collection. The identification numbers are chosen so that
+ new objects have the smallest value possible with the
+ exception that the identifier `0' always refers to the
+ object which represents all free space within the
+ collection.
+
+
+
+ Reference Count
+ All heap objects have a reference count field. An
+ object which is referenced from some other part of the
+ file will have a positive reference count. The reference
+ count for Object zero is always zero.
+
+
+
+ Object Total Size
+ This is the total size in bytes of the object. It
+ includes all fields listed in this table.
+
+
+ Object Data
+ The object data is treated as a one-dimensional array
+ of bytes to be interpreted by the caller.
+ Disk Format: Level 1F - Free-Space
+ Index (NOT FULLY DEFINED)
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Free-Space Heap Signature
+
+ B-Tree Left-Link Offset
+
+
+
Length of Free-Block #1
+
+
Offset of Free-Block #1
+ .
+
.
.
+
+
Length of Free-Block #n
+
+
Offset of Free-Block #n
+ "High" Offset
+
+ Right-Link Offset
+
+
+
+
+
+ Disk Format: Level 2 - Data Objects
+
+
+ Disk Format: Level 2a - Data Object Headers
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Version # of Object Header
+ Alignment of Object Header Messages
+ Number of Header Messages
+
+
+ Object Reference Count
+
+
+
+
Total Object-Header Size
+
+ Header Message Type #1
+ Size of Header Message Data #1
+
+
+ Flags
+ Reserved
+
+
+ Header Message Data #1 (variable size)
+
+
+ .
+
.
.
+
+ Header Message Type #n
+ Size of Header Message Data #n
+
+
+ Flags
+ Reserved
+
+
+ Header Message Data #n (variable)
+
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Version # of the object header
+ This value is used to determine the format of the
+ information in the object header. When the format of the
+ information in the object header is changed, the version #
+ is incremented and can be used to determine how the
+ information in the object header is formatted.
+
+
+
+ Alignment of object header messages
+ This value is used to determine the byte-alignment of
+ messagesin the object header. Typically set to 4, which
+ aligns new messages on a 4-byte boundary in the object
+ header.
+
+
+
+ Number of header messages
+ This value determines the number of messages listed in
+ this object header. This provides a fast way for software
+ to prepare storage for the messages in the header.
+
+
+
+ Object Reference Count
+ This value specifies the number of references to this
+ object within the current file. References to the
+ data-object from external files are not tracked.
+
+
+
+ Total Object-Header Size
+ This value specifies the total number of bytes of header
+ message data following this length field for the current
+ message as well as any continuation data located elsewhere
+ in the file.
+
+
+
+ Header Message Type
+ The header message type specifies the type of
+ information included in the header message data following
+ the type along with a small amount of other information.
+ Bit 15 of the message type is set if the message is
+ constant (constant messages cannot be changed since they
+ may be cached in symbol table entries throughout the
+ file). The header message types for the pre-defined
+ header messages will be included in further discussion
+ below.
+
+
+
+ Size of Header Message Data
+ This value specifies the number of bytes of header
+ message data following the header message type and length
+ information for the current message.
+
+ Flags
+ This is a bit field with the following definition:
+
+
+
+
+ 0
+ 1
+ 2-7
+
+
+ Header Message Data
+ The format and length of this field is determined by the
+ header message type and size respectively. Some header
+ message types do not require any data and this information
+ can be eliminated by setting the length of the message to
+ zero.
+
+ Name: NIL
+ Type: 0x0000
+ Length: varies
+ Status: Optional, may be repeated.
+ Purpose and Description: The NIL message is used to
+ indicate a message
+ which is to be ignored when reading the header messages for a data object.
+ [Probably one which has been deleted for some reason.]
+ Format of Data: Unspecified.
+ Examples: None.
+
+
+
+ Name: Simple Data Space/a>
+
+ Type: 0x0001
+ Length: varies
+ Status: One of the Simple Data Space or
+ Data-Space messages is required (but not both) and may
+ not be repeated.
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Dimensionality
+
+ Dimension Flags
+
+ Dimension Size #1 (<size> bytes)
+
+ .
+
.
.
+ Dimension Size #n (<size> bytes)
+
+ Dimension Maximum #1 (<size> bytes)
+
+ .
+
.
.
+ Dimension Maximum #n (<size> bytes)
+
+ Permutation Index #1
+
+ .
+
.
.
+ Permutation Index #n
+
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Dimensionality
+ This value is the number of dimensions that the data
+ object has.
+
+
+
+ Dimension Flags
+ This field is used to store flags to indicate the
+ presence of parts of this message. Bit 0 (counting from
+ the right) is used to indicate that maximum dimensions are
+ present. Bit 1 is used to indicate that permutation
+ indices are present for each dimension.
+
+
+
+ Dimension Size #n (<size&rt; bytes)
+ This value is the current size of the dimension of the
+ data as stored in the file. The first dimension stored in
+ the list of dimensions is the slowest changing dimension
+ and the last dimension stored is the fastest changing
+ dimension.
+
+
+
+ Dimension Maximum #n (<size&rt; bytes)
+ This value is the maximum size of the dimension of the
+ data as stored in the file. This value may be the special
+ value <UNLIMITED> (0xffffffff) which indicates that
+ the data may expand along this dimension indefinitely. If
+ these values are not stored, the maximum value of each
+ dimension is assumed to be the same as the current size
+ value.
+
+
+ Permutation Index #n (4 bytes)
+ This value is the index permutation used to map
+ each dimension from the canonical representation to an
+ alternate axis for each dimension. If these values are
+ not stored, the first dimension stored in the list of
+ dimensions is the slowest changing dimension and the last
+ dimension stored is the fastest changing dimension.
+ Examples
+
+
+
+
+ Name: Data-Space (Fiber Bundle?)
+ Type: 0x0002
+ Length: varies
+
+ Status: One of the Simple Dimensionality or
+ Data-Space messages is required (but not both) and may
+ not be repeated.
Purpose and Description: The
+ Data-Space message describes space that the dataset is
+ mapped onto in a more comprehensive way than the Simple
+ Dimensionality message is capable of handling. The
+ data-space of a dataset encompasses the type of coordinate system
+ used to locate the dataset's elements as well as the structure and
+ regularity of the coordinate system. The data-space also
+ describes the number of dimensions which the dataset inhabits as
+ well as a possible higher dimensional space in which the dataset
+ is located within.
+
+
+ Format of Data:
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Mesh Type
+
+ Logical Dimensionality
+
+
+ [Comment: need some way to handle different orientations of the
+ dataset data-space
+ within the embedded data-space]
+
+
+ The mesh type value is broken up as follows:
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Mesh Embedding
+ Coordinate System
+ Structure
+ Regularity
+
+
+
+
+
+
+
+
+
+
+
+ All of the above grid types can be embedded within another
+ data-space.
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Embedded Dimensionality
+
+ Embedded Dimension Size #1
+
+ .
+
.
.
+ Embedded Dimension Size #n
+
+ Embedded Origin Location #1
+
+ .
+
.
.
+ Embedded Origin Location #n
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ Logical Dimension Size #1
+
+ Logical Dimension Maximum #1
+
+ .
+
.
.
+ Logical Dimension Size #n
+
+ Logical Dimension Maximum #n
+
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ # of Grid Points in Dimension #1
+
+ .
+
.
.
+ # of Grid Points in Dimension #n
+
+ Data-Type of Grid Point Locations
+
+ Location of Grid Points in Dimension #1
+
+ .
+
.
.
+ Location of Grid Points in Dimension #n
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+ # of Grid Points
+
+ Data-Type of Grid Point Locations
+
+ Grid Point Locations
+
.
.Examples:
+ Need some good examples, this is complex!
+
+
+
+ Name: Data Type
+
+ Type: 0x0003
+ Length: variable
+ Status: One required per dataset
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+ Type Class
+ Class Bit Field
+
+
+
+ Size in Bytes (4 bytes)
+
+
+
+
Properties
+
+
+
+
+ Bits
+ Meaning
+
+
+
+ 0
+ Byte Order. If zero, byte order is little-endian;
+ otherwise, byte order is big endian.
+
+
+
+ 1, 2
+ Padding type. Bit 1 is the lo_pad type and bit 2
+ is the hi_pad type. If a datum has unused bits at either
+ end, then the lo_pad or hi_pad bit is copied to those
+ locations.
+
+
+
+ 3
+ Signed. If this bit is set then the fixed-point
+ number is in 2's complement form.
+
+
+ 4-23
+ Reserved (zero).
+
+
+
+
+
+ Byte
+ Byte
+ Byte
+ Byte
+
+
+ Bit Offset
+ Bit Precision
+
+
+
+
+
+ Bits
+ Meaning
+
+
+
+ 0
+ Byte Order. If zero, byte order is little-endian;
+ otherwise, byte order is big endian.
+
+
+
+ 1, 2, 3
+ Padding type. Bit 1 is the low bits pad type, bit 2
+ is the high bits pad type, and bit 3 is the internal bits
+ pad type. If a datum has unused bits at either or between
+ the sign bit, exponent, or mantissa, then the value of bit
+ 1, 2, or 3 is copied to those locations.
+
+
+
+ 4-5
+ Normalization. The value can be 0 if there is no
+ normalization, 1 if the most significant bit of the
+ mantissa is always set (except for 0.0), and 2 if the most
+ signficant bit of the mantissa is not stored but is
+ implied to be set. The value 3 is reserved and will not
+ appear in this field.
+
+
+
+ 6-7
+ Reserved (zero).
+
+
+
+ 8-15
+ Sign. This is the bit position of the sign
+ bit.
+
+
+
+ 16-23
+ Reserved (zero).
+
+
+
+
+
+ Byte
+ Byte
+ Byte
+ Byte
+
+
+
+ Bit Offset
+ Bit Precision
+
+
+
+ Exponent Location
+ Exponent Size in Bits
+ Mantissa Location
+ Mantissa Size in Bits
+
+
+ Exponent Bias
+
+
+
+
+
+ Bits
+ Meaning
+
+
+
+ 0-15
+ Number of Members. This field contains the number
+ of members defined for the compound data type. The member
+ definitions are listed in the Properties field of the data
+ type message.
+
+
+ 15-23
+ Reserved (zero).
+
+
+
+
+
+ Byte
+ Byte
+ Byte
+ Byte
+
+
+
+
+
Name (null terminated, multiple of
+ four bytes)
+
+
+ Byte Offset of Member in Compound Instance
+
+
+
+ Dimensionality
+ reserved
+
+
+
+ Size of Dimension 0 (optional)
+
+
+
+ Size of Dimension 1 (optional)
+
+
+
+ Size of Dimension 2 (optional)
+
+
+
+ Size of Dimension 3 (optional)
+
+
+
+ Dimension Permutation
+
+
+
+
+
Member Type Message
+ Name: Reserved - Not Assigned
+ Yet
+ Type: 0x0004
+ Length: N/A
+ Status: N/A
+
+
+
+ Name: Reserved - Not Assigned
+ Yet
+ Type: 0x0005
+ Length: N/A
+ Status: N/A
+
+
+
+
+ Name: Data Storage - Compact
+
+ Type: 0x0006
+ Length: varies
+ Status: Optional, may not be repeated.
+
+ Examples:
+ [very straightforward]
+
+
+ Name: Data Storage -
+ External Data Files
+ Type: 0x0007
+ Length: varies
+ Status: Optional, may not be repeated.
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+
+
Heap Address
+
+
+ Allocated Slots
+ Used Slots
+
+
+
+ Reserved
+
+
+
+
Slot Definitions...
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Heap Address
+ This is the address of a local name heap which contains
+ the names for the external files. The name at offset zero
+ in the heap is always the empty string.
+
+
+
+ Allocated Slots
+ The total number of slots allocated in the message. Its
+ value must be at least as large as the value contained in
+ the Used Slots field.
+
+
+
+ Used Slots
+ The number of initial slots which contain valid
+ information. The remaining slots are zero filled.
+
+
+
+ Reserved
+ This field is reserved for future use.
+
+
+ Slot Definitions
+ The slot definitions are stored in order according to
+ the array addresses they represent. If more slots have
+ been allocated than what has been used then the defined
+ slots are all at the beginning of the list.
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+
+
Name Offset (<size> bytes)
+
+
+
+
File Offset (<size> bytes)
+
+
+
Size
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Name Offset (<size> bytes)
+ The byte offset within the local name heap for the name
+ of the file. File names are stored as a URL which has a
+ protocol name, a host name, a port number, and a file
+ name:
+
+ protocol:port//host/file
.
+ If the protocol is omitted then "file:" is assumed. If
+ the port number is omitted then a default port for that
+ protocol is used. If both the protocol and the port
+ number are omitted then the colon can also be omitted. If
+ the double slash and host name are omitted then
+ "localhost" is assumed. The file name is the only
+ mandatory part, and if the leading slash is missing then
+ it is relative to the application's current working
+ directory (the use of relative names is not
+ recommended).
+
+
+ File Offset (<size> bytes)
+ This is the byte offset to the start of the data in the
+ specified file. For files that contain data for a single
+ dataset this will usually be zero.
+
+
+ Size
+ This is the total number of bytes reserved in the
+ specified file for raw data storage. For a file that
+ contains exactly one complete dataset which is not
+ extendable, the size will usually be the exact size of the
+ dataset. However, by making the size larger one allows
+ HDF5 to extend the dataset. The size can be set to a value
+ larger than the entire file since HDF5 will read zeros
+ past the end of the file without failing.
+
+ Name: Data Storage - Layout
+
+ Type: 0x0008
+ Length: varies
+ Status: Required for datasets, may not be repeated.
+
+
+
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+
+
Address
+
+
+ Dimensionality
+ Layout Class
+ Reserved
+
+
+
+ Reserved (4-bytes)
+
+
+
+ Dimension 0 (4-bytes)
+
+
+
+ Dimension 1 (4-bytes)
+
+
+ ...
+
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Address
+ For contiguous storage, this is the address of the first
+ byte of storage. For chunked storage this is the address
+ of the B-tree that is used to look up the addresses of the
+ chunks.
+
+
+
+ Dimensionality
+ An array has a fixed dimensionality. This field
+ specifies the number of dimension size fields later in the
+ message.
+
+
+
+ Layout Class
+ The layout class specifies how the other fields of the
+ layout message are to be interpreted. A value of one
+ indicates contiguous storage while a value of two
+ indicates chunked storage. Other values will be defined
+ in the future.
+
+
+ Dimensions
+ For contiguous storage the dimensions define the entire
+ size of the array while for chunked storage they define
+ the size of a single chunk.
+
+ Name: Reserved - Not Assigned Yet
+ Type: 0x0009
+ Length: N/A
+ Status: N/A
+ Purpose and Description: N/A
+ Format of Data: N/A
+
+
+ Name: Reserved - Not Assigned Yet
+ Type: 0x000A
+ Length: N/A
+ Status: N/A
+ Purpose and Description: N/A
+ Format of Data: N/A
+
+
+ Name: Data Storage - Compressed
+ Type: 0x000B
+ Length: varies
+ Status: Optional, may not be repeated.
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+ Method
+ Flags
+ Client Data Size
+
+
+
+
Client Data
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Method
+ The compression method is a value between zero and 255,
+ inclusive, that is used as a index into a compression
+ method lookup table. The value zero indicates no
+ compression. The values one through 15, inclusive, are
+ reserved for methods defined by NCSA. All other values
+ are user-defined compression methods.
+
+
+
+ Flags
+ Eight bits of flags which are passed to the compression
+ algorithm. There meaning depends on the compression
+ method.
+
+
+
+ Client Data Size
+ The size in bytes of the optional Client Data
+ field.
+
+
+ Client Data
+ Additional information needed by the compression method
+ can be stored in this field. The data will be passed to
+ the compression algorithm as a void pointer.
+
+
+
+
+
+
+ 0
No compression: The blocks of data are stored in
+ their raw format.
+
+
+
+
+ 1
Deflation: This is the same algorithm used by
+ GNU gzip which is a combination Huffman and LZ77
+ dictionary encoder. The
+ libz
library version
+ 1.1.2 or later must be available.
+
+
+
+ 2
Run length encoding: Not implemented yet.
+
+
+
+
+ 3
Adaptive Huffman: Not implemented yet.
+
+
+
+
+ 4
Adaptive Arithmetic: Not implemented yet.
+
+
+
+
+ 5
LZ78 Dictionary Encoding: Not implemented yet.
+
+
+
+
+ 6
Adaptive Lempel-Ziv: Similar to Unix
+
+ compress
. Not implemented yet.
+
+
+
+ 7-15
Reserved for future use.
+
+
+
+ 16-255
User-defined.
+
+ Name: Attribute List
+ Type: 0x000C
+ Length: varies
+ Status: Optional, may be repeated.
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+ Attribute List Flags
+
+
+ # of Simple Attributes
+
+
+ Simple Attribute #1 Name Offset
+
+
+ Simple Attribute #1 Data-Type
+
+
+ Simple Attribute #1 Rank
+
+
+ Simple Attribute #1 Dim #1 Size
+
+
+ Simple Attribute #1 Dim #2 Size
+
+
+ Simple Attribute #1 Dim #3 Size
+
+
+ Simple Attribute #1 Dim #4 Size
+
+
+ Simple Attribute #1 Data Offset
+
+
+ .
+
.
.
+
+ Simple Attribute #n Name Offset
+
+
+ Simple Attribute #n Data-Type
+
+
+ Simple Attribute #n Rank
+
+
+ Simple Attribute #n Dim #1 Size
+
+
+ Simple Attribute #n Dim #2 Size
+
+
+ Simple Attribute #n Dim #3 Size
+
+
+ Simple Attribute #n Dim #4 Size
+
+
+ Simple Attribute #n Data Offset
+
+
+ # of Complex Attributes
+
+
+ Pointer to Complex Attribute #1
+
+
+ .
+
.
.
+
+ Pointer to Complex Attribute #n
+
+
+
+
+
+ Examples:
+ [Comment: need examples.]
+
+
+ Name: Object Name
+ Type: 0x000D
+ Length: varies
+ Status: Optional [required?], may not be repeated.
+ Purpose and Description: The object name is designed to be a short
+ description of the instance of the data object (the class may be a short
+ description of the "type" of the object). An object name is a sequence of
+ non-zero ('\0') ASCII characters with no other formatting included by the
+ library.
+ Format of Data:The data for the object name is just a sequence of ASCII
+ characters with no special formatting.
+
+
+ Name: Object Modification Date & Time
+ Type: 0x000E
+ Length: fixed
+ Status: Required?, may not be repeated.
+ Purpose and Description: The object modification date and time is a
+ timestamp which indicates (using ISO8601 date and time format) the last
+ modification of a data object.
+ Format of Data:
+ The date is represented as a fixed length ASCII string according to the
+ "complete calendar date representation, without hyphens" listed in the ISO8601
+ standard.
+ The time of day is represented as a fixed length ASCII string according
+ to the "complete local time of day representation, without hyphens"
+ listed in the ISO8601 standard.
+
+ Examples:
+ "February 14, 1993, 1:10pm and 30 seconds" is represented as "19930214131030" in
+ the ISO standard format.
+
+
+ Name: Shared Object Message
+ Type: 0x000F
+ Length: 4 Bytes
+ Status: Optional, may be repeated.
+
+ H5O_FLAG_SHARED
bit). The message body in the
+ object header will be that of a Shared Object message defined
+ here and not that of the pointed-to message.
+
+
+
+
+
+
+ byte
+ byte
+ byte
+ byte
+
+
+
+ Flags
+
+
+
+
Pointer
+
+
+
+
+ Field Name
+ Description
+
+
+
+ Flags
+ The Shared Message Message is a pointer to a Shared
+ Message. The actual shared message can appear in either
+ the global heap or in some other object header and this
+ field specifies which form is used. If the value is zero
+ then the actual message is the first such message in some
+ other object header; otherwise the actual message is
+ stored in the global heap.
+
+
+ Pointer
+ This field points to the actual message. The format of
+ the pointer depends on the value of the Flags field. If
+ the actual message is in the global heap then the pointer
+ is the file address of the global heap collection that
+ holds the message, and a four-byte index into that
+ collection. Otherwise the pointer is a symbol table entry
+ that points to some other object header.
+
+Name: Object Header Continuation
+Type: 0x0010
+Length: fixed
+Status: Optional, may be repeated.
+Purpose and Description: The object header continuation is the location
+in the file of more header messages for the current data object. This can be
+used when header blocks are large, or likely to change over time.
+Format of Data:
+
+
+ byte
+byte
+byte
+byte
+
+
+ Header Continuation Offset
+
+ Header Continuation Length
+
+
+
+
+
+Examples:
+ [straightforward]
+
+
+Name: Symbol Table Message
+Type: 0x0011
+Length: fixed
+Status: Required for symbol tables, may not be repeated.
+Purpose and Description: Each symbol table has a B-tree and a
+name heap which are pointed to by this message.
+Format of data:
+
+
+
+ byte
+byte
+byte
+byte
+
+
+ B-Tree Address
+
+
+ Heap Address
+
+
+
+
+
+Disk Format: Level 2b - Shared Data Object Headers
+
+
+
+
+ byte
+byte
+byte
+byte
+
+
+ Reference Count of Shared Header Message
+
+
+
Shared Object Header Message
+
+
+
+
+
+Disk Format: Level 2c - Data Object Data Storage
+
+Quincey Koziol
+Robb Matzke
+
+Last modified: Mon Jun 1 21:44:38 EDT 1998
+
+
+
diff --git a/doc/html/H5.intro.html b/doc/html/H5.intro.html
new file mode 100644
index 0000000..e7d5a50
--- /dev/null
+++ b/doc/html/H5.intro.html
@@ -0,0 +1,997 @@
+
+
+
+
+What is the HDF5 prototype?
+
+
+
Limitations of the current prototype
+
+
HDF5 files are organized in a hierarchical structure, with two primary structures: "groups" and "datasets."
+ +Working with groups and group members is similar in many ways to working with directories and files in UNIX. As with UNIX directories and files, objects in an HDF5 file are often described by giving their full path names. "/" signifies the root group. "/foo" signifies a member of the root group called "foo." "/foo/zoo" signifies a member of the group "foo," which in turn is a member of the root group.
+Any HDF5 group or dataset may have an associated attribute list. An HDF5 attribute is a user-defined HDF5 structure that provides extra information about an HDF5 object. Attributes are described in more detail below. (Note: attributes are not supported in the current prototype.)
+An HDF5 group is a structure containing zero or more HDF5 objects. A group has two parts:
+ ++
A dataset is stored in a file in two parts: a header and a data array.
+The header contains information that is needed to interpret the array portion of the dataset, as well as metadata, or pointers to metadata, that describes or annotates the dataset. Header information includes the name of the object, its dimensionality, its number-type, information about how the data itself is stored on disk, and other information used by the library to speed up access to the dataset or maintain the file's integrity.
+There are four essential classes of information in any header: name, datatype, dataspace, and storage layout:
+Name. A dataset name is a sequence of alphanumeric ASCII characters.
+Datatype
. HDF5 allows one to define many different kinds of datatypes. There are two basic categories of data types: "atomic" types and "compound" types. Atomic types are those that are not decomposed at the data type interface level, such as integers and floats. Compound types are made up of atomic types. +Atomic datatypes include integers and floating-point numbers. Each atomic type belongs to a particular class and has several properties: size, order, precision, and offset. In this introduction, we consider only a few of these properties.
+Atomic datatypes include integer, float, date and time, string, bit field, and opaque. (Note: Only integer and float classes are available in the current implementation.)
+Properties of integer types include size, order (endian-ness), and signed-ness (signed/unsigned).
+Properties of float types include the size and location of the exponent and mantissa, and the location of the sign bit.
+The datatypes that are supported in the current implementation are:
+ +A compound datatype is one in which a collection of simple datatypes are represented as a single unit, similar to a "struct" in C. The parts of a compound datatype are called members. The members of a compound datatype may be of any datatype, including another compound datatype. It is possible to read members from a compound type without reading the whole type.
+Dataspace.
A dataset dataspace describes the dimensionality of the dataset. The dimensions of a dataset can be fixed (unchanging), or they may be unlimited, which means that they are extendible (i.e. they can grow larger). +Properties of a dataspace consist of the rank (number of dimensions) of the data array, and the actual sizes of the dimensions of the array, and the maximum sizes of the dimensions of the array. For a fixed-dimension dataset, the actual size is the same as the maximum size of a dimension. When a dimension is unlimited, the maximum size is set to the value
H5S_UNLIMITED. (An example below shows how to create extendible datasets.) +A dataspace can also describe portions of a dataset, making it possible to do partial I/O (hyperslab) operations.
+Since I/O operations have two end-points, the raw data transfer functions require two dataspace arguments: one describes the application memory dataspace or subset thereof, and the other describes the file dataspace or subset thereof.
+Storage layout. The HDF5 format makes it possible to store data in a variety of ways. The default storage layout format is contiguous, meaning that data is stored in the same linear way that it is organized in memory. Two other storage layout formats are currently defined for HDF5: compact, and chunked. In the future, other storage layouts may be added.
+Compact storage is used when the amount of data is small and can be stored directly in the object header. (Note: Compact storage is not supported in this prototype.)
+Chunked storage involves dividing the dataset into equal-sized "chunks" that are stored separately. Chunking has three important benefits.
+An attribute list for an dataset or group is a listing of objects in the HDF file that are used as attributes, or metadata for the object. The attribute list is composed of two lists of objects, the first being simple attributes about the object, and the second being pointers to attribute objects. (Note: Attributes are not supported in this prototype.)
++
The current HDF5 API is implemented only in C. The API provides routines for creating HDF5 files, creating and writing groups, datasets, and their attributes to HDF5 files, and reading groups, datasets and their attributes from HDF5 files.
+All C routines on the HDF 5 library begin with a prefix of the form "H5*", where "*" is a single letter indicating the object on which the operation is to be performed:
+ +There are a number definitions and declarations that should be included with any HDF5 program. These definitions and declarations are contained in several "include" files. The main include file is hdf5.h. This file includes all of the other files that your program is likely to need. Be sure to include hdf5.h in any program that accesses HDF5.
+The HDF5 prototype currently supports simple signed and unsigned 8-bit, 16-bit, 32-bit , and 64-bit integers, and floating point numbers. The naming scheme for type definitions uses the following conventions:
+ +For example, "uint16" indicates an unsigned 16-bit integer. Datatypes that are supported in this prototype are:
+char + int8 + uint8 + int16 + uint16 + int32 + uint32 + int64 + uint64 + float32 + float64+
These datatypes are defined in the file H5public.h together with keywords used to refer to them. H5public.h is included by the file hdf5.h described earlier. These datatypes should be used whenever you declare a variable to be used with an HDF5 routine. For instance, a 32-bit floating point variable should always be declared using a declaration such as
+float32 x;
+
In this section we describe how to program some basic operations on files, including how to
+ +This programming model shows how to create a file and also how to close the file.
+The following code fragment implements the specified model. If there is a possibility that the file already exists, the user must add the flag
H5F_ACC_TRUNC to the access mode to overwrite the previous file's information. +hid_t file; /* handle */
+/*
+ * Create a new file using H5F_ACC_TRUNC access,
+ * default file creation properties, and default file
+ * access properties.
+ * Then close the file.
+ */
+file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
+status = H5Fclose(file);
+
Recall that datatypes and dimensionality (dataspace) are independent objects, which are created separately from any dataset that they might be attached to. Because of this the creation of a dataset requires, at a minimum, separate definitions of datatype, dimensionality, and dataset. Hence, to create a dataset the following steps need to be taken:
++
The following code illustrates the creation of these three components of a dataset object.
+hid_t dataset, datatype, dataspace; /* declare handles */
+
+/*
+ * 1. Create dataspace: Describe the size of the array and
+ * create the data space for fixed size dataset.
+ */
+dimsf[0] = NX;
+dimsf[1] = NY;
+dataspace = H5Screate_simple(RANK, dimsf, NULL);
+/*
+/*
+ * 2. Define datatype for the data in the file.
+ * We will store little endian INT32 numbers.
+ */
+datatype = H5Tcopy(H5T_NATIVE_INT32);
+status = H5Tset_order(datatype, H5T_ORDER_LE);
+/*
+ * 3. Create a new dataset within the file using defined
+ * dataspace and datatype and default dataset creation
+ * properties.
+ * NOTE: H5T_NATIVE_INT32 can be used as datatype if conversion
+ * to little endian is not needed.
+ */
+dataset = H5Dcreate(file, DATASETNAME, datatype, dataspace, H5P_DEFAULT);
+
+
+
The type, dataspace and dataset objects should be released once they are no longer needed by a program. Since each is an independent object, the must be released ("closed") separately. The following lines of code close the type, dataspace, datasets, and file that were created in the preceding section.
+H5Tclose(datatype);
+H5Dclose(dataset);
+H5Sclose(dataspace);
+
+
+
Having defined the datatype, dataset, and dataspace parameters, you write out the data with a call to
H5Dwrite. +/*
+ * Write the data to the dataset using default transfer
+ * properties.
+ */
+status = H5Dwrite(dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,
+ H5P_DEFAULT, data);
+
The third and fourth parameters of
H5Dwrite in the example describe the dataspaces in memory and in the file, respectively. They are set to the value H5S_ALL to indicate that an entire dataset is to be written. In a later section we look at how we would access a portion of a dataset. +Example 1 contains a program that creates a file and a dataset, and writes the dataset to the file.
+Reading is analogous to writing. If, in the previous example, we wish to read an entire dataset, we would use the same basic calls with the same parameters. Of course, the routine H5Dread would replace H5Dwrite.
Although reading is analogous to writing, it is often necessary to query a file to obtain information about a dataset. For instance, we often need to know about the datatype associated with a dataset, as well dataspace information (e.g. rank and dimensions). There are several "get" routines for obtaining this information The following code segment illustrates how we would get this kind of information:
+/*
+ * Get datatype and dataspace handles and then query
+ * dataset class, order, size, rank and dimensions.
+ */
+
+datatype = H5Dget_type(dataset); /* datatype handle */
+class = H5Tget_class(datatype);
+if (class == H5T_INTEGER) printf("Data set has INTEGER type \n");
+order = H5Tget_order(datatype);
+if (order == H5T_ORDER_LE) printf("Little endian order \n");
+
+size = H5Tget_size(datatype);
+printf(" Data size is %d \n", size);
+
+dataspace = H5Dget_space(dataset); /* dataspace handle */
+rank = H5Sget_ndims(dataspace);
+status_n = H5Sget_dims(dataspace, dims_out, NULL);
+printf("rank %d, dimensions %d x %d \n", rank, dims_out[0], dims_out[1]);
+
In the previous discussion, we describe how to access an entire dataset with one write (or read) operation. To read or write a portion of a dataset, we need to provide more contextual information.
+Consider the following example. Suppose there is 500x600 dataset in a file, and we wish to read from the dataset a 100x200 hyperslab located beginning at element <200,200>. In addition, suppose we wish to read the hyperslab into an 200x400 array in memory beginning at element <0,0> in memory. Visually, the transfer looks something like this:
++
As the example illustrates, whenever we read part of a dataset from a file we must provide two dataspaces: the dataspace of the object in the file as well as the dataspace of the object in memory into which we read. There are dataspace routines (
H5S...) for doing this. +For example, suppose we want to read a 3x4 hyperslab from a dataset in a file beginning at the element <1,2> in the dataset. In order to do this, we must create a dataspace that describes the overall rank and dimensions of the dataset in the file, as well as the position and size of the hyperslab that we are extracting from that dataset. The following code illustrates how this would be done.
+/*
+ * Get overall rank and dimensions of dataspace.
+ */
+dataspace = H5Dget_space(dataset); /* get dataspace handle */
+rank = H5Sget_ndims(dataspace);
+status_n = H5Sget_dims(dataspace, dims_out, NULL);
+
+/*
+ * Define hyperslab in the dataset.
+ */
+offset[0] = 1;
+offset[1] = 2;
+count[0] = 3;
+count[1] = 4;
+status = H5Sset_hyperslab(dataspace, offset, count, NULL);
+
This describes the dataspace from which we wish to read. We need to define the dataspace in memory analogously. Suppose, for instance, that we have in memory a 3 dimensional 7x7x3 array into which we wish to read the 3x4 hyperslab described above beginning at the element <3,0,0>. Since the in-memory dataspace has three dimensions, we have to describe the hyperslab as an array with three dimensions, with the last dimension being 1: <3,4,1>.
+Notice that now we must describe two things: the dimensions of the in-memory array, and the size and position of the hyperslab that we wish to read in. The following code illustrates how this would be done.
+/*
+ * Define the memory dataspace.
+ */
+dimsm[0] = 7;
+dimsm[1] = 7;
+dimsm[2] = 3;
+memspace = H5Screate_simple(RANK_OUT,dimsm,NULL);
+
+/*
+ * Define memory hyperslab.
+ */
+offset_out[0] = 3;
+offset_out[1] = 0;
+offset_out[2] = 0;
+count_out[0] = 3;
+count_out[1] = 4;
+count_out[2] = 1;
+status = H5Sset_hyperslab(memspace, offset_out, count_out, NULL);
+
+/*
+
Example 2 contains a complete program that performs these operations.
+Properties of compound datatypes.A compound datatype is similar to a struct in C or a common block in Fortran. It is a collection of one or more atomic types or small arrays of such types. To create and use of a compound datatype requires you need to refer to various properties of the data compound datatype:
+ +Properties of members of a compound data type are defined when the member is added to the compound type and cannot be subsequently modified.
+Defining compound datatypes.
+Compound datatypes must be built out of other datatypes. First, one creates an empty compound data type and specifies its total size. Then members are added to the compound data type in any order.
+Member names. Each member must have a descriptive name, which is the key used to uniquely identify the member within the compound data type. A member name in an HDF5 data type does not necessarily have to be the same as the name of the corresponding member in the C struct in memory, although this is often the case. Nor does one need to define all members of the C struct in the HDF5 compound data type (or vice versa).
+Offsets. Usually a C struct will be defined to hold a data point in memory, and the offsets of the members in memory will be the offsets of the struct members from the beginning of an instance of the struct. The library defines two macros to compute the offset of a member within a struct (The only difference between the two is that one uses
s.m
as the struct member while the other uses p->m
):
+HOFFSET(s,m).
This macro computes the offset of member m within a struct variable s.
+HPOFFSET(p,m).
This macro computes the offset of member m from a pointer to a struct p.
+Here is an example in which a compound data type is created to describe complex numbers whose type is defined by the
complex_t
struct.
+typedef struct {
+ double re; /*real part */
+ double im; /*imaginary part */
+} complex_t;
+
+complex_t tmp; /*used only to compute offsets */
+hid_t complex_id = H5Tcreate (H5T_COMPOUND, sizeof tmp);
+H5Tinsert (complex_id, "real", HOFFSET(tmp,re),
+ H5T_NATIVE_DOUBLE);
+H5Tinsert (complex_id, "imaginary", HOFFSET(tmp,im),
+ H5T_NATIVE_DOUBLE);
+
+Example 3 shows how to create a compound data type, + write an array that has the compound data type to the file, and read back subsets of the members.
++
An extendible dataset is one whose dimensions can grow. In HDF5, it is possible to define a dataset to have certain initial dimensions, then later to increase the size of any of the initial dimensions.
+For example, you can create and store the following 3x3 HDF5 dataset:
+1 1 1
+1 1 1
+1 1 1
+
+
+then later to extend this into a 10x3 dataset by adding 7 rows, such as this:
+1 1 1
+1 1 1
+1 1 1
+2 2 2
+2 2 2
+2 2 2
+2 2 2
+2 2 2
+2 2 2
+2 2 2
+
+
+then further extend it to a 10x5 dataset by adding two columns, such as this:
+1 1 1 3 3
+1 1 1 3 3
+1 1 1 3 3
+2 2 2 3 3
+2 2 2 3 3
+2 2 2 3 3
+2 2 2 3 3
+2 2 2 3 3
+2 2 2 3 3
+2 2 2 3 3
+
+
+The current version of HDF 5 requires you to use chunking in order to define extendible datasets. Chunking makes it possible to extend datasets efficiently, without having to reorganize storage excessively.
+Three operations are required in order to write an extendible dataset:
+For example, suppose we wish to create a dataset similar to the one shown above. We want to start with a 3x3 dataset, then later extend it in both directions.
+Declaring unlimited dimensions. We could declare the dataspace to have unlimited dimensions with the following code, which uses the predefined constant H5S_UNLIMITED to specify unlimited dimensions.
+hsize_t dims[2] = { 3, 3}; /* dataset dimensions at the creation time */
+hsize_t maxdims[2] = {H5S_UNLIMITED, H5S_UNLIMITED};
+
+/*
+* 1. Create the data space with unlimited dimensions.
+*/
+dataspace = H5Screate_simple(RANK, dims, maxdims);
+
+
+Enabling chunking. We can then modify the dataset storage layout properties to
+ enable chunking. We do this using the routine H5Pset_chunk:
+ hid_t cparms; hsize_t chunk_dims[2] ={2, 5}; /* * 2. Modify dataset creation properties to enable chunking. */ cparms = H5Pcreate (H5P_DATASET_CREATE); status = H5Pset_chunk( cparms, RANK, chunk_dims);
/*
+* Extend the dataset. Dataset becomes 10 x 3.
+*/
+dims[0] = dims[0] + 7;
+size[0] = dims[0];
+size[1] = dims[1];
+status = H5Dextend (dataset, size);
+
+
+Example 4 shows how to create a 3x3 extendible dataset, to extend the dataset to 10x3, then to extend it again to 10x5.
+Groups provide a mechanism for organizing datasets in an HDF5 file extendable meaningful ways. The H5G API contains routines for working with groups.
+To create a group, use H5Gcreate>. For example, the following code creates two groups that are members of the root group. They are called "/IntData" and "/FloatData." The return value ("dir") is the group ID. +/*
+ * Create two groups in a file.
+ */
+dir = H5Gcreate(file, "/IntData", 0);
+status = H5Gclose(dir);
+dir = H5Gcreate(file,"/FloatData", 0);
+status = H5Gclose(dir);
+The third parameter in+H5Gcreate
optionally specifies how much file space to reserve to store the names that will appear in this group. If a non-positive value is supplied then a default size is chosen. +H5Gclose
closes the group and releases the group ID.+
Creating an object in a particular group. Except for single-object HDF5 files, every object in an HDF5 file must belong to a group, and hence has a path name. Hence, we put an object in a particular group by giving its path name when we create it. For example, the following code creates a dataset "IntArray" in the group "/IntData":
+
/* + * Create dataset in the /IntData group by specifying full path. + */ +dims[0] = 2; +dims[1] = 3; +dataspace = H5Screate_simple(2, dims, NULL); +dataset = H5Dcreate(file, "/IntData/IntArray", H5T_NATIVE_INT, dataspace, H5P_DEFAULT);+Changing the current group. The HDF5 Group API supports the idea of a "current," group. This is analogous to the "current working directory" idea in UNIX. You can set the current group in HDF5 with the routine H5Gset. The following code shows how to set a current group, then create a certain dataset ("FloatData") in that group.
++Example 5 shows how to create an HDF5 file with two group, and to place some datasets within those groups. +
/* + * Set current group to /FloatData. + */ +status = H5Gset (file, "/FloatData"); + +/* + * Create two datasets + */ + +dims[0] = 5; +dims[1] = 10; +dataspace = H5Screate_simple(2, dims, NULL); +dataset = H5Dcreate(file, "FloatArray", H5T_NATIVE_FLOAT, dataspace, H5P_DEFAULT);+Example code
+Example 1: How to create a homogeneous multi-dimensional dataset and write it to a file.
+This example creates a 2-dimensional HDF 5 dataset of little endian 32-bit integers.
+
+* This example writes data to HDF5 file.
+* Data conversion is performed during write operation.
+*/
+#include "hdf5.h"
+#define FILE "SDS.h5"
+#define DATASETNAME "IntArray"
+#define NX 5 /* dataset dimensions */
+#define NY 6
+#define RANK 2
+main ()
+{
+hid_t file, dataset; /* file and dataset handles */
+hid_t datatype, dataspace; /* handles */
+hsize_t dimsf[2]; /* dataset dimensions */
+herr_t status;
+int32 data[NX][NY]; /* data to write */
+int i, j;
+/*
+* Data and output buffer initialization.
+*/
+for (j = 0; j < NX; j++) {
+for (i = 0; i < NY; i++)
+data[j][i] = i + j;
+}
+/* 0 1 2 3 4 5
+1 2 3 4 5 6
+2 3 4 5 6 7
+3 4 5 6 7 8
+4 5 6 7 8 9 */
+/*
+* Create a new file using H5F_ACC_TRUNC access,
+* default file creation properties, and default file
+* access properties.
+*/
+file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
+/*
+* Describe the size of the array and create the data space for fixed
+* size dataset.
+*/
+dimsf[0] = NX;
+dimsf[1] = NY;
+dataspace = H5Screate_simple(RANK, dimsf, NULL);
+/*
+* Define datatype for the data in the file.
+* We will store little endian INT32 numbers.
+*/
+datatype = H5Tcopy(H5T_NATIVE_INT32);
+status = H5Tset_order(datatype, H5T_ORDER_LE);
+/*
+* Create a new dataset within the file using defined dataspace and
+* datatype and default dataset creation properties.
+*/
+dataset = H5Dcreate(file, DATASETNAME, datatype, dataspace,
+H5P_DEFAULT);
+/*
+* Write the data to the dataset using default transfer properties.
+*/
+status = H5Dwrite(dataset, H5T_NATIVE_INT32, H5S_ALL, H5S_ALL,
+H5P_DEFAULT, data);
+/*
+* Close/release resources.
+*/
+H5Sclose(dataspace);
+H5Tclose(datatype);
+H5Dclose(dataset);
+H5Fclose(file);
+}
++
This example reads a hyperslab from a 2-d HDF5 dataset into a 3-d dataset in memory.
+/*
+* This example reads hyperslab from the SDS.h5 file
+* created by h5_write.c program into two-dimensional
+* plane of the tree-dimensional array.
+* Information about dataset in the SDS.h5 file is obtained.
+*/
+#include "hdf5.h"
+#define FILE "SDS.h5"
+#define DATASETNAME "IntArray"
+#define NX_SUB 3 /* hyperslab dimensions */
+#define NY_SUB 4
+#define NX 7 /* output buffer dimensions */
+#define NY 7
+#define NZ 3
+#define RANK 2
+#define RANK_OUT 3
+main ()
+{
+hid_t file, dataset; /* handles */
+hid_t datatype, dataspace;
+hid_t memspace;
+H5T_class_t class; /* data type class */
+H5T_order_t order; /* data order */
+size_t size; /* size of the data element
+stored in file */
+hsize_t dimsm[3]; /* memory space dimensions */
+hsize_t dims_out[2]; /* dataset dimensions */
+herr_t status;
+int data_out[NX][NY][NZ ]; /* output buffer */
+hsize_t count[2]; /* size of the hyperslab in the file */
+hssize_t offset[2]; /* hyperslab offset in the file */
+hsize_t count_out[3]; /* size of the hyperslab in memory */
+hssize_t offset_out[3]; /* hyperslab offset in memory */
+int i, j, k, status_n, rank;
+for (j = 0; j < NX; j++) {
+for (i = 0; i < NY; i++) {
+for (k = 0; k < NZ ; k++)
+data_out[j][i][k] = 0;
+}
+}
+/*
+* Open the file and the dataset.
+*/
+file = H5Fopen(FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
+dataset = H5Dopen(file, DATASETNAME);
+/*
+* Get datatype and dataspace handles and then query
+* dataset class, order, size, rank and dimensions.
+*/
+datatype = H5Dget_type(dataset); /* datatype handle */
+class = H5Tget_class(datatype);
+if (class == H5T_INTEGER) printf("Data set has INTEGER type \n");
+order = H5Tget_order(datatype);
+if (order == H5T_ORDER_LE) printf("Little endian order \n");
+size = H5Tget_size(datatype);
+printf(" Data size is %d \n", size);
+dataspace = H5Dget_space(dataset); /* dataspace handle */
+rank = H5Sget_ndims(dataspace);
+status_n = H5Sget_dims(dataspace, dims_out, NULL);
+printf("rank %d, dimensions %d x %d \n", rank, dims_out[0], dims_out[1]);
+/*
+* Define hyperslab in the dataset.
+*/
+offset[0] = 1;
+offset[1] = 2;
+count[0] = NX_SUB;
+count[1] = NY_SUB;
+status = H5Sset_hyperslab(dataspace, offset, count, NULL);
+/*
+* Define the memory dataspace.
+*/
+dimsm[0] = NX;
+dimsm[1] = NY;
+dimsm[2] = NZ ;
+memspace = H5Screate_simple(RANK_OUT,dimsm,NULL);
+/*
+* Define memory hyperslab.
+*/
+offset_out[0] = 3;
+offset_out[1] = 0;
+offset_out[2] = 0;
+count_out[0] = NX_SUB;
+count_out[1] = NY_SUB;
+count_out[2] = 1;
+status = H5Sset_hyperslab(memspace, offset_out, count_out, NULL);
+/*
+* Read data from hyperslab in the file into the hyperslab in
+* memory and display.
+*/
+status = H5Dread(dataset, H5T_NATIVE_INT, memspace, dataspace,
+H5P_DEFAULT, data_out);
+for (j = 0; j < NX; j++) {
+for (i = 0; i < NY; i++) printf("%d ", data_out[j][i][0]);
+printf("\n");
+}
+/* 0 0 0 0 0 0 0
+0 0 0 0 0 0 0
+0 0 0 0 0 0 0
+3 4 5 6 0 0 0
+4 5 6 7 0 0 0
+5 6 7 8 0 0 0
+0 0 0 0 0 0 0 */
+/*
+* Close/release resources.
+*/
+H5Tclose(datatype);
+H5Dclose(dataset);
+H5Sclose(dataspace);
+H5Sclose(memspace);
+H5Fclose(file);
+}
+
++
This example shows how to create a compound data type, write an array which has the compound data type to the file, and read back subsets of fields.
+/*
+* This example shows how to create a compound data type,
+* write an array which has the compound data type to the file,
+* and read back fields' subsets.
+*/
+#include "hdf5.h"
+#define FILE "SDScompound.h5"
+#define DATASETNAME "ArrayOfStructures"
+#define LENGTH 10
+#define RANK 1
+main()
+{
+/* First structure and dataset*/
+typedef struct s1_t {
+int a;
+float b;
+double c;
+} s1_t;
+s1_t s1[LENGTH];
+hid_t s1_tid; /* File datatype handle */
+/* Second structure (subset of s1_t) and dataset*/
+typedef struct s2_t {
+double c;
+int a;
+} s2_t;
+s2_t s2[LENGTH];
+hid_t s2_tid; /* Memory datatype handle */
+/* Third "structure" ( will be used to read float field of s1) */
+hid_t s3_tid; /* Memory datatype handle */
+float s3[LENGTH];
+int i;
+hid_t file, datatype, dataset, space; /* Handles */
+herr_t status;
+hsize_t dim[] = {LENGTH}; /* Dataspace dimensions */
+H5T_class_t class;
+size_t size;
+/*
+* Initialize the data
+*/
+for (i = 0; i< LENGTH; i++) {
+s1[i].a = i;
+s1[i].b = i*i;
+s1[i].c = 1./(i+1);
+}
+/*
+* Create the data space.
+*/
+space = H5Screate_simple(RANK, dim, NULL);
+/*
+* Create the file.
+*/
+file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
+/*
+* Create the memory data type.
+*/
+s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t));
+status = H5Tinsert(s1_tid, "a_name", HPOFFSET(s1, a), H5T_NATIVE_INT);
+status = H5Tinsert(s1_tid, "c_name", HPOFFSET(s1, c), H5T_NATIVE_DOUBLE);
+status = H5Tinsert(s1_tid, "b_name", HPOFFSET(s1, b), H5T_NATIVE_FLOAT);
+/*
+* Create the dataset.
+*/
+dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT);
+/*
+* Write data to the dataset;
+*/
+status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1);
+/*
+* Release resources
+*/
+H5Tclose(s1_tid);
+H5Sclose(space);
+H5Dclose(dataset);
+H5Fclose(file);
+/*
+* Open the file and the dataset.
+*/
+file = H5Fopen(FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
+dataset = H5Dopen(file, DATASETNAME);
+/*
+* Create a data type for s2
+*/
+s2_tid = H5Tcreate(H5T_COMPOUND, sizeof(s2_t));
+status = H5Tinsert(s2_tid, "c_name", HPOFFSET(s2, c), H5T_NATIVE_DOUBLE);
+status = H5Tinsert(s2_tid, "a_name", HPOFFSET(s2, a), H5T_NATIVE_INT);
+/*
+* Read two fields c and a from s1 dataset. Fields in the file
+* are found by their names "c_name" and "a_name".
+*/
+status = H5Dread(dataset, s2_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s2);
+/*
+* Display the fields
+*/
+printf("\n");
+printf("Field c : \n");
+for( i = 0; i < LENGTH; i++) printf("%.4f ", s2[i].c);
+printf("\n");
+printf("\n");
+printf("Field a : \n");
+for( i = 0; i < LENGTH; i++) printf("%d ", s2[i].a);
+printf("\n");
+/*
+* Create a data type for s3.
+*/
+s3_tid = H5Tcreate(H5T_COMPOUND, sizeof(float));
+status = H5Tinsert(s3_tid, "b_name", 0, H5T_NATIVE_FLOAT);
+/*
+* Read field b from s1 dataset. Field in the file is found by its name.
+*/
+status = H5Dread(dataset, s3_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s3);
+/*
+* Display the field
+*/
+printf("\n");
+printf("Field b : \n");
+for( i = 0; i < LENGTH; i++) printf("%.4f ", s3[i]);
+printf("\n");
+/*
+* Release resources
+*/
+H5Tclose(s2_tid);
+H5Tclose(s3_tid);
+H5Dclose(dataset);
+H5Sclose(space);
+H5Fclose(file);
+}
+
++
This example shows how to create a 3x3 extendible dataset, to extend the dataset to 10x3, then to extend it again to 10x5.
+/*
+* This example shows how to work with extendible dataset.
+* In the current version of the library dataset MUST be
+* chunked.
+*
+*/
+#include "hdf5.h"
+#define FILE "SDSextendible.h5"
+#define DATASETNAME "ExtendibleArray"
+#define RANK 2
+#define NX 10
+#define NY 5
+main ()
+{
+hid_t file; /* handles */
+hid_t datatype, dataspace, dataset;
+hid_t filespace;
+hid_t cparms;
+hsize_t dims[2] = { 3, 3}; /* dataset dimensions
+at the creation time */
+hsize_t dims1[2] = { 3, 3}; /* data1 dimensions */
+hsize_t dims2[2] = { 7, 1}; /* data2 dimensions */
+hsize_t dims3[2] = { 2, 2}; /* data3 dimensions */
+hsize_t maxdims[2] = {H5S_UNLIMITED, H5S_UNLIMITED};
+hsize_t chunk_dims[2] ={2, 5};
+hsize_t size[2];
+hssize_t offset[2];
+herr_t status;
+int data1[3][3] = { 1, 1, 1, /* data to write */
+1, 1, 1,
+1, 1, 1 };
+int data2[7] = { 2, 2, 2, 2, 2, 2, 2};
+int data3[2][2] = { 3, 3,
+3, 3};
+/*
+* Create the data space with unlimited dimensions.
+*/
+dataspace = H5Screate_simple(RANK, dims, maxdims);
+/*
+* Create a new file. If file exists its contents will be overwritten.
+*/
+file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
+/*
+* Modify dataset creation properties, i.e. enable chunking.
+*/
+cparms = H5Pcreate (H5P_DATASET_CREATE);
+status = H5Pset_chunk( cparms, RANK, chunk_dims);
+/*
+* Create a new dataset within the file using cparms
+* creation properties.
+*/
+dataset = H5Dcreate(file, DATASETNAME, H5T_NATIVE_INT, dataspace,
+cparms);
+/*
+* Extend the dataset. This call assures that dataset is at least 3 x 3.
+*/
+size[0] = 3;
+size[1] = 3;
+status = H5Dextend (dataset, size);
+/*
+* Select a hyperslab.
+*/
+filespace = H5Dget_space (dataset);
+offset[0] = 0;
+offset[1] = 0;
+status = H5Sset_hyperslab(filespace, offset, dims1, NULL);
+/*
+* Write the data to the hyperslab.
+*/
+status = H5Dwrite(dataset, H5T_NATIVE_INT, dataspace, filespace,
+H5P_DEFAULT, data1);
+/*
+* Extend the dataset. Dataset becomes 10 x 3.
+*/
+dims[0] = dims1[0] + dims2[0];
+size[0] = dims[0];
+size[1] = dims[1];
+status = H5Dextend (dataset, size);
+/*
+* Select a hyperslab.
+*/
+filespace = H5Dget_space (dataset);
+offset[0] = 3;
+offset[1] = 0;
+status = H5Sset_hyperslab(filespace, offset, dims2, NULL);
+/*
+* Define memory space
+*/
+dataspace = H5Screate_simple(RANK, dims2, NULL);
+/*
+* Write the data to the hyperslab.
+*/
+status = H5Dwrite(dataset, H5T_NATIVE_INT, dataspace, filespace,
+H5P_DEFAULT, data2);
+/*
+* Extend the dataset. Dataset becomes 10 x 5.
+*/
+dims[1] = dims1[1] + dims3[1];
+size[0] = dims[0];
+size[1] = dims[1];
+status = H5Dextend (dataset, size);
+/*
+* Select a hyperslab
+*/
+filespace = H5Dget_space (dataset);
+offset[0] = 0;
+offset[1] = 3;
+status = H5Sset_hyperslab(filespace, offset, dims3, NULL);
+/*
+* Define memory space.
+*/
+dataspace = H5Screate_simple(RANK, dims3, NULL);
+/*
+* Write the data to the hyperslab.
+*/
+status = H5Dwrite(dataset, H5T_NATIVE_INT, dataspace, filespace,
+H5P_DEFAULT, data3);
+/*
+* Resulting dataset
+*
+3 3 3 2 2
+3 3 3 2 2
+3 3 3 0 0
+2 0 0 0 0
+2 0 0 0 0
+2 0 0 0 0
+2 0 0 0 0
+2 0 0 0 0
+2 0 0 0 0
+2 0 0 0 0
+*/
+/*
+* Close/release resources.
+*/
+H5Dclose(dataset);
+H5Sclose(dataspace);
+H5Sclose(filespace);
+H5Fclose(file);
+}
+
++
This example shows how to create an HDF5 file with two groups, and to place some datasets within those groups.
+/*
+* This example shows how to create groups within the file and
+* datasets within the file and groups.
+*/
+
+#include "hdf5.h"
+
+#define FILE "DIR.h5"
+#define RANK 2
+main()
+{
+hid_t file, dir;
+hid_t dataset, dataspace;
+herr_t status;
+hsize_t dims[2];
+hsize_t size[1];
+/*
+* Create a file.
+*/
+file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
+/*
+* Create two groups in a file.
+*/
+dir = H5Gcreate(file, "/IntData", 0);
+status = H5Gclose(dir);
+dir = H5Gcreate(file,"/FloatData", 0);
+status = H5Gclose(dir);
+/*
+* Create dataspace for the character string
+*/
+size[0] = 80;
+dataspace = H5Screate_simple(1, size, NULL);
+/*
+* Create dataset "String" in the root group.
+*/
+dataset = H5Dcreate(file, "String", H5T_NATIVE_CHAR, dataspace, H5P_DEFAULT);
+H5Dclose(dataset);
+/*
+* Create dataset "String" in the /IntData group.
+*/
+dataset = H5Dcreate(file, "/IntData/String", H5T_NATIVE_CHAR, dataspace,
+H5P_DEFAULT);
+H5Dclose(dataset);
+/*
+* Create dataset "String" in the /FloatData group.
+*/
+dataset = H5Dcreate(file, "/FloatData/String", H5T_NATIVE_CHAR, dataspace,
+H5P_DEFAULT);
+H5Sclose(dataspace);
+H5Dclose(dataset);
+/*
+* Create IntArray dataset in the /IntData group by specifying full path.
+*/
+dims[0] = 2;
+dims[1] = 3;
+dataspace = H5Screate_simple(RANK, dims, NULL);
+dataset = H5Dcreate(file, "/IntData/IntArray", H5T_NATIVE_INT, dataspace,
+H5P_DEFAULT);
+H5Sclose(dataspace);
+H5Dclose(dataset);
+/*
+* Set current group to /IntData and attach to the dataset String.
+*/
+status = H5Gset (file, "/IntData");
+dataset = H5Dopen(file, "String");
+if (dataset > 0) printf("String dataset in /IntData group is found\n");
+H5Dclose(dataset);
+/*
+* Set current group to /FloatData.
+*/
+status = H5Gset (file, "/FloatData");
+/*
+* Create two datasets FlatArray and DoubleArray.
+*/
+dims[0] = 5;
+dims[1] = 10;
+dataspace = H5Screate_simple(RANK, dims, NULL);
+dataset = H5Dcreate(file, "FloatArray", H5T_NATIVE_FLOAT, dataspace, H5P_DEFAULT);
+H5Sclose(dataspace);
+H5Dclose(dataset);
+dims[0] = 4;
+dims[1] = 6;
+dataspace = H5Screate_simple(RANK, dims, NULL);
+dataset = H5Dcreate(file, "DoubleArray", H5T_NATIVE_DOUBLE, dataspace,
+H5P_DEFAULT);
+H5Sclose(dataspace);
+H5Dclose(dataset);
+/*
+* Attach to /FloatData/String dataset.
+*/
+dataset = H5Dopen(file, "/FloatData/String");
+if (dataset > 0) printf("/FloatData/String dataset is found\n");
+H5Dclose(dataset);
+H5Fclose(file);
+}
+
+
diff --git a/doc/html/H5.sample_code.html b/doc/html/H5.sample_code.html
new file mode 100644
index 0000000..b3e5336
--- /dev/null
+++ b/doc/html/H5.sample_code.html
@@ -0,0 +1,123 @@
+Example programs/sections of code below: +
Notes:
+This example creates a new HDF5 file and allows write access.
+If the file exists already, the H5F_ACC_TRUNC flag would also be necessary to
+overwrite the previous file's information.
+
+
Code:
+
+
+
+
+ hid_t file_id;
+
+ file_id=H5Fcreate("example1.h5",H5F_ACC_EXCL,H5P_DEFAULT_TEMPLATE,H5P_DEFAULT_TEMPLATE);
+
+ H5Fclose(file_id);
+
+
Notes:
+This example creates a 4-dimensional dataset of 32-bit floating-point
+numbers, corresponding to the current Scientific Dataset functionality.
+
+
Code:
+
+
+
+
+ 1 hid_t file_id; /* new file's ID */
+ 2 hid_t dim_id; /* new dimensionality's ID */
+ 3 int rank=4; /* the number of dimensions */
+ 4 hsize_t dims[4]={6,5,4,3}; /* the size of each dimension */
+ 5 hid_t dataset_id; /* new dataset's ID */
+ 6 float buf[6][5][4][3]; /* storage for the dataset's data */
+ 7 herr_t status; /* function return status */
+ 8
+ 9 file_id = H5Fcreate ("example3.h5", H5F_ACC_TRUNC, H5P_DEFAULT,
+10 H5P_DEFAULT);
+11 assert (file_id >= 0);
+12
+13 /* Create & initialize a dimensionality object */
+14 dim_id = H5Screate_simple (rank, dims);
+15 assert (dim_id >= 0);
+16
+17 /* Create & initialize the dataset object */
+18 dataset_id = H5Dcreate (file_id, "Simple Object", H5T_NATIVE_FLOAT,
+19 dim_id, H5P_DEFAULT);
+20 assert (dataset_id >= 0);
+21
+22 <initialize data array>
+23
+24 /* Write the entire dataset out */
+25 status = H5Dwrite (dataset_id, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL,
+26 H5P_DEFAULT, buf);
+27 assert (status >= 0);
+28
+29 /* Release the IDs we've created */
+30 H5Sclose (dim_id);
+31 H5Dclose (dataset_id);
+32 H5Fclose (file_id);
+
Notes:
+This example shows how to get the information for and display a generic
+dataset.
+
+
Code:
+
+
diff --git a/doc/html/H5.user.html b/doc/html/H5.user.html
new file mode 100644
index 0000000..3c16553
--- /dev/null
+++ b/doc/html/H5.user.html
@@ -0,0 +1,71 @@
+
+
+ 1 hid_t file_id; /* file's ID */
+ 2 hid_t dataset_id; /* dataset's ID in memory */
+ 3 hid_t space_id; /* dataspace's ID in memory */
+ 4 uintn nelems; /* number of elements in array */
+ 5 double *buf; /* pointer to the dataset's data */
+ 6 herr_t status; /* function return value */
+ 7
+ 8 file_id = H5Fopen ("example6.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
+ 9 assert (file_id >= 0);
+10
+11 /* Attach to a datatype object */
+12 dataset_id = H5Dopen (file_id, "dataset1");
+13 assert (dataset_id >= 0);
+14
+15 /* Get the OID for the dataspace */
+16 space_id = H5Dget_space (dataset_id);
+17 assert (space_id >= 0);
+18
+19 /* Allocate space for the data */
+20 nelems = H5Sget_npoints (space_id);
+21 buf = malloc (nelems * sizeof(double));
+22
+23 /* Read in the dataset */
+24 status = H5Dread (dataset_id, H5T_NATIVE_DOUBLE, H5S_ALL,, H5S_ALL,
+25 H5P_DEFAULT, buf);
+26 assert (status >= 0);
+27
+28 /* Release the IDs we've accessed */
+29 H5Sclose (space_id);
+30 H5Dclose (dataset_id);
+31 H5Fclose (file_id);
+
The following documents form a loosely organized user's guide + to the HDF5 library. + +
The following documents form a loosely organized developer's guide to + aspects of the HDF5 library. (Some of the following documents + may be rather out of date as they were working papers for design + goals.) + +
The HDF5 raw data pipeline is a complicated beast that handles + all aspects of raw data storage and transfer of that data + between the file and the application. Data can be stored + contiguously (internal or external), in variable size external + segments, or regularly chunked; it can be sparse, extendible, + and/or compressible. Data transfers must be able to convert from + one data space to another, convert from one number type to + another, and perform partial I/O operations. Furthermore, + applications will expect their common usage of the pipeline to + perform well. + +
To accomplish these goals, the pipeline has been designed in a + modular way so no single subroutine is overly complicated and so + functionality can be inserted easily at the appropriate + locations in the pipeline. A general pipeline was developed and + then certain paths through the pipeline were optimized for + performance. + +
We describe only the file-to-memory side of the pipeline since + the memory-to-file side is a mirror image. We also assume that a + proper hyperslab of a simple data space is being read from the + file into a proper hyperslab of a simple data space in memory, + and that the data type is a compound type which may require + various number conversions on its members. + + + +
The diagrams should be read from the top down. The Line A
+ in the figure above shows that H5Dread()
copies
+ data from a hyperslab of a file dataset to a hyperslab of an
+ application buffer by calling H5D_read()
. And
+ H5D_read()
calls, in a loop,
+ H5S_simp_fgath()
, H5T_conv_struct()
,
+ and H5S_simp_mscat()
. A temporary buffer, TCONV, is
+ loaded with data points from the file, then data type conversion
+ is performed on the temporary buffer, and finally data points
+ are scattered out to application memory. Thus, data type
+ conversion is an in-place operation and data space conversion
+ consists of two steps. An additional temporary buffer, BKG, is
+ large enough to hold N instances of the destination
+ data type where N is the same number of data points
+ that can be held by the TCONV buffer (which is large enough to
+ hold either source or destination data points).
+
+
The application sets an upper limit for the size of the TCONV
+ buffer and optionally supplies a buffer. If no buffer is
+ supplied then one will be created by calling
+ malloc()
when the pipeline is executed (when
+ necessary) and freed when the pipeline exits. The size of the
+ BKG buffer depends on the size of the TCONV buffer and if the
+ application supplies a BKG buffer it should be at least as large
+ as the TCONV buffer. The default size for these buffers is one
+ megabyte but the buffer might not be used to full capacity if
+ the buffer size is not an integer multiple of the source or
+ destination data point size (whichever is larger, but only
+ destination for the BKG buffer).
+
+
+
+
Occassionally the destination data points will be partially
+ initialized and the H5Dread()
operation should not
+ clobber those values. For instance, the destination type might
+ be a struct with members a
and b
where
+ a
is already initialized and we're reading
+ b
from the file. An extra line, G, is added to the
+ pipeline to provide the type conversion functions with the
+ existing data.
+
+
+
+
It will most likely be quite common that no data type + conversion is necessary. In such cases a temporary buffer for + data type conversion is not needed and data space conversion + can happen in a single step. In fact, when the source and + destination data are both contiguous (they aren't in the + picture) the loop degenerates to a single iteration. + + + + +
So far we've looked only at internal contiguous storage, but by + replacing Line B in Figures 1 and 2 and Line A in Figure 3 with + Figure 4 the pipeline is able to handle regularly chunked + objects. Line B of Figure 4 is executed once for each chunk + which contains data to be read and the chunk address is found by + looking at a multi-dimensional key in a chunk B-tree which has + one entry per chunk. + + + +
If a single chunk is requested and the destination buffer is + the same size/shape as the chunk, then the CHUNK buffer is + bypassed and the destination buffer is used instead as shown in + Figure 5. + + + +
Some form of memory management may be necessary in HDF5 when + the various deletion operators are implemented so that the + file memory is not permanently orphaned. However, since an + HDF5 file was designed with persistent data in mind, the + importance of a memory manager is questionable. + +
On the other hand, when certain meta data containers (file glue) + grow, they may need to be relocated in order to keep the + container contiguous. + +
+ Example: An object header consists of up to two + chunks of contiguous memory. The first chunk is a fixed + size at a fixed location when the header link count is + greater than one. Thus, inserting additional items into an + object header may require the second chunk to expand. When + this occurs, the second chunk may need to move to another + location in the file, freeing the file memory which that + chunk originally occupied. ++ +
The relocation of meta data containers could potentially + orphan a significant amount of file memory if the application + has made poor estimates for preallocation sizes. + + +
Memory management by the library can be independent of memory + management support by the file format. The file format can + support no memory management, some memory management, or full + memory management. Similarly with the library. + +
We now evaluate each combination of library support with file + support: + +
The file contains an unsorted, doubly-linked list of free + blocks. The address of the head of the list appears in the + boot block. Each free block contains the following fields: + +
byte | +byte | +byte | +byte | + +
---|---|---|---|
Free Block Signature | + +|||
Total Free Block Size | + +|||
Address of Left Sibling | + +|||
Address of Right Sibling | + +|||
Remainder of Free Block |
+
The library reads as much of the free list as convenient when + convenient and pushes those entries onto stacks. This can + occur when a file is opened or any time during the life of the + file. There is one stack for each free block size and the + stacks are sorted by size in a balanced tree in memory. + +
Deallocation involves finding the correct stack or creating + a new one (an O(log K) operation where K is + the number of stacks), pushing the free block info onto the + stack (a constant-time operation), and inserting the free + block into the file free block list (a constant-time operation + which doesn't necessarily involve any I/O since the free blocks + can be cached like other objects). No attempt is made to + coalesce adjacent free blocks into larger blocks. + +
Allocation involves finding the correct stack (an O(log + K) operation), removing the top item from the stack + (a constant-time operation), and removing the block from the + file free block list (a constant-time operation). If there is + no free block of the requested size or larger, then the file + is extended. + +
To provide sharability of the free list between processes, + the last step of an allocation will check for the free block + signature and if it doesn't find one will repeat the process. + Alternatively, a process can temporarily remove free blocks + from the file and hold them in it's own private pool. + +
To summarize... +
The HDF5 file format supports a general B-tree mechanism + for storing data with keys. If we use a B-tree to represent + all parts of the file that are free and the B-tree is indexed + so that a free file chunk can be found if we know the starting + or ending address, then we can efficiently determine whether a + free chunk begins or ends at the specified address. Call this + the Address B-Tree. + +
If a second B-tree points to a set of stacks where the + members of a particular stack are all free chunks of the same + size, and the tree is indexed by chunk size, then we can + efficiently find the best-fit chunk size for a memory request. + Call this the Size B-Tree. + +
All free blocks of a particular size can be linked together + with an unsorted, doubly-linked, circular list and the left + and right sibling addresses can be stored within the free + chunk, allowing us to remove or insert items from the list in + constant time. + +
Deallocation of a block fo file memory consists of: + +
Allocation is similar to deallocation. + +
To summarize... + +
The property list (a.k.a., template) interface provides a + mechanism for default named arguments for a C function + interface. A property list is a collection of name/value pairs + which can be passed to various other HDF5 functions to control + features that are typically unimportant or whose default values + are usually used. + +
For instance, file creation needs to know various things such
+ as the size of the user-block at the beginning of the file, or
+ the size of various file data structures. Wrapping this
+ information in a property list simplifies the API by reducing
+ the number of arguments to H5Fcreate()
.
+
+
Property lists follow the same create/open/close paradigm as + the rest of the library. + +
hid_t H5Pcreate (H5P_class_t class)
+ H5P_FILE_CREATE
+ H5P_FILE_ACCESS
+ H5P_DATASET_CREATE
+ H5P_DATASET_XFER
+ hid_t H5Pcopy (hid_t plist)
+ herr_t H5Pclose (hid_t plist)
+ H5P_class_t H5Pget_class (hid_t plist)
+ H5Pcreate()
.
+ The HDF5 version number is a set of three integer values and
+ one lower-case letter written as, for example,
+ hdf5-1.2.0a
.
+
+
The 5
is part of the library name and will only
+ change if the entire file format and library are redesigned
+ similar in scope to the changes between HDF4 and HDF5.
+
+
The 1
is the major version number and
+ changes when there is an extensive change to the file format or
+ library. Such a change will likely require files to be
+ translated and applications to be modified. This number is not
+ expected to change frequently.
+
+
The 2
is the minor version number and is
+ incremented by each public release that presents new features.
+ Even numbers are reserved for stable public versions of the
+ library while odd numbers are reserved for developement
+ versions. See the diagram below for examples.
+
+
The 0
is the release number. For public
+ versions of the library, the release number is incremented each
+ time a bug(s) is fixed and the fix is made available to the
+ public. For development versions, the release number is
+ incremented automatically each time a CVS commit occurs anywhere
+ in the source tree.
+
+
The a
is the patch level and is used only
+ for public versions. It's incremented only for very minor
+ changes that don't affect the usability of the library. For
+ instance, fixing spelling errors, changing warning messages, or
+ updating documentation.
+
+
It's often convenient to drop the patch level and release + number when referring to a version of the library, like saying + version 1.2 of HDF5. The release number and patch level can be + any value in this case. + +
Version 1.0.0 was released for alpha testing the first week of + March, 1998. The developement version number was incremented to + 1.0.1 and remained constant until the the last week of April, + when the release number started to increase and development + versions were made available to people outside the core HDF5 + development team. + +
Version 1.1.0 will be the first official beta release but the + 1.1 branch will also serve as a development branch since we're + not concerned about providing bug fixes separate from normal + development for the beta version. + +
Version 1.2 will be the first official HDF5 version. The + version tree will fork at this point with public bug fixes + provided on the 1.2 branch and development will continue on the + 1.3 branch. + +
The motivation for separate public and development versions is + that the public version will receive only bug fixes while the + development version will receive new features. + +
Eventually, the development version will near completion and a + new development branch will fork while the original one enters a + feature freeze state. When the original development branch is + ready for release the minor version number will be incremented + to an even value. + +
+
The library provides a set of macros and functions to query and + check version numbers. + +
H5_VERS_MAJOR
+ H5_VERS_MINOR
+ H5_VERS_RELEASE
+ H5_VERS_PATCH
+ herr_t H5version (unsigned *majnum, unsigned
+ *minnum, unsigned *relnum, unsigned
+ *patnum)
+ void H5check(void)
+ herr_t H5vers_check (unsigned majnum,
+ unsigned minnum, unsigned relnum, unsigned
+ patnum)
+ H5check()
macro
+ with the include file version constants. The function
+ compares its arguments to the result returned by
+ H5version()
and if a mismatch is detected prints
+ an error message on the standard error stream and aborts.
+ The HDF5 development must proceed in such a manner as to + satisfy the following conditions: + +
There's at least one invarient: new object features introduced + in the HDF5 file format (like 2-d arrays of structs) might be + impossible to "translate" to a format that an old HDF4 + application can understand either because the HDF4 file format + or the HDF4 API has no mechanism to describe the object. + +
What follows is one possible implementation based on how + Condition B was solved in the AIO/PDB world. It also attempts + to satisfy these goals: + +
The proposed implementation uses wrappers to handle + compatability issues. A Format-X file is wrapped in a + Format-Y file by creating a Format-Y skeleton that replicates + the Format-X meta data. The Format-Y skeleton points to the raw + data stored in Format-X without moving the raw data. The + restriction is that raw data storage methods in Format-Y is a + superset of raw data storage methods in Format-X (otherwise the + raw data must be copied to Format-Y). We're assuming that meta + data is small wrt the entire file. + +
The wrapper can be a separate file that has pointers into the + first file or it can be contained within the first file. If + contained in a single file, the file can appear as a Format-Y + file or simultaneously a Format-Y and Format-X file. + +
The Format-X meta-data can be thought of as the original + wrapper around raw data and Format-Y is a second wrapper around + the same data. The wrappers are independend of one another; + modifying the meta-data in one wrapper causes the other to + become out of date. Modification of raw data doesn't invalidate + either view as long as the meta data that describes its storage + isn't modifed. For instance, an array element can change values + if storage is already allocated for the element, but if storage + isn't allocated then the meta data describing the storage must + change, invalidating all wrappers but one. + +
It's perfectly legal to modify the meta data of one wrapper + without modifying the meta data in the other wrapper(s). The + illegal part is accessing the raw data through a wrapper which + is out of date. + +
If raw data is wrapped by more than one internal wrapper + (internal means that the wrapper is in the same file as + the raw data) then access to that file must assume that + unreferenced parts of that file contain meta data for another + wrapper and cannot be reclaimed as free memory. + +
Since this is a temporary situation which can't be + automatically detected by the HDF5 library, we must rely + on the application to notify the HDF5 library whether or not it + must satisfy Condition B. (Even if we don't rely on the + application, at some point someone is going to remove the + Condition B constraint from the library.) So the module that + handles Condition B is conditionally compiled and then enabled + on a per-file basis. + +
If the application desires to produce an HDF4 file (determined
+ by arguments to H5Fopen
), and the Condition B
+ module is compiled into the library, then H5Fclose
+ calls the module to traverse the HDF5 wrapper and generate an
+ additional internal or external HDF4 wrapper (wrapper specifics
+ are described below). If Condition B is implemented as a module
+ then it can benefit from the metadata already cached by the main
+ library.
+
+
An internal HDF4 wrapper would be used if the HDF5 file is + writable and the user doesn't mind that the HDF5 file is + modified. An external wrapper would be used if the file isn't + writable or if the user wants the data file to be primarily HDF5 + but a few applications need an HDF4 view of the data. + +
Modifying through the HDF5 library an HDF5 file that has
+ internal HDF4 wrapper should invalidate the HDF4 wrapper (and
+ optionally regenerate it when H5Fclose
is
+ called). The HDF5 library must understand how wrappers work, but
+ not necessarily anything about the HDF4 file format.
+
+
Modifying through the HDF5 library an HDF5 file that has an
+ external HDF4 wrapper will cause the HDF4 wrapper to become out
+ of date (but possibly regenerated during H5Fclose
).
+ Note: Perhaps the next release of the HDF4 library should
+ insure that the HDF4 wrapper file has a more recent modification
+ time than the raw data file (the HDF5 file) to which it
+ points(?)
+
+
Modifying through the HDF4 library an HDF5 file that has an + internal or external HDF4 wrapper will cause the HDF5 wrapper to + become out of date. However, there is now way for the old HDF4 + library to notify the HDF5 wrapper that it's out of date. + Therefore the HDF5 library must be able to detect when the HDF5 + wrapper is out of date and be able to fix it. If the HDF4 + wrapper is complete then the easy way is to ignore the original + HDF5 wrapper and generate a new one from the HDF4 wrapper. The + other approach is to compare the HDF4 and HDF5 wrappers and + assume that if they differ HDF4 is the right one, if HDF4 omits + data then it was because HDF4 is a partial wrapper (rather than + assume HDF4 deleted the data), and if HDF4 has new data then + copy the new meta data to the HDF5 wrapper. On the other hand, + perhaps we don't need to allow these situations (modifying an + HDF5 file with the old HDF4 library and then accessing it with + the HDF5 library is either disallowed or causes HDF5 objects + that can't be described by HDF4 to be lost). + +
To convert an HDF5 file to an HDF4 file on demand, one simply + opens the file with the HDF4 flag and closes it. This is also + how AIO implemented backward compatability with PDB in its file + format. + +
This condition must be satisfied for all time because there
+ will always be archived HDF4 files. If a pure HDF4 file (that
+ is, one without HDF5 meta data) is opened with an HDF5 library,
+ the H5Fopen
builds an internal or external HDF5
+ wrapper and then accesses the raw data through that wrapper. If
+ the HDF5 library modifies the file then the HDF4 wrapper becomes
+ out of date. However, since the HDF5 library hasn't been
+ released, we can at least implement it to disable and/or reclaim
+ the HDF4 wrapper.
+
+
If an external and temporary HDF5 wrapper is desired, the
+ wrapper is created through the cache like all other HDF5 files.
+ The data appears on disk only if a particular cached datum is
+ preempted. Instead of calling H5Fclose
on the HDF5
+ wrapper file we call H5Fabort
which immediately
+ releases all file resources without updating the file, and then
+ we unlink the file from Unix.
+
+
External wrappers are quite obvious: they contain only things + from the format specs for the wrapper and nothing from the + format specs of the format which they wrap. + +
An internal HDF4 wrapper is added to an HDF5 file in such a way
+ that the file appears to be both an HDF4 file and an HDF5
+ file. HDF4 requires an HDF4 file header at file offset zero. If
+ a user block is present then we just move the user block down a
+ bit (and truncate it) and insert the minimum HDF4 signature.
+ The HDF4 dd
list and any other data it needs are
+ appended to the end of the file and the HDF5 signature uses the
+ logical file length field to determine the beginning of the
+ trailing part of the wrapper.
+
+
+
HDF4 minimal file header. Its main job is to point to
+ the dd list at the end of the file. |
+
User-defined block which is truncated by the size of the + HDF4 file header so that the HDF5 boot block file address + doesn't change. | +
The HDF5 boot block and data, unmodified by adding the + HDF4 wrapper. | +
The main part of the HDF4 wrapper. The dd
+ list will have entries for all parts of the file so
+ hdpack(?) doesn't (re)move anything. |
+
When such a file is opened by the HDF5 library for + modification it shifts the user block back down to address zero + and fills with zeros, then truncates the file at the end of the + HDF5 data or adds the trailing HDF4 wrapper to the free + list. This prevents HDF4 applications from reading the file with + an out of date wrapper. + +
If there is no user block then we have a problem. The HDF5 + boot block must be moved to make room for the HDF4 file header. + But moving just the boot block causes problems because all file + addresses stored in the file are relative to the boot block + address. The only option is to shift the entire file contents + by 512 bytes to open up a user block (too bad we don't have + hooks into the Unix i-node stuff so we could shift the entire + file contents by the size of a file system page without ever + performing I/O on the file :-) + +
Is it possible to place an HDF5 wrapper in an HDF4 file? I
+ don't know enough about the HDF4 format, but I would suspect it
+ might be possible to open a hole at file address 512 (and
+ possibly before) by moving some things to the end of the file
+ to make room for the HDF5 signature. The remainder of the HDF5
+ wrapper goes at the end of the file and entries are added to the
+ HDF4 dd
list to mark the location(s) of the HDF5
+ wrapper.
+
+
Conversion programs that copy an entire HDF4 file to a separate, + self-contained HDF5 file and vice versa might be useful. + + + + +
Since file data structures can be cached in memory by the H5AC + package it becomes problematic to move such a data structure in + the file. One cannot just copy a portion of the file from one + location to another because: + +
Here's a correct method to move data from one location to
+ another. The example code assumes that one is moving a B-link
+ tree node from old_addr
to new_addr
.
+
+
H5AC_flush
is
+ FALSE
.
+
+ H5AC_flush (f, H5AC_BT, old_addr, FALSE);
+
+
+ H5F_block_read (f, old_addr, size, buf);
+ H5F_block_write (f, new_addr, size, buf);
+
+
+ H5AC_rename (f, H5AC_BT, old_addr, new_addr);
+
+ Parallel HDF5 Design
++
In this section, I first describe the function requirements of the Parallel HDF5 (PHDF5) software and the assumed system requirements. Section 2 describes the programming model of the PHDF5 interface. Section 3 shows an example PHDF5 program.
+HDF5 uses optional access template object to control the file access +mechanism. The general model in accessing an HDF5 file in parallel +contains the following steps:
+ +Each processes of the MPI communicator creates an access template and sets +it up with MPI parallel access information (communicator, info object, +access-mode).
+All processes of the MPI communicator open an HDF5 file by a collective call +(H5FCreate or H5Fopen) with the access template.
+All processes of the MPI communicator open a dataset by a collective call (H5Dcreate or H5Dopen). This version supports only collective dataset open. Future version may support datasets open by a subset of the processes that have opened the file.
+Each process may do independent and arbitrary number of data I/O access by independent calls (H5Dread or H5Dwrite) to the dataset with the transfer template set for independent access. (The default transfer mode is independent transfer). If the dataset is an unlimited dimension one and if the H5Dwrite is writing data beyond the current dimension size of the dataset, all processes that have opened the dataset must make a collective call (H5Dallocate) to allocate more space for the dataset BEFORE the independent H5Dwrite call.
+All processes that have opened the dataset may do collective data I/O access by collective calls (H5Dread or H5Dwrite) to the dataset with the transfer template set for collective access. Pre-allocation (H5Dallocate) is not needed for unlimited dimension datasets since the H5Dallocate call, if needed, is done internally by the collective data access call.
+Changes to attributes can only occur at the "main process" (process 0). Read only access to attributes can occur independent in each process that has opened the dataset. (API to be defined later.)
+
All processes that have opened the dataset must close the dataset by a collective call (H5Dclose).
+All processes that have opened the file must close the file by a collective call (H5Fclose).
+
+
+
Example code
+
+Send comments to
+hdfparallel@ncsa.uiuc.edu
Example programs/sections of code below: +
Notes:
+This example creates a new HDF5 file and allows write access.
+If the file exists already, the H5F_ACC_TRUNC flag would also be necessary to
+overwrite the previous file's information.
+
+
Code:
+
+
+
+
+ hid_t file_id;
+
+ file_id=H5Fcreate("example1.h5",0);
+
+ H5Fclose(file_id);
+
+
Notes:
+This example checks if a file is an HDF5 file and lists the contents of the top
+level (file level) group.
+
+
Code:
+
+
+
+
+ hid_t file_id; /* File ID */
+ uint32 num_items; /* number of items in top-level group */
+ intn i; /* counter */
+ char *obj_name; /* object's name as string atom */
+ uintn name_len; /* object name's length in chars */
+ uintn buf_len=0; /* buffer length for names */
+ char *buf=NULL; /* buffer for names */
+
+ if(H5Fis_hdf5("example2.h5")==TRUE)
+ {
+ file_id=H5Fopen("example2.h5",H5F_ACC_RDWR|H5ACC_CREATE);
+ num_items=H5GgetNumContents(file_id);
+ for(i=0; i<num_items; i++)
+ {
+ obj_name=H5GgetNameByIndex(file_id,i,NULL,0);
+ printf("object #%d is: %s\n",i,buf);
+ HDfree(obj_name);
+ }
+ H5Fclose(file_id);
+ }
+
+
Notes:
+This example creates a 4-dimensional dataset of 32-bit floating-point
+numbers, corresponding to the current Scientific Dataset functionality.
+This example assumes that the datatype and dataspace of the dataset will not
+be re-used.
+
+
Code:
+
+
+
+
+ hid_t file_id; /* File's ID */
+ uint32 dims[4]={6,5,4,3}; /* the size of each dimension */
+ hid_t dataset_id; /* new object's ID */
+ float32 obj_data[6][5][4][3]; /* storage for the dataset's data */
+
+ if((file_id=H5Fcreate("example3.h5",H5F_ACC_TRUNC))>=0)
+ {
+ /* Create & initialize the dataset object */
+ dataset_id=H5Mcreate(file_id,H5OBJ_DATASET,"Simple Object");
+
+ /* Create & initialize a datatype object */
+ H5TsetType(dataset_id,H5TYPE_FLOAT,4,H5T_BIGENDIAN);
+
+ /* Initialize dimensionality of dataset */
+ H5SsetSpace(dataset_id,rank,dims);
+
+ <initialize data array>
+
+ /* Write the entire dataset out */
+ H5Dwrite(dataset_id,H5S_SCALAR,data);
+ <or>
+ H5Dwrite(dataset_id,dataset_id,data);
+
+ /* Release the atoms we've created */
+ H5Mrelease(dataset_id);
+
+ /* close the file */
+ H5Fclose(file_id);
+ }
+
Notes:
+This example creates a 1-dimensional dataset of compound datatype records,
+corresponding to the current Vdata functionality. This example also assumes
+that the datatype and dataspace will not be re-used.
+
+
Code:
+
+
+
+
+ hid_t file_id; /* File's ID */
+ uint32 dims[1]={45}; /* the size of the dimension */
+ hid_t dataset_id; /* object's ID */
+ void *obj_data; /* pointer to the dataset's data */
+
+ if((file_id=H5Fcreate("example4.h5",H5F_ACC_TRUNC))>=0)
+ {
+ /* Create & initialize the dataset object */
+ dataset_id=H5Mcreate(file_id,H5OBJ_DATASET,"Compound Object");
+
+ /* Initialize datatype */
+ H5TsetType(dataset_id,H5TYPE_STRUCT);
+ H5TaddField(dataset_id,H5TYPE_FLOAT32,"Float32 Scalar Field",H5SPACE_SCALAR);
+ H5TaddField(dataset_id,H5TYPE_CHAR,"Char Field",H5SPACE_SCALAR);
+ H5TaddField(dataset_id,H5TYPE_UINT16,"UInt16 Field",H5SPACE_SCALAR);
+ H5TendDefine(dataset_id);
+
+ /* Initialize dimensionality */
+ H5SsetSpace(dataset_id,1,dims);
+
+ <initialize data array>
+
+ /* Write the entire dataset out */
+ H5Dwrite(dataset_id,H5S_SCALAR,data);
+
+ /* Release the atoms we've created */
+ H5Mrelease(dataset_id);
+
+ /* close the file */
+ H5Fclose(file_id);
+ }
+
Notes:
+This example creates a 3-dimensional dataset of compound datatype records,
+roughly corresponding to a multi-dimensional Vdata functionality. This
+example also shows the use of multi-dimensional fields in the compound datatype.
+This example uses "stand-alone" datatypes and dataspaces.
+
+
Code:
+
+
+
+
+ hid_t file_id; /* File's ID */
+ hid_t type_id; /* datatype's ID */
+ hid_t dim_id; /* dimensionality's ID */
+ uint32 dims[3]={95,67,5}; /* the size of the dimensions */
+ hid_t field_dim_id; /* dimensionality ID for fields in the structure */
+ uint32 field_dims[4]; /* array for field dimensions */
+ hid_t dataset_id; /* object's ID */
+ void *obj_data; /* pointer to the dataset's data */
+
+ if((file_id=H5Fcreate("example5.h5",H5F_ACC_TRUNC))>=0)
+ {
+ /* Create & initialize a datatype object */
+ type_id=H5Mcreate(file_id,H5OBJ_DATATYPE,"Compound Type #1");
+ H5TsetType(type_id,H5TYPE_STRUCT);
+
+ /* Create each multi-dimensional field in structure */
+ field_dim_id=H5Mcreate(file_id,H5OBJ_DATASPACE,"Lat/Long Dims");
+ field_dims[0]=360;
+ field_dims[1]=720;
+ H5SsetSpace(field_dim_id,2,field_dims);
+ H5TaddField(type_id,H5TYPE_FLOAT32,"Lat/Long Locations",field_dim_id);
+ H5Mrelease(field_dim_id);
+
+ field_dim_id=H5Mcreate(file_id,H5OBJ_DATASPACE,"Browse Dims");
+ field_dims[0]=40;
+ field_dims[1]=40;
+ H5SsetSpace(field_dim_id,2,field_dims);
+ H5TaddField(type_id,H5TYPE_CHAR,"Browse Image",field_dim_id);
+ H5Mrelease(field_dim_id);
+
+ field_dim_id=H5Mcreate(file_id,H5OBJ_DATASPACE,"Multispectral Dims");
+ field_dims[0]=80;
+ field_dims[1]=60;
+ field_dims[2]=40;
+ H5SsetSpace(field_dim_id,3,field_dims);
+ H5TaddField(type_id,H5TYPE_UINT16,"Multispectral Scans",field_dim_id);
+ H5Mrelease(field_dim_id);
+ H5TendDefine(type_id);
+
+ /* Create & initialize a dimensionality object */
+ dim_id=H5Mcreate(file_id,H5OBJ_DATASPACE,"3-D Dim");
+ H5SsetSpace(dim_id,3,dims);
+
+ /* Create & initialize the dataset object */
+ dataset_id=H5Mcreate(file_id,H5OBJ_DATASET,"Compound Multi-Dim Object");
+ H5DsetInfo(dataset_id,type_id,dim_id);
+
+ <initialize data array>
+
+ /* Write the entire dataset out */
+ H5Dwrite(dataset_id,H5S_SCALAR,data);
+
+ /* Release the atoms we've created */
+ H5Mrelease(type_id);
+ H5Mrelease(dim_id);
+ H5Mrelease(dataset_id);
+
+ /* close the file */
+ H5Fclose(file_id);
+ }
+
Notes:
+This example shows how to get the information for and display a generic
+dataset.
+
+
Code:
+
+
diff --git a/doc/html/review1a.html b/doc/html/review1a.html
new file mode 100644
index 0000000..78a5a84
--- /dev/null
+++ b/doc/html/review1a.html
@@ -0,0 +1,252 @@
+
+
+
+ hid_t file_id; /* File's ID */
+ hid_t dataset_id; /* dataset's ID in memory */
+ uintn elem_size; /* size of each element */
+ uintn nelems; /* number of elements in array */
+ void *obj_data; /* pointer to the dataset's data */
+
+ if((file_id=H5Fopen("example6.h5",0))>=0)
+ {
+ /* Attach to a datatype object */
+ dataset_id=H5MaccessByIndex(obj_oid,0);
+
+ if(H5TbaseType(dataset_id)==H5T_COMPOUND)
+ {
+ <set up for compound object>
+ }
+ else
+ {
+ <set up for homogenous object>
+ }
+
+ elem_size=H5Tsize(dataset_id);
+ nelems=H5Snelem(dataset_id);
+ <allocate space based on element size and number of elements >
+
+ /* Read in the dataset */
+ H5Dwrite(dataset_id,H5S_SCALAR,data);
+ <or>
+ H5Dwrite(dataset_id,dataset_id,data);
+
+ /* Release the atoms we've accessed */
+ H5Mrelease(dataset_id);
+
+ /* close the file */
+ H5Fclose(file_id);
+ }
+
Directories (or now Groups) are currently implemented as + a directed graph with a single entry point into the graph which + is the Root Object. The root object is usually a + group. All objects have at least one predecessor (the Root + Object always has the HDF5 file boot block as a + predecessor). The number of predecessors of a group is also + known as the hard link count or just link count. + Unlike Unix directories, HDF5 groups have no ".." entry since + any group can have multiple predecessors. Given the handle or + id of some object and returning a full name for that object + would be an expensive graph traversal. + +
A special optimization is that a file may contain a single + non-group object and no group(s). The object has one + predecessor which is the file boot block. However, once a root + group is created it never dissappears (although I suppose it + could if we wanted). + +
A special object called a Symbolic Link is simply a + name. Usually the name refers to some (other) object, but that + object need not exist. Symbolic links in HDF5 will have the + same semantics as symbolic links in Unix. + +
The symbol table graph contains "entries" for each name. An + entry contains the file address for the object header and + possibly certain messages cached from the object header. + +
The H5G package understands the notion of opening and object + which means that given the name of the object, a handle to the + object is returned (this isn't an API function). Objects can be + opened multiple times simultaneously through the same name or, + if the object has hard links, through other names. The name of + an object cannot be removed from a group if the object is opened + through that group (although the name can change within the + group). + +
Below the API, object attributes can be read without opening + the object; object attributes cannot change without first + opening that object. The one exception is that the contents of a + group can change without opening the group. + +
Assuming we have a flat name space (that is, the root object is + a group which contains names for all other objects in the file + and none of those objects are groups), then we can build a + hierarchy of groups that also refer to the objects. + +
The file initially contains `foo' `bar' `baz' in the root + group. We wish to add groups `grp1' and `grp2' so that `grp1' + contains objects `foo' and `baz' and `grp2' contains objects + `bar' and `baz' (so `baz' appears in both groups). + +
In either case below, one might want to move the flat objects + into some other group (like `flat') so their names don't + interfere with the rest of the hierarchy (or move the hierarchy + into a directory called `/hierarchy'). + +
Create group `grp1' and add symbolic links called `foo' whose + value is `/foo' and `baz' whose value is `/baz'. Similarly for + `grp2'. + +
Accessing `grp1/foo' involves searching the root group for + the name `grp1', then searching that group for `foo', then + searching the root directory for `foo'. Alternatively, one + could change working groups to the grp1 group and then ask for + `foo' which searches `grp1' for the name `foo', then searches + the root group for the name `foo'. + +
Deleting `/grp1/foo' deletes the symbolic link without + affecting the `/foo' object. Deleting `/foo' leaves the + `/grp1/foo' link dangling. + +
Creating the hierarchy is the same as with symbolic links. + +
Accessing `/grp1/foo' searches the root group for the name + `grp1', then searches that group for the name `foo'. If the + current working group is `/grp1' then we just search for the + name `foo'. + +
Deleting `/grp1/foo' leaves `/foo' and vice versa. + +
Depending on the eventual API...
+
+
+
+ or
+
+
+H5Gcreate (file_id, "/grp1");
+H5Glink (file_id, H5G_HARD, "/foo", "/grp1/foo");
+
+
+
+
+group_id = H5Gcreate (root_id, "grp1");
+H5Glink (file_id, H5G_HARD, root_id, "foo", group_id, "foo");
+H5Gclose (group_id);
+
Similar to abvoe, but in this case we have to watch out that + we don't get two names which are the same: what happens to + `/grp1/baz' and `/grp2/baz'? If they really refer to the same + object then we just have `/baz', but if they point to two + different objects what happens? + +
The other thing to watch out for cycles in the graph when we + traverse it to build the flat namespace. + +
Two things to watch out for are that the group contents don't + appear to change in a manner which would confuse the + application, and that listing everything in a group is as + efficient as possible. + +
Query the number of things in a group and then query each item
+ by index. A trivial implementation would be O(n*n) and wouldn't
+ protect the caller from changes to the directory which move
+ entries around and therefore change their indices.
+
+
+
+
+n = H5GgetNumContents (group_id);
+for (i=0; i<n; i++) {
+ H5GgetNameByIndex (group_id, i, ...); /*don't worry about args yet*/
+}
+
The API contains a single function that reads all information
+ from the specified group and returns that info through an array.
+ The caller is responsible for freeing the array allocated by the
+ query and the things to which it points. This also makes it
+ clear the the returned value is a snapshot of the group which
+ doesn't change if the group is modified.
+
+
+
+ Notice that it would be difficult to expand the info struct since
+ its definition is part of the API.
+
+
+n = H5Glist (file_id, "/grp1", info, ...);
+for (i=0; i<n; i++) {
+ printf ("name = %s\n", info[i].name);
+ free (info[i].name); /*and maybe other fields too?*/
+}
+free (info);
+
The caller asks for a snapshot of the group and then accesses
+ items in the snapshot through various query-by-index API
+ functions. When finished, the caller notifies the library that
+ it's done with the snapshot. The word "snapshot" makes it clear
+ that subsequent changes to the directory will not be reflected in
+ the shapshot_id.
+
+
+
+ In fact, we could allow the user to leave off the H5Gsnapshot and
+ H5Grelease and use group_id in the H5GgetNumContents and
+ H5GgetNameByIndex so they can choose between Method A and Method
+ C.
+
+
+snapshot_id = H5Gsnapshot (group_id); /*or perhaps group_name */
+n = H5GgetNumContents (snapshot_id);
+for (i=0; i<n; i++) {
+ H5GgetNameByIndex (shapshot_id, i, ...);
+}
+H5Grelease (shapshot_id);
+
hid_t H5Gshapshot (hid_t group_id)
+ H5GgetNameByIndex
is changed. Adding new entries
+ to a group doesn't affect the snapshot.
+
+ char *H5GgetNameByIndex (hid_t shapshot_id, int
+ index)
+ index
of
+ the snapshot array to get the object name. This is a
+ constant-time operation. The name is updated automatically if
+ the object is renamed within the group.
+
+ H5Gget<whatever>ByIndex...()
+ index
,
+ which is just a symbol table entry, and reads the appropriate
+ object header message(s) which might be cached in the symbol
+ table entry. This is a constant-time operation if cached,
+ linear in the number of messages if not cached.
+
+ H5Grelease (hid_t snapshot_id)
+ char*
or some HDF5 string type.In either case, the caller has to release resources associated + with the return value, calling free() or some HDF5 function. + +
Names in the current implementation of the H5G package don't + contain embedded null characters and are always null terminated. + +
Eventually the caller probably wants a char*
so it
+ can pass it to some non-HDF5 function, does that require
+ strdup'ing the string again? Then the caller has to free() the
+ the char* and release the DHF5 string.
+
+
This document describes the various ways that raw data is + stored in an HDF5 file and the object header messages which + contain the parameters for the storage. + +
Raw data storage has three components: the mapping from some + logical multi-dimensional element space to the linear address + space of a file, compression of the raw data on disk, and + striping of raw data across multiple files. These components + are orthogonal. + +
Some goals of the storage mechanism are to be able to + efficently store data which is: + +
The Sparse Large, Dynamic Size, and Subslab Access methods
+ share so much code that they can be described with a single
+ message. The new Indexed Storage Message (0x0008
)
+ will replace the old Chunked Object (0x0009
) and
+ Sparse Object (0x000A
) Messages.
+
+
+
byte | +byte | +byte | +byte | +
---|---|---|---|
Address of B-tree |
+ |||
Number of Dimensions | +Reserved | +Reserved | +Reserved | +
Reserved (4 bytes) | +|||
Alignment for Dimension 0 (4 bytes) | +|||
Alignment for Dimension 1 (4 bytes) | +|||
... | +|||
Alignment for Dimension N (4 bytes) | +
The alignment fields indicate the alignment in logical space to + use when allocating new storage areas on disk. For instance, + writing every other element of a 100-element one-dimensional + array (using one HDF5 I/O partial write operation per element) + that has unit storage alignment would result in 50 + single-element, discontiguous storage segments. However, using + an alignment of 25 would result in only four discontiguous + segments. The size of the message varies with the number of + dimensions. + +
A B-tree is used to point to the discontiguous portions of + storage which has been allocated for the object. All keys of a + particular B-tree are the same size and are a function of the + number of dimensions. It is therefore not possible to change the + dimensionality of an indexed storage array after its B-tree is + created. + +
+
byte | +byte | +byte | +byte | +
---|---|---|---|
External File Number or Zero (4 bytes) | +|||
Chunk Offset in Dimension 0 (4 bytes) | +|||
Chunk Offset in Dimension 1 (4 bytes) | +|||
... | +|||
Chunk Offset in Dimension N (4 bytes) | +
The keys within a B-tree obey an ordering based on the chunk + offsets. If the offsets in dimension-0 are equal, then + dimension-1 is used, etc. The External File Number field + contains a 1-origin offset into the External File List message + which contains the name of the external file in which that chunk + is stored. + +
The indexed storage will support arbitrary striping at the + chunk level; each chunk can be stored in any file. This is + accomplished by using the External File Number field of an + indexed storage B-tree key as a 1-origin offset into an External + File List Message (0x0009) which takes the form: + +
+
byte | +byte | +byte | +byte | +
---|---|---|---|
Name Heap Address |
+ |||
Number of Slots Allocated (4 bytes) | +|||
Number of File Names (4 bytes) | +|||
Byte Offset of Name 1 in Heap (4 bytes) | +|||
Byte Offset of Name 2 in Heap (4 bytes) | +|||
... | +|||
Unused Slot(s) |
+
Each indexed storage array that has all or part of its data + stored in external files will contain a single external file + list message. The size of the messages is determined when the + message is created, but it may be possible to enlarge the + message on demand by moving it. At this time, it's not possible + for multiple arrays to share a single external file list + message. + +
+ H5O_efl_t *H5O_efl_new (H5G_entry_t *object, intn
+ nslots_hint, intn heap_size_hint)
+
+
+ intn H5O_efl_index (H5O_efl_t *efl, const char *filename)
+
+
+ H5F_low_t *H5O_efl_open (H5O_efl_t *efl, intn index, uintn mode)
+
+
+ herr_t H5O_efl_release (H5O_efl_t *efl)
+
+ H5O_efl_new
flushes the message
+ to disk.
+ This is the results of studying the chunked layout policy in + HDF5. A 1000 by 1000 array of integers was written to a file + dataset extending the dataset with each write to create, in the + end, a 5000 by 5000 array of 4-byte integers for a total data + storage size of 100 million bytes. + +
+
After the array was written, it was read back in blocks that + were 500 by 500 bytes in row major order (that is, the top-left + quadrant of output block one, then the top-right quadrant of + output block one, then the top-left quadrant of output block 2, + etc.). + +
I tried to answer two questions: +
I started with chunk sizes that were multiples of the read + block size or k*(500, 500). + +
+
Chunk Size (elements) | +Meta Data Overhead (ppm) | +Raw Data Overhead (ppm) | +
---|---|---|
500 by 500 | +85.84 | +0.00 | +
1000 by 1000 | +23.08 | +0.00 | +
5000 by 1000 | +23.08 | +0.00 | +
250 by 250 | +253.30 | +0.00 | +
499 by 499 | +85.84 | +205164.84 | +
+
The first half of Figure 2 shows output to the file while the + second half shows input. Each dot represents a file-level I/O + request and the lines that connect the dots are for visual + clarity. The size of the request is not indicated in the + graph. The output block size is four times the chunk size which + results in four file-level write requests per block for a total + of 100 requests. Since file space for the chunks was allocated + in output order, and the input block size is 1/4 the output + block size, the input shows a staircase effect. Each input + request results in one file-level read request. The downward + spike at about the 60-millionth byte is probably the result of a + cache miss for the B-tree and the downward spike at the end is + probably a cache flush or file boot block update. + +
+
In this test I increased the chunk size to match the output + chunk size and one can see from the first half of the graph that + 25 file-level write requests were issued, one for each output + block. The read half of the test shows that four times the + amount of data was read as written. This results from the fact + that HDF5 must read the entire chunk for any request that falls + within that chunk, which is done because (1) if the data is + compressed the entire chunk must be decompressed, and (2) the + library assumes that a chunk size was chosen to optimize disk + performance. + +
+
Increasing the chunk size further results in even worse + performance since both the read and write halves of the test are + re-reading and re-writing vast amounts of data. This proves + that one should be careful that chunk sizes are not much larger + than the typical partial I/O request. + +
+
If the chunk size is decreased then the amount of data + transfered between the disk and library is optimal for no + caching, but the amount of meta data required to describe the + chunk locations increases to 250 parts per million. One can + also see that the final downward spike contains more file-level + write requests as the meta data is flushed to disk just before + the file is closed. + +
+
This test shows the result of choosing a chunk size which is + close to the I/O block size. Because the total size of the + array isn't a multiple of the chunk size, the library allocates + an extra zone of chunks around the top and right edges of the + array which are only partially filled. This results in + 20,516,484 extra bytes of storage, a 20% increase in the total + raw data storage size. But the amount of meta data overhead is + the same as for the 500 by 500 test. In addition, the mismatch + causes entire chunks to be read in order to update a few + elements along the edge or the chunk which results in a 3.6-fold + increase in the amount of data transfered. + +
The HDF5 library is now able to trace API calls by printing the
+ function name, the argument names and their values, and the
+ return value. Some people like to see lots of output during
+ program execution instead of using a good symbolic debugger, and
+ this feature is intended for their consumption. For example,
+ the output from h5ls foo
after turning on tracing,
+ includes:
+
+
+
+
+ |
+
This all happens with some magic in the configuration script,
+ the makefiles, and macros. First, from the end-user point of
+ view, the library must be configured with the
+ --enable-tracing
switch. This causes the library to
+ include the support necessary for API tracing.
+
+
+
+
+ |
+
In order to actually get tracing output one must turn tracing
+ on and specify a file descriptor where the tracing output should
+ be written. This is done by assigning a file descriptor number
+ to the HDF5_TRACE
environment variable.
+
+
+
To display the trace on the standard error stream:
+
+ |
+
To send the trace to a file:
+
+ |
+
If the library was not configured for tracing then there is no + unnecessary overhead since all tracing code is + excluded. + +
However, if tracing is enabled but not used there is a
+ small penalty. First, code size is larger because of extra
+ statically-declared character strings used to store argument
+ types and names and extra auto variable pointer in each
+ function. Also, execution is slower because each function sets
+ and tests a local variable and each API function calls the
+ H5_trace()
function.
+
+
If tracing is enabled and turned on then the penalties from the + previous paragraph apply plus the time required to format each + line of tracing information. There is also an extra call to + H5_trace() for each API function to print the return value. + +
The tracing mechanism is invoked for each API function before + arguments are checked for validity. If bad arguments are passed + to an API function it could result in a segmentation fault. + However, the tracing output is line-buffered so all previous + output will appear. + +
There are two API functions that don't participate in
+ tracing. They are H5Eprint()
and
+ H5Eprint_cb()
because their participation would
+ mess up output during automatic error reporting.
+
+
On the other hand, a number of API functions are called during + library initialization and they print tracing information. + +
For those interested in the implementation here is a
+ description. Each API function should have a call to one of the
+ H5TRACE()
macros immediately after the
+ FUNC_ENTER()
macro. The first argument is the
+ return type encoded as a string. The second argument is the
+ types of all the function arguments encoded as a string. The
+ remaining arguments are the function arguments. This macro was
+ designed to be as terse and unobtrousive as possible.
+
+
In order to keep the H5TRACE()
calls synchronized
+ with the source code we've written a perl script which gets
+ called automatically just before Makefile dependencies are
+ calculated for the file. However, this only works when one is
+ using GNU make. To reinstrument the tracing explicitly, invoke
+ the trace
program from the hdf5 bin directory with
+ the names of the source files that need to be updated. If any
+ file needs to be modified then a backup is created by appending
+ a tilde to the file name.
+
+
+
+
+ |
+
Note: The warning message is the result of a comment of the
+ form /*NO TRACE*/
somewhere in the function
+ body. Tracing information will not be updated or inserted if
+ such a comment exists.
+
+
Error messages have the same format as a compiler so that they + can be parsed from program development environments like + Emacs. Any function which generates an error will not be + modified. + + +