The Datatype interface, H5T, provides a mechanism to describe the storage format of individual data points of a data set and is hopefully designed in such a way as to allow new features to be easily added without disrupting applications that use the data type interface. A dataset (the H5D interface) is composed of a collection or raw data points of homogeneous type organized according to the data space (the H5S interface).
A data type is a collection of data type properties, all of which can be stored on disk, and which when taken as a whole, provide complete information for data conversion to or from that data type. The interface provides functions to set and query properties of a data type.
A data point is an instance of a datatype, which is an instance of a type class. We have defined a set of type classes and properties which can be extended at a later time. The atomic type classes are those which describe types which cannot be decomposed at the data type interface level; all other classes are compound.
See The Datatype Interface (H5T) in the HDF5 User's Guide for further information.
H5Topen
(hid_t loc_id
,
const char * name
)
H5Topen
opens a named datatype at the location
specified by loc_id
and returns an identifier
for the data type. The identifier should eventually be closed
by calling H5Tclose()
to release resources.
loc_id
is either a file or group identifier.
From SC: Opens a named datatype.
loc_id
name
H5Tcommit
(hid_t loc_id
,
const char * name
,
hid_t type
)
H5Tcommit
commits a transient datatype (not immutable)
to a file, turned it into a named datatype. The loc_id
is either a file or group identifier which, when combined with
name
, refers to a new named data type.
From SC: Saves a transient data type to a file and turns the type identifier into a named, immutable type.
loc_id
name
type
H5Tcommitted
(hid_t type
)
H5Tcommitted
queries a type to determine whether
it is a named type or a transient type. If this function returns
a positive value, then the type is named (that is, it has been
committed, perhaps by some other application). Datasets which
return committed data types with H5Dget_type()
are
able to share the data type with other datasets in the same file.
From SC: Determines whether a data type is committed.
type
H5Tinsert_array
(hid_t parent_id
,
const char *name
,
size_t offset
,
int ndims
,
const size_t *dim
,
const int *perm
,
hid_t member_id
)
H5Tinsert_array
adds a new member to the
compound data type parent_id
. The new member's name,
name
, must be unique within the compound data type.
The offset
argument defines the start of
the member in an instance of the compound data type and
member_id
is the type of the new member.
The member is an array with ndims
dimensionality
and the size of the array is dim
.
The total member size should be relatively small
parent_id
name
offset
ndims
dim
perm
member_id
H5Tfind
(hid_t src_id
,
hid_t dst_id
,
H5T_cdata_t **pcdata
)
H5Tfind
finds a conversion function
that can handle a conversion from type src_id
to
type dst_id
.
The pcdata
argument is a pointer
to a pointer to type conversion data which was created and
initialized by the soft type conversion function of this path
when the conversion function was installed on the path.
src_id
dst_id
pcdata
H5Tconvert
(hid_t src_id
,
hid_t dst_id
,
size_t nelmts
,
void *buf
,
void *background
)
nelmts
elements from type
src_id
to type dst_id
.
The source elements are packed in buf
and on return
the destination will be packed in buf
.
That is, the conversion is performed in place.
The optional background buffer is an array of nelmts
values of destination type which are merged with the converted
values to fill in cracks (for instance, background
might be an array of structs with the a
and
b
fields already initialized and the conversion
of buf
supplies the c
and d
field values).
src_id
dst_id
nelmts
buf
background
H5Tset_overflow
(H5T_overflow_t func
)
H5Tset_overflow
sets the overflow handler
to be the function specified by func
.
func
will be called for all data type conversions that
result in an overflow.
See the definition of H5T_overflow_t
in
H5Tpublic.h
for documentation
of arguments and return values.
The prototype for H5T_overflow_t
is as follows:
herr_t (*H5T_overflow_t)(hid_t src_id, hid_t dst_id,
void *src_buf, void *dst_buf);
The NULL pointer may be passed to remove the overflow handler.
func
H5Tget_overflow
(void
)
H5Tset_overflow
returns a pointer
to the current global overflow function.
This is an application-defined function that is called whenever a
data type conversion causes an overflow.
H5Tcreate
(H5T_class_t class
,
size_tsize
)
H5Tcreate
creates a new dataype of the specified class with the
specified number of bytes. Currently, only the H5T_COMPOUND
datatype class is supported with this function, use H5Tcopy
to create integer or floating-point datatypes. The datatype ID
returned from this function should be released with H5Tclose or resource
leaks will result.
class
size
H5Tcopy
(hid_t type_id
)
H5Tcopy
copies an existing datatype. The datatype ID returned
should be released with H5Tclose or resource leaks will occur. Native
datatypes supported by the library are:
type_id
H5Tequal
(hid_t type_id1
,
hid_ttype_id2
)
H5Tequal
determines whether two datatype identifiers
refer to the same datatype.
type_id1
type_id2
H5Tlock
(hid_t type_id
)
H5Tlock
locks a type, making it read-only and non-destrucible.
This is normally done by the library for predefined data types so the
application doesn't inadvertently change or delete a predefined type.
Once a data type is locked it can never be unlocked.
type_id
H5Tget_class
(hid_t type_id
)
H5Tget_class
returns the base class of a datatype.
type_id
H5Tget_size
(hid_t type_id
)
H5Tget_size
returns the size of a datatype in bytes.
type_id
H5Tset_size
(hid_t type_id
,
size_tsize
)
H5Tset_size
sets the total size in bytes for an
atomic data type (this
operation is not permitted on compound data types). If the size is
decreased so that the significant bits of the data type extend beyond
the edge of the new size, then the `offset' property is decreased
toward zero. If the `offset' becomes zero and the significant
bits of the data type still hang over the edge of the new size, then
the number of significant bits is decreased.
Adjusting the size of an H5T_STRING automatically sets the precision
to 8*size. All data types have a positive size.
type_id
size
H5Tget_order
(hid_t type_id
)
H5Tget_order
returns the byte order of an atomic datatype.
type_id
H5Tset_order
(hid_t type_id
,
H5T_order_torder
)
H5Tset_order
sets the byte ordering of an atomic datatype.
Byte orderings currently supported are:
type_id
order
H5Tget_precision
(hid_t type_id
)
H5Tget_precision
returns the precision of an atomic data type. The
precision is the number of significant bits which, unless padding is
present, is 8 times larger than the value returned by H5Tget_size().
type_id
H5Tset_precision
(hid_t type_id
,
size_tprecision
)
H5Tset_precision
sets the precision of an atomic data type.
The precision is the number of significant bits which, unless padding
is present, is 8 times larger than the value returned by H5Tget_size().
If the precision is increased then the offset is decreased and then the size is increased to insure that significant bits do not "hang over" the edge of the data type.
Changing the precision of an H5T_STRING automatically changes the size as well. The precision must be a multiple of 8.
When decreasing the precision of a floating point type, set the locations and sizes of the sign, mantissa, and exponent fields first.
type_id
precision
H5Tget_offset
(hid_t type_id
)
H5Tget_offset
retrieves the bit offset of the first significant bit.
The signficant bits of an atomic datum can be offset from the beginning
of the memory for that datum by an amount of padding. The `offset'
property specifies the number of bits of padding that appear to the
"right of" the value. That is, if we have a 32-bit datum with 16-bits
of precision having the value 0x1122 then it will be layed out in
memory as (from small byte address toward larger byte addresses):
Byte Position | Big-Endian Offset=0 | Big-Endian Offset=16 | Little-Endian Offset=0 | Little-Endian Offset=16 |
---|---|---|---|---|
0: | [ pad] | [0x11] | [0x22] | [ pad] |
1: | [ pad] | [0x22] | [0x11] | [ pad] |
2: | [0x11] | [ pad] | [ pad] | [0x22] |
3: | [0x22] | [ pad] | [ pad] | [0x11] |
type_id
H5Tset_offset
(hid_t type_id
,
size_t offset
)
H5Tset_offset
sets the bit offset of the first significant bit. The
signficant bits of an atomic datum can be offset from the beginning of
the memory for that datum by an amount of padding. The `offset'
property specifies the number of bits of padding that appear to the
"right of" the value. That is, if we have a 32-bit datum with 16-bits
of precision having the value 0x1122 then it will be layed out in
memory as (from small byte address toward larger byte addresses):
Byte Position | Big-Endian Offset=0 | Big-Endian Offset=16 | Little-Endian Offset=0 | Little-Endian Offset=16 |
---|---|---|---|---|
0: | [ pad] | [0x11] | [0x22] | [ pad] |
1: | [ pad] | [0x22] | [0x11] | [ pad] |
2: | [0x11] | [ pad] | [ pad] | [0x22] |
3: | [0x22] | [ pad] | [ pad] | [0x11] |
If the offset is incremented then the total size is incremented also if necessary to prevent significant bits of the value from hanging over the edge of the data type.
The offset of an H5T_STRING cannot be set to anything but zero.
type_id
offset
H5Tget_pad
(hid_t type_id
,
H5T_pad_t * lsb
,
H5T_pad_t * msb
)
H5Tget_pad
retrieves the padding type of the least and most-significant
bit padding. Valid types are:
type_id
lsb
msb
H5Tset_pad
(hid_t type_id
,
H5T_pad_t lsb
,
H5T_pad_t msb
)
H5Tset_pad
sets the least and most-significant bits padding types.
type_id
lsb
msb
H5Tget_sign
(hid_t type_id
)
H5Tget_sign
retrieves the sign type for an integer type.
Valid types are:
type_id
H5Tset_sign
(hid_t type_id
,
H5T_sign_t sign
)
H5Tset_sign
sets the sign proprety for an integer type.
type_id
sign
H5Tget_fields
(hid_t type_id
,
size_t * epos
,
size_t * esize
,
size_t * mpos
,
size_t * msize
)
H5Tget_fields
retrieves information about the locations of the various
bit fields of a floating point data type. The field positions are bit
positions in the significant region of the data type. Bits are
numbered with the least significant bit number zero.
Any (or even all) of the arguments can be null pointers.
type_id
epos
esize
mpos
msize
H5Tset_fields
(hid_t type_id
,
size_t epos
,
size_t esize
,
size_t mpos
,
size_t msize
)
H5Tset_fields
sets the locations and sizes of the various floating
point bit fields. The field positions are bit positions in the
significant region of the data type. Bits are numbered with the least
significant bit number zero.
Fields are not allowed to extend beyond the number of bits of precision, nor are they allowed to overlap with one another.
type_id
epos
esize
mpos
msize
H5Tget_ebias
(hid_t type_id
)
H5Tget_ebias
retrieves the exponent bias of a floating-point type.
type_id
H5Tset_ebias
(hid_t type_id
,
size_t ebias
)
H5Tset_ebias
sets the exponent bias of a floating-point type.
type_id
ebias
H5Tget_norm
(hid_t type_id
)
H5Tget_norm
retrieves the mantissa normalization of a floating-point
datatype. Valid normalization values are:
type_id
H5Tset_norm
(hid_t type_id
,
H5T_norm_t norm
)
H5Tset_norm
sets the mantissa normalization of a floating-point
datatype. Valid normalization values are:
type_id
norm
H5Tget_inpad
(hid_t type_id
)
H5Tget_inpad
retrieves the internal padding type for
unused bits in floating-point datatypes.
Valid padding values are:
type_id
H5Tset_inpad
(hid_t type_id
,
H5T_pad_t inpad
)
H5Tset_inpad
will be filled
according to the value of the padding value property inpad
.
Valid padding values are:
type_id
pad
H5Tget_cset
(hid_t type_id
)
H5Tget_cset
retrieves the character set type of a string datatype.
Valid character set values are:
type_id
H5Tset_cset
(hid_t type_id
,
H5T_cset_t cset
)
type_id
cset
H5Tget_strpad
(hid_t type_id
)
H5Tget_strpad
retrieves the string padding method for a string datatype.
Valid string padding values are:
type_id
H5Tset_strpad
(hid_t type_id
,
H5T_str_t strpad
)
H5Tset_strpad
defines the storage mechanism for the string.
Valid string padding values are:
type_id
strpad
H5Tget_nmembers
(hid_t type_id
)
H5Tget_nmembers
retrieves the number of fields a compound datatype has.
type_id
H5Tget_member_name
(hid_t type_id
,
intn fieldno
)
H5Tget_member_name
retrieves the name of a field of a compound data type.
Fields are stored in no particular order with numbers 0 through N-1
where N is the value returned by H5Tget_nmembers(). The name of the
field is allocated with malloc() and the caller is responsible for
freeing the memory used by the name.
type_id
fieldno
H5Tget_member_dims
(hid_t type_id
,
intn fieldno
,
size_t * dims
,
int * perm
)
H5Tget_member_dims
returns the dimensionality of the field. The dimensions
and permuation vector are returned through arguments dims
and perm
, both arrays of at least four elements. Either
(or even both) may be null pointers.
type_id
fieldno
dims
perm
H5Tget_member_type
(hid_t type_id
,
intn fieldno
)
H5Tget_member_type
returns the data type of the specified member. The caller
should invoke H5Tclose() to release resources associated with the type.
type_id
fieldno
H5Tinsert
(hid_t type_id
,
const char * name
,
off_t offset
,
hid_t field_id
)
H5Tinsert
adds another member to the compound data type
type_id
. The new member has a name
which
must be unique within the compound data type. The offset
argument defines the start of the member in an instance of the compound
data type, and field_id
is the type of the new member.
Note: All members of a compound data type must be atomic; a compound data type cannot have a member which is a compound data type.
type_id
name
offset
field_id
H5Tpack
(hid_t type_id
)
H5Tpack
recursively removes padding from within a compound
datatype to make it more efficient (space-wise) to store that data.
type_id
H5Tregister_hard
(const char
* name
, hid_t src_id
,
hid_t dst_id
,
H5T_conv_t func
)
H5Tregister_hard
registers a hard conversion function for a data type
conversion path. The path is specified by the source and destination
datatypes src_id
and dst_id
. A conversion
path can only have one hard function, so func
replaces any
previous hard function.
If func
is the null pointer then any hard function
registered for this path is removed from this path. The soft functions
are then used when determining which conversion function is appropriate
for this path. The name
argument is used only
for debugging and should be a short identifier for the function.
The type of the conversion function pointer is declared as: typedef herr_t (*H5T_conv_t) (hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata, size_t nelmts, void *buf, void *bkg);
name
src_id
dst_id
func
H5Tregister_soft
(const char
* name
, hid_t src_id
,
hid_t dst_id
,
H5T_conv_t func
)
H5Tregister_soft
registers a soft conversion function by adding it to the
end of the master soft list and replacing the soft function in all
applicable existing conversion paths. The name
is used only for debugging and should be a short identifier
for the function.
The type of the conversion function pointer is declared as: typedef herr_t (*H5T_conv_t) (hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata, size_t nelmts, void *buf, void *bkg);
name
src_id
dst_id
func
H5Tunregister
(H5T_conv_t func
)
H5Tunregister
removes a conversion function from all conversion paths.
The type of the conversion function pointer is declared as: typedef herr_t (*H5T_conv_t) (hid_t src_id, hid_t dst_id, H5T_cdata_t *cdata, size_t nelmts, void *buf, void *bkg);
func
H5Tclose
(hid_t type_id
)
H5Tclose
releases a datatype. Further access through the datatype
ID is illegal. Failure to release a datatype with this call will
result in resource leaks.
type_id
H5Tshare
!....
in left margin indicates where material was
pulled out for inclusion above.
Elena> Datatype Interface: Elena> Do we have description of the named datatypes somewhere? >From Datatypes.html... html> 7. Sharing Data Types among Datasets html> html> If a file has lots of datasets which have a common data type html> then the file could be made smaller by having all the datasets html> share a single data type. Instead of storing a copy of the data html> type in each dataset object header, a single data type is stored html> and the object headers point to it. The space savings is html> probably only significant for datasets with a compound data type html> since the simple data types can be described with just a few html> bytes anyway. html> html> To create a bunch of datasets that share a single data type just html> create the datasets with a committed (named) data type. html> html> To create two datasets that share a common data type one just html> commits the data type, giving it a name, and then uses that html> data type to create the datasets. html> html> hid_t t1 = ...some transient type...; html> H5Tcommit (file, "shared_type", t1); html> hid_t dset1 = H5Dcreate (file, "dset1", t1, space, H5P_DEFAULT); html> hid_t dset2 = H5Dcreate (file, "dset2", t1, space, H5P_DEFAULT); html> html> html> And to create two additional datasets later which share the html> same type as the first two datasets: html> html> hid_t dset1 = H5Dopen (file, "dset1"); html> hid_t t2 = H5Dget_type (dset1); html> hid_t dset3 = H5Dcreate (file, "dset3", t2, space, H5P_DEFAULT); html> hid_t dset4 = H5Dcreate (file, "dset4", t2, space, H5P_DEFAULT); html> html> html> Example: Shared Types Mail from Quincey summarizing shared data types: Quincey> Hi Robb, Quincey> Everything looks good, I just have a couple of minor comments below: Quincey> Quincey> > A very quick data types summary (so I can remember it next week :-) Quincey> > Quincey> > * Handles to named types are immutable. Quincey> > Quincey> > * A transient type handle can be converted to a named type handle Quincey> > by calling H5Tcommit(). This can only be called for transient Quincey> > types which are not locked or predefined. Quincey> > Quincey> > * H5Topen() returns a handle to a named immutable type. Quincey> > Quincey> > * H5Tcopy() returns a handle to a transient type. Quincey> H5Tcreate also returns a handle to a transient type. Quincey> Quincey> > * Using a named type in H5Dcreate() causes the dataset object Quincey> > header to point to the named type (shared). The link count on Quincey> > the named type is incremented. Quincey> > Quincey> > * Using a transient type in H5Dcreate() causes the type to be Quincey> > copied and stored in the dataset header (unshared). Quincey> > Quincey> > * Type handles returned from H5Dget_type() are immutable. Quincey> > Quincey> > * If the dataset is using a shared type (dataset object header Quincey> > points to some other object header with a type message, e.g., a Quincey> > named type) then H5Dget_type() returns a handle to that named Quincey> > type. Quincey> > Quincey> > * If the dataset has a private type (data type is stored in the Quincey> > dataset object header) then H5Dget_type() returns a handle to a Quincey> > transient immutable type. Quincey> > Quincey> > * The name of a data type can be removed from a group, but unless Quincey> > the reference count becomes zero the type continues to exist. Quincey> > (Other objects work this way too). Quincey> > Quincey> > * H5Tcopy() applied to a dataset returns a transient, modifiable Quincey> > copy of that dataset's data type. Quincey> > Quincey> > * H5Topen() applied to a dataset returns either a transient Quincey> > immutable or named immutable data type depending on whether the Quincey> > dataset has a shared data type. Quincey> Hmm, do we want to allow this? It makes a certain amount of sense, but Quincey> is a little unusual... :-) Quincey> Elena, we decided not not to allow H5Topen() on a dataset. Quincey> Quincey> > * The H5Tshare() and H5Tis_shared() will be removed. Data types Quincey> > will not be stored in the global heap. A new type of shared Quincey> > message header will be added to the object headers that points to Quincey> > another object header instead of the global heap Quincey> Quincey> > * Still to discuss: Attributes on named data types? Quincey> I think we should all them. Quincey> Elena, attributes work for named data types just like they do for datasets. Quincey> Quincey> > * Still to discuss: compound types whose members point to other types. Quincey> I like this concept a lot and this we should figure out a way to do it. Quincey> This allows the "is a" relationship to be used very nicely for named datatypes. Quincey> Quincey> > * Still to discuss: datasets that point to data types in other Quincey> > files by symbolic link. Quincey> Probably a good idea also, just ugly to implement. Quincey>