References

1. Introduction

This document discusses the kinds of references implemented (and planned) in HDF5 and the functions implemented (and planned) to support them.

2. References

This section contains an overview of the kinds of references implemented, or planned for implementation, in HDF5.

Object reference
Reference to an entire object in the current HDF5 file.
The only kind of reference currently implemented.

An object reference points to an entire object in the current HDF5 file by storing the relative file address (OID) of the object header for the object pointed to. The relative file address of an object header is constant for the life of the object. An object reference is of a fixed size in the file.

Dataset region reference
Reference to a specific dataset region.
Not yet implemented.

A dataset region reference points to a region of a dataset in the current HDF5 file by storing the OID of the dataset and the global heap offset of the region referenced. The region referenced is located by retrieving the coordinates of the areas in the region from the global heap. A dataset region reference is of a variable size in the file.

Internal dataset region reference
Reference to a region within the current dataset.
Not yet implemented.

An internal dataset region reference points to a region of the current dataset by storing the coordinates of the region. An internal dataset region reference is of a fixed size in the file.

Note: All references are treated as soft links for the purposes of reference counting. The library does not keep track of reference links and they may dangle if the object they refer to is deleted, moved, or not yet available.

3. Reference Types

This section lists valid HDF5 reference types for use in the H5R functions.
Reference TypeValue  Description
H5R_BADTYPE -1   Invalid reference type
H5R_OBJECT 0   Object reference
H5R_DATASET_REGION 1   Dataset region reference
H5R_INTERNAL 2   Internal reference

4. Functions

Four functions, three in the H5R interface and one in the H5I interface, have been implemented to support references. The H5I function is also useful outside the context of references.

herr_t H5Rcreate(href_t *reference, hid_t loc_id, const char *name, H5R_type_t type, hid_t space_id)
H5Rcreate creates an object which is a particular type of reference (specified with the type parameter) to some file object and/or location specified with the space_id parameter. For dataset region references, the selection specified in the dataspace is the portion of the dataset which will be referred to.

Currently only object references which point to entire datasets can be created.

hid_t H5Rdereference(hid_t dset, H5R_type_t rtype, href_t *ref)
H5Rdereference opens the object referenced and returns an identifier for that object. The parameter ref specifies a reference of type rtype that is stored in the dataset dset.

hid_t H5Rget_region(hid_t dataset, H5R_type_t type, href_t *reference)
H5Rget_region creates a copy of dataspace of the dataset that is pointed to and defines a selection in the copy which is the location (or region) pointed to. The parameter ref specifies a reference of type rtype that is stored in the dataset dset.

This function is not yet implemented.

H5I_type_t H5Iget_type(hid_t id)
Returns the type of object referred to by the identifier id. Valid return values appear in the following list:
H5I_BADID Invalid ID
H5I_FILE File objects
H5I_GROUP Group objects
H5I_DATATYPE Data type objects
H5I_DATASPACE Dataspace objects
H5I_DATASET Dataset objects
H5I_ATTR Attribute objects

This function was inspired by the need of users to figure out which type of object closing function (H5Dclose, H5Gclose, etc.) to call after a call to H5Ddereference, but it is also of general use.

5. Examples

Object Reference Writing Example
Create a dataset which has links to other datasets as part of its raw data and write the dataset to the file.

{
    hid_t file1;
    hid_t dataset1;
    hid_t datatype, dataspace;
    char buf[128];
    href_t link;
    href_t data[10][10];
    int rank;
    size_t dimsf[2];
    int i, j;

    /* Open the file */
    file1=H5Fopen("example.h5", H5F_ACC_RDWR, H5P_DEFAULT);

    /* Describe the size of the array and create the data space */
    rank=2;
    dimsf[0] = 10;
    dimsf[1] = 10;
    dataspace = H5Screate_simple(rank, dimsf, NULL); 

    /* Define datatype */
    datatype = H5Tcopy(H5T_STD_REF_OBJ);

    /* Create a dataset */
    dataset1=H5Dcreate(file1,"Dataset One",datatype,dataspace,H5P_DEFAULT);

    /* Construct array of OIDs for other datasets in the file */
    /* somewhat hokey and artificial, but demonstrates the point */
    for(i=0; i<10; i++)
        for(j=0; j<10; i++)
          {
            sprintf(buf,"/Group/Linked Set %d-%d",i,j);
            if(H5Rcreate(&link,file1,buf,H5R_REF_OBJ,-1)>0)
                data[i][j]=link;
          } /* end for */

    /* Write the data to the dataset using default transfer properties.  */
    H5Dwrite(dataset, H5T_POINTER_OBJECT, H5S_ALL, H5S_ALL, H5P_DEFAULT, data);

    /* Close everything */
    H5Sclose(dataspace);
    H5Tclose(datatype);
    H5Dclose(dataset1);
    H5Fclose(file1);
}
Object Reference Reading Example
Open a dataset which has links to other datasets as part of its raw data and read in those links.

{
    hid_t file1;
    hid_t dataset1, tmp_dset;
    href_t data[10][10];
    int i, j;

    /* Open the file */
    file1=H5Fopen("example.h5", H5F_ACC_RDWR, H5P_DEFAULT);

    /* Open the dataset */
    dataset1=H5Dopen(file1,"Dataset One",H5P_DEFAULT);

    /* 
     * Read the data to the dataset using default transfer properties.
     * (we are assuming the dataset is the same and not querying the
     *  dimensions, etc.)
     */
    H5Dread(dataset, H5T_STD_REF_OBJ, H5S_ALL, H5S_ALL, H5P_DEFAULT, data);

    /* Analyze array of OIDs of linked datasets in the file */
    /* somewhat hokey and artificial, but demonstrates the point */
    for(i=0; i<10; i++)
        for(j=0; j<10; i++)
          {
            if((tmp_dset=H5Rdereference(dataset, H5T_STD_REF_OBJ, data[i][j]))>0)
              {
                  
              } /* end if */
            H5Dclose(tmp_dset);
          } /* end for */


    /* Close everything */
    H5Dclose(dataset1);
    H5Fclose(file1);
}
Dataset Region Reference Writing Example
Create a dataset which has links to other dataset regions (single elements in this case) as part of its raw data and write the dataset to the file.

{
    hid_t file1;
    hid_t dataset1, dataset2;
    hid_t datatype, dataspace1, dataspace2;
    char buf[128];
    href_t link;
    href_t data[10][10];     /* HDF5 reference type */
    int rank;
    size_t dimsf[2];
    hssize_t start[3],count[3];
    int i, j;

    /* Open the file */
    file1=H5Fopen("example.h5", H5F_ACC_RDWR, H5P_DEFAULT);

    /* Describe the size of the array and create the data space */
    rank=2;
    dimsf[0] = 10;
    dimsf[1] = 10;
    dataspace1 = H5Screate_simple(rank, dimsf, NULL); 

    /* Define Dataset Region Reference datatype */
    datatype = H5Tcopy(H5T_STD_REF_DATAREG);

    /* Create a dataset */
    dataset1=H5Dcreate(file1,"Dataset One",datatype,dataspace1,H5P_DEFAULT);

    /* Construct array of OIDs for other datasets in the file */
    /* (somewhat artificial, but demonstrates the point) */
    for(i=0; i<10; i++)
        for(j=0; j<10; i++)
          {
            sprintf(buf,"/Group/Linked Set %d-%d",i,j);
            
            /* Get the dataspace for the object to point to */
            dataset2=H5Dopen(file1,buf,H5P_DEFAULT);
            dataspace2=H5Dget_space(dataspace2);

            /* Select the region to point to */
            /* (could be different region for each pointer) */
            start[0]=5; start[1]=4; start[2]=3;
            count[0]=2; count[1]=4; count[2]=1;
            H5Sselect_hyperslab(dataspace2,H5S_SELECT_SET,start,NULL,count,NULL);

            if(H5Rcreate(&link,file1,buf,H5R_REF_DATAREG,dataspace2)>0)
                /* Store the reference */
                data[i][j]=link;

            H5Sclose(dataspace2);
            H5Dclose(dataspace2);
          } /* end for */

    /* Write the data to the dataset using default transfer properties.  */
    H5Dwrite(dataset, H5T_STD_REF_DATAREG, H5S_ALL, H5S_ALL, H5P_DEFAULT, data);

    /* Close everything */
    H5Sclose(dataspace);
    H5Tclose(datatype);
    H5Dclose(dataset1);
    H5Fclose(file1);
}
Dataset Region Reference Reading Example
Open a dataset which has links to other datasets regions (single elements in this case) as part of its raw data and read in those links.

{
    hid_t file1;
    hid_t dataset1, tmp_dset;
    hid_t dataspace;
    href_t data[10][10];     /* HDF5 reference type */
    int i, j;

    /* Open the file */
    file1=H5Fopen("example.h5", H5F_ACC_RDWR, H5P_DEFAULT);

    /* Open the dataset */
    dataset1=H5Dopen(file1,"Dataset One",H5P_DEFAULT);

    /* 
     * Read the data to the dataset using default transfer properties.
     * (we are assuming the dataset is the same and not querying the
     *  dimensions, etc.)
     */
    H5Dread(dataset, H5T_STD_REF_DATAREG, H5S_ALL, H5S_ALL, H5P_DEFAULT, data);

    /* Analyze array of OIDs of linked datasets in the file */
    /* (somewhat artificial, but demonstrates the point) */
    for(i=0; i<10; i++)
        for(j=0; j<10; i++)
          {
            if((tmp_dset=H5Rdereference(dataset, H5D_STD_REF_DATAREG,data[i][j]))>0)
              {
                  /* Get the dataspace with the pointed to region selected */
                  dataspace=H5Rget_space(data[i][j]);

                  

                  H5Sclose(dataspace);
              } /* end if */
            H5Dclose(tmp_dset);
          } /* end for */


    /* Close everything */
    H5Dclose(dataset1);
    H5Fclose(file1);
}

HDF Help Desk
Last modified: 28 October 1998

Material to Be Omitted!!!

Additional material above will also need to be deleted or commented out.

"Kinds of Reference" Information

Dataset Offset Reference
Reference to a specific byte sequence in a dataset (3)
Disk Offset Reference
Reference to a specific byte sequence in a file (3)
External Object Reference
Reference to an entire object in another HDF5 file (3)
External Dataset Region Reference
Reference an a specific dataset region in another HDF5 file (3)
External Dataset Offset Reference
Reference to a specific byte sequence in a dataset in another HDF5 file (3)
External Disk Offset Reference
Reference to a specific byte sequence in another HDF5 file (3)
Generic Reference
A reference which may be any of the types defined above. (3)
Notes:

Comments

    Reference types are atomic types and may be included as fields in compound
        data types.

    There are (at least) three levels of reference strength:
        Weak - We allow the user to store any type of reference in an array
            of references.  (u.e., the array of references in the example above
            could be a mix of Object, Dataset Region and Internal references)
        Medium - We force the user to stick with a particular type of
            reference within a dataset, but the datasets pointed to (with
            Object and Dataset Region references) may be of any data type
            or dataspace.
        Strong - We force the user to stick with a particular type of
            reference and Object and Dataset Region references must point to
            datasets with the same data type.
        Extra Strong - We force the user to stick with a particular type of
            reference and Object and Dataset Region references must point to
            datasets with the same data type _and_ dataspace.

    The library is currently implemented with "medium" strength references.

Reference Type

H5R_MAXTYPE 3   Highest type in group (invalid as true type)

Information Regarding Specific Kinds of References

Dataset Offset Reference
Points to a sequence of bytes within a dataset in the current HDF5 file by storing the OID of the dataset and the byte length and offset of the sequence within the dataset. The offset is the logical byte offset within the dataset, meaning that the data is de-compressed before returning the sequence of bytes requested. No interpretation of the data at that location is provided. However, if the dataset is extendible and the size of the dimensions are changed, the element(s) that the sequence is located within may vary. Fixed size in file.
Disk Offset Reference
Points to a sequence of bytes in the current HDF5 file by storing the byte length and offset of the sequence within the file, relative to the boot-block (as are all the other high-level addresses used in the file). The offset is the absolute byte offset within the file, no interpretation of the data at that location is provided. Fixed size in file.
External Object Reference
Points to an entire object in another HDF5 file by storing a global heap offset which points to the URL of the external file and the OID of the object pointed to. Variable size in file.
External Dataset Region Reference
Points to a region of a dataset in another HDF5 file by storing a global heap offset which points to the URL of the external file, OID of the dataset and the coordinates of the region. Variable size in file.
External Dataset Offset Reference
Points to a sequence of bytes within a dataset in another HDF5 file by storing a global heap offset which points to the URL of the external file, the OID of the dataset and the byte length and offset of the sequence within the dataset. The offset is the logical byte offset within the dataset, meaning that the data is de-compressed before returning the sequence of bytes requested. However, if the dataset is not stored contiguously and the size of the dimensions are changed, the element(s) that the sequence is located within may vary. Variable size in file.
External Disk Offset Reference
Points to a sequence of bytes in another HDF5 file by storing a global heap reference which points to the URL of the external file and the byte length and offset of the sequence within the file. The offset is the absolute byte offset within the file, no interpretation of the data at that location is provided. Variable size in file.
Generic Reference
A reference which may contain any of the other references defined above. (Mostly useful for implementing "weak" strength pointers within the medium strength model we are using) Variable size in file.

Implementation Details

File Storage
In order to efficiently index an array, each element must be the same size when stored in the dataset on disk. Fixed-sized references will be stored directly in the dataset array on disk; variable-sized references will have a fixed-size head offset stored in the array on disk with a file heap used to store the actual variable-sized information stored in the heap.
Memory Storage
Each href_t object in memory is a struct containing a pointer type and union of information required for each pointer type. Information in this structure is not designed for users to view. Non-C APIs may have to mangle this structure in some way, in order to provide users with access to references in a language-appropriate way.