Ragged Arrays

The H5RA Interface is strictly experimental at this time; the interface may change dramatically or support for ragged arrays may be unavailable in future in releases. As a result, future releases may be unable to retrieve data stored with this interface.

Use these functions at your own risk!
Do not create any archives using this interface!

1. Introduction

Ragged arrays should be considered alpha quality. They were added to HDF5 to satisfy the needs of the ASCI/DMF vector bundle project; the interface and storage methods are likely to change in the future in ways that are not backward compatible.

A two-dimensional ragged array has been added to the library and built on top of other existing functionality. A ragged array is a one-dimensional array of rows where the length of any row is independent of the lengths of the other rows. The number of rows and the length of each row can be changed at any time (the current version does not support truncating an array by removing rows). All elements of the ragged array have the same data type and, as with datasets, the data is type-converted between memory buffers and files.

The current implementation works best when most of the rows are approximately the same length since a two dimensional dataset can be created to hold a nominal number of elements from each row with the additional elements stored in a separate dataset which implements a heap.

A ragged array is a composite object implemented as a group with three datasets. The name of the group is the name of the ragged array. The raw dataset is a two-dimensional array that contains the first N elements of each row where N is determined by the application when the array is created. If most rows have fewer than N elements then internal fragmentation may be quite bad.

The over dataset is a one-dimensional array that contains elements from each row that don't fit in the raw dataset.

The meta dataset maintains information about each row such as the number of elements in the row, the location of the overflow elements in the over dataset (if any), and the amount of space reserved in over for the row. The meta dataset has one entry per row and is where most of the storage overhead is concentrated when rows are relatively short.

2. Opening and Closing

hid_t H5RAcreate (hid_t location, const char *name, hid_t type, hid_t plist)
This function creates a new ragged array by creating the group with the specified name and populating it with the component datasets (which should not be accessed independently). The dataset creation property list plist defines the width of the raw dataset; a nominal row is considered to be the width of a chunk. The type argument defines the data type which will be stored in the file. A negative value is returned if the array cannot be created.

hid_t H5RAopen (hid_t location, const char *name)
This function opens a ragged array by opening the specified group and the component datasets (which should not be accessed indepently). A negative value is returned if the array cannot be opened.

herr_t H5RAclose (hid_t array)
All ragged arrays should be closed by calling this function. The group and component datasets will be closed automatically by the library.

3. Reading and Writing

In order to be as efficient as possible the ragged array layer operates on sets of contiguous rows and it is to the application's advantage to perform I/O on as many rows at a time as possible. These functions take a starting row number and the number of rows on which to operate.

herr_t H5RAwrite (hid_t array_id, hssize_t start_row, hsize_t nrows, hid_t type, hsize_t size[], void *buf[])
A set of ragged array rows beginning at start_row and continuing for nrows is written to the file, converting the memory data type type to the file data type which was defined when the array was created. The number of elements to write from each row is specified in the size array and the data for each row is pointed to from the buf array. The size and buf are indexed so their first element corresponds to the first row on which to operate.

herr_t H5RAread (hid_t array_id, hssize_t start_row, hsize_t nrows, hid_t type, hsize_t size[], void *buf[])
A set of ragged array rows beginning at start_row and continuing for nrows is read from the file, converting from the file data type which was defined when the array was created to the memory data type type. The number of elements to read from each row is specified in the size array and the buffers in which to place the results are pointed to by the buf array. On return, the size array will contain the actual size of the row which may be different than the requested size. When the request size is smaller than the actual size the row will be truncated; otherwise the remainder of the output buffer will be zero filled. If a pointer in the buf array is null then the library will ignore the corresponding size value and allocate a buffer large enough to hold the entire row. This function returns negative for failures with buf containing the original input values.

HDF Help Desk
Last modified: 21 October 1998