Files

1. Introduction

HDF5 files are composed of a "boot block" describing information required to portably access files on multiple platforms, followed by information about the groups in a file and the datasets in the file. The boot block contains information about the size of offsets and lengths of objects, the number of entries in symbol tables (used to store groups) and additional version information for the file.

2. File access modes

The HDF5 library assumes that all files are implicitly opened for read access at all times. Passing the H5F_ACC_RDWR parameter to H5Fopen() allows write access to a file also. H5Fcreate() assumes write access as well as read access, passing H5F_ACC_TRUNC forces the truncation of an existing file, otherwise H5Fcreate will fail to overwrite an existing file.

3. Creating, Opening, and Closing Files

Files are created with the H5Fcreate() function, and existing files can be accessed with H5Fopen(). Both functions return an object ID which should be eventually released by calling H5Fclose().

hid_t H5Fcreate (const char *name, uintn flags, hid_t create_properties, hid_t access_properties)
This function creates a new file with the specified name in the current directory. The file is opened with read and write permission, and if the H5F_ACC_TRUNC flag is set, any current file is truncated when the new file is created. If a file of the same name exists and the H5F_ACC_TRUNC flag is not set (or the H5F_ACC_EXCL bit is set), this function will fail. Passing H5P_DEFAULT for the creation and/or access property lists uses the library's default values for those properties. Creating and changing the values of a property list is documented further below. The return value is an ID for the open file and it should be closed by calling H5Fclose() when it's no longer needed. A negative value is returned for failure.

hid_t H5Fopen (const char *name, uintn flags, hid_t access_properties)
This function opens an existing file with read permission and write permission if the H5F_ACC_RDWR flag is set. The access_properties is a file access property list ID or H5P_DEFAULT for the default I/O access parameters. Creating and changing the parameters for access templates is documented further below. Files which are opened more than once return a unique identifier for each H5Fopen() call and can be accessed through all file IDs. The return value is an ID for the open file and it should be closed by calling H5Fclose() when it's no longer needed. A negative value is returned for failure.

herr_t H5Fclose (hid_t file_id)
This function releases resources used by a file which was opened by H5Fcreate() or H5Fopen(). After closing a file the file_id should not be used again. This function returns zero for success or a negative value for failure.

herr_t H5Fflush (hid_t object_id)
This function will cause all buffers associated with a file to be immediately flushed to the file. The object_id can be any object which is associated with a file, including the file itself.

4. File Property Lists

Additional parameters to H5Fcreate() or H5Fopen() are passed through property list objects, which are created with the H5Pcreate() function. These objects allow many parameters of a file's creation or access to be changed from the default values. Property lists are used as a portable and extensible method of modifying multiple parameter values with simple API functions. There are two kinds of file-related property lists, namely file creation properties and file access properties.

4.1. File Creation Properties

File creation property lists apply to H5Fcreate() only and are used to control the file meta-data which is maintained in the boot block of the file. The parameters which can be modified are:

User-Block Size
The "user-block" is a fixed length block of data located at the beginning of the file which is ignored by the HDF5 library and may be used to store any data information found to be useful to applications. This value may be set to any power of two equal to 512 or greater (i.e. 512, 1024, 2048, etc). This parameter is set and queried with the H5Pset_userblock() and H5Pget_userblock() calls.

Offset and Length Sizes
The number of bytes used to store the offset and length of objects in the HDF5 file can be controlled with this parameter. Values of 2, 4 and 8 bytes are currently supported to allow 16-bit, 32-bit and 64-bit files to be addressed. These parameters are set and queried with the H5Pset_sizes() and H5Pget_sizes() calls.

Symbol Table Parameters
The size of symbol table B-trees can be controlled by setting the 1/2 rank and 1/2 node size parameters of the B-tree. These parameters are set and queried with the H5Pset_sym_k() and H5Pget_sym_k() calls.

Indexed Storage Parameters
The size of indexed storage B-trees can be controlled by setting the 1/2 rank and 1/2 node size parameters of the B-tree. These parameters are set and queried with the H5Pset_istore_k() and H5Pget_istore_k() calls.

4.2. File Access Property Lists

File access property lists apply to H5Fcreate() or H5Fopen() and are used to control different methods of performing I/O on files.

Unbuffered I/O
Local permanent files can be accessed with the functions described in Section 2 of the Posix manual, namely open(), lseek(), read(), write(), and close(). The lseek64() function is used on operating systems that support it. This driver is enabled and configured with H5Pset_sec2(), and queried with H5Pget_sec2().

Buffered I/O
Local permanent files can be accessed with the functions declared in the stdio.h header file, namely fopen(), fseek(), fread(), fwrite(), and fclose(). The fseek64() function is used on operating systems that support it. This driver is enabled and configured with H5Pset_stdio(), and queried with H5Pget_stdio().

Memory I/O
Local temporary files can be created and accessed directly from memory without ever creating permanent storage. The library uses malloc() and free() to create storage space for the file. The total size of the file must be small enough to fit in virtual memory. The name supplied to H5Fcreate() is irrelevant, and H5Fopen() will always fail.

Parallel Files using MPI I/O
This driver allows parallel access to a file through the MPI I/O library. The parameters which can be modified are the MPI communicator, the info object, and the access mode. The communicator and info object are saved and then passed to MPI_File_open() during file creation or open. The access_mode controls the kind of parallel access the application intends. (Note that it is likely that the next API revision will remove the access_mode parameter and have access control specified via the raw data transfer property list of H5Dread() and H5Dwrite().) These parameters are set and queried with the H5Pset_mpi() and H5Pget_mpi() calls.

Data Alignment
Sometimes file access is faster if certain things are aligned on file blocks. This can be controlled by setting alignment properties of a file access property list with the H5Pset_alignment() function. Any allocation request at least as large as some threshold will be aligned on an address which is a multiple of some number.

5. Examples of using file templates

5.1. Example of using file creation templates

This following example shows how to create a file with 64-bit object offsets and lengths:

        hid_t create_template;
        hid_t file_id;

        create_template = H5Pcreate(H5P_FILE_CREATE);
        H5Pset_sizes(create_template, 8, 8);

        file_id = H5Fcreate("test.h5", H5F_ACC_TRUNC,
                             create_template, H5P_DEFAULT);
        .
        .
        .
        H5Fclose(file_id);
    

5.2. Example of using file creation templates

This following example shows how to open an existing file for independent datasets access by MPI parallel I/O:

        hid_t access_template;
        hid_t file_id;

        access_template = H5Pcreate(H5P_FILE_ACCESS);
        H5Pset_mpi(access_template, MPI_COMM_WORLD, MPI_INFO_NULL);

	/* H5Fopen must be called collectively */
        file_id = H5Fopen("test.h5", H5F_ACC_RDWR, access_template);
        .
        .
        .
	/* H5Fclose must be called collectively */
        H5Fclose(file_id);
        

6. Low-level File Drivers

HDF5 is able to access its address space through various types of low-level file drivers. For instance, an address space might correspond to a single file on a Unix file system, multiple files on a Unix file system, multiple files on a parallel file system, or a block of memory within the application. Generally, an HDF5 address space is referred to as an "HDF5 file" regardless of how the space is organized at the storage level.

6.1. Unbuffered Permanent Files

The sec2 driver uses functions from section 2 of the Posix manual to access files stored on a local file system. These are the open(), close(), read(), write(), and lseek() functions. If the operating system supports lseek64() then it is used instead of lseek(). The library buffers meta data regardless of the low-level driver, but using this driver prevents data from being buffered again by the lowest layers of the HDF5 library.

H5F_driver_t H5Pget_driver (hid_t access_properties)
This function returns the constant H5F_LOW_SEC2 if the sec2 driver is defined as the low-level driver for the specified access property list.

herr_t H5Pset_sec2 (hid_t access_properties)
The file access properties are set to use the sec2 driver. Any previously defined driver properties are erased from the property list. Additional parameters may be added to this function in the future.

herr_t H5Pget_sec2 (hid_t access_properties)
If the file access property list is set to the sec2 driver then this function returns zero; otherwise it returns a negative value. In the future, additional arguments may be added to this function to match those added to H5Pset_sec2().

6.2. Buffered Permanent Files

The stdio driver uses the functions declared in the stdio.h header file to access permanent files in a local file system. These are the fopen(), fclose(), fread(), fwrite(), and fseek() functions. If the operating system supports fseek64() then it is used instead of fseek(). Use of this driver introduces an additional layer of buffering beneath the HDF5 library.

H5F_driver_t H5Pget_driver(hid_t access_properties)
This function returns the constant H5F_LOW_STDIO if the stdio driver is defined as the low-level driver for the specified access property list.

herr_t H5Pset_stdio (hid_t access_properties)
The file access properties are set to use the stdio driver. Any previously defined driver properties are erased from the property list. Additional parameters may be added to this function in the future.

herr_t H5Pget_stdio (hid_t access_properties)
If the file access property list is set to the stdio driver then this function returns zero; otherwise it returns a negative value. In the future, additional arguments may be added to this function to match those added to H5Pset_stdio().

6.3. Buffered Temporary Files

The core driver uses malloc() and free() to allocated space for a file in the heap. Reading and writing to a file of this type results in mem-to-mem copies instead of disk I/O and as a result is somewhat faster. However, the total file size must not exceed the amount of available virtual memory, and only one HDF5 file handle can access the file (because the name of such a file is insignificant and H5Fopen() always fails).

H5F_driver_t H5Pget_driver (hid_t access_properties)
This function returns the constant H5F_LOW_CORE if the core driver is defined as the low-level driver for the specified access property list.

herr_t H5Pset_core (hid_t access_properties, size_t block_size)
The file access properties are set to use the core driver and any previously defined driver properties are erased from the property list. Memory for the file will always be allocated in units of the specified block_size. Additional parameters may be added to this function in the future.

herr_t H5Pget_core (hid_t access_properties, size_t *block_size)
If the file access property list is set to the core driver then this function returns zero and block_size is set to the block size used for the file; otherwise it returns a negative value. In the future, additional arguments may be added to this function to match those added to H5Pset_core().

6.4. Parallel Files

This driver uses MPI I/O to provide parallel access to a file.

H5F_driver_t H5Pget_driver (hid_t access_properties)
This function returns the constant H5F_LOW_MPI if the mpi driver is defined as the low-level driver for the specified access property list.

herr_t H5Pset_mpi (hid_t access_properties, MPI_Comm comm, MPI_info info)
The file access properties are set to use the mpi driver and any previously defined driver properties are erased from the property list. Additional parameters may be added to this function in the future.

herr_t H5Pget_mpi (hid_t access_properties, MPI_Comm *comm, MPI_info *info)
If the file access property list is set to the mpi driver then this function returns zero and comm, and info are set to the values stored in the property list; otherwise the function returns a negative value. In the future, additional arguments may be added to this function to match those added to H5Pset_mpi().

6.5. File Families

A single HDF5 address space may be split into multiple files which, together, form a file family. Each member of the family must be the same logical size although the size and disk storage reported by ls(1) may be substantially smaller. The name passed to H5Fcreate() or H5Fopen() should include a printf(3c) style integer format specifier which will be replaced with the family member number (the first family member is zero).

Any HDF5 file can be split into a family of files by running the file through split(1) and numbering the output files. However, because HDF5 is lazy about extending the size of family members, a valid file cannot generally be created by concatenation of the family members. Additionally, split and cat don't attempt to generate files with holes. The h5repart program can be used to repartition an HDF5 file or family into another file or family and preserves holes in the files.

h5repart [-v] [-b block_size[suffix]] [-m member_size[suffix]] source destination
This program repartitions an HDF5 file by copying the source file or family to the destination file or family preserving holes in the underlying Unix files. Families are used for the source and/or destination if the name includes a printf-style integer format such as "%d". The -v switch prints input and output file names on the standard error stream for progress monitoring, -b sets the I/O block size (the default is 1kB), and -m sets the output member size if the destination is a family name (the default is 1GB). The block and member sizes may be suffixed with the letters g, m, or k for GB, MB, or kB respectively.

H5F_driver_t H5Pget_driver (hid_t access_properties)
This function returns the constant H5F_LOW_FAMILY if the family driver is defined as the low-level driver for the specified access property list.

herr_t H5Pset_family (hid_t access_properties, hsize_t memb_size, hid_t member_properties)
The file access properties are set to use the family driver and any previously defined driver properties are erased from the property list. Each member of the file family will use member_properties as its file access property list. The memb_size argument gives the logical size in bytes of each family member but the actual size could be smaller depending on whether the file contains holes. The member size is only used when creating a new file or truncating an existing file; otherwise the member size comes from the size of the first member of the family being opened. Note: if the size of the off_t type is four bytes then the maximum family member size is usually 2^31-1 because the byte at offset 2,147,483,647 is generally inaccessable. Additional parameters may be added to this function in the future.

herr_t H5Pget_family (hid_t access_properties, hsize_t *memb_size, hid_t *member_properties)
If the file access property list is set to the family driver then this function returns zero; otherwise the function returns a negative value. On successful return, access_properties will point to a copy of the member access property list which should be closed by calling H5Pclose() when the application is finished with it. If memb_size is non-null then it will contain the logical size in bytes of each family member. In the future, additional arguments may be added to this function to match those added to H5Pset_family().

6.6. Split Meta/Raw Files

On occasion, it might be useful to separate meta data from raw data. The split driver does this by creating two files: one for meta data and another for raw data. The application provides a base file name to H5Fcreate() or H5Fopen() and this driver appends a file extension which defaults to ".meta" for the meta data file and ".raw" for the raw data file. Each file can have its own file access property list which allows, for instance, a split file with meta data stored with the core driver and raw data stored with the sec2 driver.

H5F_driver_t H5Pget_driver (hid_t access_properties)
This function returns the constant H5F_LOW_SPLIT if the split driver is defined as the low-level driver for the specified access property list.

herr_t H5Pset_split (hid_t access_properties, const char *meta_extension, hid_t meta_properties, const char *raw_extension, hid_t raw_properties)
The file access properties are set to use the split driver and any previously defined driver properties are erased from the property list. The meta file will have a name which is formed by adding meta_extension (or ".meta") to the end of the base name and will be accessed according to the meta_properties. The raw file will have a name which is formed by appending raw_extension (or ".raw") to the base name and will be accessed according to the raw_properties. Additional parameters may be added to this function in the future.

herr_t H5Pget_split (hid_t access_properties, size_t meta_ext_size, const char *meta_extension, hid_t meta_properties, size_t raw_ext_size, const char *raw_extension, hid_t *raw_properties)
If the file access property list is set to the split driver then this function returns zero; otherwise the function returns a negative value. On successful return, meta_properties and raw_properties will point to copies of the meta and raw access property lists which should be closed by calling H5Pclose() when the application is finished with them, but if the meta and/or raw file has no property list then a negative value is returned for that property list handle. Also, if meta_extension and/or raw_extension are non-null pointers, at most meta_ext_size or raw_ext_size characters of the meta or raw file name extension will be copied to the specified buffer. If the actual name is longer than what was requested then the result will not be null terminated (similar to strncpy()). In the future, additional arguments may be added to this function to match those added to H5Pset_split().

Quincey Koziol
Robb Matzke
Last modified: Thu Aug 6 16:17:08 EDT 1998