diff options
Diffstat (limited to 'doc/html/Big.html')
-rw-r--r-- | doc/html/Big.html | 122 |
1 files changed, 0 insertions, 122 deletions
diff --git a/doc/html/Big.html b/doc/html/Big.html deleted file mode 100644 index fe00ff8..0000000 --- a/doc/html/Big.html +++ /dev/null @@ -1,122 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> -<html> - <head> - <title>Big Datasets on Small Machines</title> - </head> - - <body> - <h1>Big Datasets on Small Machines</h1> - - <h2>1. Introduction</h2> - - <p>The HDF5 library is able to handle files larger than the - maximum file size, and datasets larger than the maximum memory - size. For instance, a machine where <code>sizeof(off_t)</code> - and <code>sizeof(size_t)</code> are both four bytes can handle - datasets and files as large as 18x10^18 bytes. However, most - Unix systems limit the number of concurrently open files, so a - practical file size limit is closer to 512GB or 1TB. - - <p>Two "tricks" must be imployed on these small systems in order - to store large datasets. The first trick circumvents the - <code>off_t</code> file size limit and the second circumvents - the <code>size_t</code> main memory limit. - - <h2>2. File Size Limits</h2> - - <p>Systems that have 64-bit file addresses will be able to access - those files automatically. One should see the following output - from configure: - - <p><code><pre> -checking size of off_t... 8 - </pre></code> - - <p>Also, some 32-bit operating systems have special file systems - that can support large (>2GB) files and HDF5 will detect - these and use them automatically. If this is the case, the - output from configure will show: - - <p><code><pre> -checking for lseek64... yes -checking for fseek64... yes - </pre></code> - - <p>Otherwise one must use an HDF5 file family. Such a family is - created by setting file family properties in a file access - property list and then supplying a file name that includes a - <code>printf</code>-style integer format. For instance: - - <p><code><pre> -hid_t plist, file; -plist = H5Pcreate (H5P_FILE_ACCESS); -H5Pset_family (plist, 1<<30, H5P_DEFAULT); -file = H5Fcreate ("big%03d.h5", H5F_ACC_TRUNC, H5P_DEFAULT, plist); - </code></pre> - - <p>The second argument (<code>1<<30</code>) to - <code>H5Pset_family()</code> indicates that the family members - are to be 2^30 bytes (1GB) each although we could have used any - reasonably large value. In general, family members cannot be - 2GB because writes to byte number 2,147,483,647 will fail, so - the largest safe value for a family member is 2,147,483,647. - HDF5 will create family members on demand as the HDF5 address - space increases, but since most Unix systems limit the number of - concurrently open files the effective maximum size of the HDF5 - address space will be limited (the system on which this was - developed allows 1024 open files, so if each family member is - approx 2GB then the largest HDF5 file is approx 2TB). - - <p>If the effective HDF5 address space is limited then one may be - able to store datasets as external datasets each spanning - multiple files of any length since HDF5 opens external dataset - files one at a time. To arrange storage for a 5TB dataset split - among 1GB files one could say: - - <p><code><pre> -hid_t plist = H5Pcreate (H5P_DATASET_CREATE); -for (i=0; i<5*1024; i++) { - sprintf (name, "velocity-%04d.raw", i); - H5Pset_external (plist, name, 0, (size_t)1<<30); -} - </code></pre> - - <h2>3. Dataset Size Limits</h2> - - <p>The second limit which must be overcome is that of - <code>sizeof(size_t)</code>. HDF5 defines a data type called - <code>hsize_t</code> which is used for sizes of datasets and is, - by default, defined as <code>unsigned long long</code>. - - <p>To create a dataset with 8*2^30 4-byte integers for a total of - 32GB one first creates the dataspace. We give two examples - here: a 4-dimensional dataset whose dimension sizes are smaller - than the maximum value of a <code>size_t</code>, and a - 1-dimensional dataset whose dimension size is too large to fit - in a <code>size_t</code>. - - <p><code><pre> -hsize_t size1[4] = {8, 1024, 1024, 1024}; -hid_t space1 = H5Screate_simple (4, size1, size1); - -hsize_t size2[1] = {8589934592LL}; -hid_t space2 = H5Screate_simple (1, size2, size2}; - </pre></code> - - <p>However, the <code>LL</code> suffix is not portable, so it may - be better to replace the number with - <code>(hsize_t)8*1024*1024*1024</code>. - - <p>For compilers that don't support <code>long long</code> large - datasets will not be possible. The library performs too much - arithmetic on <code>hsize_t</code> types to make the use of a - struct feasible. - - <hr> - <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address> -<!-- Created: Fri Apr 10 13:26:04 EDT 1998 --> -<!-- hhmts start --> -Last modified: Sun Jul 19 11:37:25 EDT 1998 -<!-- hhmts end --> - </body> -</html> |