1 files changed, 122 insertions, 0 deletions
diff --git a/doc/html/TechNotes/BigDataSmMach.html b/doc/html/TechNotes/BigDataSmMach.html
new file mode 100644
index 0000000..fe00ff8
--- /dev/null
+++ b/doc/html/TechNotes/BigDataSmMach.html
@@ -0,0 +1,122 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+  <head>
+    <title>Big Datasets on Small Machines</title>
+  </head>
+
+  <body>
+    <h1>Big Datasets on Small Machines</h1>
+
+    <h2>1. Introduction</h2>
+
+    <p>The HDF5 library is able to handle files larger than the
+      maximum file size, and datasets larger than the maximum memory
+      size.  For instance, a machine where <code>sizeof(off_t)</code>
+      and <code>sizeof(size_t)</code> are both four bytes can handle
+      datasets and files as large as 18x10^18 bytes.  However, most
+      Unix systems limit the number of concurrently open files, so a
+      practical file size limit is closer to 512GB or 1TB.
+
+    <p>Two "tricks" must be imployed on these small systems in order
+      to store large datasets.  The first trick circumvents the
+      <code>off_t</code> file size limit and the second circumvents
+      the <code>size_t</code> main memory limit.
+
+    <h2>2. File Size Limits</h2>
+
+    <p>Systems that have 64-bit file addresses will be able to access
+      those files automatically.  One should see the following output
+      from configure:
+
+    <p><code><pre>
+checking size of off_t... 8
+    </pre></code>
+
+    <p>Also, some 32-bit operating systems have special file systems
+      that can support large (&gt;2GB) files and HDF5 will detect
+      these and use them automatically.  If this is the case, the
+      output from configure will show:
+
+    <p><code><pre>
+checking for lseek64... yes
+checking for fseek64... yes
+    </pre></code>
+
+    <p>Otherwise one must use an HDF5 file family.  Such a family is
+      created by setting file family properties in a file access
+      property list and then supplying a file name that includes a
+      <code>printf</code>-style integer format.  For instance:
+
+    <p><code><pre>
+hid_t plist, file;
+plist = H5Pcreate (H5P_FILE_ACCESS);
+H5Pset_family (plist, 1&lt;&lt;30, H5P_DEFAULT);
+file = H5Fcreate ("big%03d.h5", H5F_ACC_TRUNC, H5P_DEFAULT, plist);
+    </code></pre>
+
+    <p>The second argument (<code>1&lt;&lt;30</code>) to
+      <code>H5Pset_family()</code> indicates that the family members
+      are to be 2^30 bytes (1GB) each although we could have used any
+      reasonably large value.  In general, family members cannot be
+      2GB because writes to byte number 2,147,483,647 will fail, so
+      the largest safe value for a family member is 2,147,483,647.
+      HDF5 will create family members on demand as the HDF5 address
+      space increases, but since most Unix systems limit the number of
+      concurrently open files the effective maximum size of the HDF5
+      address space will be limited (the system on which this was
+      developed allows 1024 open files, so if each family member is
+      approx 2GB then the largest HDF5 file is approx 2TB).
+
+    <p>If the effective HDF5 address space is limited then one may be
+      able to store datasets as external datasets each spanning
+      multiple files of any length since HDF5 opens external dataset
+      files one at a time.  To arrange storage for a 5TB dataset split
+      among 1GB files one could say:
+
+    <p><code><pre>
+hid_t plist = H5Pcreate (H5P_DATASET_CREATE);
+for (i=0; i&lt;5*1024; i++) {
+   sprintf (name, "velocity-%04d.raw", i);
+   H5Pset_external (plist, name, 0, (size_t)1&lt;&lt;30);
+}
+    </code></pre>
+
+    <h2>3. Dataset Size Limits</h2>
+
+    <p>The second limit which must be overcome is that of
+      <code>sizeof(size_t)</code>.  HDF5 defines a data type called
+      <code>hsize_t</code> which is used for sizes of datasets and is,
+      by default, defined as <code>unsigned long long</code>.
+
+    <p>To create a dataset with 8*2^30 4-byte integers for a total of
+      32GB one first creates the dataspace.  We give two examples
+      here: a 4-dimensional dataset whose dimension sizes are smaller
+      than the maximum value of a <code>size_t</code>, and a
+      1-dimensional dataset whose dimension size is too large to fit
+      in a <code>size_t</code>.
+
+    <p><code><pre>
+hsize_t size1[4] = {8, 1024, 1024, 1024};
+hid_t space1 = H5Screate_simple (4, size1, size1);
+
+hsize_t size2[1] = {8589934592LL};
+hid_t space2 = H5Screate_simple (1, size2, size2};
+    </pre></code>
+
+    <p>However, the <code>LL</code> suffix is not portable, so it may
+      be better to replace the number with
+      <code>(hsize_t)8*1024*1024*1024</code>.
+
+    <p>For compilers that don't support <code>long long</code> large
+      datasets will not be possible.  The library performs too much
+      arithmetic on <code>hsize_t</code> types to make the use of a
+      struct feasible.
+
+    <hr>
+    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
+<!-- Created: Fri Apr 10 13:26:04 EDT 1998 -->
+<!-- hhmts start -->
+Last modified: Sun Jul 19 11:37:25 EDT 1998
+<!-- hhmts end -->
+  </body>
+</html>