summaryrefslogtreecommitdiffstats
path: root/doc/html/TechNotes/VLTypes.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html/TechNotes/VLTypes.html')
-rw-r--r--doc/html/TechNotes/VLTypes.html150
1 files changed, 0 insertions, 150 deletions
diff --git a/doc/html/TechNotes/VLTypes.html b/doc/html/TechNotes/VLTypes.html
deleted file mode 100644
index 8a41c10..0000000
--- a/doc/html/TechNotes/VLTypes.html
+++ /dev/null
@@ -1,150 +0,0 @@
-<html>
- <head>
- <title>
- Variable-Length Datatypes in HDF5
- </title>
-
- <STYLE TYPE="text/css">
-
- P { text-indent: 2em}
- P.item { margin-left: 2em; text-indent: -2em}
- P.item2 { margin-left: 2em; text-indent: 2em}
-
- TABLE.format { border:solid; border-collapse:collapse; caption-side:top; text-align:center; width:80%;}
- TABLE.format TH { border:ridge; padding:4px; width:25%;}
- TABLE.format TD { border:ridge; padding:4px; }
- TABLE.format CAPTION { font-weight:bold; font-size:larger;}
-
- TABLE.note {border:none; text-align:right; width:80%;}
-
- TABLE.desc { border:solid; border-collapse:collapse; caption-size:top; text-align:left; width:80%;}
- TABLE.desc TR { vertical-align:top;}
- TABLE.desc TH { border-style:ridge; font-size:larger; padding:4px; text-decoration:underline;}
- TABLE.desc TD { border-style:ridge; padding:4px; }
- TABLE.desc CAPTION { font-weight:bold; font-size:larger;}
-
- TABLE.list { border:none; }
- TABLE.list TR { vertical-align:top;}
- TABLE.list TH { border:none; text-decoration:underline;}
- TABLE.list TD { border:none; }
-
- </STYLE>
-
- <!-- #BeginLibraryItem "/ed_libs/styles_Format.lbi" -->
-<!--
- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
- * Copyright by the Board of Trustees of the University of Illinois. *
- * All rights reserved. *
- * *
- * This file is part of HDF5. The full HDF5 copyright notice, including *
- * terms governing use, modification, and redistribution, is contained in *
- * the files COPYING and Copyright.html. COPYING can be found at the root *
- * of the source code distribution tree; Copyright.html can be found at the *
- * root level of an installed copy of the electronic HDF5 document set and *
- * is linked from the top-level documents page. It can also be found at *
- * http://hdf.ncsa.uiuc.edu/HDF5/doc/Copyright.html. If you do not have *
- * access to either file, you may request a copy from hdfhelp@ncsa.uiuc.edu. *
- * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
- -->
-
-<link href="../ed_styles/FormatElect.css" rel="stylesheet" type="text/css">
-<!-- #EndLibraryItem --></head>
-
- <body bgcolor="#FFFFFF">
- <H3>Introduction</H3>
- <P>Variable-length (VL) datatypes have a great deal of flexibility, but can
- be over- or mis-used. VL datatypes are ideal at capturing the notion
- that elements in an HDF5 dataset (or attribute) can have different
- amounts of information (VL strings are the canonical example),
- but they have some drawbacks that this document attempts
- to address.
- </P>
-
- <H3>Background</H3>
- <P>Because fast random access to dataset elements requires that each
- element be a fixed size, the information stored for VL datatype elements
- is actually information to locate the VL information, not
- the information itself.
- </P>
-
- <H3>When to use VL datatypes</H3>
- <P>VL datatypes are designed allow the amount of data stored in each
- element of a dataset to vary. This change could be
- over time as new values, with different lengths, were written to the
- element. Or, the change can be over "space" - the dataset's space,
- with each element in the dataset having the same fundamental type, but
- different lengths. "Ragged arrays" are the classic example of elements
- that change over the "space" of the dataset. If the elements of a
- dataset are not going to change over "space" or time, a VL datatype
- should probably not be used.
- </P>
-
- <H3>Access Time Penalty</H3>
- <P>Accessing VL information requires reading the element in the file, then
- using that element's location information to retrieve the VL
- information itself.
- In the worst case, this obviously doubles the number of disk accesses
- required to access the VL information.
- </P>
- <P>However, in order to avoid this extra disk access overhead, the HDF5
- library groups VL information together into larger blocks on disk and
- performs I/O only on those larger blocks. Additionally, these blocks of
- information are cached in memory as long as possible. For most access
- patterns, this amortizes the extra disk accesses over enough pieces of
- VL information to hide the extra overhead involved.
- </P>
-
- <H3>Storage Space Penalty</H3>
- <P>Because VL information must be located and retrieved from another
- location in the file, extra information must be stored in the file to
- locate
- each item of VL information (i.e. each element in a dataset or each
- VL field in a compound datatype, etc.).
- Currently, that extra information amounts to 32 bytes per VL item.
- </P>
- <P>
- With some judicious re-architecting of the library and file format,
- this could be reduced to 18 bytes per VL item with no loss in
- functionality or additional time penalties. With some additional
- effort, the space could perhaps could be pushed down as low as 8-10
- bytes per VL item with no loss in functionality, but potentially a
- small time penalty.
- </P>
-
- <H3>Chunking and Filters</H3>
- <P>Storing data as VL information has some affects on chunked storage and
- the filters that can be applied to chunked data. Because the data that
- is stored in each chunk is the location to access the VL information,
- the actual VL information is not broken up into chunks in the same way
- as other data stored in chunks. Additionally, because the
- actual VL information is not stored in the chunk, any filters which
- operate on a chunk will operate on the information to
- locate the VL information, not the VL information itself.
- </P>
-
- <H3>File Drivers</H3>
- <P>Because the parallel I/O file drivers (MPI-I/O and MPI-posix) don't
- allow objects with varying sizes to be created in the file, attemping
- to create
- a dataset or attribute with a VL datatype in a file managed by those
- drivers will cause the creation call to fail.
- </P>
- <P>Additionally, using
- VL datatypes and the 'multi' and 'split' file drivers may not operate
- in the manner desired. The HDF5 library currently categorizes the
- "blocks of VL information" stored in the file as a type of metadata,
- which means that they may not be stored with the other raw data for
- the file.
- </P>
-
- <H3>Rewriting</H3>
- <P>When VL information in the file is re-written, the old VL information
- must be releases, space for the new VL information allocated and
- the new VL information must be written to the file. This may cause
- additional I/O accesses.
- </P>
-
- </body>
-
-</html>
-