diff options
Diffstat (limited to 'doc/html/TechNotes/VLTypes.html')
-rw-r--r-- | doc/html/TechNotes/VLTypes.html | 150 |
1 files changed, 0 insertions, 150 deletions
diff --git a/doc/html/TechNotes/VLTypes.html b/doc/html/TechNotes/VLTypes.html deleted file mode 100644 index 8a41c10..0000000 --- a/doc/html/TechNotes/VLTypes.html +++ /dev/null @@ -1,150 +0,0 @@ -<html> - <head> - <title> - Variable-Length Datatypes in HDF5 - </title> - - <STYLE TYPE="text/css"> - - P { text-indent: 2em} - P.item { margin-left: 2em; text-indent: -2em} - P.item2 { margin-left: 2em; text-indent: 2em} - - TABLE.format { border:solid; border-collapse:collapse; caption-side:top; text-align:center; width:80%;} - TABLE.format TH { border:ridge; padding:4px; width:25%;} - TABLE.format TD { border:ridge; padding:4px; } - TABLE.format CAPTION { font-weight:bold; font-size:larger;} - - TABLE.note {border:none; text-align:right; width:80%;} - - TABLE.desc { border:solid; border-collapse:collapse; caption-size:top; text-align:left; width:80%;} - TABLE.desc TR { vertical-align:top;} - TABLE.desc TH { border-style:ridge; font-size:larger; padding:4px; text-decoration:underline;} - TABLE.desc TD { border-style:ridge; padding:4px; } - TABLE.desc CAPTION { font-weight:bold; font-size:larger;} - - TABLE.list { border:none; } - TABLE.list TR { vertical-align:top;} - TABLE.list TH { border:none; text-decoration:underline;} - TABLE.list TD { border:none; } - - </STYLE> - - <!-- #BeginLibraryItem "/ed_libs/styles_Format.lbi" --> -<!-- - * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - * Copyright by the Board of Trustees of the University of Illinois. * - * All rights reserved. * - * * - * This file is part of HDF5. The full HDF5 copyright notice, including * - * terms governing use, modification, and redistribution, is contained in * - * the files COPYING and Copyright.html. COPYING can be found at the root * - * of the source code distribution tree; Copyright.html can be found at the * - * root level of an installed copy of the electronic HDF5 document set and * - * is linked from the top-level documents page. It can also be found at * - * http://hdf.ncsa.uiuc.edu/HDF5/doc/Copyright.html. If you do not have * - * access to either file, you may request a copy from hdfhelp@ncsa.uiuc.edu. * - * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - --> - -<link href="../ed_styles/FormatElect.css" rel="stylesheet" type="text/css"> -<!-- #EndLibraryItem --></head> - - <body bgcolor="#FFFFFF"> - <H3>Introduction</H3> - <P>Variable-length (VL) datatypes have a great deal of flexibility, but can - be over- or mis-used. VL datatypes are ideal at capturing the notion - that elements in an HDF5 dataset (or attribute) can have different - amounts of information (VL strings are the canonical example), - but they have some drawbacks that this document attempts - to address. - </P> - - <H3>Background</H3> - <P>Because fast random access to dataset elements requires that each - element be a fixed size, the information stored for VL datatype elements - is actually information to locate the VL information, not - the information itself. - </P> - - <H3>When to use VL datatypes</H3> - <P>VL datatypes are designed allow the amount of data stored in each - element of a dataset to vary. This change could be - over time as new values, with different lengths, were written to the - element. Or, the change can be over "space" - the dataset's space, - with each element in the dataset having the same fundamental type, but - different lengths. "Ragged arrays" are the classic example of elements - that change over the "space" of the dataset. If the elements of a - dataset are not going to change over "space" or time, a VL datatype - should probably not be used. - </P> - - <H3>Access Time Penalty</H3> - <P>Accessing VL information requires reading the element in the file, then - using that element's location information to retrieve the VL - information itself. - In the worst case, this obviously doubles the number of disk accesses - required to access the VL information. - </P> - <P>However, in order to avoid this extra disk access overhead, the HDF5 - library groups VL information together into larger blocks on disk and - performs I/O only on those larger blocks. Additionally, these blocks of - information are cached in memory as long as possible. For most access - patterns, this amortizes the extra disk accesses over enough pieces of - VL information to hide the extra overhead involved. - </P> - - <H3>Storage Space Penalty</H3> - <P>Because VL information must be located and retrieved from another - location in the file, extra information must be stored in the file to - locate - each item of VL information (i.e. each element in a dataset or each - VL field in a compound datatype, etc.). - Currently, that extra information amounts to 32 bytes per VL item. - </P> - <P> - With some judicious re-architecting of the library and file format, - this could be reduced to 18 bytes per VL item with no loss in - functionality or additional time penalties. With some additional - effort, the space could perhaps could be pushed down as low as 8-10 - bytes per VL item with no loss in functionality, but potentially a - small time penalty. - </P> - - <H3>Chunking and Filters</H3> - <P>Storing data as VL information has some affects on chunked storage and - the filters that can be applied to chunked data. Because the data that - is stored in each chunk is the location to access the VL information, - the actual VL information is not broken up into chunks in the same way - as other data stored in chunks. Additionally, because the - actual VL information is not stored in the chunk, any filters which - operate on a chunk will operate on the information to - locate the VL information, not the VL information itself. - </P> - - <H3>File Drivers</H3> - <P>Because the parallel I/O file drivers (MPI-I/O and MPI-posix) don't - allow objects with varying sizes to be created in the file, attemping - to create - a dataset or attribute with a VL datatype in a file managed by those - drivers will cause the creation call to fail. - </P> - <P>Additionally, using - VL datatypes and the 'multi' and 'split' file drivers may not operate - in the manner desired. The HDF5 library currently categorizes the - "blocks of VL information" stored in the file as a type of metadata, - which means that they may not be stored with the other raw data for - the file. - </P> - - <H3>Rewriting</H3> - <P>When VL information in the file is re-written, the old VL information - must be releases, space for the new VL information allocated and - the new VL information must be written to the file. This may cause - additional I/O accesses. - </P> - - </body> - -</html> - |