summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rwxr-xr-xdoc/VFD_SWMR_RFC_200916.docxbin0 -> 168539 bytes
-rwxr-xr-xdoc/VFD_SWMR_RFC_200916.pdfbin0 -> 539166 bytes
-rw-r--r--doc/vfd-swmr-user-guide.md211
3 files changed, 138 insertions, 73 deletions
diff --git a/doc/VFD_SWMR_RFC_200916.docx b/doc/VFD_SWMR_RFC_200916.docx
new file mode 100755
index 0000000..a2c2d12
--- /dev/null
+++ b/doc/VFD_SWMR_RFC_200916.docx
Binary files differ
diff --git a/doc/VFD_SWMR_RFC_200916.pdf b/doc/VFD_SWMR_RFC_200916.pdf
new file mode 100755
index 0000000..dd0cad9
--- /dev/null
+++ b/doc/VFD_SWMR_RFC_200916.pdf
Binary files differ
diff --git a/doc/vfd-swmr-user-guide.md b/doc/vfd-swmr-user-guide.md
index cfb6848..9d798cf 100644
--- a/doc/vfd-swmr-user-guide.md
+++ b/doc/vfd-swmr-user-guide.md
@@ -8,30 +8,75 @@ while one or more processes read the file. Use cases range from
monitoring data collection and/or steering experiments in progress
to financial applications.
-The following diagram illustrates how SWMR works.
+The following diagram illustrates the original version of SWMR.
<img src = SWMRdataflow.png width=400 />
-
-VFD SWMR is designed to be a more flexible, more modular,
-better-performing replacement for the existing SWMR feature.
-
-* VFD SWMR allows HDF5 objects (groups, datasets, attributes) to be
- created and destroyed in the course of a reader-writer session.
- Creating objects is not possible using the existing SWMR feature.
-* It compartmentalizes much of the SWMR functionality in a virtual-file
- driver (VFD), thus easing The HDF Group's software-maintenance burden.
-* And it makes guarantees for the maximum time from write to availability
- of data for read, provided that the reading and writing systems and
- their interconnections can keep up with the data flow.
-
-For details on how VFD SWMR is implemented, see [TBD: LINK to RFC].
+The original version of SWMR functions by ordering metadata writes to
+the HDF5 file so as to always maintain a consistent view of metadata
+in the HDF5 file -- which requires SWMR specific modifications to
+all code that maintains on disk metadata.
+
+VFD SWMR is designed to be a more maintainable and more modular
+replacement for the existing SWMR feature. It functions by taking
+regular snapshots of HDF5 file metadata on the writer side, and using
+a specialized virtual file driver (VFD) on the reader side to
+intercept metadata read requests and satisfy them from the
+snapshots where appropriate -- thus assuring that the readers
+see a consistent view of HDF5 file metadata,
+
+This design allowed us to implement VFD SWMR with only minor
+modifications to the HDF5 library above metadata cache and page
+buffer. As a result, not only is VFD SWMR more modular and
+easier to maintain, it is also almost "full SWMW" -- that is it
+allows use of almost all HDF5 capabilities by VFD SWMR writers,
+with results that become visible to the VFD SWMR readers.
+
+In particular, VFD SWMR allows the writer to create and delete
+both groups and datasets, and to create and delete attributes on
+both groups and datasets while operating in VFD SWMR mode --
+which is not possible in the original SWMR.
+
+We say that VFD SWMR is almost "full SWMR" because there are a
+few limitations -- most notably:
+
+* The current implementation of variable length data in datasets
+ is fundamentally incompatible with VFD SWMR, as it stores variable
+ length data as metadata. This shouldn't be a major issue, as the
+ current implementation of variable length data has very poor performance,
+ and thus is not suitable for most SWMR applications. A new
+ implementation of variable length data is in the works, and should
+ offer both better performance and be compatible with VFD SWMR.
+ However, there is no ETA for delivery. Variable length attributes
+ on datasets and groups should work, but are currently un-tested.
+
+* At present the Virtual Data Set (VDS) feature is not
+ well integrated with VFD SWMR. While we have a work around that
+ allowed us to test for more fundamental issues (sse below), a proper
+ solution is on hold pending the availability of the original developer.
+
+* VFD SWMR is only tested with, and should only be used with
+ the latest HDF5 file format. Theoretically, there is no functional
+ reason why it will not work with earlier versions of the file format.
+ However, it is possible to construct very large pieces of metadata
+ in early versions of the HDF5 file format, which has the potential to
+ cause major performance issues.
+
+Due to its regular snapshots of metadata, VFD SWMR provides guarantees
+on the maximum time from write to visibility to the readers -- with
+the provisos that the underlying file system is fast enough, that
+the writer makes HDF5 library API calls with sufficient regularity, and
+that both reader and writer avoid long running HDF5 API calls.
+
+For further details on VFD SWMR design and implementation, see
+VFD_SWMR_RFC_200916.pdf or VFD_SWMR_RFC_200916.docx in the
+doc directory.
# Quick start
Follow these instructions to download, configure, and build the
-VFD SWMR project in a jiffy. Then install the HDF5 library and
-utilites built by the VFD SWMR project.
+VFD SWMR project in a jiffy. Then install the HDF5 library and
+utilities built by the VFD SWMR project.
## Download
@@ -49,12 +94,6 @@ Clone the repository in a new directory, then switch to the VFD SWMR branch:
## Build
-Setup for autotools:
-
-```
-% sh ./autogen.sh
-```
-
Create a build directory, change to that directory, and run the
configure script:
@@ -81,6 +120,21 @@ SWMR works correctly on your system. To test the library, utilities, run
If the tests don't pass, please let the developers know!
+Note that due to reader and writer process drifting out of sync, you
+likely see several messages such as:
+
+```
+ vfd_swmr_zoo_reader: tend_zoo: vrfy_ns_grp_c: H5Gopen2() failed
+```
+or
+```
+ vfd_swmr_zoo_reader: tend_zoo: H5Lexists unexpectedly true.
+```
+
+These are expected. In addition, there will be expected errors
+in the variable length data tests until we are able to re-implement
+variable length data storage in HDF5.
+
# Sample programs
## Extensible datasets
@@ -105,15 +159,15 @@ command-line parameters as the "bigset" writer. The reader and writer
may run concurrently; the reader "polls" the content until it is just
shy of complete, given the number of steps expected.
-To run a bigset test, I open a couple of terminal windows, one for the
-reader and one for the writer. I change to the `test` directory under
-my build directory, and I run the writer in one window:
+To run a bigset test, open a couple of terminal windows, one for the
+reader and one for the writer. cd to the `test` directory under
+my build directory, and run the writer in one window:
```
% ./vfd_swmr_bigset_writer -n 50 -d 2
```
-and in the other window, I run the reader:
+and in the other window, run the reader:
```
% ./vfd_swmr_bigset_reader -n 50 -d 2 -W
@@ -154,7 +208,8 @@ able to `make` and `make clean` the demos.
Under `gaussians/`, two programs are built, `wgaussians` and
`rgaussians`. If you start both from the same directory in different
terminals, you should see the "bouncing 2-D Gaussian distributions"
-in the `rgaussians` terminal.
+in the `rgaussians` terminal. This demo uses curses, so you may need
+to install the curses developers library to build.
The creation-deletion (`credel`) demo is also run in two terminals.
The two command lines are given in `credel/README.md`. You need
@@ -273,7 +328,7 @@ If a reader spends longer than `max_lag - 1` ticks (2400ms with
the example configuration) inside the HDF5 API, then its snapshot
may expire, resulting in undefined behavior. When a snapshot
expires while the reader is using it, we say that the writer has
-"overrun" the reader. The writer cannot currently detect overruns.
+"overrun" the reader. The writer cannot detect overruns.
Frequently the reader will detect an overrun and force the program
to exit with a diagnostic assertion failure.
@@ -281,18 +336,18 @@ The application tells VFD SWMR whether or not to configure for
reading or writing a file by setting the `writer` parameter to
`true` for writing or `false` for reading.
-VFD SWMR snapshots are stored in a "shadow file" that is shared
-between writer and readers. On a POSIX system, the shadow file
+VFD SWMR snapshots are stored in a "metadata file" that is shared
+between writer and readers. On a POSIX system, the metadata file
may be placed on any *local* filesystem that the reader and writer
-share. The `md_file_path` parameter tells where to put the shadow
+share. The `md_file_path` parameter tells where to put the metadata
file.
The `md_pages_reserved` parameter tells how many pages to reserve
-at the beginning of the shadow file for the shadow-file header
-and the shadow index. The header has an entire page to itself.
+at the beginning of the metadata file for the metadata-file header
+and the metadata index. The header has an entire page to itself.
The remaining `md_pages_reserved - 1` pages are reserved for the
-shadow index. If the index grows larger than its initial
-allocation, then it will move to a new location in the shadow file,
+metadata index. If the index grows larger than its initial
+allocation, then it will move to a new location in the metadata file,
and the initial allocation will be reclaimed. `md_pages_reserved`
must be at least 2.
@@ -332,8 +387,8 @@ inside. If a virtual dataset resides on file `v.h5`, and one of
its source datasets resides on a second file, `s1.h5`, then the
virtual dataset will try to open `s1.h5` using the same file-access
properties as `v.h5`. Thus, if `v.h5` is open with VFD SWMR with
-shadow file `v.shadow`, then the virtual dataset will try to open
-`s1.h5` with the same shadow file, which will fail.
+metadata file `v.shadow`, then the virtual dataset will try to open
+`s1.h5` with the same metadata file, which will fail.
Suppose that `v.h5` is *not* open with VFD SWMR, but it was opened
with default file-access properties. Then the virtual dataset will
@@ -343,42 +398,46 @@ helpful to the application that wants to use VFD SWMR to read or
write source datasets.
To use VFD SWMR with VDS, an application should *pre-open* each file
-using its preferred file-access properties, including independent shadow
+using its preferred file-access properties, including independent metadata
filenames for each source file. As long as the virtual dataset remains
in use, the application should leave each of the pre-opened files open.
In this way the library, when it tries to open the source files, will
always find them already open and re-use the already-open files with the
file-access properties established on first open.
-## Pushing HDF5 content to reader visibility
+## Pushing HDF5 raw data to reader visibility
-With VFD SWMR, ordinarily it should not be necessary to call
-H5Fflush(). In fact, when VFD SWMR is active, calling H5Fflush()
-may slow down your program considerably because the call will not
-return until after `max_lag` ticks have passed.
+At present, VFD SWMR is hard coded to flush raw data at the end of
+each tick. While this imposes additional overhead, it simplifies testing,
+and is probably desirable for applications that do not require the best
+possible raw data throughput. We plan to upgrade our tests and make this
+user configurable in the first production release.
-A writer can make its last changes to an HDF5 file visible to all
-readers immediately using the new call, `H5Fvfd_swmr_end_tick()`.
-A writer should use `H5Fvfd_swmr_end_tick()` carefully: by calling
-it more frequently than once a tick, a writer may corrupt a reader's
-view of the HDF5 file.
+With the currently hard coded flush of raw data at the end of each tick,
+it should not be necessary to call H5Fflush(). In fact, when VFD SWMR is
+active, H5Fflush() may require up to 'max_lag' ticks to complete due to
+metadata consistency issues.
-When VFD SWMR is enabled, raw data is not cached in the page buffer. On
-each tick, the content of chunk caches and other unwritten raw data is
-flushed directly to the HDF5 file, so that raw data is always available
-before the HDF5 structural metadata that describes it.
+Instead, a writer can make its last changes to HDF5 file visible to all
+readers immediately using the new call, `H5Fvfd_swmr_end_tick()`. Note
+that this call should be used sparingly, as it terminates the current
+tick early, thus effectively reducing 'max_lag'. Repeated calls in
+quick succession can force a reader to overrun 'max_lag', and
+read stale metadata.
+
+When the flush of raw data at end of tick is disabled (not possible at present),
+the `H5Fvfd_swmr_end_tick()` call will make the writers current view of metadata
+visible to the reader -- which may refer to raw data that hasn't been written to
+the HDF5 file yet.
## Reading up-to-date content
-The HDF Group (THG) expects that in one class of VFD SWMR application,
-instruments on a particle accelerator will continuously generate
-2-dimensional data frames and add them to HDF5 datasets while an
-experiment is ongoing. The datasets will be written to an HDF5
-file opened in VFD SWMR mode. Experimenters will monitor a real-time
-display of the datasets while the experiment takes place. A second
-program, possibly running on a second computer, will generate the
-display. The second program will open the HDF5 file in VFD SWMR
-mode, too.
+One expected use case for VFD SWMR involves an experiment in which instruments
+continuously generate 2-dimensional data frames. These data frames are recorded
+in datasets in a HDF5 file that has been opened in VFD SWMR writer mode. In this
+use case, the HDF5 file is opened in VFD SWMR reader mode by a second program
+that generates a real time display of the data as it is being collected -- thus
+allowing the experimenters to steer the experiment.
THG developed a demonstration program for class of application,
and we have some advice based on that experience.
@@ -405,6 +464,8 @@ SWMR's clock across both of the calls. The
`H5Fvfd_swmr_disable_end_of_tick()` call takes a file identifier
and stops new snapshots from being taken on the given file until
`H5Fvfd_swmr_enable_end_of_tick()` is called on the same file.
+Needless to say, end of tick processing should only be disabled
+briefly.
# Known issues
@@ -442,10 +503,12 @@ and read back like this,
may produce either an error return from `H5Dread` (`ret < 0`) or
a `NULL` pointer (`data == NULL`).
-Planned improvements to the HDF5 *global heap* may alleviate this
-problem. There is no schedule for those improvements.
-
-Improvements to VFD SWMR may also alleviate the problem.
+As discussed above, this is caused by a fundamental incompatibility
+between the current variable length data implementation in HDF5, which
+stores variable length data as metadata. It is possible we may be able
+to mitigate the issue, but the most likely solution is the planned
+re-implementation of variable length data that is currently in the planning
+stage. Unfortunately, we have no ETA for this re-implementation.
## Iteration
@@ -478,11 +541,12 @@ NFS, et al.).
## Supported filesystems
A VFD SWMR writer and readers share a couple of files, the HDF5 (`.h5`)
-file and the shadow file. VFD SWMR relies on writes to the files to
-take effect in the order described in the POSIX documentation for
-`read(2)` and `write(2)` system calls. If the VFD SWMR readers and the
-writer run on the same POSIX host, this ordering should take effect,
-regardless of the underlying filesystem.
+file and the metadata file -- which is used to communicate snapshots of
+the HDF5 file metadata from the writer to the readers. VFD SWMR relies
+on writes to the metadata file to take effect in the order described in
+the POSIX documentation for `read(2)` and `write(2)` system calls. If
+the VFD SWMR readers and the writer run on the same POSIX host, this
+ordering should take effect, regardless of the underlying filesystem.
If the VFD SWMR reader and the writer run on *different* hosts, then
the write-ordering rules depend on the shared filesystem. VFD SWMR is
@@ -503,7 +567,8 @@ seconds.
# Reporting bugs
-VFD SWMR is still under construction, so I think that you will find some
-bugs. Please do not hesitate to report them.
+VFD SWMR is still under development, so we expect that you will encounter
+bugs. Please report them, along with performance or design issues you
+encounter.
To contact the VFD SWMR developers, email vfdswmr@hdfgroup.org.