diff options
-rwxr-xr-x | doc/VFD_SWMR_RFC_200916.docx | bin | 0 -> 168539 bytes | |||
-rwxr-xr-x | doc/VFD_SWMR_RFC_200916.pdf | bin | 0 -> 539166 bytes | |||
-rw-r--r-- | doc/vfd-swmr-user-guide.md | 211 |
3 files changed, 138 insertions, 73 deletions
diff --git a/doc/VFD_SWMR_RFC_200916.docx b/doc/VFD_SWMR_RFC_200916.docx Binary files differnew file mode 100755 index 0000000..a2c2d12 --- /dev/null +++ b/doc/VFD_SWMR_RFC_200916.docx diff --git a/doc/VFD_SWMR_RFC_200916.pdf b/doc/VFD_SWMR_RFC_200916.pdf Binary files differnew file mode 100755 index 0000000..dd0cad9 --- /dev/null +++ b/doc/VFD_SWMR_RFC_200916.pdf diff --git a/doc/vfd-swmr-user-guide.md b/doc/vfd-swmr-user-guide.md index cfb6848..9d798cf 100644 --- a/doc/vfd-swmr-user-guide.md +++ b/doc/vfd-swmr-user-guide.md @@ -8,30 +8,75 @@ while one or more processes read the file. Use cases range from monitoring data collection and/or steering experiments in progress to financial applications. -The following diagram illustrates how SWMR works. +The following diagram illustrates the original version of SWMR. <img src = SWMRdataflow.png width=400 /> - -VFD SWMR is designed to be a more flexible, more modular, -better-performing replacement for the existing SWMR feature. - -* VFD SWMR allows HDF5 objects (groups, datasets, attributes) to be - created and destroyed in the course of a reader-writer session. - Creating objects is not possible using the existing SWMR feature. -* It compartmentalizes much of the SWMR functionality in a virtual-file - driver (VFD), thus easing The HDF Group's software-maintenance burden. -* And it makes guarantees for the maximum time from write to availability - of data for read, provided that the reading and writing systems and - their interconnections can keep up with the data flow. - -For details on how VFD SWMR is implemented, see [TBD: LINK to RFC]. +The original version of SWMR functions by ordering metadata writes to +the HDF5 file so as to always maintain a consistent view of metadata +in the HDF5 file -- which requires SWMR specific modifications to +all code that maintains on disk metadata. + +VFD SWMR is designed to be a more maintainable and more modular +replacement for the existing SWMR feature. It functions by taking +regular snapshots of HDF5 file metadata on the writer side, and using +a specialized virtual file driver (VFD) on the reader side to +intercept metadata read requests and satisfy them from the +snapshots where appropriate -- thus assuring that the readers +see a consistent view of HDF5 file metadata, + +This design allowed us to implement VFD SWMR with only minor +modifications to the HDF5 library above metadata cache and page +buffer. As a result, not only is VFD SWMR more modular and +easier to maintain, it is also almost "full SWMW" -- that is it +allows use of almost all HDF5 capabilities by VFD SWMR writers, +with results that become visible to the VFD SWMR readers. + +In particular, VFD SWMR allows the writer to create and delete +both groups and datasets, and to create and delete attributes on +both groups and datasets while operating in VFD SWMR mode -- +which is not possible in the original SWMR. + +We say that VFD SWMR is almost "full SWMR" because there are a +few limitations -- most notably: + +* The current implementation of variable length data in datasets + is fundamentally incompatible with VFD SWMR, as it stores variable + length data as metadata. This shouldn't be a major issue, as the + current implementation of variable length data has very poor performance, + and thus is not suitable for most SWMR applications. A new + implementation of variable length data is in the works, and should + offer both better performance and be compatible with VFD SWMR. + However, there is no ETA for delivery. Variable length attributes + on datasets and groups should work, but are currently un-tested. + +* At present the Virtual Data Set (VDS) feature is not + well integrated with VFD SWMR. While we have a work around that + allowed us to test for more fundamental issues (sse below), a proper + solution is on hold pending the availability of the original developer. + +* VFD SWMR is only tested with, and should only be used with + the latest HDF5 file format. Theoretically, there is no functional + reason why it will not work with earlier versions of the file format. + However, it is possible to construct very large pieces of metadata + in early versions of the HDF5 file format, which has the potential to + cause major performance issues. + +Due to its regular snapshots of metadata, VFD SWMR provides guarantees +on the maximum time from write to visibility to the readers -- with +the provisos that the underlying file system is fast enough, that +the writer makes HDF5 library API calls with sufficient regularity, and +that both reader and writer avoid long running HDF5 API calls. + +For further details on VFD SWMR design and implementation, see +VFD_SWMR_RFC_200916.pdf or VFD_SWMR_RFC_200916.docx in the +doc directory. # Quick start Follow these instructions to download, configure, and build the -VFD SWMR project in a jiffy. Then install the HDF5 library and -utilites built by the VFD SWMR project. +VFD SWMR project in a jiffy. Then install the HDF5 library and +utilities built by the VFD SWMR project. ## Download @@ -49,12 +94,6 @@ Clone the repository in a new directory, then switch to the VFD SWMR branch: ## Build -Setup for autotools: - -``` -% sh ./autogen.sh -``` - Create a build directory, change to that directory, and run the configure script: @@ -81,6 +120,21 @@ SWMR works correctly on your system. To test the library, utilities, run If the tests don't pass, please let the developers know! +Note that due to reader and writer process drifting out of sync, you +likely see several messages such as: + +``` + vfd_swmr_zoo_reader: tend_zoo: vrfy_ns_grp_c: H5Gopen2() failed +``` +or +``` + vfd_swmr_zoo_reader: tend_zoo: H5Lexists unexpectedly true. +``` + +These are expected. In addition, there will be expected errors +in the variable length data tests until we are able to re-implement +variable length data storage in HDF5. + # Sample programs ## Extensible datasets @@ -105,15 +159,15 @@ command-line parameters as the "bigset" writer. The reader and writer may run concurrently; the reader "polls" the content until it is just shy of complete, given the number of steps expected. -To run a bigset test, I open a couple of terminal windows, one for the -reader and one for the writer. I change to the `test` directory under -my build directory, and I run the writer in one window: +To run a bigset test, open a couple of terminal windows, one for the +reader and one for the writer. cd to the `test` directory under +my build directory, and run the writer in one window: ``` % ./vfd_swmr_bigset_writer -n 50 -d 2 ``` -and in the other window, I run the reader: +and in the other window, run the reader: ``` % ./vfd_swmr_bigset_reader -n 50 -d 2 -W @@ -154,7 +208,8 @@ able to `make` and `make clean` the demos. Under `gaussians/`, two programs are built, `wgaussians` and `rgaussians`. If you start both from the same directory in different terminals, you should see the "bouncing 2-D Gaussian distributions" -in the `rgaussians` terminal. +in the `rgaussians` terminal. This demo uses curses, so you may need +to install the curses developers library to build. The creation-deletion (`credel`) demo is also run in two terminals. The two command lines are given in `credel/README.md`. You need @@ -273,7 +328,7 @@ If a reader spends longer than `max_lag - 1` ticks (2400ms with the example configuration) inside the HDF5 API, then its snapshot may expire, resulting in undefined behavior. When a snapshot expires while the reader is using it, we say that the writer has -"overrun" the reader. The writer cannot currently detect overruns. +"overrun" the reader. The writer cannot detect overruns. Frequently the reader will detect an overrun and force the program to exit with a diagnostic assertion failure. @@ -281,18 +336,18 @@ The application tells VFD SWMR whether or not to configure for reading or writing a file by setting the `writer` parameter to `true` for writing or `false` for reading. -VFD SWMR snapshots are stored in a "shadow file" that is shared -between writer and readers. On a POSIX system, the shadow file +VFD SWMR snapshots are stored in a "metadata file" that is shared +between writer and readers. On a POSIX system, the metadata file may be placed on any *local* filesystem that the reader and writer -share. The `md_file_path` parameter tells where to put the shadow +share. The `md_file_path` parameter tells where to put the metadata file. The `md_pages_reserved` parameter tells how many pages to reserve -at the beginning of the shadow file for the shadow-file header -and the shadow index. The header has an entire page to itself. +at the beginning of the metadata file for the metadata-file header +and the metadata index. The header has an entire page to itself. The remaining `md_pages_reserved - 1` pages are reserved for the -shadow index. If the index grows larger than its initial -allocation, then it will move to a new location in the shadow file, +metadata index. If the index grows larger than its initial +allocation, then it will move to a new location in the metadata file, and the initial allocation will be reclaimed. `md_pages_reserved` must be at least 2. @@ -332,8 +387,8 @@ inside. If a virtual dataset resides on file `v.h5`, and one of its source datasets resides on a second file, `s1.h5`, then the virtual dataset will try to open `s1.h5` using the same file-access properties as `v.h5`. Thus, if `v.h5` is open with VFD SWMR with -shadow file `v.shadow`, then the virtual dataset will try to open -`s1.h5` with the same shadow file, which will fail. +metadata file `v.shadow`, then the virtual dataset will try to open +`s1.h5` with the same metadata file, which will fail. Suppose that `v.h5` is *not* open with VFD SWMR, but it was opened with default file-access properties. Then the virtual dataset will @@ -343,42 +398,46 @@ helpful to the application that wants to use VFD SWMR to read or write source datasets. To use VFD SWMR with VDS, an application should *pre-open* each file -using its preferred file-access properties, including independent shadow +using its preferred file-access properties, including independent metadata filenames for each source file. As long as the virtual dataset remains in use, the application should leave each of the pre-opened files open. In this way the library, when it tries to open the source files, will always find them already open and re-use the already-open files with the file-access properties established on first open. -## Pushing HDF5 content to reader visibility +## Pushing HDF5 raw data to reader visibility -With VFD SWMR, ordinarily it should not be necessary to call -H5Fflush(). In fact, when VFD SWMR is active, calling H5Fflush() -may slow down your program considerably because the call will not -return until after `max_lag` ticks have passed. +At present, VFD SWMR is hard coded to flush raw data at the end of +each tick. While this imposes additional overhead, it simplifies testing, +and is probably desirable for applications that do not require the best +possible raw data throughput. We plan to upgrade our tests and make this +user configurable in the first production release. -A writer can make its last changes to an HDF5 file visible to all -readers immediately using the new call, `H5Fvfd_swmr_end_tick()`. -A writer should use `H5Fvfd_swmr_end_tick()` carefully: by calling -it more frequently than once a tick, a writer may corrupt a reader's -view of the HDF5 file. +With the currently hard coded flush of raw data at the end of each tick, +it should not be necessary to call H5Fflush(). In fact, when VFD SWMR is +active, H5Fflush() may require up to 'max_lag' ticks to complete due to +metadata consistency issues. -When VFD SWMR is enabled, raw data is not cached in the page buffer. On -each tick, the content of chunk caches and other unwritten raw data is -flushed directly to the HDF5 file, so that raw data is always available -before the HDF5 structural metadata that describes it. +Instead, a writer can make its last changes to HDF5 file visible to all +readers immediately using the new call, `H5Fvfd_swmr_end_tick()`. Note +that this call should be used sparingly, as it terminates the current +tick early, thus effectively reducing 'max_lag'. Repeated calls in +quick succession can force a reader to overrun 'max_lag', and +read stale metadata. + +When the flush of raw data at end of tick is disabled (not possible at present), +the `H5Fvfd_swmr_end_tick()` call will make the writers current view of metadata +visible to the reader -- which may refer to raw data that hasn't been written to +the HDF5 file yet. ## Reading up-to-date content -The HDF Group (THG) expects that in one class of VFD SWMR application, -instruments on a particle accelerator will continuously generate -2-dimensional data frames and add them to HDF5 datasets while an -experiment is ongoing. The datasets will be written to an HDF5 -file opened in VFD SWMR mode. Experimenters will monitor a real-time -display of the datasets while the experiment takes place. A second -program, possibly running on a second computer, will generate the -display. The second program will open the HDF5 file in VFD SWMR -mode, too. +One expected use case for VFD SWMR involves an experiment in which instruments +continuously generate 2-dimensional data frames. These data frames are recorded +in datasets in a HDF5 file that has been opened in VFD SWMR writer mode. In this +use case, the HDF5 file is opened in VFD SWMR reader mode by a second program +that generates a real time display of the data as it is being collected -- thus +allowing the experimenters to steer the experiment. THG developed a demonstration program for class of application, and we have some advice based on that experience. @@ -405,6 +464,8 @@ SWMR's clock across both of the calls. The `H5Fvfd_swmr_disable_end_of_tick()` call takes a file identifier and stops new snapshots from being taken on the given file until `H5Fvfd_swmr_enable_end_of_tick()` is called on the same file. +Needless to say, end of tick processing should only be disabled +briefly. # Known issues @@ -442,10 +503,12 @@ and read back like this, may produce either an error return from `H5Dread` (`ret < 0`) or a `NULL` pointer (`data == NULL`). -Planned improvements to the HDF5 *global heap* may alleviate this -problem. There is no schedule for those improvements. - -Improvements to VFD SWMR may also alleviate the problem. +As discussed above, this is caused by a fundamental incompatibility +between the current variable length data implementation in HDF5, which +stores variable length data as metadata. It is possible we may be able +to mitigate the issue, but the most likely solution is the planned +re-implementation of variable length data that is currently in the planning +stage. Unfortunately, we have no ETA for this re-implementation. ## Iteration @@ -478,11 +541,12 @@ NFS, et al.). ## Supported filesystems A VFD SWMR writer and readers share a couple of files, the HDF5 (`.h5`) -file and the shadow file. VFD SWMR relies on writes to the files to -take effect in the order described in the POSIX documentation for -`read(2)` and `write(2)` system calls. If the VFD SWMR readers and the -writer run on the same POSIX host, this ordering should take effect, -regardless of the underlying filesystem. +file and the metadata file -- which is used to communicate snapshots of +the HDF5 file metadata from the writer to the readers. VFD SWMR relies +on writes to the metadata file to take effect in the order described in +the POSIX documentation for `read(2)` and `write(2)` system calls. If +the VFD SWMR readers and the writer run on the same POSIX host, this +ordering should take effect, regardless of the underlying filesystem. If the VFD SWMR reader and the writer run on *different* hosts, then the write-ordering rules depend on the shared filesystem. VFD SWMR is @@ -503,7 +567,8 @@ seconds. # Reporting bugs -VFD SWMR is still under construction, so I think that you will find some -bugs. Please do not hesitate to report them. +VFD SWMR is still under development, so we expect that you will encounter +bugs. Please report them, along with performance or design issues you +encounter. To contact the VFD SWMR developers, email vfdswmr@hdfgroup.org. |