summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorDana Robinson <43805+derobins@users.noreply.github.com>2022-09-14 12:53:35 (GMT)
committerGitHub <noreply@github.com>2022-09-14 12:53:35 (GMT)
commitfe9c07fd90d7ccf91776672fdc75cc5675d6ed58 (patch)
tree63878e0e5459bde70bf449b92d0d50e5f65d3123 /doc
parent05a0411140688722a39855dcfa902c64bf6215a9 (diff)
downloadhdf5-fe9c07fd90d7ccf91776672fdc75cc5675d6ed58.zip
hdf5-fe9c07fd90d7ccf91776672fdc75cc5675d6ed58.tar.gz
hdf5-fe9c07fd90d7ccf91776672fdc75cc5675d6ed58.tar.bz2
Adds file locking documentation (#2084)
* Added initial (partial) file locking document * Almost done with file locking document * Fix intro * Cleaned up text * Updated environment variable verion info * Fix typo * Fix typos
Diffstat (limited to 'doc')
-rw-r--r--doc/file-locking.md366
1 files changed, 366 insertions, 0 deletions
diff --git a/doc/file-locking.md b/doc/file-locking.md
new file mode 100644
index 0000000..4f7fb39
--- /dev/null
+++ b/doc/file-locking.md
@@ -0,0 +1,366 @@
+# File Locking in HDF5
+
+This document describes the file locking scheme that was added to HDF5 in
+version 1.10.0 and how you can work around it, if you choose to do so. I'll
+try to keep it understandable for everyone, though diving into technical
+details is unavoidable, given the complexity of the material. We're in the
+process of converting the HDF5 user guide (UG) to Doxygen and this document
+will eventually be rolled up into those files as we update things.
+
+**Parallel HDF5 Note**
+
+Everything written here is from the perspective of serial HDF5. When we say
+that you can't access a file for write access from more than one process, we
+mean "from more than one independent, serial process". Parallel HDF5 can
+obviously write to a file from more than one process, but that involves
+IPC and multiple processes working together, not independent processes with
+no knowledge of each other, which is what the file locks are for.
+
+
+## Why file locks?
+
+The short answer is: "To prevent you from corrupting your HDF5 files and/or
+crashing your reader processes."
+
+The long answer is more complicated.
+
+An HDF5 file's state exists in two places when it is open for writing:
+
+1. The HDF5 file itself
+2. The HDF5 library's various caches
+
+One of those caches is the metadata cache, which stores things like B-tree
+nodes that we use to locate data in the file. Problems arise when parent
+objects are flushed to storage before child objects. If a reader tries to
+load unflushed children, the object's file offset could point at garbage
+and it will encounter library failures as it tries to access the non-existent
+objects.
+
+Keep in mind that the HDF5 library is not analogous to a database server. The
+HDF5 library is just a simple shared library, like libjpeg. Library state is
+maintained per-library-instance and there is no IPC between HDF5 libraries
+loaded by different processes (exception: collective operations in parallel
+HDF5, but that's not what were talking about here).
+
+Prior to HDF5 1.10.0, concurrent access to an HDF5 file by multiple processes,
+when one or more processes is a writer, was not supported. There was no
+enforcement mechanism for this. We simply told people not to do it.
+
+In HDF5 1.10.0, we updated the library to allow the single-writer / multiple-readers
+(SWMR - pronounced "swimmer") access pattern. This setup allows one writer and
+multiple readers to access the same file, as long as a certain protocol is
+followed concerning file opening order and setting the right flags. Since
+synchronization might be tricky to pull off and the consequences of getting
+it wrong could result in corrupt files or crashed readers, we decided to add
+a file locking scheme to help users get it right. Since this would also help
+prevent harmful accesses when SWMR is not in use, we decided to switch the
+file locking scheme on by default. This scheme has been carried forward into
+HDF5 1.12 and 1.13 (soon to be 1.14).
+
+Note that the current implementation of SWMR is only useful for appending to chunked
+datasets. Creating file objects like groups and datasets is not supported
+in the current SWMR implementation.
+
+Unfortunately, this file locking scheme has caused problems for some users.
+This is usually people who are working on network file systems like NFS or
+on parallel file systems, especially when file locks have been disabled, which
+often causes lock calls to fail. As a result of this, we've added work-arounds
+to disable the file locking scheme over the years.
+
+## The existing scheme
+
+There are two parts to the file locking scheme. One is the file lock itself.
+The second is a mark we make in the HDF5 file's superblock. The superblock
+mark isn't really that important for understanding the file locking, but since
+it's entwined with the file locking scheme, we'll cover it in the
+algorithm below. The lower-level details of file lock implementations are
+described in the appendix, but the semantics are straightforward: Locks are
+mandatory unless disabled, always for the entire file, and non-blocking. They
+are also not required for SWMR operations and simply exist to help you set up
+SWMR and prevent dangerous file access.
+
+Here's how it all works:
+
+1. The first thing we do is check if we're using file locks
+
+ - We first check the file locking property in the file access property list
+ (fapl). The default value of this property is set at configure time when
+ the library is built.
+ - Next we check the value of the `HDF5_USE_FILE_LOCKING` environment variable,
+ which was previously parsed at library startup. If this is set,
+ we use the value to override the property list setting.
+
+ The particulars of the ways you can disable file locks are described in a
+ separate section below.
+
+ If we are not using file locking, no further file locking operations will
+ take place.
+
+2. We also check for ignoring file locks when they are disabled on the file system.
+
+ - The environment variable setting for this is checked at VFD initialization
+ time for all library VFDs.
+ - We check the value in the fapl in the `open` callback. The default value for
+ this property was set at configure time when the library was built.
+
+3. When we open a file, we lock it based on the file access flags:
+
+ - If the `H5F_ACC_RDWR` flag is set, use an exclusive lock
+ - Otherwise use a shared lock
+
+ If we are ignoring disabled file locks (see below), we will silently swallow
+ lock API call failure when locks are not implemented on the file system.
+
+4. If the VFD supports locking and the file is open for writing, we mark the
+ file consistency flags in the file's superblock to indicate this.
+
+ **NOTE!**
+
+ - The VFD has to have a lock callback for this to happen. It doesn't matter if
+ the locking was disabled - the check is simply for the callback.
+ - We mark the superblock in **ANY** write case - both SWMR and non-SWMR.
+ - Only the latest version of the superblock is marked in this way. If you
+ open up a file that wasn't created with the 1.10.0 or later file format,
+ it won't get the superblock mark, even if it's been opened for writing.
+
+ According to the file format document and H5Fpkg.h:
+
+ - Bit 0 is set if the file is open for writing (`H5F_SUPER_WRITE_ACCESS`)
+ - Bit 2 is set if the file is open for SWMR writing (`H5F_SUPER_SWMR_WRITE_ACCESS`)
+
+ We check these superblock flags on file open and error out if they are
+ unsuitable.
+
+ - If the file is already opened for non-SWMR writing, no other process can open
+ it.
+ - If the file is open for SWMR writing, only SWMR readers can open the file.
+ - If you try to open a file for reading with `H5F_ACC_SWMR_READ` set and the
+ file does not have the SWMR writer bits set in the superblock, the open
+ call will fail.
+
+ This scheme is often confused with the file locking, so it's included here,
+ even though it's a bit tangential to the locks themselves.
+
+5. If the file is open for SWMR writing (`H5F_ACC_SWMR_WRITE` is set), we
+ remove the file lock just before the open call completes.
+
+6. We normally don't explicitly unlock the file on file close. We let the OS
+ handle it when the file descriptors are closed since file locks don't
+ normally surivive closing the underlying file descriptor.
+
+**TL;DR**
+
+When locks are available, HDF5 files will be exclusively locked while they are
+in use. The exception to this are files that are opened for SWMR writing, which
+are unlocked. Files that are open for any kind of writing get a "writing"
+superblock mark that HDF5 1.10.0+ will respect and refuse to open outside of SWMR.
+
+## `H5Fstart_swmr_write()`
+
+This API call can be used to switch an open file to "SWMR writing" mode as
+if it had been opened with the `H5F_ACC_SWMR_WRITE` flag set. This is used
+when code needs to perform SWMR-forbidden operations like creating groups
+and datasets before appending data to datasets using SWMR.
+
+Most of the work of this API call involves flushing out the library caches
+in preparation for SWMR access, but there are a few locking operations that
+take place under the hood:
+
+- The file's superblock is marked as in the SWMR writer case, above.
+- For a brief period of time in the call, we convert the exclusive lock to
+ a shared lock. It's unclear why this was done and we'll look into removing
+ this.
+- At the end of the call, the lock is removed, as in the SWMR write open
+ case described above.
+
+## Disabling the locks
+
+There are several ways to disable the locks, depending on which version of the
+HDF5 library you are working with. This section will describe the file lock
+disable schemes as they exist in late 2022. The current library versions at
+this time are 1.10.9, 1.12.3, and 1.13.2. File locks are not present in HDF5
+1.8. The lock feature matrix later in this document will describe the
+limitations of earlier versions.
+
+### Configure option
+
+You can set the file locking defaults at configure time. This sets the defaults
+for the associated properties in the fapl. Users can override the configure
+defaults using `H5Pset_file_locking()` or the `HDF5_USE_FILE_LOCKING`
+environment variable.
+
+- Autotools
+
+ `--enable-file-locking=(yes|no|best-effort)` sets the file locking behavior.
+ `on` and `off` should be self-explanatory. `best-effort` turns file locking
+ on but ignores file locks when they are disabled (default: `best-effort`).
+
+- CMake
+
+ - set `IGNORE_DISABLED_FILE_LOCK` to `ON` to ignore file locks when they
+ are disabled on the file system (default: `ON`).
+ - set `HDF5_USE_FILE_LOCKING` to `OFF` to disable file locks (default: `ON`)
+
+### `H5Pset_file_locking()`
+
+This API call can be used to override the configure defaults. It takes
+`hbool_t` parameters for both the file locking and "ignore file locks when
+disabled on the file system" parameters. The values set here can be
+overridden by the file locking environment variable.
+
+There is a corresponding `H5Pget_file_locking()` call that can be used to check
+the currently set values of both properties in the fapl. **NOTE** that this
+call just checks the property list values. It does **NOT** check the
+environment variables!
+
+### Environment variables
+
+The `HDF5_USE_FILE_LOCKING` environment variable overrides all other file
+locking settings.
+
+HDF5 1.10.0
+- No file locking environment variable
+
+HDF5 1.10.1 - 1.10.6, 1.12.0:
+- `FALSE` turns file locking off
+- Anything else turns file locking on
+- Neither of these values ignores disabled file locks
+- Environment variable parsed at file create/open time
+
+HDF5 1.10.7+, 1.12.1+, 1.13.x:
+- `FALSE` or `0` disables file locking
+- `TRUE` or `1` enables file locking
+- `BEST_EFFORT` enables file locking and ignores disabled file locks
+- Anything else gives you the defaults
+- Environment variable parsed at library startup
+
+### Lock disable scheme interactions
+
+As mentioned above and reiterated here:
+- Configure-time settings set fapl defaults
+- `H5Pset_file_locking()` overrides configure-time defaults
+- The environment variable setting overrides all
+
+If you want to check that file locking is on, you'll need to check the fapl
+setting AND check the environment variable, which can override the fapl.
+
+**!!! WARNING !!!**
+
+Disabling the file locks is at your own risk. If more than one writer process
+modifies an HDF5 file at the same time, the file could be corrupted. If a
+reader process reads a file that is being modified by a writer, the writer
+process might attempt to read garbage and encounter errors or even crash.
+
+In the case of:
+
+- A single process accessing a file with write access
+- Any number of processes accessing a file read-only
+
+You can safely disable the file locking scheme.
+
+If you are trying to set up SWMR without the benefit of the file locks, you'll
+just need to be extra careful that you hold to rules for SWMR access.
+
+## Feature Matrix
+
+The following table indicates which versions of the library support which file
+lock features. 1.13.0 and 1.13.1 are experimental releases (basically glorified
+release candidates) so they are not included here.
+
+**Locks**
+
+- P = POSIX locks only, Windows was a no-op that always succeeded
+- WP = POSIX and Windows locks
+- (-) = POSIX no-op lock fails
+- (+) = POSIX no-op lock passes
+
+**Configure Option and Environment Variable**
+
+- on/off = sets file locks on/off
+- try = can also set "best effort", where locks are on but ignored if disabled
+
+|Version|Has locks|Configure option|`H5Pset_file_locking()`|`HDF5_USE_FILE_LOCKING`|
+|-------|---------|----------------|-----------------------|-----------------------|
+|1.8.x|No|-|-|-|
+|1.10.0|P(-)|-|-|-|
+|1.10.1|P(-)|-|-|on/off|
+|1.10.2|P(-)|-|-|on/off|
+|1.10.3|P(-)|-|-|on/off|
+|1.10.4|P(-)|-|-|on/off|
+|1.10.5|P(-)|-|-|on/off|
+|1.10.6|P(-)|-|-|on/off|
+|1.10.7|P(+)|try|Y|try|
+|1.10.8|WP(+)|try|Y|try|
+|1.10.9|WP(+)|try|Y|try|
+|1.12.0|P(-)|-|-|on/off|
+|1.12.1|WP(+)|try|Y|try|
+|1.12.2|WP(+)|try|Y|try|
+|1.13.2|WP(+)|try|Y|try|
+
+
+## Appendix: File lock implementation
+
+The file lock system is implemented with `flock(2)` as the archetype since it
+has simple semantics and we don't need range locking. Locks are advisory on many
+systems, but this shouldn't be a problem for most users since the HDF5 library
+always respects them. If you have a program that parses or modifies HDF5 files
+independently of the HDF5 library, you'll want to be mindful of any potential
+for concurrent access across processes.
+
+On Unix systems, we call `flock()` directly when it's available and pass
+`LOCK_SH` (shared lock), `LOCK_EX` (exclusive lock), and `LOCK_UN` (unlock) as
+described in the algorithm section. All locks are non-blocking, so we set the
+`LOCK_NB` flag. Sadly, `flock(2)` is not POSIX and it doesn't lock files over
+NFS. We didn't consider a lack of NFS support a problem since SWMR isn't
+supported on networked file systems like NFS (write order preservation isn't
+guaranteed) and `flock(2)` usually doesn't fail when you attempt to lock NFS
+files.
+
+On Unix systems without `flock(2)`, we implement a scheme based on `fcntl(2)`
+(`Pflock()` in `H5system.c`). On these systems we use `F_SETLK` (non-blocking)
+as the operation and set `l_type` in `struct flock` to be:
+
+- `F_UNLOCK` for `LOCK_UN`
+- `F_WRLOCK` for `LOCK_EX`
+- `F_RDLOCK` for `LOCK_SH`
+
+We set the range to be the entire file. Most Unix-like systems have `flock()`
+these days, so this system probably isn't very well tested.
+
+We don't use `fcntl`-based open file locks or mandatory locking anywhere. The
+former scheme is non-POSIX and the latter is deprecated.
+
+On Windows, we use `LockFileEx()` and `UnlockFileEx()` to lock the entire file
+(`Wflock()` in `H5system.c`). We set `LOCKFILE_FAIL_IMMEDIATELY` to get
+non-blocking locks and set `LOCKFILE_EXCLUSIVE_LOCK` when we want an exclusive
+lock. SWMR isn't well-tested on Windows, so this scheme hasn't been as
+thoroughly vetted as the `flock`-based scheme.
+
+On non-Windows systems where neither `flock(2)` nor `fcntl(2)` is available,
+we substitute a no-op stub that always succeeds (`Nflock()` in `H5system.c`).
+In the past, the stub always failed (see the matrix for when we made the switch).
+We currently know of no non-Windows systems where neither call is available
+so this scheme is not well-tested.
+
+One thing that should be immediately apparent to anyone familiar with file
+locking, is that all of these schemes have subtly different semantics. We're
+using file locking in a fairly crude manner, though, and lock use has always
+been optional, so we consider this a lower-order concern.
+
+Locks are implemented at the VFD level via `lock` and `unlock` callbacks. The
+VFDs that implement file locks are: core (w/ backing store), direct, log, sec2,
+and stdio (`flock(2)` locks only). The family, multi, and splitter VFDs invoke
+the lock callback of their underlying sub-files. The onion and MPI-IO VFDs do NOT
+use locks, even though they create normal, on-disk native HDF5 files. The
+read-only S3 VFD and HDFS VFDs do not use file locking since they use
+alternative storage schemes.
+
+Lock failures are detected by checking to see if `errno` is set to `ENOSYS`.
+This is not particularly sophisticated and was implemented as a way of working
+around disabled locks on popular parallel file systems.
+
+One other thing to note here is that, in all of the locking schemes we use, the
+file locks do not survive process termination, so you don't have to worry
+about files being locked forever if a process exits abnormally. If a writer
+crashed and the library didn't clear the superblock mark, you can remove it with
+the h5clear command-line tool, which is built with the library.