diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/file-locking.md | 366 |
1 files changed, 366 insertions, 0 deletions
diff --git a/doc/file-locking.md b/doc/file-locking.md new file mode 100644 index 0000000..4f7fb39 --- /dev/null +++ b/doc/file-locking.md @@ -0,0 +1,366 @@ +# File Locking in HDF5 + +This document describes the file locking scheme that was added to HDF5 in +version 1.10.0 and how you can work around it, if you choose to do so. I'll +try to keep it understandable for everyone, though diving into technical +details is unavoidable, given the complexity of the material. We're in the +process of converting the HDF5 user guide (UG) to Doxygen and this document +will eventually be rolled up into those files as we update things. + +**Parallel HDF5 Note** + +Everything written here is from the perspective of serial HDF5. When we say +that you can't access a file for write access from more than one process, we +mean "from more than one independent, serial process". Parallel HDF5 can +obviously write to a file from more than one process, but that involves +IPC and multiple processes working together, not independent processes with +no knowledge of each other, which is what the file locks are for. + + +## Why file locks? + +The short answer is: "To prevent you from corrupting your HDF5 files and/or +crashing your reader processes." + +The long answer is more complicated. + +An HDF5 file's state exists in two places when it is open for writing: + +1. The HDF5 file itself +2. The HDF5 library's various caches + +One of those caches is the metadata cache, which stores things like B-tree +nodes that we use to locate data in the file. Problems arise when parent +objects are flushed to storage before child objects. If a reader tries to +load unflushed children, the object's file offset could point at garbage +and it will encounter library failures as it tries to access the non-existent +objects. + +Keep in mind that the HDF5 library is not analogous to a database server. The +HDF5 library is just a simple shared library, like libjpeg. Library state is +maintained per-library-instance and there is no IPC between HDF5 libraries +loaded by different processes (exception: collective operations in parallel +HDF5, but that's not what were talking about here). + +Prior to HDF5 1.10.0, concurrent access to an HDF5 file by multiple processes, +when one or more processes is a writer, was not supported. There was no +enforcement mechanism for this. We simply told people not to do it. + +In HDF5 1.10.0, we updated the library to allow the single-writer / multiple-readers +(SWMR - pronounced "swimmer") access pattern. This setup allows one writer and +multiple readers to access the same file, as long as a certain protocol is +followed concerning file opening order and setting the right flags. Since +synchronization might be tricky to pull off and the consequences of getting +it wrong could result in corrupt files or crashed readers, we decided to add +a file locking scheme to help users get it right. Since this would also help +prevent harmful accesses when SWMR is not in use, we decided to switch the +file locking scheme on by default. This scheme has been carried forward into +HDF5 1.12 and 1.13 (soon to be 1.14). + +Note that the current implementation of SWMR is only useful for appending to chunked +datasets. Creating file objects like groups and datasets is not supported +in the current SWMR implementation. + +Unfortunately, this file locking scheme has caused problems for some users. +This is usually people who are working on network file systems like NFS or +on parallel file systems, especially when file locks have been disabled, which +often causes lock calls to fail. As a result of this, we've added work-arounds +to disable the file locking scheme over the years. + +## The existing scheme + +There are two parts to the file locking scheme. One is the file lock itself. +The second is a mark we make in the HDF5 file's superblock. The superblock +mark isn't really that important for understanding the file locking, but since +it's entwined with the file locking scheme, we'll cover it in the +algorithm below. The lower-level details of file lock implementations are +described in the appendix, but the semantics are straightforward: Locks are +mandatory unless disabled, always for the entire file, and non-blocking. They +are also not required for SWMR operations and simply exist to help you set up +SWMR and prevent dangerous file access. + +Here's how it all works: + +1. The first thing we do is check if we're using file locks + + - We first check the file locking property in the file access property list + (fapl). The default value of this property is set at configure time when + the library is built. + - Next we check the value of the `HDF5_USE_FILE_LOCKING` environment variable, + which was previously parsed at library startup. If this is set, + we use the value to override the property list setting. + + The particulars of the ways you can disable file locks are described in a + separate section below. + + If we are not using file locking, no further file locking operations will + take place. + +2. We also check for ignoring file locks when they are disabled on the file system. + + - The environment variable setting for this is checked at VFD initialization + time for all library VFDs. + - We check the value in the fapl in the `open` callback. The default value for + this property was set at configure time when the library was built. + +3. When we open a file, we lock it based on the file access flags: + + - If the `H5F_ACC_RDWR` flag is set, use an exclusive lock + - Otherwise use a shared lock + + If we are ignoring disabled file locks (see below), we will silently swallow + lock API call failure when locks are not implemented on the file system. + +4. If the VFD supports locking and the file is open for writing, we mark the + file consistency flags in the file's superblock to indicate this. + + **NOTE!** + + - The VFD has to have a lock callback for this to happen. It doesn't matter if + the locking was disabled - the check is simply for the callback. + - We mark the superblock in **ANY** write case - both SWMR and non-SWMR. + - Only the latest version of the superblock is marked in this way. If you + open up a file that wasn't created with the 1.10.0 or later file format, + it won't get the superblock mark, even if it's been opened for writing. + + According to the file format document and H5Fpkg.h: + + - Bit 0 is set if the file is open for writing (`H5F_SUPER_WRITE_ACCESS`) + - Bit 2 is set if the file is open for SWMR writing (`H5F_SUPER_SWMR_WRITE_ACCESS`) + + We check these superblock flags on file open and error out if they are + unsuitable. + + - If the file is already opened for non-SWMR writing, no other process can open + it. + - If the file is open for SWMR writing, only SWMR readers can open the file. + - If you try to open a file for reading with `H5F_ACC_SWMR_READ` set and the + file does not have the SWMR writer bits set in the superblock, the open + call will fail. + + This scheme is often confused with the file locking, so it's included here, + even though it's a bit tangential to the locks themselves. + +5. If the file is open for SWMR writing (`H5F_ACC_SWMR_WRITE` is set), we + remove the file lock just before the open call completes. + +6. We normally don't explicitly unlock the file on file close. We let the OS + handle it when the file descriptors are closed since file locks don't + normally surivive closing the underlying file descriptor. + +**TL;DR** + +When locks are available, HDF5 files will be exclusively locked while they are +in use. The exception to this are files that are opened for SWMR writing, which +are unlocked. Files that are open for any kind of writing get a "writing" +superblock mark that HDF5 1.10.0+ will respect and refuse to open outside of SWMR. + +## `H5Fstart_swmr_write()` + +This API call can be used to switch an open file to "SWMR writing" mode as +if it had been opened with the `H5F_ACC_SWMR_WRITE` flag set. This is used +when code needs to perform SWMR-forbidden operations like creating groups +and datasets before appending data to datasets using SWMR. + +Most of the work of this API call involves flushing out the library caches +in preparation for SWMR access, but there are a few locking operations that +take place under the hood: + +- The file's superblock is marked as in the SWMR writer case, above. +- For a brief period of time in the call, we convert the exclusive lock to + a shared lock. It's unclear why this was done and we'll look into removing + this. +- At the end of the call, the lock is removed, as in the SWMR write open + case described above. + +## Disabling the locks + +There are several ways to disable the locks, depending on which version of the +HDF5 library you are working with. This section will describe the file lock +disable schemes as they exist in late 2022. The current library versions at +this time are 1.10.9, 1.12.3, and 1.13.2. File locks are not present in HDF5 +1.8. The lock feature matrix later in this document will describe the +limitations of earlier versions. + +### Configure option + +You can set the file locking defaults at configure time. This sets the defaults +for the associated properties in the fapl. Users can override the configure +defaults using `H5Pset_file_locking()` or the `HDF5_USE_FILE_LOCKING` +environment variable. + +- Autotools + + `--enable-file-locking=(yes|no|best-effort)` sets the file locking behavior. + `on` and `off` should be self-explanatory. `best-effort` turns file locking + on but ignores file locks when they are disabled (default: `best-effort`). + +- CMake + + - set `IGNORE_DISABLED_FILE_LOCK` to `ON` to ignore file locks when they + are disabled on the file system (default: `ON`). + - set `HDF5_USE_FILE_LOCKING` to `OFF` to disable file locks (default: `ON`) + +### `H5Pset_file_locking()` + +This API call can be used to override the configure defaults. It takes +`hbool_t` parameters for both the file locking and "ignore file locks when +disabled on the file system" parameters. The values set here can be +overridden by the file locking environment variable. + +There is a corresponding `H5Pget_file_locking()` call that can be used to check +the currently set values of both properties in the fapl. **NOTE** that this +call just checks the property list values. It does **NOT** check the +environment variables! + +### Environment variables + +The `HDF5_USE_FILE_LOCKING` environment variable overrides all other file +locking settings. + +HDF5 1.10.0 +- No file locking environment variable + +HDF5 1.10.1 - 1.10.6, 1.12.0: +- `FALSE` turns file locking off +- Anything else turns file locking on +- Neither of these values ignores disabled file locks +- Environment variable parsed at file create/open time + +HDF5 1.10.7+, 1.12.1+, 1.13.x: +- `FALSE` or `0` disables file locking +- `TRUE` or `1` enables file locking +- `BEST_EFFORT` enables file locking and ignores disabled file locks +- Anything else gives you the defaults +- Environment variable parsed at library startup + +### Lock disable scheme interactions + +As mentioned above and reiterated here: +- Configure-time settings set fapl defaults +- `H5Pset_file_locking()` overrides configure-time defaults +- The environment variable setting overrides all + +If you want to check that file locking is on, you'll need to check the fapl +setting AND check the environment variable, which can override the fapl. + +**!!! WARNING !!!** + +Disabling the file locks is at your own risk. If more than one writer process +modifies an HDF5 file at the same time, the file could be corrupted. If a +reader process reads a file that is being modified by a writer, the writer +process might attempt to read garbage and encounter errors or even crash. + +In the case of: + +- A single process accessing a file with write access +- Any number of processes accessing a file read-only + +You can safely disable the file locking scheme. + +If you are trying to set up SWMR without the benefit of the file locks, you'll +just need to be extra careful that you hold to rules for SWMR access. + +## Feature Matrix + +The following table indicates which versions of the library support which file +lock features. 1.13.0 and 1.13.1 are experimental releases (basically glorified +release candidates) so they are not included here. + +**Locks** + +- P = POSIX locks only, Windows was a no-op that always succeeded +- WP = POSIX and Windows locks +- (-) = POSIX no-op lock fails +- (+) = POSIX no-op lock passes + +**Configure Option and Environment Variable** + +- on/off = sets file locks on/off +- try = can also set "best effort", where locks are on but ignored if disabled + +|Version|Has locks|Configure option|`H5Pset_file_locking()`|`HDF5_USE_FILE_LOCKING`| +|-------|---------|----------------|-----------------------|-----------------------| +|1.8.x|No|-|-|-| +|1.10.0|P(-)|-|-|-| +|1.10.1|P(-)|-|-|on/off| +|1.10.2|P(-)|-|-|on/off| +|1.10.3|P(-)|-|-|on/off| +|1.10.4|P(-)|-|-|on/off| +|1.10.5|P(-)|-|-|on/off| +|1.10.6|P(-)|-|-|on/off| +|1.10.7|P(+)|try|Y|try| +|1.10.8|WP(+)|try|Y|try| +|1.10.9|WP(+)|try|Y|try| +|1.12.0|P(-)|-|-|on/off| +|1.12.1|WP(+)|try|Y|try| +|1.12.2|WP(+)|try|Y|try| +|1.13.2|WP(+)|try|Y|try| + + +## Appendix: File lock implementation + +The file lock system is implemented with `flock(2)` as the archetype since it +has simple semantics and we don't need range locking. Locks are advisory on many +systems, but this shouldn't be a problem for most users since the HDF5 library +always respects them. If you have a program that parses or modifies HDF5 files +independently of the HDF5 library, you'll want to be mindful of any potential +for concurrent access across processes. + +On Unix systems, we call `flock()` directly when it's available and pass +`LOCK_SH` (shared lock), `LOCK_EX` (exclusive lock), and `LOCK_UN` (unlock) as +described in the algorithm section. All locks are non-blocking, so we set the +`LOCK_NB` flag. Sadly, `flock(2)` is not POSIX and it doesn't lock files over +NFS. We didn't consider a lack of NFS support a problem since SWMR isn't +supported on networked file systems like NFS (write order preservation isn't +guaranteed) and `flock(2)` usually doesn't fail when you attempt to lock NFS +files. + +On Unix systems without `flock(2)`, we implement a scheme based on `fcntl(2)` +(`Pflock()` in `H5system.c`). On these systems we use `F_SETLK` (non-blocking) +as the operation and set `l_type` in `struct flock` to be: + +- `F_UNLOCK` for `LOCK_UN` +- `F_WRLOCK` for `LOCK_EX` +- `F_RDLOCK` for `LOCK_SH` + +We set the range to be the entire file. Most Unix-like systems have `flock()` +these days, so this system probably isn't very well tested. + +We don't use `fcntl`-based open file locks or mandatory locking anywhere. The +former scheme is non-POSIX and the latter is deprecated. + +On Windows, we use `LockFileEx()` and `UnlockFileEx()` to lock the entire file +(`Wflock()` in `H5system.c`). We set `LOCKFILE_FAIL_IMMEDIATELY` to get +non-blocking locks and set `LOCKFILE_EXCLUSIVE_LOCK` when we want an exclusive +lock. SWMR isn't well-tested on Windows, so this scheme hasn't been as +thoroughly vetted as the `flock`-based scheme. + +On non-Windows systems where neither `flock(2)` nor `fcntl(2)` is available, +we substitute a no-op stub that always succeeds (`Nflock()` in `H5system.c`). +In the past, the stub always failed (see the matrix for when we made the switch). +We currently know of no non-Windows systems where neither call is available +so this scheme is not well-tested. + +One thing that should be immediately apparent to anyone familiar with file +locking, is that all of these schemes have subtly different semantics. We're +using file locking in a fairly crude manner, though, and lock use has always +been optional, so we consider this a lower-order concern. + +Locks are implemented at the VFD level via `lock` and `unlock` callbacks. The +VFDs that implement file locks are: core (w/ backing store), direct, log, sec2, +and stdio (`flock(2)` locks only). The family, multi, and splitter VFDs invoke +the lock callback of their underlying sub-files. The onion and MPI-IO VFDs do NOT +use locks, even though they create normal, on-disk native HDF5 files. The +read-only S3 VFD and HDFS VFDs do not use file locking since they use +alternative storage schemes. + +Lock failures are detected by checking to see if `errno` is set to `ENOSYS`. +This is not particularly sophisticated and was implemented as a way of working +around disabled locks on popular parallel file systems. + +One other thing to note here is that, in all of the locking schemes we use, the +file locks do not survive process termination, so you don't have to worry +about files being locked forever if a process exits abnormally. If a writer +crashed and the library didn't clear the superblock mark, you can remove it with +the h5clear command-line tool, which is built with the library. |