summaryrefslogtreecommitdiffstats
path: root/src/H5FDsubfiling/H5FDsubfile_int.c
diff options
context:
space:
mode:
authorjhendersonHDF <jhenderson@hdfgroup.org>2022-08-09 23:05:37 (GMT)
committerGitHub <noreply@github.com>2022-08-09 23:05:37 (GMT)
commitef33ac8bac5fd201b41d1a3084f03834f47729a2 (patch)
treead4756b872abff6d16f11d9a6c6c949e8f359cad /src/H5FDsubfiling/H5FDsubfile_int.c
parentb84241e57a97309b15846da4cc74611a66d92f6d (diff)
downloadhdf5-ef33ac8bac5fd201b41d1a3084f03834f47729a2.zip
hdf5-ef33ac8bac5fd201b41d1a3084f03834f47729a2.tar.gz
hdf5-ef33ac8bac5fd201b41d1a3084f03834f47729a2.tar.bz2
Subfiling VFD - tidying up and fixing a few new testing failures (#1977)
* Rename Subfiling IOC "thread_pool_count" field to "thread_pool_size" * Add simple HDF5 example for Subfiling VFD * Subfiling VFD - never cache app topology as it may change * Subfiling VFD - cleanup unused funtionality and tidy up some TODOs * Subfiling VFD - tidy up subfiling error handling in H5subfiling_common.c * Subfiling VFD - show number of failed I/O requests on close * Subfiling VFD - Update file cmp callback after switching to MPI I/O VFD * Amend RELEASE.txt with info about h5fuse.sh and Subfiling limitations * Subfiling VFD - switch to using H5_basename and H5_dirname
Diffstat (limited to 'src/H5FDsubfiling/H5FDsubfile_int.c')
-rw-r--r--src/H5FDsubfiling/H5FDsubfile_int.c53
1 files changed, 53 insertions, 0 deletions
diff --git a/src/H5FDsubfiling/H5FDsubfile_int.c b/src/H5FDsubfiling/H5FDsubfile_int.c
index af14db3..22a5bd0 100644
--- a/src/H5FDsubfiling/H5FDsubfile_int.c
+++ b/src/H5FDsubfiling/H5FDsubfile_int.c
@@ -192,6 +192,59 @@ done:
* invalid data if other ranks perform writes while this
* operation is in progress.
*
+ * SUBFILING NOTE:
+ * The EOF calculation for subfiling is somewhat different
+ * than for the more traditional HDF5 file implementations.
+ * This statement derives from the fact that unlike "normal"
+ * HDF5 files, subfiling introduces a multi-file representation
+ * of a single HDF5 file. The plurality of sub-files represents
+ * a software RAID-0 based HDF5 file. As such, each sub-file
+ * contains a designated portion of the address space of the
+ * virtual HDF5 storage. We have no notion of HDF5 datatypes,
+ * datasets, metadata, or other HDF5 structures; only BYTES.
+ *
+ * The organization of the bytes within sub-files is consistent
+ * with the RAID-0 striping, i.e. there are IO Concentrators
+ * (IOCs) which correspond to a stripe-count (in Lustre) as
+ * well as a stripe_size. The combination of these two
+ * variables determines the "address" (a combination of IOC
+ * and a file offset) of any storage operation.
+ *
+ * Having a defined storage layout, the virtual file EOF
+ * calculation should be the MAXIMUM value returned by the
+ * collection of IOCs. Every MPI rank which hosts an IOC
+ * maintains its own EOF by updating that value for each
+ * WRITE operation that completes, i.e. if a new local EOF
+ * is greater than the existing local EOF, the new EOF
+ * will replace the old. The local EOF calculation is as
+ * follows.
+ * 1. At file creation, each IOC is assigned a rank value
+ * (0 to N-1, where N is the total number of IOCs) and
+ * a 'sf_base_addr' = 'subfile_rank' * 'sf_stripe_size')
+ * we also determine the 'sf_blocksize_per_stripe' which
+ * is simply the 'sf_stripe_size' * 'n_ioc_concentrators'
+ *
+ * 2. For every write operation, the IOC receives a message
+ * containing a file_offset and the data_size.
+ *
+ * 3. The file_offset + data_size are in turn used to
+ * create a stripe_id:
+ * IOC-(ioc_rank) IOC-(ioc_rank+1)
+ * |<- sf_base_address |<- sf_base_address |
+ * ID +--------------------+--------------------+
+ * 0:|<- sf_stripe_size ->|<- sf_stripe_size ->|
+ * 1:|<- sf_stripe_size ->|<- sf_stripe_size ->|
+ * ~ ~ ~
+ * N:|<- sf_stripe_size ->|<- sf_stripe_size ->|
+ * +--------------------+--------------------+
+ *
+ * The new 'stripe_id' is then used to calculate a
+ * potential new EOF:
+ * sf_eof = (stripe_id * sf_blocksize_per_stripe) + sf_base_addr
+ * + ((file_offset + data_size) % sf_stripe_size)
+ *
+ * 4. If (sf_eof > current_sf_eof), then current_sf_eof = sf_eof.
+ *
* Return: SUCCEED/FAIL
*
* Programmer: JRM -- 1/18/22