|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moddified testpar/t_vfd.c to test the subfiling vfd with default configuration.
Must update this code to run with a variety of configurations -- most particularly
multiple IO concentrators, and stripe depth small enough to test the other IO
concentrators.
testpar/t_vfd.c exposed a large number of race condidtions -- symtoms included:
1) Crashes (usually seg faults)
2) Heap corruption
3) Stack corruption
4) Double frees of heap space
5) Hangs
6) Out of order execution of I/O requests / violations of POSIX semantics
7) Swapped write requests
Items 1 - 4 turned out to be primarily caused by file close issues --
specifically, the main I/O concentrator thread and its pool of worker threads
were not being shut down properly on file close. Addressing this issue in
combination with some other minor fixes seems to have addressed these issues.
Items 5 & 6 appear to have been caused by issue of I/O requests to the
thread pool in an order that did not maintain POSIX semantics. A rewrite of
the I/O request dispatch code appears to have solved these issues.
Item 7 seems to have been caused by multiple write requests from a given
rank being read by the wrong worker thread. Code to issue "unique" tags for
each write request via the ACK message appears to have cleaned this up.
Note that the code is still in poor condtition. A partial list of known
defects includes:
a) Race condiditon on file close that allows superblock writes to arrive
at the I/O concentrator after it has been shutdown. This defect is
most evident when testpar/t_subfiling_vfd is run with 8 ranks.
b) No error reporting from I/O concentrators -- must design and implement
this. For now, mostly just asserts, which suggests that it should be
run in debug mode.
c) Much commented out and/or un-used code.
d) Code orgnaization
e) Build system with bits of Mercury is awkward -- think of shifting
to pthreads with our own thread pool code.
f) Need to add native support for vector and selection I/O to the subfiling
VFD.
g) Need to review, and posibly rework configuration code.
h) Need to store subfile configuration data in a superblock extension message,
and add code to use this data on file open.
i) Test code is inadequate -- expect more issues as it is extended.
In particular, there is no unit test code for the I/O request dispatch code.
While I think it is correct at present, we need test code to verify this.
Similarly, we need to test with multiple I/O concentrators and much smaller
stripe depth.
My actual code changes were limited to:
src/H5FDioc.c
src/H5FDioc_threads.c
src/H5FDsubfile_int.c
src/H5FDsubfile_mpi.c
src/H5FDsubfiling.c
src/H5FDsubfiling.h
src/H5FDsubfiling_priv.h
testpar/t_subfiling_vfd.c
testpar/t_vfd.c
I'm not sure what is going on with the deletions in src/mercury/src/util.
Tested parallel/debug on Charis and Jelly
|