From a2f7a64aa0351970054ff1334e1d82efe365a8ac Mon Sep 17 00:00:00 2001 From: Albert Cheng Date: Tue, 1 Mar 2005 21:30:43 -0500 Subject: [svn-r10116] Purpose: Bug fix. Description: "testphdf -p" would with data verification errors. The reasons were that the MPIPOSIX driver file open and close, especially the close routine provide no "coordination" between processes. The testphdf5 tests reuse the same file for test data file by opening using H5Fcreate with the HDF5_FCC_TRUNC option. The test routines do not provide any code to ensure that all processes have finished one test before moving to the next test. Some "faster" process would have finished verifying its portion of data as correct and move to the next test which opens the same file with TRUNCATOIN which truncates the previous data file. But some "slower" processes are still verifying the "previous" data file which all of a sudden got truncated by the "faster" process. Solution: Technically, the test program should be fixed to ensure all processes have finished one test before any is allowed to move to the next test. OTOH, the MPIO VFD has no problem with this test because MPI-IO requires File open and close be called collectively correct and ensure it is returned properly. I choose to fix the MPIPOSIX close routine to provide some sort of coordination between processes by requiring all processes to have completed the close of a file before it is returned to user space. This makes the MPIPOSIX close routine behaves more like the MPIO close routine, thus provide more protection for user applications that fail to code in the coordination. But having the barrier in the MPIPOSIX close routine would penalize applications where it is "okay" for some processes to close its file handle and race ahead to do other things since it is not going to access this file, therefore whether other processes are still using the file is immaterial. Maybe this protective coordination should be optional and can be turned off by confident users who need not this sort of protection. Platforms tested: "h5committested" and tested in modi4 and tesla. Misc. update: --- src/H5FDmpiposix.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/H5FDmpiposix.c b/src/H5FDmpiposix.c index de5c444..294b13e 100644 --- a/src/H5FDmpiposix.c +++ b/src/H5FDmpiposix.c @@ -943,6 +943,8 @@ H5FD_mpiposix_close(H5FD_t *_file) if (HDclose(file->fd)<0) HGOTO_ERROR(H5E_IO, H5E_CANTCLOSEFILE, FAIL, "unable to close file") + /* make sure all processes have closed the file before returning. */ + MPI_Barrier(file->comm); /* Clean up other stuff */ MPI_Comm_free(&file->comm); H5MM_xfree(file); -- cgit v0.12