From 99ef5765f5e25402d97f1355c0e049dc2b746228 Mon Sep 17 00:00:00 2001 From: Jonathan Kim Date: Wed, 1 Aug 2012 12:07:46 -0500 Subject: [svn-r22618] Purpose: HDFFV-8003 - ph5diff (parallel h5diff): daily test failure on ember intermittently during non comparable test file comparison HDFFV-7755 - parallel h5diff : hanging on koala intermittently during non comparable test file comparison Description: non-comparable test intermittently hung on koala and ember, but not on jam. it didn't occur until -np reaches 4 or bigger. it occurred once out of many repeated attempts of the same test. There was a incorrectly (mistakenly?) duplicated code in MPI section which caused such hang in a certain condition. The test used more processes than other tests, which increased chance to trigger more undone processes, and such process could enter the incorrect code section and wait for wrong pair of send. it explains why it occurred intermittently according to machine condition and using a certain feature. Removed incorrect code which blocked correct code. Tested: some manually repeated test performed jam (linux32-LE), koala (linux64-LE), ostrich (linuxppc64-BE) --- release_docs/RELEASE.txt | 4 ++++ tools/h5diff/testh5diff.sh | 9 +-------- tools/lib/h5diff.c | 8 -------- 3 files changed, 5 insertions(+), 16 deletions(-) diff --git a/release_docs/RELEASE.txt b/release_docs/RELEASE.txt index a6b5d96..33599b1 100644 --- a/release_docs/RELEASE.txt +++ b/release_docs/RELEASE.txt @@ -705,6 +705,10 @@ Bug Fixes since HDF5-1.8.0 release Tools ----- + - ph5diff: Fixed intermittent hang issue on a certain operation in + parallel mode. It was detected by daily test for comparing + non-comparable objects, but it could have occurred in other + operations depend on machine condition. HDFFV-8003 (JKM 2012/08/01) - h5diff: Fixed test failure for "make check" due to failure of copying test files when performed in HDF5 source tree. Also applied to other tools. diff --git a/tools/h5diff/testh5diff.sh b/tools/h5diff/testh5diff.sh index 86a0c9d..7e95e80 100755 --- a/tools/h5diff/testh5diff.sh +++ b/tools/h5diff/testh5diff.sh @@ -836,14 +836,7 @@ TOOLTEST h5diff_221.txt -c non_comparables1.h5 non_comparables2.h5 /g2 # entire file # All the comparables should display differences. -if test -n "$pmode"; then - # parallel mode: - # skip due to ph5diff hangs on koala (linux64-LE) and ember intermittently. - # (HDFFV-8003 - TBD) - SKIP -c non_comparables1.h5 non_comparables2.h5 -else - TOOLTEST h5diff_222.txt -c non_comparables1.h5 non_comparables2.h5 -fi +TOOLTEST h5diff_222.txt -c non_comparables1.h5 non_comparables2.h5 # non-comparable test for common objects (same name) with different object types # (HDFFV-7644) diff --git a/tools/lib/h5diff.c b/tools/lib/h5diff.c index bcd63f1..0c1f3d3 100644 --- a/tools/lib/h5diff.c +++ b/tools/lib/h5diff.c @@ -1411,14 +1411,6 @@ hsize_t diff_match(hid_t file1_id, const char *grp1, trav_info_t *info1, options->not_cmp = options->not_cmp | nFoundbyWorker.not_cmp; busyTasks--; } /* end if */ - else if(Status.MPI_TAG == MPI_TAG_TOK_RETURN) - { - MPI_Recv(&nFoundbyWorker, sizeof(nFoundbyWorker), MPI_BYTE, Status.MPI_SOURCE, MPI_TAG_DONE, MPI_COMM_WORLD, &Status); - nfound += nFoundbyWorker.nfound; - options->not_cmp = options->not_cmp | nFoundbyWorker.not_cmp; - busyTasks--; - havePrintToken = 1; - } /* end else-if */ else if(Status.MPI_TAG == MPI_TAG_TOK_REQUEST) { MPI_Recv(NULL, 0, MPI_BYTE, Status.MPI_SOURCE, MPI_TAG_TOK_REQUEST, MPI_COMM_WORLD, &Status); -- cgit v0.12