From a65c53731fca816274bdc1419ec3f872cc094387 Mon Sep 17 00:00:00 2001 From: Neil Fortner Date: Tue, 27 Dec 2022 17:28:38 -0600 Subject: Add note to known problems about MPI implementations derived from buggy MPICH or OpenMPI versions (#2368) * Add note to known problems about MPI implementations derived from buggy MPICH or OpenMPI versions. Fix typo. * Add note about increasing thread limit with OpenMPI and subfiling --- release_docs/RELEASE.txt | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/release_docs/RELEASE.txt b/release_docs/RELEASE.txt index 198f157..2262afb 100644 --- a/release_docs/RELEASE.txt +++ b/release_docs/RELEASE.txt @@ -2147,7 +2147,7 @@ Known Problems * at the National Energy Research Scientific Computing * * Center, NERSC. * * Doing so may cause a system disruption due to subfiling * - * crashing Lustre. The sytem's Lustre bug is expected * + * crashing Lustre. The system's Lustre bug is expected * * to be resolved by 2023. * * * ************************************************************ @@ -2166,11 +2166,21 @@ Known Problems https://www.hdfgroup.org/2022/11/workarounds-for-openmpi-bug-exposed-by-make-check-in-hdf5-1-13-3/ + When using the subfiling feature with OpenMPI it is often necessary to + increase the maximum number of threads: + + --mca common_pami_max_threads 4096 + There is a bug in MPICH 4.0.0-4.0.3 where using device=ch4:ofi (the default) can cause failures in the testphdf5 test program. Using ch4:ucx or ch3 allows the test to pass. The bug appears to be fixed in the upcoming 4.1 release. + These MPI implementation bugs may also be present in implementations derived + from OpenMPI or MPICH. The workarounds listed above may need to be adjusted + to match the derived implementation, or in some cases, there may be no + workaround. + The accum test fails on MacOS 12.6.2 (Monterey) with clang 14.0.0. The reason for this failure and its impact are unknown. -- cgit v0.12