diff options
Diffstat (limited to 'release_docs/INSTALL_parallel')
| -rw-r--r-- | release_docs/INSTALL_parallel | 211 |
1 files changed, 74 insertions, 137 deletions
diff --git a/release_docs/INSTALL_parallel b/release_docs/INSTALL_parallel index d771c0b..d3d7830 100644 --- a/release_docs/INSTALL_parallel +++ b/release_docs/INSTALL_parallel @@ -1,12 +1,24 @@ Installation instructions for Parallel HDF5 ------------------------------------------- +0. Use Build Scripts +-------------------- +The HDF Group is accumulating build scripts to handle building parallel HDF5 +on various platforms (Cray, IBM, SGI, etc...). These scripts are being +maintained and updated continuously for current and future systems. The reader +is strongly encouraged to consult the repository at, + +https://github.com/HDFGroup/build_hdf5 + +for building parallel HDF5 on these system. All contributions, additions +and fixes to the repository are welcomed and encouraged. + 1. Overview ----------- This file contains instructions for the installation of parallel HDF5 (PHDF5). It is assumed that you are familiar with the general installation steps as -described in the INSATLL file. Get familiar with that file before trying +described in the INSTALL file. Get familiar with that file before trying the parallel HDF5 installation. The remaining of this section explains the requirements to run PHDF5. @@ -17,19 +29,22 @@ of running the parallel test suites. 1.1. Requirements ----------------- -PHDF5 requires an MPI compiler with MPI-IO support and a parallel file system. -If you don't know yet, you should first consult with your system support staff -of information how to compile an MPI program, how to run an MPI application, -and how to access the parallel file system. There are sample MPI-IO C and -Fortran programs in the appendix section of "Sample programs". You can use -them to run simple tests of your MPI compilers and the parallel file system. +PHDF5 requires an MPI compiler with MPI-IO support and a POSIX compliant +(Ref. 1) parallel file system. If you don't know yet, you should first consult +with your system support staff of information how to compile an MPI program, +how to run an MPI application, and how to access the parallel file system. +There are sample MPI-IO C and Fortran programs in the appendix section of +"Sample programs". You can use them to run simple tests of your MPI compilers +and the parallel file system. 1.2. Further Help ----------------- -If you still have difficulties installing PHDF5 in your system, please send -mail to - help@hdfgroup.org + +For help with installing, questions can be posted to the HDF Forum or sent to the HDF Helpdesk: + + HDF Forum: https://forum.hdfgroup.org/ + HDF Helpdesk: https://portal.hdfgroup.org/display/support/The+HDF+Help+Desk In your mail, please include the output of "uname -a". If you have run the "configure" command, attach the output of the command and the content of @@ -48,54 +63,15 @@ more detailed explanations. ---------------------------- HDF5 knows several parallel compilers: mpicc, hcc, mpcc, mpcc_r. To build parallel HDF5 with one of the above, just set CC as it and configure. -The "--enable-parallel" is optional in this case. - $ CC=/usr/local/mpi/bin/mpicc ./configure --prefix=<install-directory> + $ CC=/usr/local/mpi/bin/mpicc ./configure --enable-parallel --prefix=<install-directory> $ make # build the library $ make check # verify the correctness # Read the Details section about parallel tests. $ make install -2.2. IBM SP ------------ -During the build stage, the H5detect is compiled and executed to generate -the source file H5Tinit.c which is compiled as part of the HDF5 library. In -parallel mode, make sure your environment variables are set correctly to -execute a single process mpi application. Otherwise, multiple processes -attempt to write to the same H5Tinit.c file, resulting in a scrambled -source file. Unfortunately, the setting varies from machine to machine. -E.g., the following works for the IBM SP machine at LLNL. - - setenv MP_PROCS 1 - setenv MP_NODES 1 - setenv MP_LABELIO no - setenv MP_RMPOOL 0 - setenv LLNL_COMPILE_SINGLE_THREADED TRUE # for LLNL site only - -The shared library configuration is problematic. So, only static library -is supported. - -Then do the following steps: - - $ ./configure --disable-shared --prefix=<install-directory> - $ make # build the library - $ make check # verify the correctness - # Read the Details section about parallel tests. - $ make install - -We also suggest that you add "-qxlf90=autodealloc" to FFLAGS when building -parallel with fortran enabled. This can be done by invoking: - - setenv FFLAGS -qxlf90=autodealloc # 32 bit build -or - setenv FFLAGS "-q64 -qxlf90=autodealloc" # 64 bit build - -prior to running configure. Recall that the "-q64" is necessary for 64 -bit builds. - - -2.3. Linux 2.4 and greater +2.2. Linux 2.4 and greater -------------------------- Be sure that your installation of MPICH was configured with the following configuration command-line option: @@ -106,83 +82,39 @@ This allows for >2GB sized files on Linux systems and is only available with Linux kernels 2.4 and greater. -2.4. Red Storm (Cray XT3) (for v1.8 and later) +2.3. Hopper (Cray XE6) (for v1.8 and later) ------------------------- -Both serial and parallel HDF5 are supported in Red Storm. -2.4.1 Building serial HDF5 for Red Storm ------------------------------------------- -The following steps are for building the serial HDF5 for the Red Storm -compute nodes. They would probably work for other Cray XT3 systems but have +The following steps are for building HDF5 for the Hopper compute +nodes. They would probably work for other Cray systems but have not been verified. -# Assume you already have a copy of HDF5 source code in directory `hdf5' and -# want to install the binary in directory `/project/hdf5/hdf5'. +Obtain the HDF5 source code: + https://portal.hdfgroup.org/display/support/Downloads -$ cd hdf5 -$ bin/yodconfigure configure -$ env RUNSERIAL="yod -sz 1" \ - CC=cc FC=ftn CXX=CC \ - ./configure --prefix=/project/hdf5/hdf5 -$ make -$ make check +The entire build process should be done on a MOM node in an interactive allocation and on a file system accessible by all compute nodes. +Request an interactive allocation with qsub: +qsub -I -q debug -l mppwidth=8 -# if all is well, install the binary. -$ make install +- create a build directory build-hdf5: + mkdir build-hdf5; cd build-hdf5/ -2.4.2 Building parallel HDF5 for Red Storm ------------------------------------------- -The following steps are for building the Parallel HDF5 for the Red Storm -compute nodes. They would probably work for other Cray XT3 systems but have -not been verified. +- configure HDF5: + RUNSERIAL="aprun -q -n 1" RUNPARALLEL="aprun -q -n 6" FC=ftn CC=cc /path/to/source/configure --enable-fortran --enable-parallel --disable-shared + + RUNSERIAL and RUNPARALLEL tell the library how it should launch programs that are part of the build procedure. + +- Compile HDF5: + gmake + +- Check HDF5 + gmake check -# Assume you already have a copy of HDF5 source code in directory `hdf5' and -# want to install the binary in directory `/project/hdf5/phdf5'. You also -# have done the proper setup to have mpicc and mpif90 as the compiler commands. - -$ cd hdf5 -$ bin/yodconfigure configure -$ env RUNSERIAL="yod -sz 1" RUNPARALLEL="yod -sz 3" \ - CC=cc FC=ftn \ - ./configure --enable-parallel --prefix=/project/hdf5/phdf5 -$ make -$ make check - -# if all is well, install the binary. -$ make install - -2.4.3 Red Storm known problems ------------------------------- -For Red Storm, a Cray XT3 system, the yod command sometimes gives the -message, "yod allocation delayed for node recovery". This interferes with -test suites that do not expect seeing this message. To bypass this problem, -I launch the executables with a command shell script called "myyod" which -consists of the following lines. (You should set $RUNSERIAL and $RUNPARALLEL -to use myyod instead of yod.) -==== myyod ======= -#!/bin/sh -# sleep 2 seconds to allow time for the node recovery else it pops the -# message, -# yod allocation delayed for node recovery -sleep 2 -yod $* -==== end of myyod ======= - -For Red Storm, a Cray XT3 system, the tools/h5ls/testh5ls.sh will fail on -the test "Testing h5ls -w80 -r -g tgroup.h5" fails. This test is -expected to fail and exit with a non-zero code but the yod command does -not propagate the exit code of the executables. Yod always returns 0 if it -can launch the executable. The test suite shell expects a non-zero for -this particular test, therefore it concludes the test has failed when it -receives 0 from yod. To bypass this problem for now, change the following -lines in the tools/h5ls/testh5ls.sh. -======== Original ========= -# The following combination of arguments is expected to return an error message -# and return value 1 -TOOLTEST tgroup-1.ls 1 -w80 -r -g tgroup.h5 -======== Skip the test ========= -echo SKIP TOOLTEST tgroup-1.ls 1 -w80 -r -g tgroup.h5 -======== end of bypass ======== +- Install HDF5 + gmake install + +The build will be in build-hdf5/hdf5/ (or whatever you specify in --prefix). +To compile other HDF5 applications use the wrappers created by the build (build-hdf5/hdf5/bin/h5pcc or h5fc) 3. Detail explanation @@ -215,28 +147,24 @@ a properly installed parallel compiler (e.g., MPICH's mpicc or IBM's mpcc_r) and supply the compiler name as the value of the CC environment variable. For examples, - $ CC=mpcc_r ./configure - $ CC=/usr/local/mpi/bin/mpicc ./configure - -If no such a compiler command is available then you must use your normal -C compiler along with the location(s) of MPI/MPI-IO files to be used. -For example, - - $ CPPFLAGS=-I/usr/local/mpi/include \ - LDFLAGS=-L/usr/local/mpi/lib/LINUX/ch_p4 \ - ./configure --enable-parallel=mpich + $ CC=mpcc_r ./configure --enable-parallel + $ CC=/usr/local/mpi/bin/mpicc ./configure --enable-parallel If a parallel library is being built then configure attempts to determine how to run a parallel application on one processor and on many processors. If the compiler is `mpicc' and the user hasn't specified values for RUNSERIAL and RUNPARALLEL then configure chooses `mpiexec' from the same directory as `mpicc': - RUNSERIAL: /usr/local/mpi/bin/mpiexec -np 1 - RUNPARALLEL: /usr/local/mpi/bin/mpiexec -np $${NPROCS:=6} + RUNSERIAL: mpiexec -n 1 + RUNPARALLEL: mpiexec -n $${NPROCS:=6} The `$${NPROCS:=6}' will be substituted with the value of the NPROCS environment variable at the time `make check' is run (or the value 6). +Note that some MPI implementations (e.g. OpenMPI 4.0) disallow oversubscribing +nodes by default so you'll have to either set NPROCS equal to the number of +processors available (or fewer) or redefine RUNPARALLEL with appropriate +flag(s) (--oversubscribe in OpenMPI). 4. Parallel test suite ---------------------- @@ -253,11 +181,6 @@ non-zero code. Failure to support file size greater than 2GB is not a fatal error for HDF5 because HDF5 can use other file-drivers such as families of files to bypass the file size limit. -The t_posix_compliant tests if the file system is POSIX compliant when POSIX -and MPI IO APIs are used. This is for information only and it always exits -with 0 even when non-compliance errors have occurred. This is to prevent -the test from aborting the remaining parallel HDF5 tests unnecessarily. - The t_cache does many small sized I/O requests and may not run well in a slow file system such as NFS disk. If it takes a long time to run it, try set the environment variable $HDF5_PARAPREFIX to a file system more suitable @@ -274,6 +197,20 @@ if the tests should use directory /PFS/user/me, do shell initial files like .profile, .cshrc, etc.) +Reference +--------- +1. POSIX Compliant. A good explanation is by Donald Lewin, + After a write() to a regular file has successfully returned, any + successful read() from each byte position on the file that was modified + by that write() will return the date that was written by the write(). A + subsequent write() to the same byte will overwrite the file data. If a + read() of a file data can be proven by any means [e.g., MPI_Barrier()] + to occur after a write() of that data, it must reflect that write(), + even if the calls are made by a different process. + Lewin, D. (1994). "POSIX Programmer's Guide (pg. 513-4)". O'Reilly + & Associates. + + Appendix A. Sample programs --------------------------- Here are sample MPI-IO C and Fortran programs. You may use them to run simple |
