1. Title: User Guide for SWMR Use Case Programs 2. Purpose: This is a User Guide of the SWMR Use Case programs. It describes the use case program and explain how to run them. 2.1. Author and Dates: Version 2: By Albert Cheng, 2013/06/18. Version 1: By Albert Cheng, 2013/06/01. %%%%Use Case 1.7%%%% 3. Use Case [1.7]: Appending a single chunk 3.1. Program name: use_append_chunk 3.2. Description: Appending a single chunk of raw data to a dataset along an unlimited dimension within a pre-created file and reading the new data back. It first creates one 3d dataset using chunked storage, each chunk is a (1, chunksize, chunksize) square. The dataset is (unlimited, chunksize, chunksize). Data type is 2 bytes integer. It starts out "empty", i.e., first dimension is 0. The writer then appends planes, each of (1,chunksize,chunksize) to the dataset. Fills each plan with plane number and then writes it at the nth plane. Increases the plane number and repeats till the end of dataset, when it reaches chunksize long. End product is a chunksize^3 cube. The reader is a separated process, running in parallel with the writer. It reads planes from the dataset. It expects the dataset is being changed (growing). It checks the unlimited dimension (dimension[0]). When it increases, it will read in the new planes, one by one, and verify the data correctness. (The nth plan should contain all "n".) When the unlimited dimension grows to the chunksize (it becomes a cube), that is the expected end of data, the reader exits. 3.3. How to run the program: Simplest way is $ use_append_chunk It creates a skeleton dataset (0,256,256) of shorts. Then fork off a process, which becomes the reader process to read planes from the dataset, while the original process continues as the writer process to append planes onto the dataset. Other possible options: 1. -z option: different chunksize. Default is 256. $ use_append_chunk -z 1024 It uses (1,1024,1024) chunks to produce a 1024^3 cube, about 2GB big. 2. -f filename: different dataset file name $ use_append_chunk -f /gpfs/tmp/append_data.h5 The data file is /gpfs/tmp/append_data.h5. This allows two independent processes in separated compute nodes to access the datafile on the shared /gpfs file system. 3. -l option: launch only the reader or writer process. $ use_append_chunk -f /gpfs/tmp/append_data.h5 -l w # in node X $ use_append_chunk -f /gpfs/tmp/append_data.h5 -l r # in node Y In node X, launch the writer process, which creates the data file and appends to it. In node Y, launch the read process to read the data file. Note that you need to time the read process to start AFTER the write process has created the skeleton data file. Otherwise, the reader will encounter errors such as data file not found. 4. -n option: number of planes to write/read. Default is same as the chunk size as specified by option -z. $ use_append_chunk -n 1000 # 1000 planes are writtern and read. 5. -s option: use SWMR file access mode or not. Default is yes. $ use_append_chunk -s 0 It opens the HDF5 data file without the SWMR access mode (0 means off). This likely will result in error. This option is provided for users to see the effect of the needed SWMR access mode for concurrent access. 3.4. Test Shell Script: The Use Case program is installed in the test/ directory and is compiled as part of the make process. A test script (test_usecases.sh) is installed in the same directory to test the use case programs. The test script is rather basic and is more for demonstrating how to use the program. %%%%Use Case 1.8%%%% 4. Use Case [1.8]: Appending a hyperslab of multiple chunks. 4.1. Program name: use_append_mchunks 4.2. Description: Appending a hyperslab that spans several chunks of a dataset with unlimited dimensions within a pre-created file and reading the new data back. It first creates one 3d dataset using chunked storage, each chunk is a (1, chunksize, chunksize) square. The dataset is (unlimited, 2*chunksize, 2*chunksize). Data type is 2 bytes integer. Therefore, each plane consists of 4 chunks. It starts out "empty", i.e., first dimension is 0. The writer then appends planes, each of (1,2*chunksize,2*chunksize) to the dataset. Fills each plan with plane number and then writes it at the nth plane. Increases the plane number and repeats till the end of dataset, when it reaches chunksize long. End product is a (2*chunksize)^3 cube. The reader is a separated process, running in parallel with the writer. It reads planes from the dataset. It expects the dataset is being changed (growing). It checks the unlimited dimension (dimension[0]). When it increases, it will read in the new planes, one by one, and verify the data correctness. (The nth plan should contain all "n".) When the unlimited dimension grows to the 2*chunksize (it becomes a cube), that is the expected end of data, the reader exits. 4.3. How to run the program: Simplest way is $ use_append_mchunks It creates a skeleton dataset (0,512,512) of shorts. Then fork off a process, which becomes the reader process to read planes from the dataset, while the original process continues as the writer process to append planes onto the dataset. Other possible options: 1. -z option: different chunksize. Default is 256. $ use_append_mchunks -z 512 It uses (1,512,512) chunks to produce a 1024^3 cube, about 2GB big. 2. -f filename: different dataset file name $ use_append_mchunks -f /gpfs/tmp/append_data.h5 The data file is /gpfs/tmp/append_data.h5. This allows two independent processes in separated compute nodes to access the datafile on the shared /gpfs file system. 3. -l option: launch only the reader or writer process. $ use_append_mchunks -f /gpfs/tmp/append_data.h5 -l w # in node X $ use_append_mchunks -f /gpfs/tmp/append_data.h5 -l r # in node Y In node X, launch the writer process, which creates the data file and appends to it. In node Y, launch the read process to read the data file. Note that you need to time the read process to start AFTER the write process has created the skeleton data file. Otherwise, the reader will encounter errors such as data file not found. 4. -n option: number of planes to write/read. Default is same as the chunk size as specified by option -z. $ use_append_mchunks -n 1000 # 1000 planes are writtern and read. 5. -s option: use SWMR file access mode or not. Default is yes. $ use_append_mchunks -s 0 It opens the HDF5 data file without the SWMR access mode (0 means off). This likely will result in error. This option is provided for users to see the effect of the needed SWMR access mode for concurrent access. 4.4. Test Shell Script: The Use Case program is installed in the test/ directory and is compiled as part of the make process. A test script (test_usecases.sh) is installed in the same directory to test the use case programs. The test script is rather basic and is more for demonstrating how to use the program. %%%%Use Case 1.9%%%% 5. Use Case [1.9]: Appending n-1 dimensional planes 5.1. Program names: use_append_chunk and use_append_mchunks 5.2. Description: Appending n-1 dimensional planes or regions to a chunked dataset where the data does not fill the chunk. This means the chunks have multiple planes and when a plane is written, only one of the planes in each chunk is written. This use case is achieved by extending the previous use cases 1.7 and 1.8 by defining the chunks to have more than 1 plane. The -y option is implemented for both use_append_chunk and use_append_mchunks. 5.3. How to run the program: Simplest way is $ use_append_mchunks -y 5 It creates a skeleton dataset (0,512,512), with storage chunks (5,512,512) of shorts. It then proceeds like use case 1.8 by forking off a reader process. The original process continues as the writer process that writes 1 plane at a time, updating parts of the chunks involved. The reader reads 1 plane at a time, retrieving data from partial chunks. The other possible options will work just like the two use cases. 5.4. Test Shell Script: Commands are added with -y options to demonstrate how the two use case programs can be used as for this use case.