diff options
author | Gerd Heber <gheber@hdfgroup.org> | 2021-04-26 19:07:29 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2021-04-26 19:07:29 (GMT) |
commit | 1d680fe04c545d601678d876ad22dea7bfc31edd (patch) | |
tree | f84a97c149df9ccf4409c5bfea04a7316c019e8a /doxygen/dox/MetadataCachingInHDF5.dox | |
parent | 12082e728de7481144db7ea3cca6f38acf0a7cac (diff) | |
download | hdf5-1d680fe04c545d601678d876ad22dea7bfc31edd.zip hdf5-1d680fe04c545d601678d876ad22dea7bfc31edd.tar.gz hdf5-1d680fe04c545d601678d876ad22dea7bfc31edd.tar.bz2 |
Merge doxygen2 into develop (#553)
* Fixed warnings and started H5Epublic.h.
* Include H5FD* headers to correctly resolve references.
* Doxygen2 (#330)
* H5Eauto_is_v2.
* Added a few more calls.
* Added a few more H5E calls.
* First cut of H5E v2.
* Added the deprecated v1 calls.
* Updated spacing.
* Once more.
* Taking some inspiration from Eigen3.
* Add doxygen for the assigned functions: H5Pregister1,H5Pinsert1,H5Pen… (#352)
* Add doxygen for the assigned functions: H5Pregister1,H5Pinsert1,H5Pencode1, H5Pget_filter_by_id1,H5Pget_version, H5Pset_file_space,H5Pget_file_space. Someone already adds H5Pget_filter1. Also fixs an extra parameter 'close' call back function for HPregister2.
* doxygen work. fixs format by using clang-format.
* doxgen work for H5Pregister1 etc. Addressed Barbara and Gerd's comments.
For Quincey's comments, since we are not supposed to change the source code.
I leave this to future improvements.
* added documentation for H5P APIs (#350)
* add documenation for H5Pget_buffer,H5Pget_data_transform,H5Pget_edc_check,H5Pget_hyper_vector_size,H5Pget_preserve,H5Pget_type_conv_cb,H5Pget_vlen_mem_manager,H5Pset_btree_ratios
* format corrections
* fixed grammer
* fixed herr_t
* Better name.
* A fresh look.
* add doxygen to H5Ppublic.h
* use attention instead of warning
* Add doxygen comments in H5Ppublic.h (#375)
* Add doxygen comments in H5Ppublic.h
* H5Pset_meta_block_size
* H5Pset_metadata_read_attempts
* H5Pset_multi_type
* H5Pset_object_flush_cb
* H5Pset_sieve_buf_size
* H5Pset_small_data_block_size
* H5Pset_all_coll_metadata_ops
* H5Pget_all_coll_metadata_ops
* Add DOXYGEN_EXAMPLES_DIR to src/CMakeLists.txt
* Fix clang-format errors
* Fix filenames in doxygen/examples
* add doxygen to H5Ppublic.h (#378)
* add doxygen to H5Ppublic.h
* use attention instead of warning
Co-authored-by: Kimmy Mu <kmu@hdfgroup.org>
* Revert "add doxygen to H5Ppublic.h (#378)"
This reverts commit 2ee1821b138a5c00b15ea57ce9e950367480f5f2.
* Updated Doxygen variables.
* I forgot to copy two images.
* Enable desktop search by default.
* Add my assigned Doxygen documentation.
* Remove whitespace at EOL. Appease clang-format.
* Addressed Chris' comments.
* Added an alias for asynchronous functions.
* One space is enough for all of us.
* Slightly restructured RM page.
* address some issues
* reformatting
* Style external links.
* reformatting
* reformatting
* Added "Metadata Caching in HDF5" as a technical note example.
* Revise this soon!
* Added specification examples.
* Fixed references.
* Added H5AC cache image stuff and file format study.
* Added older FMT versions. Where did 1.0 go?
* Updated C/C++ note and replaced ambiguous labels.
* Reformat source with clang v10.0.1.
* Added the VFL technical note.
* Added what I believe might be called version 1.0 of the format.
* Added the remaining specs.
* Added H5Z callback documentation and fixed a few mistakes.
* Added dox for deprecated H5G calls and fixed a few snippet blockIDs.
* clang-format happy?
* Ok?
* Bonus track: Deprecated H5D functions.
* Carry over the more detailed group description.
* Added documentation for the missing and deprecated H5R calls.
* Life is easier and less repetitive w/ snippets. Use them!
* Eliminate the snippet block ID artifacts in the HTML rendering.
* Fixed snippet HTML artifacts and added a few missing calls.
* Under 20 H5Ps to go!
* Almost complete!
* "This is a form of pedantry up with which I will not put." (Churchill)
* Let's not waste as much space on bulleted lists!
* First complete (?) draft of the Doxygen-based RM.
* Completeness check and minor fixes along the way.
* Pedantry.
* Adding missing H5FD calls checkpoint.
* Pedantry.
* More pedantry.
* Added H5Pset_fapl_log.
* First draft of H5ES.
* Fixed warnings.
* Prep. for map module.
* First cut of the map module.
* Pedantry.
* Possible H5F introduction.
* Fix the indentation.
* Pedantry.
* Ditto.
* Thanks to the reviewers for their comments.
* Added missing images.
* Line numbers are a distraction here.
* More examples, references, and clean-up. Don't repeat yourself!
* Clang pedantry.
* Ditto.
* More reviewer comments...
* Templatized references and cleaned up \todos.
* Committing clang-format changes
* Fixed MANIFEST.
* Addressed Quincey's comments. (OCPLs)
* Fixed a few more \todo items.
* Fixed more \todo items.
* Added attribute life cycle.
* Forgot the examples file.
* Committing clang-format changes
* Pedantry.
* Live and learn!
* Added a sample H5D life cycle.
* Committing clang-format changes
* Pedantry.
Co-authored-by: kyang2014 <kyang2014@users.noreply.github.com>
Co-authored-by: Scot Breitenfeld <brtnfld@hdfgroup.org>
Co-authored-by: Kimmy Mu <kmu@hdfgroup.org>
Co-authored-by: Christopher Hogan <ChristopherHogan@users.noreply.github.com>
Co-authored-by: jya-kmu <53388330+jya-kmu@users.noreply.github.com>
Co-authored-by: David Young <dyoung@hdfgroup.org>
Co-authored-by: Larry Knox <lrknox@hdfgroup.org>
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Diffstat (limited to 'doxygen/dox/MetadataCachingInHDF5.dox')
-rw-r--r-- | doxygen/dox/MetadataCachingInHDF5.dox | 1020 |
1 files changed, 1020 insertions, 0 deletions
diff --git a/doxygen/dox/MetadataCachingInHDF5.dox b/doxygen/dox/MetadataCachingInHDF5.dox new file mode 100644 index 0000000..9ba0fab --- /dev/null +++ b/doxygen/dox/MetadataCachingInHDF5.dox @@ -0,0 +1,1020 @@ +/** \page TNMDC Metadata Caching in HDF5 + +\todo Revise this! + +\section intro Introduction + +In the 1.6.4 release, we introduced a re-implementation of the metadata +cache. That release contained an incomplete version of the cache which could not +be controlled via the API. The version in the 1.8 release is more mature and +includes new API calls that allow the user program to configure the metadata +cache both on file open and at run time. + +From the user perspective, the most striking effect of the new cache should be a +large reduction in the cache memory requirements when working with complex HDF5 +files. + +Those working with such files may also notice a reduction in file close time. + +Those working with HDF5 files with a simple structure shouldn't notice any +particular changes in most cases. In rare cases, there may be a significant +improvement in performance. + +The remainder of this document contains an architectural overview of the old and +new metadata caches, a discussion of algorithms used to automatically adjust +cache size to circumstances, and a high-level discussion of the cache +configuration controls. It can be safely skipped by anyone who works only with +HDF5 files with relatively simple structure (i.e. no huge groups, no datasets +with large numbers of chunks, and no objects with large numbers of attributes.) + +On the other hand, it is mandatory reading if you want to use something other +than the default metadata cache configuration. The documentation on the metadata +cache-related API calls will not make much sense without this background. + +\section oldnew Old and New Metadata Cache + +\subsection old The Old Metadata Cache + +The old metadata cache indexed the cache with a hash table with no provision for +collisions. Instead, collisions were handled by evicting the existing entry to +make room for the new entry. Aside from flushes, there was no other mechanism +for evicting entries, so the replacement policy could best be described as +"Evict on Collision". + +As a result, if two frequently used entries hashed to the same location, they +would evict each other regularly. To decrease the likelihood of this situation, +the default hash table size was set fairly large -- slightly more than +10,000. This worked well, but since the size of metadata entries is not bounded, +and since entries were only evicted on collision, the large hash table size +allowed the cache size to explode when working with HDF5 files with complex +structure. + +The "Evict on Collision" replacement policy also caused problems with the +parallel version of the HDF5 library, as a collision with a dirty entry could +force a write in response to a metadata read. Since all metadata writes must be +collective in the parallel case while reads need not be, this could cause the +library to hang if only some of the processes participated in a metadata read +that forced a write. Prior to the implementation of the new metadata cache, we +dealt with this issue by maintaining a shadow cache for dirty entries evicted by +a read. + +\subsection new The New Metadata Cache + +The new metadata cache was designed to address the above issues. After +implementation, it became evident that the working set size for HDF5 files +varies widely depending on both structure and access patterns. Thus it was +necessary to add support for cache size adjustment under either automatic or +user program control (see section 2.3 for details). + +When the cache is operating under direct user program control, it is also +possible to temporarily disable evictions from the metadata cache so as to +maximize raw data throughput at the expense of allowing the cache to grow +without bound until evictions are enabled again. + +Structurally, the new metadata cache can be thought of as a heavily modified +version of the UNIX buffer cache as described in chapter three of M. J. Bach's +"The Design of the UNIX Operating System" In essence, the UNIX buffer cache uses +a hash table with chaining to index a pool of fixed-size buffers. It uses the +LRU replacement policy to select candidates for eviction. + +Since HDF5 metadata entries are not of fixed size and may grow arbitrarily +large, the size of the new metadata cache cannot be controlled by setting a +maximum number of entries. Instead, the new cache keeps a running sum of the +sizes of all entries and attempts to evict entries as necessary to stay within a +user-specified maximum size. (Note the use of the word "attempts" here -- as +will be seen, it is possible for the cache to exceed its currently specified +maximum size.) At present, the LRU replacement policy is the only option for +selecting candidates for eviction. + +Per the standard Unix buffer cache, dirty entries are given two passes through +the LRU list before being evicted. The first time they reach the end of the LRU +list, they are flushed, marked as clean, and moved to the head of the LRU +list. When a clean entry reaches the end of the LRU list, it is simply evicted +if space is needed. + +The cache cannot evict entries that are locked, and thus it will temporarily +grow beyond its maximum size if there are insufficient unlocked entries +available for eviction. + +In the parallel version of the library, only the cache running under process 0 +of the file communicator is allowed to write metadata to file. All the other +caches must retain dirty metadata until the process 0 cache tells them that the +metadata is clean. + +Since all operations modifying metadata must be collective, all caches see the +same stream of dirty metadata. This fact is used to allow them to synchronize +every n bytes of dirty metadata, where n is a user-configurable value that +defaults to 256 KB. + +To avoid sending the other caches messages from the future, process 0 must not +write any dirty entries until it reaches a synchronization point. When it +reaches a synchronization point, it writes entries as needed, and then +broadcasts the list of flushed entries to the other caches. The caches on the +other processes use this list to mark entries clean before they leave the +synchronization point, allowing them to evict those entries as needed. + +The caches will also synchronize on a user-initiated flush. + +To minimize overhead when running in parallel, the cache maintains a "clean" LRU +list in addition to the regular LRU list. This list contains only clean entries +and is used as a source of candidates for eviction when flushing dirty entries +is not allowed. + +Since flushing entries is forbidden most of the time when running in parallel, +the caches can be forced to exceed their maximum sizes if they run out of clean +entries to evict. + +To decrease the likelihood of this event, the new cache allows the user to +specify a minimum clean size -- which is a minimum total size of all the entries +on the clean LRU plus all unused space in the cache. + +While the clean LRU list is only maintained in the parallel version of the HDF5 +library, the notion of a minimum clean size still applies in the serial +case. Here it is used to force a mix of clean and dirty entries in the cache +even in the write-only case. + +This, in turn, reduces the number of redundant flushes by avoiding the case in +which the cache fills with dirty metadata and all entries must be flushed before +a clean entry can be evicted to make room for a new entry. + +Observe that in both the serial and parallel cases, the maintenance of a minimum +clean size modifies the replacement policy, as dirty entries may be flushed +earlier than would otherwise be the case so as to maintain the desired amount of +clean and/or empty space in the cache. + +While the new metadata cache only supports the LRU replacement policy at +present, that may change. Support for multiple replacement policies was very +much in mind when the cache was designed, as was the ability to switch +replacement policies at run time. The situation has been complicated by the +later addition of the adaptive cache resizing requirement, as two of the +resizing algorithms piggyback on the LRU list. However, if there is a need for +additional replacement policies, it shouldn't be too hard to implement them. + +\section adapt Adaptive Cache Resizing in the New Metadata Cache + +As mentioned earlier, the metadata working set size for an HDF5 file varies +wildly depending on the structure of the file and the access pattern. For +example, a 2MB limit on metadata cache size is excessive for an H5repack of +almost all HDF5 files we have tested. However, I have a file submitted by one of +our users that will run a 13% hit rate with this cache size and will lock up one +of our Linux boxes using the old metadata cache. Increase the new metadata cache +size to 4 MB, and the hit rate exceeds 99%. + +In this case, the main culprit is a root group with more than 20,000 entries in +it. As a result, the root group heap exceeds 1 MB, which tends to crowd out the +rest of the metadata in a 2 MB cache + +This case and a number of synthetic tests convinced us that we needed to modify +the new metadata cache to expand and contract according to need within +user-specified bounds. + +I was unable to find any previous work on this problem, so I invented solutions +as I went along. If you are aware of prior work, please send me references. The +closest I was able to come was a group of embedded CPU designers who were +turning off sections of their cache to conserve power. + +\subsection increasing Increasing the Cache Size + +In the context of the HDF5 library, the problem of increasing the cache size as +necessary to contain the current working set turns out to involve two rather +different issues. + +The first of these, which was recognized immediately, is the problem of +recognizing long term changes in working set size, and increasing the cache size +accordingly, while not reacting to transients. + +The second, which I recognized the hard way, is to adjust the cache size for +sudden, dramatic increases in working set size caused by requests for large +pieces of metadata which may be larger than the current metadata cache size. + +The algorithms for handling these situations are discussed below. These problems +are largely orthogonal to each other, so both algorithms may be used +simultaneously. + +\subsubsection hrtcsi Hit Rate Threshold Cache Size Increment + +Perhaps the most obvious heuristic for identifying cases in which the cache is +too small involves monitoring the hit rate. If the hit rate is low for a while, +and the cache is at its current maximum size, the current maximum cache size is +probably too small. + +The hit rate threshold algorithm for increasing cache size applies this +intuition directly. + +Hit rate statistics are collected over a user-specified number of cache +accesses. This period is known as an epoch. + +At the end of each epoch, the hit rate is computed, and the counters are +reset. If the hit rate is below a user-specified threshold and the cache is at +its current maximum size, the maximum size of the cache is increased by a +user-specified multiple. If required, the new cache maximum size is clipped to +stay within the user-specified upper bound on the maximum cache size, and +optionally, within a user-specified maximum increment. + +My tests indicate that this algorithm works well in most cases. However, in a +synthetic test in which hit rate increased slowly with cache size, and load +remained steady for many epochs, I observed a case in which cache size increased +until the hit rate just exceeded the specified minimum and then stalled. This is +a problem, as to avoid volatility, it is necessary to set the minimum hit rate +threshold well below the desired hit rate. Thus we may find ourselves with a +cache running with a 91% hit rate when we really want it to increase its size +until the hit rate is about 99%. + +If this case occurs frequently in actual use, I will have to come up with an +improved algorithm. Please let me know if you see this behavior. However, I had +to work rather hard to create it in my synthetic tests, so I would expect it to +be uncommon. + +\subsubsection fcsi Flash Cache Size Increment + +A fundamental problem with the above algorithm is that contains the hidden +assumption that cache entries are relatively small in comparison to the cache +itself. While I knew this assumption was not generally true when I developed the +algorithm, I thought that cases, where it failed, would be so rare as to not be +worth considering, as even if they did occur, the above algorithm would rectify +the situation within an epoch or two. + +While it is true that such occurrences are rare, and it is true that the hit +rate threshold cache size increment algorithm will rectify the situation +eventually, the performance degradation experienced by users while waiting for +the epoch to end was so extreme that some way of accelerating response to such +situations was essential. + +To understand the problem, consider the following use case: + +Suppose we create a group, and then repeatedly create a new data set in the +group, write some data to it and then close it. + +In some versions of the HDF5 file format, the names of the datasets will be +stored in a local heap associated with the group, and the space for that heap +will be allocated in a single, contiguous chunk. When this local heap is full, +we allocate a new chunk twice the size of the old, copy the data from the old +local heap into the new, and discard the old local heap. + +By default, the minimum metadata cache size is set to 2 MB. Thus in this use +case, our hit rate will be fine as long as the local heap is no larger than a +little less than 2 MB, as the group related metadata is accessed frequently and +never evicted, and the data set related metadata is never accessed once the data +set is closed, and thus is evicted smoothly to make room for new data sets. + +All this changes abruptly when the local heap finally doubles in size to a value +above the slightly less than 2 MB limit. All of a sudden, the local heap is the +size of the metadata cache, and the cache must constantly swap it in to access +it, and then swap it out to make room for other metadata. + +The hit rate threshold-based algorithm for increasing the cache size will fix +this problem eventually, but performance will be very bad until it does, as the +metadata cache will largely ineffective until its size is increased. + +An obvious heuristic for addressing this "big rock in a small pond" issue is to +watch for large "incoming rocks", and increase the size of the "pond" if the +rock is so big that it will force most of the "water" out of the "pond". + +The add space flash cache size increment algorithm applies this intuition +directly: + +Let x be either the size of a newly inserted entry, a newly loaded entry, or the +number of bytes by which the size of an existing entry has been increased +(i.e. the size of the "rock"). + +If x is greater than some user-specified fraction of the current maximum cache +size, increase the current maximum cache size by x times some user-specified +multiple, less any free space that was in the cache, to begin with. Further, to +avoid confusing the other cache size increment/decrement code, start a new +epoch. + +At present, this algorithm pays no attention to any user-specified limit on the +maximum size of any single cache size increase, but it DOES stay within the +user-specified upper bound on the maximum cache size. + +While it should be easy to see how this algorithm could be fooled into +inactivity by a large number of entries that were not quite large enough to +cross the threshold, in practice it seems to work reasonably well. + +Needless to say, I will revisit the issue should this cease to be the case. + +\subsection decreasing Decreasing the Cache Size + +Identifying cases in which the maximum cache size is larger than necessary +turned out to be more difficult. + +\subsubsection hrtcsr Hit Rate Threshold Cache Size Reduction + +One obvious heuristic is to monitor the hit rate and guess that we can safely +decrease cache size if the hit rate exceeds some user-supplied threshold (say +.99995). The hit rate threshold size decrement algorithm implemented in the new +metadata cache implements this intuition as follows: + +At the end of each epoch (this is the same epoch that is used in the cache size +increment algorithm), the hit rate is compared with the user-specified +threshold. If the hit rate exceeds that threshold, the current maximum cache +size is decreased by a user-specified factor. If required, the size of the +reduction is clipped to stay within a user-specified lower bound on the maximum +cache size, and optionally, within a user-specified maximum decrement. + +In my synthetic tests, this algorithm works poorly. Even with a very high +threshold and a small maximum reduction, it results in cache size +oscillations. The size increment code typically increments the maximum cache +size above the working set size. This results in a high hit rate, which causes +the threshold size decrement code to reduce the maximum cache size below the +working set size, which causes the hit rate to crash causing the cycle to +repeat. The resulting average hit rate is poor. + +It remains to be seen if this behavior will be seen in the field. The algorithm +is available for use, but it wouldn't be my first choice. If you use it, please +report back. + +\subsubsection acsr Ageout Cache Size Reduction + +Another heuristic for dealing with oversized cache conditions is to look for +entries that haven't been accessed for a long time, evict them, and reduce the +cache size accordingly. + +The age out cache size reduction applies this intuition as follows: At the end +of each epoch (again the same epoch as used in the cache size increment +algorithm), all entries that haven't been accessed for a user-configurable +number of epochs (1 - 10 at present) are evicted. The maximum cache size is then +reduced to equal the sum of the sizes of the remaining entries. The size of the +reduction is clipped to stay within a user-specified lower bound on maximum +cache size, and optionally, within a user-specified maximum decrement. + +In addition, the user may specify a minimum fraction of the cache which must be +empty before the cache size is reduced. Thus if an empty reserve of 0.1 was +specified on a 10 MB cache, there would be no cache size reduction unless the +eviction of aged out entries resulted in more than 1 MB of empty space. Further, +even after the reduction, the cache would be one-tenth empty. + +In my synthetic tests, the age out algorithm works rather well, although it is +somewhat sensitive to the epoch length and age out period selection. + +\subsubsection awhrtcsr Ageout With Hit Rate Threshold Cache Size Reduction + +To address these issues, I combined the hit rate threshold and age out +heuristics. + +Age out with threshold works just like age out, except that the algorithm is not +run unless the hit rate exceeded a user-specified threshold in the previous +epoch. + +In my synthetic tests, age out with threshold seems to work nicely, with no +observed oscillation. Thus I have selected it as the default cache size +reduction algorithm. + +For those interested in such things, the age out algorithm is implemented by +inserting a marker entry at the head of the LRU list at the beginning of each +epoch. Entries that haven't been accessed for at least n epochs are simply +entries that appear in the LRU list after the n-th marker at the end of an +epoch. + +\section configuring Configuring the New Metadata Cache + +Due to a lack of resources, the design work on the automatic cache size +adjustment algorithms was done hastily, using primarily synthetic tests. I don't +think I spent more than a couple weeks writing and running performance tests -- +most time went into coding and functional testing. + +As a result, while I think the algorithms provided for adaptive cache resizing +will work well in actual use, I don't really know (although preliminary results +from the field are promising). Fortunately, the issue shouldn't arise for the +vast majority of HDF5 users, and those for whom it may arise should be savvy +enough to recognize problems and deal with them. + +For this latter class of users, I have implemented a number of new API calls +allowing the user to select and configure the cache resize algorithms, or to +turn them off and control cache size directly from the user program. There are +also API calls that allow the user program to monitor hit rate and cache size. + +From the user perspective, all the cache configuration data for a given file is +contained in an instance of the \ref H5AC_cache_config_t structure -- the definition +of which is given below: + +\snippet H5ACpublic.h H5AC_cache_config_t_snip + +This structure is defined in \c H5ACpublic.h. Each field is discussed below and in +the associated header comment. + +The C API allows you to get and set this structure directly. Unfortunately, the +Fortran API has to do this with individual parameters for each of the fields +(with the exception of version). + +While the API calls are discussed individually in the reference manual, the +following high-level discussion of what fields to change for different purposes +should be useful. + +\subsection gconfig General Configuration + +The \c version field is intended to allow \THG to change the \c +H5AC_cache_config_t structure without breaking old code. For now, this field +should always be set to \c H5AC__CURR_CACHE_CONFIG_VERSION, even when you are +getting the current configuration data from the cache. The library needs the +version number to know where fields are located with reference to the supplied +base address. + +The \ref H5AC_cache_config_t.rpt_fcn_enabled "rpt_fcn_enabled" field is a +boolean flag that allows you to turn on and off the resize reporting function +that reports the activities of the adaptive cache resize code at the end of each +epoch -- assuming that it is enabled. + +The report function is unsupported, so you are on your own if you use it. Since +it dumps status data to stdout, you should not attempt to use it with Windows +unless you modify the source. You may find it useful if you want to experiment +with different adaptive resize configurations. It is also a convenient way of +diagnosing poor cache configuration. Finally, if you do lots of runs with +identical behavior, you can use it to determine the metadata cache size needed +in each phase of your program so you can set the required cache sizes manually. + +The trace file fields are also unsupported. They allow one to open and close a +trace file in which all calls to the metadata cache are logged in a +user-specified file for later analysis. The feature is intended primarily for +THG use in debugging or optimizing the metadata cache in cases where users in +the field observe obscure failures or poor performance that we cannot re-create +in the lab. The trace file will allow us to re-create the exact sequence of +cache operations that are triggering the problem. + +At present we do not have a playback utility for trace files, although I imagine +that we will write one quickly when and if we need it. + +To enable the trace file, you load the full path of the desired trace file into +\ref H5AC_cache_config_t.trace_file_name "trace_file_name", and set \ref +H5AC_cache_config_t.open_trace_file "open_trace_file" to \c TRUE. In the +parallel case, an ASCII representation of the rank of each process is appended +to the supplied trace file name to create a unique trace file name for that +process. + +To close an open trace file, set \ref H5AC_cache_config_t.close_trace_file +"close_trace_file" to \c TRUE. + +It must be emphasized that you are on your own if you play with the trace file +feature absent a request from \THG. Needless to say, the trace file feature is +disabled by default. If you enable it, you will take a large performance hit and +generate huge trace files. + +The \ref H5AC_cache_config_t.evictions_enabled "evictions_enabled" field is a +boolean flag allowing the user to disable the eviction of entries from the +metadata cache. Under normal operation conditions, this field will always be set +to \c TRUE. + +In rare circumstances, the raw data throughput requirements may be so high that +the user wishes to postpone metadata writes so as to reserve I/O throughput for +raw data. The \ref H5AC_cache_config_t.evictions_enabled "evictions_enabled" +field exists to allow this -- although the user is to be warned that the +metadata cache will grow without bound while evictions are disabled. Thus +evictions should be re-enabled as soon as possible, and it may be wise to +monitor cache size and statistics (to see how to enable statistics, see the +debugging facilities section below). + +Evictions may only be disabled when the automatic cache resize code is disabled +as well. Thus to disable evictions, not only must the user set the \ref +H5AC_cache_config_t.evictions_enabled "evictions_enabled" field to \c FALSE, but +he must also set \ref H5AC_cache_config_t.incr_mode "incr_mode" to +#H5C_incr__off, set \ref H5AC_cache_config_t.flash_incr_mode "flash_incr_mode" +to #H5C_flash_incr__off, and set \ref H5AC_cache_config_t.decr_mode "decr_mode" +to #H5C_decr__off. + +To re-enable evictions, just set \ref H5AC_cache_config_t.evictions_enabled +"evictions_enabled" back to \c TRUE. + +Before passing on to other subjects, it is worth re-iterating that disabling +evictions is an extreme step. Before attempting it, you might consider setting a +large cache size manually, and flushing the cache just before high raw data +throughput is required. This may yield the desired results without the risks +inherent in disabling evictions. + +The \ref H5AC_cache_config_t.set_initial_size "set_initial_size" and \ref +H5AC_cache_config_t.initial_size "initial_size" fields allow you to specify an +initial maximum cache size. If \ref H5AC_cache_config_t.set_initial_size +"set_initial_size" is \c TRUE, \ref H5AC_cache_config_t.initial_size +"initial_size" must lie in the interval [\ref H5AC_cache_config_t.min_size +"min_size", \ref H5AC_cache_config_t.max_size "max_size"] (see below for a +discussion of the \ref H5AC_cache_config_t.min_size "min_size" and \ref +H5AC_cache_config_t.max_size "max_size" fields). + +If you disable the adaptive cache resizing code (done by setting \ref +H5AC_cache_config_t.incr_mode "incr_mode" to #H5C_incr__off, \ref +H5AC_cache_config_t.flash_incr_mode "flash_incr_mode" to #H5C_flash_incr__off, +and \ref H5AC_cache_config_t.decr_mode "decr_mode" to #H5C_decr__off), you can +use these fields to control maximum cache size manually, as the maximum cache +size will remain at the initial size. + +Note, that the maximum cache size is only modified when \ref +H5AC_cache_config_t.set_initial_size "set_initial_size" is \c TRUE. This allows +the use of configurations specified at compile time to change resize +configuration without altering the current maximum size of the cache. Without +this feature, an additional call would be required to get the current maximum +cache size so as to set the \ref H5AC_cache_config_t.initial_size "initial_size" +to the current maximum cache size, and thereby avoid changing it. + +The \ref H5AC_cache_config_t.min_clean_fraction "min_clean_fraction" sets the +current minimum clean size as a fraction of the current max cache size. While +this field was originally used only in the parallel version of the library, it +now applies to the serial version as well. Its value must lie in the range +\Code{[0.0, 1.0]}. 0.01 is reasonable in the serial case, and 0.3 in the +parallel. + +A potential interaction, discovered at release 1.8.3, between the enforcement of +the \ref H5AC_cache_config_t.min_clean_fraction "min_clean_fraction" and the +adaptive cache resize code can severely degrade performance. While this +interaction is easily dealt with in the serial case by setting \ref +H5AC_cache_config_t.min_clean_fraction "min_clean_fraction" to 0.01, the problem +is more difficult in the parallel case. Please see the Interactions section +below for further details. + +The \ref H5AC_cache_config_t.max_size "max_size" and \ref +H5AC_cache_config_t.min_size "min_size" fields specify the range of maximum +sizes that may be set for the cache by the automatic resize code. \ref +H5AC_cache_config_t.min_size "min_size" must be less than or equal to +\ref H5AC_cache_config_t.max_size "max_size", and both must lie in the range +\Code{[H5C__MIN_MAX_CACHE_SIZE, H5C__MAX_MAX_CACHE_SIZE]} -- currently [1 KB, +128 MB]. If you routinely run a cache size in the top half of this range, you +should increase the hash table size. To do this, modify the \c +H5C__HASH_TABLE_LEN \Code{\#define} in \c H5Cpkg.h and re-compile. At present, +\c H5C__HASH_TABLE_LEN must be a power of two. + +The \c epoch_length is the number of cache accesses between runs of the adaptive +cache size control algorithms. It is ignored if these algorithms are turned +off. It must lie in the range \Code{[H5C__MIN_AR_EPOCH_LENGTH, +H5C__MAX_AR_EPOCH_LENGTH]} -- currently [100, 1000000]. The above constants are +defined in \c H5Cprivate.h. 50000 is a reasonable value. + +\subsection increment Increment Configuration + +The \ref H5AC_cache_config_t.incr_mode "incr_mode" field specifies the cache +size increment algorithm used. Its value must be a member of the \ref +H5C_cache_incr_mode enum type -- currently either #H5C_incr__off or +#H5C_incr__threshold (note the double underscores after \c "incr"). This type is +defined in H5Cpublic.h. + +If \ref H5AC_cache_config_t.incr_mode "incr_mode" is set to #H5C_incr__off, +regular automatic cache size increases are disabled, and the \ref +H5AC_cache_config_t.lower_hr_threshold "lower_hr_threshold", \ref +H5AC_cache_config_t.increment "increment", \ref +H5AC_cache_config_t.apply_max_increment "apply_max_increment", and \ref +H5AC_cache_config_t.max_increment "max_increment", fields are ignored. + +The \ref H5AC_cache_config_t.flash_incr_mode "flash_incr_mode" field specifies +the flash cache size increment algorithm used. Its value must be a member of the +\ref H5C_cache_flash_incr_mode enum type -- currently either +#H5C_flash_incr__off or #H5C_flash_incr__add_space (note the double underscores +after \c "incr"). This type is defined in H5Cpublic.h. + +If \ref H5AC_cache_config_t.flash_incr_mode "flash_incr_mode" is set to +#H5C_flash_incr__off, flash cache size increases are disabled, and the \ref +H5AC_cache_config_t.flash_multiple "flash_multiple", and \ref +H5AC_cache_config_t.flash_threshold "flash_threshold", fields are ignored. + +\subsubsection hrtcsic Hit Rate Threshold Cache Size Increase Configuration + +If \ref H5AC_cache_config_t.incr_mode "incr_mode" is #H5C_incr__threshold, the +cache size is increased via the hit rate threshold algorithm. The remaining +fields in the section are then used as follows: + +\ref H5AC_cache_config_t.lower_hr_threshold "lower_hr_threshold" is the +threshold below which the hit rate must fall to trigger an increase. The value +must lie in the range \Code{[0.0 - 1.0]}. In my tests, a relatively high value +seems to work best -- 0.9 for example. + +\ref H5AC_cache_config_t.increment "increment" is the factor by which the old +maximum cache size is multiplied to obtain an initial new maximum cache size +when an increment is needed. The actual change in size may be smaller as +required by \ref H5AC_cache_config_t.max_size "max_size" (above) and \c +max_increment (discussed below). increment must be greater than or equal to +1.0. If you set it to 1.0, you will effectively turn off the increment code. 2.0 +is a reasonable value. + +\ref H5AC_cache_config_t.apply_max_increment "apply_max_increment" and \ref +H5AC_cache_config_t.max_increment "max_increment" allow the user to specify a +maximum increment. If \ref H5AC_cache_config_t.apply_max_increment +"apply_max_increment" is \c TRUE, the cache size will never be increased by more +than the number of bytes specified in \ref H5AC_cache_config_t.max_increment +"max_increment" in any single increase. + +\subsubsection fcsic Flash Cache Size Increase Configuration + +If \ref H5AC_cache_config_t.flash_incr_mode "flash_incr_mode" is set to +#H5C_flash_incr__add_space, flash cache size increases are enabled. The size of +the cache will be increased under the following circumstances: + +Let \c t be the current maximum cache size times the value of the \ref +H5AC_cache_config_t.flash_threshold "flash_threshold" field. + +Let \c x be either the size of the newly inserted entry, the size of the newly +loaded entry, or the number of bytes added to the size of the entry under +consideration for triggering a flash cache size increase. + +If \Code{t < x}, the basic condition for a flash cache size increase is met, and +we proceed as follows: + +Let \c space_needed equal \c x less the amount of free space in the cache. + +Further, let \ref H5AC_cache_config_t.increment "increment" equal \c +space_needed times the value of the \ref H5AC_cache_config_t.flash_multiple +"flash_multiple" field. If \ref H5AC_cache_config_t.increment "increment" plus +the current cache size is greater than \ref H5AC_cache_config_t.max_size +"max_size" (discussed above), reduce \ref H5AC_cache_config_t.increment +"increment" so that \ref H5AC_cache_config_t.increment "increment" plus the +current cache size is equal to \ref H5AC_cache_config_t.max_size "max_size". + +If the increment is greater than zero, increase the current cache size by \ref +H5AC_cache_config_t.increment "increment". To avoid confusing the other cache +size increment or decrement algorithms, start a new epoch. Note, however, that +we do not cycle the epoch markers if some variant of the age out algorithm is in +use. + +The use of the \ref H5AC_cache_config_t.flash_threshold "flash_threshold" field +is discussed above. It must be a floating-point value in the range of +\Code{[0.1, 1.0]}. 0.25 is a reasonable value. + +The use of the \ref H5AC_cache_config_t.flash_multiple "flash_multiple" field is +also discussed above. It must be a floating-point value in the range of +\Code{[0.1, 10.0]}. 1.4 is a reasonable value. + +\subsection decrement Decrement Configuration + +The \ref H5AC_cache_config_t.decr_mode "decr_mode" field specifies the cache +size decrement algorithm used. Its value must be a member of the \ref +H5C_cache_decr_mode enum type -- currently either #H5C_decr__off, +#H5C_decr__threshold, #H5C_decr__age_out, or #H5C_decr__age_out_with_threshold +(note the double underscores after \c "decr"). This type is defined in +H5Cpublic.h. + +If \ref H5AC_cache_config_t.decr_mode "decr_mode" is set to #H5C_decr__off, +automatic cache size decreases are disabled, and the remaining fields in the +cache size decrease control section are ignored. + +\subsubsection hrtcsdc Hit Rate Threshold Cache Size Decrease Configuration + +If \ref H5AC_cache_config_t.decr_mode "decr_mode" is #H5C_decr__threshold, the +cache size is decreased by the threshold algorithm, and the remaining fields of +the decrement section are used as follows: + +\ref H5AC_cache_config_t.upper_hr_threshold "upper_hr_threshold" is the +threshold above which the hit rate must rise to trigger cache size reduction. It +must be in the range \Code{[0.0, 1.0]}. In my synthetic tests, very high values +like .9995 or .99995 seemed to work best. + +\ref H5AC_cache_config_t.decrement "decrement" is the factor by which the +current maximum cache size is multiplied to obtain a tentative new maximum cache +size. It must lie in the range \Code{[0.0, 1.0]}. Relatively large values like +.9 seem to work best in my synthetic tests. Note that the actual size reduction +may be smaller as required by \ref H5AC_cache_config_t.min_size "min_size" and +\ref H5AC_cache_config_t.max_decrement "max_decrement" (discussed below). \ref +H5AC_cache_config_t.apply_max_decrement "apply_max_decrement" and \ref +H5AC_cache_config_t.max_decrement "max_decrement" allow the user to specify a +maximum decrement. If \ref H5AC_cache_config_t.apply_max_decrement +"apply_max_decrement" is \c TRUE, the cache size will never be reduced by more +than \ref H5AC_cache_config_t.max_decrement "max_decrement" bytes in any single +reduction. + +With the hit rate threshold cache size decrement algorithm, the remaining fields +in the section are ignored. + +\subsubsection acsr Ageout Cache Size Reduction + +If \ref H5AC_cache_config_t.decr_mode "decr_mode" is #H5C_decr__age_out the +cache size is decreased by the ageout algorithm, and the remaining fields of the +decrement section are used as follows: + +\ref H5AC_cache_config_t.epochs_before_eviction "epochs_before_eviction" is the +number of epochs an entry must reside unaccessed in the cache before it is +evicted. This value must lie in the range \Code{[1, H5C__MAX_EPOCH_MARKERS]}. \c +H5C__MAX_EPOCH_MARKERS is defined in H5Cprivate.h, and is currently set to 10. + +\ref H5AC_cache_config_t.apply_max_decrement "apply_max_decrement" and \ref +H5AC_cache_config_t.max_decrement "max_decrement" are used as in section +2.4.3.1. + +\ref H5AC_cache_config_t.apply_empty_reserve "apply_emty_reserve" and \ref +H5AC_cache_config_t.empty_reserve "empty_reserve" allow the user to specify a +minimum empty reserve as discussed in section 2.3.2.2. An empty reserve of 0.05 +or 0.1 seems to work well. + +The \ref H5AC_cache_config_t.decrement "decrement" and \ref +H5AC_cache_config_t.upper_hr_threshold "upper_hr_threshold" fields are ignored +in this case. + +\subsubsection awhrtcsr Ageout With Hit Rate Threshold Cache Size Reduction + +If \ref H5AC_cache_config_t.decr_mode "decr_mode" is +#H5C_decr__age_out_with_threshold, the cache size is decreased by the ageout +with hit rate threshold algorithm, and the fields of decrement section are used +as per the Ageout algorithm (see 5.3.2) with the exception of \ref +H5AC_cache_config_t.upper_hr_threshold "upper_hr_threshold". + +Here, \ref H5AC_cache_config_t.upper_hr_threshold "upper_hr_threshold" is the +threshold above which the hit rate must rise to trigger cache size reduction. It +must be in the range \Code{[0.0, 1.0]}. In my synthetic tests, high values like +.999 seemed to work well. + +\subsection parallel Parallel Configuration + +This section is a catch-all for parallel specific configuration data. At +present, it has only one field -- +\ref H5AC_cache_config_t.dirty_bytes_threshold "dirty_bytes_threshold". + +In PHDF5, all operations that modify metadata must be executed collectively. We +used to think that this was enough to ensure consistency across the metadata +caches, but since we allow processes to read metadata individually, the order of +dirty entries in the LRU list can vary across processes. This, in turn, can +change the order in which dirty metadata cache entries reach the bottom of the +LRU and are flushed to disk -- opening the door to messages from the past and +messages from the future bugs. + +To prevent this, only the metadata cache on process 0 of the file communicator +is allowed to write to file, and then only after entering a sync point with the +other caches. After it writes entries to file, it sends the base addresses of +the now clean entries to the other caches, so they can mark these entries clean +as well, and then leaves the sync point. The other caches mark the specified +entries as clean before they leave the synch point as well. (Observe, that since +all caches see the same stream of dirty metadata, they will all have the same +set of dirty entries upon sync point entry and exit.) + +The different caches know when to synchronize by counting the number of bytes of +dirty metadata created by the collective operations modifying metadata. Whenever +this count exceeds the value specified in the \ref +H5AC_cache_config_t.dirty_bytes_threshold "dirty_bytes_threshold", they all +enter the sync point, and process 0 flushes down to its minimum clean size and +sends the list of newly cleaned entries to the other caches. + +Needless to say, the value of the \ref H5AC_cache_config_t.dirty_bytes_threshold +"dirty_bytes_threshold" field must be consistent across all the caches operating +on a given file. + +All dirty metadata can also by flushed under programmatic control via the +H5Fflush() call. This call must be collective and will reset the dirty data +counts on each metadata cache. + +Absent calls to H5Fflush(), dirty metadata will only be flushed when the \ref +H5AC_cache_config_t.dirty_bytes_threshold "dirty_bytes_threshold" is exceeded, +and then only down to the H5AC_cache_config_t.min_clean_fraction +"min_clean_fraction". Thus, if a program does all its metadata modifications in +one phase, and then doesn't modify metadata thereafter, a residue of dirty +metadata will be frozen in the metadata caches for the remainder of the +computation -- effectively reducing the sizes of the caches. + +In the default configuration, the caches will eventually resize themselves to +maintain an acceptable hit rate. However, this will take time, and it will +increase the application's footprint in memory. + +If your application behaves in this manner, you can avoid this by a collective +call to H5Fflush() immediately after the metadata modification phase. + +\subsection interactions Interactions + +Evictions may not be disabled unless the automatic cache resize code is disabled +as well (by setting \ref H5AC_cache_config_t.decr_mode "decr_mode" to +#H5C_decr__off, \c flash_decr_mode to #H5C_flash_incr__add_space, and \ref +H5AC_cache_config_t.incr_mode "incr_mode" to #H5C_incr__off) -- thus placing the +cache size under the direct control of the user program. + +There is no logical necessity for this restriction. It is imposed because it +simplifies testing greatly and because I can't see any reason why one would want +to disable evictions while the automatic cache size adjustment code was +enabled. This restriction can be relaxed if anyone can come up with a good +reason to do so. + +At present, there are two interactions between the increment and decrement +sections of the configuration. + +If \ref H5AC_cache_config_t.incr_mode "incr_mode" is #H5C_incr__threshold, and +\ref H5AC_cache_config_t.decr_mode "decr_mode" is either #H5C_decr__threshold or +#H5C_decr__age_out_with_threshold, then \ref +H5AC_cache_config_t.lower_hr_threshold "lower_hr_threshold" must be strictly +less than \ref H5AC_cache_config_t.upper_hr_threshold "upper_hr_threshold". + +Also, if the flash cache size increment code is enabled and is triggered, it +will restart the current epoch without calling any other cache size increment or +decrement code. + +In both the serial and parallel cases, there is the potential for an interaction +between the \ref H5AC_cache_config_t.min_clean_fraction "min_clean_fraction" and +the cache size increment code that can severely degrade +performance. Specifically, if the \ref H5AC_cache_config_t.min_clean_fraction +"min_clean_fraction" is large enough, it is possible that keeping the specified +fraction of the cache clean may generate enough flushes to seriously degrade +performance even though the hit rate is excellent. + +In the serial case, this is easily dealt with by selecting a very small \ref +H5AC_cache_config_t.min_clean_fraction "min_clean_fraction" -- 0.01 for example +-- as this still avoids the "metadata blizzard" phenomenon that appears when the +cache fills with dirty metadata and must then flush all of it before evicting an +entry to make space for a new entry. + +The problem is more difficult in the parallel case, as the \ref +H5AC_cache_config_t.min_clean_fraction "min_clean_fraction" is used to ensure +that the cache contains clean entries that can be evicted to make space for new +entries when metadata writes are forbidden -- i.e. between sync points. + +This issue was discovered shortly before release 1.8.3 and an automated solution +has not been implemented. Should it become an issue for an application, try +manually setting the cache size to ~1.5 times the maximum working set size for +the application, and leave \ref H5AC_cache_config_t.min_clean_fraction +"min_clean_fraction" set to 0.3. + +You can approximate the working set size of your application via repeated calls +to H5Fget_mdc_size() and H5Fget_mdc_hit_rate() while running your program with +the cache resize code enabled. The maximum value returned by H5Fget_mdc_size() +should be a reasonable approximation -- particularly if the associated hit rate +is good. In the parallel case, there is also an interaction between \c +min_clean_fraction and \ref H5AC_cache_config_t.dirty_bytes_threshold +"dirty_bytes_threshold". Absent calls to H5Fflush() (discussed above), the upper +bound on the amount of dirty data in the metadata caches will oscillate between +(1 - \ref H5AC_cache_config_t.min_clean_fraction "min_clean_fraction") times +current maximum cache size, and that value plus the \ref +H5AC_cache_config_t.dirty_bytes_threshold "dirty_bytes_threshold". Needless to +say, it will be best if the \ref H5AC_cache_config_t.min_size "min_size", \ref +H5AC_cache_config_t.min_clean_fraction "min_clean_fraction", and the \ref +H5AC_cache_config_t.dirty_bytes_threshold "dirty_bytes_threshold" +are chosen so that the cache can't fill with dirty data. + +\subsection defaults Default Metadata Cache Configuration + +Starting with release 1.8.3, HDF5 provides different default metadata cache +configurations depending on whether the library is compiled for serial or +parallel. + +The default configuration for the serial case is as follows: + +\code{.c} +{ + /* int version = */ H5C__CURR_AUTO_SIZE_CTL_VER, + /* hbool_t rpt_fcn_enabled = */ FALSE, + /* hbool_t open_trace_file = */ FALSE, + /* hbool_t close_trace_file = */ FALSE, + /* char trace_file_name[] = */ "", + /* hbool_t evictions_enabled = */ TRUE, + /* hbool_t set_initial_size = */ TRUE, + /* size_t initial_size = */ ( 2 * 1024 * 1024), + /* double min_clean_fraction = */ 0.01, + /* size_t max_size = */ (32 * 1024 * 1024), + /* size_t min_size = */ ( 1 * 1024 * 1024), + /* long int epoch_length = */ 50000, + /* enum H5C_cache_incr_mode incr_mode = */ H5C_incr__threshold, + /* double lower_hr_threshold = */ 0.9, + /* double increment = */ 2.0, + /* hbool_t apply_max_increment = */ TRUE, + /* size_t max_increment = */ (4 * 1024 * 1024), + /* enum H5C_cache_flash_incr_mode */ + /* flash_incr_mode = */ H5C_flash_incr__add_space, + /* double flash_multiple = */ 1.4, + /* double flash_threshold = */ 0.25, + /* enum H5C_cache_decr_mode decr_mode = */ H5C_decr__age_out_with_threshold, + /* double upper_hr_threshold = */ 0.999, + /* double decrement = */ 0.9, + /* hbool_t apply_max_decrement = */ TRUE, + /* size_t max_decrement = */ (1 * 1024 * 1024), + /* int epochs_before_eviction = */ 3, + /* hbool_t apply_empty_reserve = */ TRUE, + /* double empty_reserve = */ 0.1, + /* int dirty_bytes_threshold = */ (256 * 1024) +} +\endcode + +The default configuration for the parallel case is as follows: + +\code{.c} +{ + /* int version = */ H5C__CURR_AUTO_SIZE_CTL_VER, + /* hbool_t rpt_fcn_enabled = */ FALSE, + /* hbool_t open_trace_file = */ FALSE, + /* hbool_t close_trace_file = */ FALSE, + /* char trace_file_name[] = */ "", + /* hbool_t evictions_enabled = */ TRUE, + /* hbool_t set_initial_size = */ TRUE, + /* size_t initial_size = */ ( 2 * 1024 * 1024), + /* double min_clean_fraction = */ 0.3, + /* size_t max_size = */ (32 * 1024 * 1024), + /* size_t min_size = */ ( 1 * 1024 * 1024), + /* long int epoch_length = */ 50000, + /* enum H5C_cache_incr_mode incr_mode = */ H5C_incr__threshold, + /* double lower_hr_threshold = */ 0.9, + /* double increment = */ 2.0, + /* hbool_t apply_max_increment = */ TRUE, + /* size_t max_increment = */ (4 * 1024 * 1024), + /* enum H5C_cache_flash_incr_mode */ + /* flash_incr_mode = */ H5C_flash_incr__add_space, + /* double flash_multiple = */ 1.0, + /* double flash_threshold = */ 0.25, + /* enum H5C_cache_decr_mode decr_mode = */ H5C_decr__age_out_with_threshold, + /* double upper_hr_threshold = */ 0.999, + /* double decrement = */ 0.9, + /* hbool_t apply_max_decrement = */ TRUE, + /* size_t max_decrement = */ (1 * 1024 * 1024), + /* int epochs_before_eviction = */ 3, + /* hbool_t apply_empty_reserve = */ TRUE, + /* double empty_reserve = */ 0.1, + /* int dirty_bytes_threshold = */ (256 * 1024) +} +\endcode + +The default serial configuration should be adequate for most serial HDF5 users. + +The same may not be true for the default parallel configuration due to the +interaction between the \ref H5AC_cache_config_t.min_clean_fraction "min_clean_fraction" and the cache size increase code. See +the Interactions section for further details. + +Should you need to change the default configuration, it can be found in +H5ACprivate.h. Look for the definition of H5AC__DEFAULT_RESIZE_CONFIG. + +\section controlling Controlling the New Metadata Cache Size From Your Program + +You have already seen how \ref H5AC_cache_config_t has facilities that allow you +to control the metadata cache size directly. Use H5Fget_mdc_config() and +H5Fset_mdc_config() to get and set the metadata cache configuration on an open +file. Use H5Pget_mdc_config() and H5Pset_mdc_config() to get and set the initial +metadata cache configuration in a file access property list. Recall that this +list contains configuration data used when opening a file. + +Use H5Fget_mdc_hit_rate() to get the average hit rate since the last time the +hit rate stats were reset. This happens automatically at the beginning of each +epoch if the adaptive cache resize code is enabled. You can also do it manually +with H5Freset_mdc_hit_rate_stats(). Be careful about doing this if the adaptive +cache resize code is enabled, as you may confuse it. + +Use H5Fget_mdc_size() to get metadata cache size data on an open file. + +Finally, note that cache size and cache footprint are two different things -- in +my tests, the cache footprint (as inferred from the UNIX \c top command) is +typically about three times the maximum cache size. I haven't tracked it down +yet, but I would guess that most of this is due to the very small typical cache +entry size combined with the rather large size of the cache entry header +structure. This should be investigated further, but there are other matters of +higher priority. + +\section news New Metadata Cache Debugging Facilities + +The new metadata cache has a variety of debugging facilities that may be of +use. I doubt that any other than the report function and the trace file will +ever be accessible via the API, but they are relatively easy to turn on in the +source code. + +Note that none of this should be viewed as supported -- it is described here on +the off chance that you want to use it, but you are on your own if you do. Also, +there are no promises as to consistency between versions. + +As mentioned above, you can use the \ref H5AC_cache_config_t.rpt_fcn_enabled "rpt_fcn_enabled" field of the +configuration structure to enable the default reporting function +(H5C_def_auto_resize_rpt_fcn() in H5C.c). If this function doesn't work for you, +you will have to write your own. In particular, remember that it uses \c stdout, +so it will probably be unhappy under Windows. + +Again, remember that this facility is not supported. Further, it is likely to +change every time I do any serious work on the cache. + +There is also an extensive statistics collection code. Use +H5C_COLLECT_CACHE_STATS and H5C_COLLECT_CACHE_ENTRY_STATS in H5Cprivate.h to +turn this on. If you also turn on H5AC_DUMP_STATS_ON_CLOSE in H5ACprivate.h, +stats will be dumped when you close a file. Alternatively you can call +H5C_stats() and H5C_stats__reset() within the library to dump and reset +stats. Both of these functions are defined in H5C.c. + +Finally, the cache also contains an extensive sanity checking code. Much of this +is turned on when you compile in debug mode, but to enable the full suite, turn +on H5C_DO_SANITY_CHECKS in H5Cprivate.h. + +\section trouble Trouble Shooting + +Absent major bugs in the cache, the only troubleshooting you should have to do +is diagnosing and fixing problems with your cache configuration. + +Assuming it runs on your platform (I've only used it under Linux), the reporting +function is probably the most convenient diagnosis tool. However, since it is +unsupported code, I will not discuss it further beyond directing you to the +source (H5C_def_auto_resize_rpt_fcn() in H5C.c). + +Absent the reporting function, regular calls to H5Fget_mdc_hit_rate() should +give you a good idea of the hit rate over time. Remember that the hit rate stats +are reset at the end of each epoch (when adaptive cache resizing is enabled), so +you should expect some jitter. + +Similar calls to H5Fget_mdc_size() should allow you to monitor cache size and +the fraction of the current maximum cache size that is actually in use. + +If the hit rate is consistently low, and the cache it at its current maximum +size, increasing the maximum size is an obvious fix. + +If you see hit rate and cache size oscillations, try disabling adaptive cache +resizing and setting a fixed cache size a bit greater than the high end of the +cache size oscillations you observed. + +If the hit rate oscillations don't go away, you are probably looking at a +feature of your application that can't be helped without major changes to the +cache. Please send along a description of the situation. + +If the oscillations do go away, you may be able to come up with a configuration +that deals with the situation. If that fails, control the cache size manually, +and write to me, so I can try to develop an adaptive resize algorithm that works +in your case. + +Needless to say, you should give the cache a few epochs to adapt to +circumstances. If that is too slow for you, try manual cache size control. + +If you find it necessary to disable evictions, you may find it useful to enable +the internal statistics collection code mentioned above in the section on +debugging facilities. + +Amongst many other things, the stats code will report the maximum cache size, +and the average successful and unsuccessful search depths in the hash table. If +these latter figures are significantly above 1, you should increase the size of +the hash table. + + */
\ No newline at end of file |