| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
Other file recovery or Journaling related documents can be kept in this file
for now. Will get them more organized later.
Tested: eyeballed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bring changes from trunk from the time the branch was created (r14280)
up to the 1.8.0 release (r14525) back into the metadata journaling branch.
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Linux/64-ia64 2.4 (tg-login3) w/parallel, w/FORTRAN, in production mode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
value into
journal entries to be used by the recovery tool.
This value is only really neded once per transaction, and only when
the EOA changes, so rather than putting it into each journal entry,
this should be moved into its own transaction type. However, in order
to speed testing along, this quick fix has been implemented for the
time being.
Modified h5recover tool to use eoa value as well as journaling tests
accordingly.
Tested: kagiso
|
|
|
|
|
|
|
|
|
|
|
|
| |
* How to use this:
* ./enable_journaling # create JournalEG.h5 file
* ./enable_journaling -r # reopen JournalEG.h5 with Journaling on and
* # add more rows, then crash.
* ./h5recover -j JournalEG.h5.jnl JournalEG.h5 # to recover the file.
* ./enable_journaling -p # patch it with metadata of the added rows.
* Then JournalEG.h5 should have all the expected written rows and data.
Tested: kagiso. (-r failed with a library assertion error.)
|
|
|
|
|
|
|
|
| |
H5Ppublic.h.
Added H5AC2public.h to hdf5.h.
Tested: kagiso.
|
|
|
|
|
|
|
| |
Correct 'serialize' callback to add file pointer.
Tested on:
Linux/32 2.6 (kagiso) w/parallel
|
|
|
|
|
|
|
|
|
|
| |
Sun does not like variable and function having a common name of H5DIFF.
Rename the function as MYH5DIFF.
trecover_writer.c:
Change dataset datatype to the machine independent type of H5T_STD_I32LE.
This allows the h5dump output easier to match the expectd output.
Tested: smirom, linew, kagiso (serial passed).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added a pointer to the cache that an entry is contained within to the
cache entry structure. This allows us to remove the file pointer from some of
the H5AC2 calls, easing the conversion of some of the cache clients (the free
space section info and fractal heap direct blocks, and probably others).
Removed file pointer from the H5AC2_unpin_entry() call.
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.5.2 (amazon) in debug mode
Linux/64-ia64 2.4 (tg-login3) w/parallel, w/FORTRAN, in production mode
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Description: Changed H5C2_jb__journal_entry function to make a copy of the
incoming journal entry before doing anything with it. I was seeing
errors in the journals produced by using the pointer passed to me,
so copying the data beforehand looks to solve the problem.
Also made a quick change to h5recover.c to use generated fapl
when opening the recovered HDF5 file. (was previously using
H5P_DEFAULT).
Tested: kagiso
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updated to start using the real h5recover tool. But tests are not passing
and have to patch it not to exit 1 at all.
Also, h5diff tool has errors, therefore I made my own diff by using h5dumps.
trecover_main.c:
Changed the default to generate chunked storage datasets only since that
is the only one that Journal code can do.
default.txt:
async_crash.txt:
updated them to use current output which is not right anyway.
Doing all these so that other team members can work on their code.
Tested: kagiso.
This line, and those below, will be ignored--
M h5recover/testh5recover.sh.in
M h5recover/testfiles/default.txt
M h5recover/testfiles/async_crash.txt
M h5recover/trecover_main.c
|
|
|
|
| |
Tested: kagiso.
|
|
|
|
|
|
|
|
|
|
| |
Has has a temporary patch option (-p) to add datasets back in till the object
header codes work.
Makefile.am: also added more cleaning of temporary generated files.
tested: kagiso.
------------------------------------------------------------------------
|
|
|
|
|
|
|
| |
Has has a temporary patch option (-p) to add datasets back in till the object
header codes work.
tested: kagiso.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Convert the symbol table node metadata cache client to use the new
journaling cache callbacks.
Also added a 'H5F_t *' parameter to the 'serialize' callback for the
journaling cache, which makes the client's job much easier.
Various minor coding cleanups, etc. also.
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.5.3 (amazon) in debug mode
Linux/64-ia64 2.4 (tg-login3) w/parallel, w/FORTRAN, in production mode
|
|
|
|
|
|
|
|
|
|
|
|
| |
Description: Adding recovery tool to the metadata_journaling repository. The
tool still needs to go through some tweaks, especially regarding
syntax changes, grabbing the journal name from an hdf5 file,
confirming successful uncorruption of file, et cetera, but this
should be enough to give Albert a chance to start using it in
the trecover tests so we can work through additional debugging
issues together to get that to run.
Tested: kagiso
|
|
|
|
|
|
|
|
|
| |
Remove all test files before creating them anew. (This was
partially due to the journal code does not handle existing
journal file well.)
Changed testh5recover.sh to exit 1 if errors encountered.
The old way of "exit $nerrors" could fail if $nerrors happened
to be a multiple of the exit code limit (usually 1 byte=256).
|
|
|
|
|
|
|
|
| |
Rename H5Pset_fapl_journal as H5Pset_journal.
Use the public constant of H5AC2__CURR_CACHE_CONFIG_VERSION.
Tested: h5committested. (tools/h5recover/testh5reover.sh had failures but
that was because ../h5diff/h5diff's failure.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Updates to test opening of a file created with journaling,
along with associated debugging modifications.
(Mike M. To get journal deletion to work correctly, I
had to modify H5C2_jb__init() to allocate a buffer for the
journal file name and copy it into the buffer. Similarly,
I had to modify to H5C2_jb__takedown() to free the buffer.
The fix was hurried, and should be reviewed. Also, a
similar fix is probably in order for the HDF5 file name.)
* Fix for the bug Albert reported on Linew.
* An attempt to apply the changes Quincey requested to the
loc_id parameters to the FUNC_ENTER_API_META macro calls
in:
H5Gmove2(), (src_loc_id --> dst_loc_id)
H5Lcopy(), (src_loc_id --> dst_loc_id)
H5Lmove(), (src_loc_id --> dst_loc_id)
H5Glink2(), (cur_loc_id --> new_loc_id)
H5Lmove() (cur_loc_id --> new_loc_id)
However, with the exception of the requested change to
H5Gmove2(), all these chages caused us to fail the
regression tests. Thus only the H5Gmove2() change is
made.
Several caviats and warnings:
* If you build and test this checkin, it will fail on the
on the test for trecover.
This showed up after I updated my project, so initially
I thought I had broken something. However, after examining
the problem for a while, I thought to checkout the version
prior to this checkin, and test to see if the problem appeared.
It did (under serial on phoenix, and parallel on Kagiso),
so I am going ahead with this checkin regardless under the
assumption that it is orthoginal to my changes.
* Low level testing for the journaling feature of the metadata
cache is not complete. The coverage of the existing tests
is good enough that I don't expect anything major, but don't
be surprised if you run into problems around the edges.
In particular, enabling and disabling journaling while the
file is open has not been tested at all. Suggest we stay
away from this until it gets at least a once over.
* The metadata journaling smoke check tests in cache2_journal.c
are still configured to generate the architype files used to
check journal output. This can be turned off any time, but
given Quincey's constaints on test file size, I have to write
code to skip the tests if the architype files are missing,
and then put compressed versions of the architype files in
svn before I do so. Unfortunately, there is no time before
I leave.
* I left a good bit of debugging code in both the journaling
code proper, and the associated test code. It should all
be #if 0'ed out at present, but if you run into it, you
know what is going on. Needless to say, I'll delete it when
I finish testing.
* I was not able to reproduce the bug Albert observed on RedStorm
locally, so I don't have a fix for it. That said, I touched
some things that could have caused it, so it is possible that
I fixed it by accident.
Testing:
Before I updated, I was able to build and test serial on Phoenix
and Linew, and parallel on Kagiso without errors in the regression
tests.
As discussed above, after the update, I failed in the test for
trecover in a serial build and test on Phoenix, and parallel build
and test on Kagiso. Linew is slow, so I didn't attempt a test there.
Since the same failures appear in the verion prior to this checkin,
I am going ahead with the checkin regardless on the assumption that
the problem is orthoginal to my changes.
|
|
|
|
|
|
|
|
|
|
|
|
| |
new feature.
Description:
Added H5Pset_fapl_journal() to provide an API to turn journal on. This is
just a preliminary implementation. More features will be added later when
they are available.
Tested:
h5committest.
|
|
|
|
|
|
|
|
| |
Description: Removing a line of what looks to be debugging code
from H5AC2.c, which sets the transaction number to 1 before
leaving the begin_transaction routine.
Tested: kagiso
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Description: Fixing a couple of bugs in the journal logging code. Primarily,
H5C2_jb__bin2hex was WAY inefficient. I've cleaned it up a bit.
cache2_journal now finishes in less than a minute on kagiso. Also,
__DATE__ preprocessor macros are no longer used to generate
journal file headers.
Tested: kagiso
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I now have substantial tests for this code -- enough (I hope) for
Mike M. to get started. However, the code is my no means fully tested.
I don't expect any obvious problems, but there are probably quite a few
relatively subtle bugs remaining. I'll be chasing these in the next
week.
For an example of setting up the cache to journal, see
setup_cache_for_journaling() in test/cache2_journal.c
Warnings:
1) For now, only enable journaling at file creation time -- code to
do this after the file is opened exists, but it hasn't been tested.
2) Right now the journal logging code is very inefficient, so expect
things to run slowly until Mike M. checks in his changes to address
this problem.
3) I have not checked in exemplar journal output files pending a fix
another minor bug in the journal logging code. Until then, the
journal tests create exemplars and then test against them -- a poor
way to find errors.
4) The USE_CORE_DRIVER has been moved to cache2_common.h.
5) When USE_CORE_DRIVER is FALSE, cache2_journal runs VERY slowly on
some system (i.e. 4 hours on Phoenix) -- but it runs fast on Kagiso
(~10 minutes). Don't know why, but would guess that the quantity
of RAM on the system has much to do with it.
Tested serial debug on Phonenix, and parallel debug on Kagiso
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While this code doesn't break any of the existing tests, it HAS NOT
been tested beyond that.
Also mods needed to integrate the journaling code with Quincey's latest
mods, and to adapt existing test code to slight changes caused by the
addition of journaling.
Finally, fixed an undefined variable bug in the HL code exposed by the
journaling mods.
Tested serial under Linux (Phoenix) and parallel under Linux (Kagiso).
|
|
|
|
|
|
|
|
|
|
| |
Add "_META" suffix to FUNC_ENTER/FUNC_LEAVE API routines that can modify
metadata in the file. This will give us a single place to change to recording
the beginning & ending of transactions.
Tested on:
FreeBSD/32 6.2 (duty)
Too simple to require h5committest
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Correctly initialize cache_info struct for cloned B-tree node
(correcting memory corruption issue on Linux machines)
Remove "//" comments in cache2_common.c and replace them with "/* *"
comments.
Back out h5diff hack in testh5recover.sh.in, Albert already fixed it.
Still building on tg-login3 for parallel testing...
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.5.2 (amazon) in debug mode
Linux/64-ia64 2.4 (tg-login3) w/parallel, w/FORTRAN, in production mode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Misc. cleanups found while compiling in other environments.
Still failing on linux machines with a memory corruption error and
not finished building in parallel yet either...
Tested on:
FreeBSD/32 6.2 (duty) in debug mode
FreeBSD/64 6.2 (liberty) w/C++ & FORTRAN, in debug mode
Linux/32 2.6 (kagiso) w/PGI compilers, w/C++ & FORTRAN, w/threadsafe,
in debug mode
Linux/64-amd64 2.6 (smirom) w/default API=1.6.x, w/C++ & FORTRAN,
in production mode
Linux/64-ia64 2.6 (cobalt) w/Intel compilers, w/C++ & FORTRAN,
in production mode
Solaris/32 2.10 (linew) w/deprecated symbols disabled, w/C++ & FORTRAN,
w/szip filter, in production mode
Mac OS X/32 10.4.10 (amazon) in debug mode
Linux/64-ia64 2.4 (tg-login3) w/parallel, w/FORTRAN, in production mode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch v1 B-tree nodes from using previous cache to use the new journaling
cache code. This is a major switch for the cache callbacks.
Switched the metadata caching code from having a pointer to the file it's
in to receiving a pointer to the file, when needed. This was necessary in
order to avoid crashing when two files IDs were open on the same underlying
file and one of those files was closed with cache entries using the file
pointers. Also took out the check in the caching code for reading off the
end of the file, which didn't play nicely with the multi-file VFD.
Switching the cache from having a pointer internally to requiring one
externally meant tweaking almost all the test code. :-/
Added correct exit codes to cache2 tests that didn't have them already,
so the 'make check' will stop when they fail.
Use the path to the h5diff in this build in the tools/h5recover testing
script, since we can't guarantee a user has HDF5 already installed.
Assorted minor tweaks to get everything to compile more cleanly and pass
all the tests (on my Mac :-).
Tested on:
Mac OS X (10.5.2) w/C++
(more testing coming up shortly, I just didn't have my "rsync testbed" set
up for this branch when I started making changes to the code)
|
|
|
|
|
|
|
|
|
|
|
|
| |
got interrupted by ALARM.
Eliminate all printf() calls during signal handling. Also _exit() sometimes hangs the
Red Storm processes. Replaced it with SIGTERM.
Moved trecover away from TEST_PROG since it always exits with non-zero status since it
is ended by SIGTERM now.
Tested: Kagiso and Red Storm.
|
|
|
|
|
|
|
|
|
| |
Red Storm did not like to use "touch" as the faked h5recover tools.
Added h5recover.c to provide a dummy executable to make RS happy.
Tested:
Kagiso, smirom, linew, Redstorm. (but red storm now failed when trecover uses
Async crash.)
|
|
|
|
|
|
|
|
|
|
| |
The h5diff in linew could not handle files created in this version of
library. Changed the test script to use the h5diff generated in this
version itself.
Tested:
Linew and smirom. (did not test in kagiso since the library is failing
in it somewhere else.)
|
|
|
|
| |
Also added entry for ./test/cache2_journal.c.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Code to read and write the metadata journaling configuration block
and associated test code. (Quincey: Just recalled that I have not
converted the memory type of the metadata journaling configuration
block to H5FD_MEM_SUPER per our email conversation. It will be in
the next checkin.)
2) Dummy begin/end transaction calls on the off chance that Quincey
gets to working on this before I check in the real ones.
3) Updates to cache2 in test to reduce the size of the test according
to the value of the HDF5TestExpress environment variable. Run times
on Phoenix using the core file driver are as follows:
HDF5TestExpress = 0 20:20
HDF5TestExpress = 1 4:56
HDF5TestExpress = 2 3:18
HDF5TestExpress = 3 0:25
With HDF5TestExpress = 3, I skip the smoke checks entirely. With
HDF5TestExpress = 2, I set the number of itterations as low as it
can go without a major re-write. (Albert: I hope running with
HDF5TestExpress = 2 will work for you. If it doesn't, we will just
have to run with HDF5TestExpress = 3 on RedStorm, and skip the
smoke checks for now.
Tested serial on Phonenix and parallel on kagiso.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tests.
Description: Journal entry logging code has been upgraded. The code
now accurately tracks transactions that have made it
to disk, and converts data from binary to hex for
entry into the journal file. Various other minor
code modifications have been made based on suggestions
by John Mainzer.
More tests have been added to verify the functionality
of the logging code. Specifically, tests have been
written to verify functionality of the ring buffer, to
check validity of journal messages produced, and to
verify that transaction tracking is functioning
appropriately.
Tested: kagiso, smirom, linew, duty
|
|
|
|
|
|
|
|
|
| |
journaling
superblock extension message. Created the file src/H5Omdj_msg.c and
moved journaling superblock extension message to it.
Tested serial on Phoenix, and parallel on Kagiso.
|
|
|
|
|
|
|
|
| |
non-crashed control
file. Also, writer() can generated different datasets as directed now.
Tested: kagiso.
|
|
|
|
|
|
|
|
|
| |
and h5recover
tools. (note, h5recover is not available yet. Used "touch" as a dummy
program.)
Tested: kagiso.
|
|
|
|
| |
Tested: kagiso.
|
|
|
|
| |
Tested in ksagiso.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bug fix.
Description:
Functions H5O_mdj_conf_decode and H5O_mdj_conf_debug had the wrong
function names in its FUNC_ENTER_NOAPI_NOINIT() line. This caused
linew (SunOS10) to fail linking as it was looking for those mistyped
function names for its own module.
Solution:
Fixed the typo name.
Tested:
It took way too long to run make check because cache2 ran for an hour
in linew and still not done. At least, all three machines could build
the binary now.
|
|
|
|
| |
Tested: None, comment modifications only.
|
|
|
|
|
|
|
|
| |
the journaling branch. Added several new errors, and made many edits
to Mike's code (don't worry Mike -- changes I made were on items I
neglected to discuss with you).
Serial test on Phoenix only.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
note that both the H5C and H5C2 code have been updated.
Also checked in code to track journaling status in the super block.
Note that this code has not been tested -- but as best I can tell,
it does not break the existing regression tests.
Tested serial (debug and production) on Phoenix. Also tested parallel
on kagiso.
Note that regression test fails on kagiso (but not on phoenix) if
the cache2 serial tests are configured to use the core file driver.
Thus this code is check in with the core file driver optimization
of the cache2 tests disabled. To turn it on, set the USE_CORE_DRIVER
#define to TRUE.
|
|
|
|
|
|
|
|
| |
Added async crash support.
Added command line option support.
Tested:
kagiso, smirom, linew.
|
|
|
|
|
|
|
| |
file recovery after a crash.
Tested:
linux (kagiso), linux64 (smirom), Solaris 10 (linew, in progress).
|
|
|
|
|
|
|
|
| |
the checkout.
They seemed to be cosmetic changes such as reformating the column width.
I went ahead to commit them so that it will not mix up with the changes
I am planning for adding tools/h5recover.
|
|
|
|
|
|
|
|
| |
which has a lot
of nice new features.
(ported revisions through r14312 from trunk)
|
|
|
|
|
|
|
|
|
| |
driver.
This is disabled at present due to failures on Kagiso, but can be turned
on by setting USE_CORE_DRIVER to TRUE at the top of cache2_common.c
Also modified cache2.c to turn on the full set of tests.
|
|
|
|
|
|
|
| |
branch -- this commit needed at I forgot to svn add the new files
created in support of metadata journaling.
Again, this version may not compile.
|