| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
to clarify it includes potentially buffered data.
|
|\
| |
| | |
merge lz4opt.h into lz4hc.c
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Having a dedicated file for optimal parser
made sense during its creation,
it allowed Przemyslaw to work more freely on lz4opt, with less dependency on lz4hc,
moreover, the optimal parser was more complex, with its own search functions.
Since the optimal was rewritten last year, it's now a lot lighter.
It makes more sense now to integrate it directly inside lz4hc.c,
making it easier to edit (editors are a bit "lost" inside a `*.h` dependent on its #include position),
it also reduces the number of files in the project,
which fits pretty well with lz4 objectives.
(adding lz4hc requires "just" lz4hc.h and lz4hc.c).
|
| |
| |
| |
| | |
updated NEWS was current progresses
|
|/
|
|
|
| |
notably regarding LZ4_saveDict() speed advantage,
answering #477.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LZ4 block format specification
states that the last match must start
at a minimum distance of 12 bytes from the end of the block.
However, out of an abundance of caution,
the reference implementation would actually stop searching matches
at 13 bytes from the end of the block.
This patch fixes this small detail.
The new version is now able to properly compress a limit case
such as `aaaaaaaabaaa\n`
as reported by Gao Xiang (@hsiangkao).
Obviously, it doesn't change a lot of things.
This is just one additional match candidate per block, with a maximum match length of 7 (since last 5 bytes must remain literals).
With default policy, blocks are 4 MB long, so it doesn't happen too often
Compressing silesia.tar at default level 1 saves 5 bytes (100930101 -> 100930096).
At max level 12, it saves a grand 16 bytes (77389871 -> 77389855).
The impact is a bit more visible when blocks are smaller, hence more numerous.
For example, compressing silesia with blocks of 64 KB (using -12 -B4D) saves 543 bytes (77304583 -> 77304040).
So the smaller the packet size, the more visible the impact.
And it happens we have a ton of scenarios with little blocks using LZ4 compression ...
And a useless "hooray" sidenote :
the patch improves the LZ4 compression record of silesia (using -12 -B7D --no-frame-crc) by 16 bytes (77270672 -> 77270656)
and the record on enwik9 by 44 bytes (371680396 -> 371680352) (previously claimed by [smallz4](http://create.stephan-brumme.com/smallz4/) ).
|
|\
| |
| | |
Faster HC
|
| |
| |
| |
| | |
suggested by @terrelln
|
| |
| |
| |
| | |
by optimizing countback
|
| |
| |
| |
| | |
by removing bad candidates faster.
|
| |
| |
| |
| | |
answering question #473
|
| |
| |
| |
| | |
better use memcpy() directly
|
|/
|
|
| |
by making shortcut slightly more common
|
|
|
|
|
|
|
| |
On Windows, the Intel compiler is closer to MSVC rather than GCC and
does not support the GCC attribute syntax.
Fixes #468
|
|
|
|
|
| |
Also clarified a few API code comments
and updated associated html documentation
|
| |
|
|\ |
|
| | |
|
| |
| |
| |
| |
| |
| | |
When using clang++ with std c++14 or c++17 you would get the error "an attribute list cannot appear here" when including "lz4.h" as the visibility attribute is before the c++ attribute.
This ensures that the [[deprecated]] c++ attribute is before everything
else in the function declarations.
|
|/
|
|
| |
to better reflect LZ4F API usage.
|
| |
|
|
|
|
|
|
|
| |
- Replace U+00A0 by space
- Fix build failure of archivers/py-borgbackup in FreeBSD
Reference: https://bugs.FreeBSD.org/225235
|
|
|
|
|
| |
ensure some strange jump cases are not possible
(they were already not possible, but static analyzer couldn't understand it).
|
|
|
|
|
| |
with an assert()
to help static analyzer understanding this condition.
|
|
|
|
|
|
| |
previous version used an intentional overflow,
which is defined since it uses unsigned type,
but static analyzer complain about it.
|
| |
|
| |
|
|\
| |
| | |
[lz4f] Skip memcpy() on empty dictionary
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In some contexts, *cough*like at facebook*cough*, dynamic linking is used in
contexts which aren't truly dynamic. That is, the guarantee is maintained that
a program will only ever execute against the library version it was compiled
to interact with.
For those situations, introduce a compile-time flag that overrides hiding
these unstable APIs in shared objects.
|
|\ \
| | |
| | | |
conditional pattern analysis
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pattern analysis (currently limited to long ranges of identical bytes)
is actually detrimental to performance
when `nbSearches` is low.
Reason is : `nbSearches` provides a built-in protection for these cases.
The problem with patterns is that they dramatically increase the number of candidates to visit.
But with a low nbSearches, the match finder just aborts early.
In such cases, pattern analysis adds some complexity without reducing total nb of candidates.
It actually increases compression ratio a little bit, by filtering only "good" candidates,
but at a measurable speed cost, so it's not a good trade-off.
This patch makes pattern analysis optional.
It's enabled for levels 8+ only.
|
|/
|
|
| |
no longer limited to level 9
|
|
|
|
|
|
|
|
|
|
|
|
| |
lz4opt is only competitive vs lz4hc level 10.
Below that level, it doesn't match the speed / compression effectiveness of regular hc parser.
This patch propose to extend lz4opt to levels 10-12.
The new level 10 tend to compress a bit better and a bit faster than previous one (mileage vary depending on file)
The only downside is that `limitedDestSize` mode is now limited to max level 9 (vs 10),
since it's only compatible with regular HC parser.
(Note : I suspect it's possible to convert lz4opt to support it too, but haven't spent time into it).
|
|
|
|
|
| |
deprecated in newer C++ versions,
and dubious utility
|
|
|
|
|
| |
updated relevant doc.
This patch has no impact on ABI/API, nor on binary generation.
|
|\
| |
| | |
Improve Optimal parser
|
| | |
|
| |
| |
| |
| |
| | |
which is more explicit than its value `3`.
reported by @terrelln
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
for multi-bytes patterns
(which is not useful for the time being)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The first byte used to be skipped
to avoid a infinite self-comparison.
This is no longer necessary, since init() ensures that index starts at 64K.
The first byte is also useless to search when each block is independent,
but it's no longer the case when blocks are linked.
Removing the first-byte-skip saves
about 10 bytes / MB on files compressed with -BD4 (linked blocks 64Kb),
which feels correct as each MB has 16 blocks of 64KB.
|
| | |
|
| | |
|
| |
| |
| |
| | |
as reported by @terrelln
|
| |
| |
| |
| |
| | |
works for any repetitive pattern of length 1, 2 or 4 (but not 3!)
works for any endianess
|
| |
| |
| |
| |
| |
| |
| | |
- works with byte values other than `0`
- works for any repetitive pattern of length 1, 2 or 4 (but not 3!)
- works for little and big endian systems
- preserve speed of previous implementation
|
| |
| |
| |
| | |
dead assignment
|
| | |
|