From 44793b8be9f18bb51f524b3a210de11bb0df6654 Mon Sep 17 00:00:00 2001 From: Yann Collet Date: Mon, 30 Mar 2015 18:32:21 +0100 Subject: Updated documentation --- README.md | 46 +++++++++++++--------------------------------- lz4_Block_format.md | 25 ++++++++++++++----------- 2 files changed, 27 insertions(+), 44 deletions(-) diff --git a/README.md b/README.md index f960e7d..275085e 100644 --- a/README.md +++ b/README.md @@ -20,41 +20,21 @@ A high compression derivative, called LZ4_HC, is also provided. It trades CPU ti Benchmarks ------------------------- -The benchmark uses the [Open-Source Benchmark program by m^2 (v0.14.2)](http://encode.ru/threads/1371-Filesystem-benchmark?p=33548&viewfull=1#post33548) compiled with GCC v4.6.1 on Linux Ubuntu 64-bits v11.10, -The reference system uses a Core i5-3340M @2.7GHz. +The benchmark uses the [Open-Source Benchmark program by m^2 (v0.14.3)](http://encode.ru/threads/1371-Filesystem-benchmark?p=33548&viewfull=1#post33548) compiled with GCC v4.8.2 on Linux Mint 64-bits v17. +The reference system uses a Core i5-4300U @1.9GHz. Benchmark evaluates the compression of reference [Silesia Corpus](http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia) in single-thread mode. - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CompressorRatioCompressionDecompression
LZ4 (r101)2.084422 MB/s1820 MB/s
LZO 2.062.106414 MB/s600 MB/s
QuickLZ 1.5.1b62.237373 MB/s420 MB/s
Snappy 1.1.02.091323 MB/s1070 MB/s
LZF2.077270 MB/s570 MB/s
zlib 1.2.8 -12.73065 MB/s280 MB/s
LZ4 HC (r101)2.72025 MB/s2080 MB/s
zlib 1.2.8 -63.09921 MB/s300 MB/s
- -The LZ4 block compression format is detailed within [lz4_block_format.txt](lz4_block_format.txt). +| Compressor | Ratio | Compression | Decompression | +| ---------- | ----- | ----------- | ------------- | +|**LZ4 (r129)** | 2.101 |**385 MB/s** |**1850 MB/s** | +| LZO 2.06 | 2.108 | 350 MB/s | 510 MB/s | +| QuickLZ 1.5.1.b6 | 2.238 | 320 MB/s | 380 MB/s | +| Snappy 1.1.0 | 2.091 | 250 MB/s | 960 MB/s | +| zlib 1.2.8 -1 | 2.730 | 59 MB/s | 250 MB/s | +|**LZ4 HC (r129)** |**2.720**| 22 MB/s |**1830 MB/s** | +| zlib 1.2.8 -6 | 3.099 | 18 MB/s | 270 MB/s | + +The LZ4 block compression format is detailed within [lz4_Block_format](lz4_Block_format.md). For streaming unknown amount of data and compress files of any size, a frame format has been published, and can be consulted within the file LZ4_Frame_Format.html . diff --git a/lz4_Block_format.md b/lz4_Block_format.md index e248fd9..b933a6a 100644 --- a/lz4_Block_format.md +++ b/lz4_Block_format.md @@ -1,10 +1,9 @@ LZ4 Block Format Description ============================ -Last revised: 2015-03-26; +Last revised: 2015-03-26. Author : Yann Collet - This small specification intents to provide enough information to anyone willing to produce LZ4-compatible compressed data blocks using any programming language. @@ -26,7 +25,8 @@ on implementation details of the compressor, and vice versa. Compressed block format ----------------------- An LZ4 compressed block is composed of sequences. -Schematically, a sequence is a suite of literals, followed by a match copy. +A sequence is a suite of literals (not-compressed bytes), +followed by a match copy. Each sequence starts with a token. The token is a one byte value, separated into two 4-bits fields. @@ -35,14 +35,14 @@ Therefore each field ranges from 0 to 15. The first field uses the 4 high-bits of the token. It provides the length of literals to follow. -(Note : a literal is a not-compressed byte). + If the field value is 0, then there is no literal. If it is 15, then we need to add some more bytes to indicate the full length. -Each additionnal byte then represent a value from 0 to 255, +Each additional byte then represent a value from 0 to 255, which is added to the previous value to produce a total length. When the byte value is 255, another byte is output. There can be any number of bytes following the token. There is no "size limit". -(Sidenote this is why a not-compressible input block is expanded by 0.4%). +(Side note : this is why a not-compressible input block is expanded by 0.4%). Example 1 : A length of 48 will be represented as : - 15 : value for the 4-bits High field @@ -65,7 +65,8 @@ It's possible that there are zero literal. Following the literals is the match copy operation. It starts by the offset. -This is a 2 bytes value, in little endian format. +This is a 2 bytes value, in little endian format +(the 1st byte is the "low" byte, the 2nd one is the "high" byte). The offset represents the position of the match to be copied from. 1 means "current position - 1 byte". @@ -95,9 +96,12 @@ Parsing restrictions ----------------------- There are specific parsing rules to respect in order to remain compatible with assumptions made by the decoder : -1) The last 5 bytes are always literals -2) The last match must start at least 12 bytes before end of block -Consequently, a block with less than 13 bytes cannot be compressed. + +1. The last 5 bytes are always literals +2. The last match must start at least 12 bytes before end of block. + + Consequently, a block with less than 13 bytes cannot be compressed. + These rules are in place to ensure that the decoder will never read beyond the input buffer, nor write beyond the output buffer. @@ -118,4 +122,3 @@ or full optimal parsing. All these trade-off offer distinctive speed/memory/compression advantages. Whatever the method used by the compressor, its result will be decodable by any LZ4 decoder if it follows the format specification described above. - -- cgit v0.12