summaryrefslogtreecommitdiffstats
path: root/lz4_Block_format.md
diff options
context:
space:
mode:
authorYann Collet <yann.collet.73@gmail.com>2015-03-30 17:32:21 (GMT)
committerYann Collet <yann.collet.73@gmail.com>2015-03-30 17:32:21 (GMT)
commit44793b8be9f18bb51f524b3a210de11bb0df6654 (patch)
tree5452e812fe46d117ceba25514e891d35287b1a4e /lz4_Block_format.md
parentb93f629681ad3245a09add28e4d0b2e43bcde58a (diff)
downloadlz4-44793b8be9f18bb51f524b3a210de11bb0df6654.zip
lz4-44793b8be9f18bb51f524b3a210de11bb0df6654.tar.gz
lz4-44793b8be9f18bb51f524b3a210de11bb0df6654.tar.bz2
Updated documentation
Diffstat (limited to 'lz4_Block_format.md')
-rw-r--r--lz4_Block_format.md25
1 files changed, 14 insertions, 11 deletions
diff --git a/lz4_Block_format.md b/lz4_Block_format.md
index e248fd9..b933a6a 100644
--- a/lz4_Block_format.md
+++ b/lz4_Block_format.md
@@ -1,10 +1,9 @@
LZ4 Block Format Description
============================
-Last revised: 2015-03-26;
+Last revised: 2015-03-26.
Author : Yann Collet
-
This small specification intents to provide enough information
to anyone willing to produce LZ4-compatible compressed data blocks
using any programming language.
@@ -26,7 +25,8 @@ on implementation details of the compressor, and vice versa.
Compressed block format
-----------------------
An LZ4 compressed block is composed of sequences.
-Schematically, a sequence is a suite of literals, followed by a match copy.
+A sequence is a suite of literals (not-compressed bytes),
+followed by a match copy.
Each sequence starts with a token.
The token is a one byte value, separated into two 4-bits fields.
@@ -35,14 +35,14 @@ Therefore each field ranges from 0 to 15.
The first field uses the 4 high-bits of the token.
It provides the length of literals to follow.
-(Note : a literal is a not-compressed byte).
+
If the field value is 0, then there is no literal.
If it is 15, then we need to add some more bytes to indicate the full length.
-Each additionnal byte then represent a value from 0 to 255,
+Each additional byte then represent a value from 0 to 255,
which is added to the previous value to produce a total length.
When the byte value is 255, another byte is output.
There can be any number of bytes following the token. There is no "size limit".
-(Sidenote this is why a not-compressible input block is expanded by 0.4%).
+(Side note : this is why a not-compressible input block is expanded by 0.4%).
Example 1 : A length of 48 will be represented as :
- 15 : value for the 4-bits High field
@@ -65,7 +65,8 @@ It's possible that there are zero literal.
Following the literals is the match copy operation.
It starts by the offset.
-This is a 2 bytes value, in little endian format.
+This is a 2 bytes value, in little endian format
+(the 1st byte is the "low" byte, the 2nd one is the "high" byte).
The offset represents the position of the match to be copied from.
1 means "current position - 1 byte".
@@ -95,9 +96,12 @@ Parsing restrictions
-----------------------
There are specific parsing rules to respect in order to remain compatible
with assumptions made by the decoder :
-1) The last 5 bytes are always literals
-2) The last match must start at least 12 bytes before end of block
-Consequently, a block with less than 13 bytes cannot be compressed.
+
+1. The last 5 bytes are always literals
+2. The last match must start at least 12 bytes before end of block.
+
+ Consequently, a block with less than 13 bytes cannot be compressed.
+
These rules are in place to ensure that the decoder
will never read beyond the input buffer, nor write beyond the output buffer.
@@ -118,4 +122,3 @@ or full optimal parsing.
All these trade-off offer distinctive speed/memory/compression advantages.
Whatever the method used by the compressor, its result will be decodable
by any LZ4 decoder if it follows the format specification described above.
-