From 88cef262ec50cf905da8498b7af2d8f167fbf61d Mon Sep 17 00:00:00 2001
From: Przemyslaw Skibinski <inikep@gmail.com>
Date: Thu, 3 Nov 2016 13:25:20 +0100
Subject: documentation moved to doc/

---
 README.md               |   4 +-
 doc/lz4_Block_format.md | 127 ++++++++++++++++
 doc/lz4_Frame_format.md | 385 ++++++++++++++++++++++++++++++++++++++++++++++++
 lib/README.md           |   2 +-
 lib/lz4.h               |   2 +-
 lz4_Block_format.md     | 127 ----------------
 lz4_Frame_format.md     | 385 ------------------------------------------------
 7 files changed, 516 insertions(+), 516 deletions(-)
 create mode 100644 doc/lz4_Block_format.md
 create mode 100644 doc/lz4_Frame_format.md
 delete mode 100644 lz4_Block_format.md
 delete mode 100644 lz4_Frame_format.md

diff --git a/README.md b/README.md
index 753f11d..3f9cba6 100644
--- a/README.md
+++ b/README.md
@@ -89,6 +89,6 @@ A list of known source ports is maintained on the [LZ4 Homepage].
 
 [Open-Source Benchmark program by m^2 (v0.14.3)]: http://encode.ru/threads/1371-Filesystem-benchmark?p=34029&viewfull=1#post34029
 [Silesia Corpus]: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
-[lz4_Block_format]: lz4_Block_format.md
-[lz4_Frame_format]: lz4_Frame_format.md
+[lz4_Block_format]: doc/lz4_Block_format.md
+[lz4_Frame_format]: doc/lz4_Frame_format.md
 [LZ4 Homepage]: http://www.lz4.org
diff --git a/doc/lz4_Block_format.md b/doc/lz4_Block_format.md
new file mode 100644
index 0000000..0f6a5ba
--- /dev/null
+++ b/doc/lz4_Block_format.md
@@ -0,0 +1,127 @@
+LZ4 Block Format Description
+============================
+Last revised: 2015-05-07.
+Author : Yann Collet
+
+
+This specification is intended for developers
+willing to produce LZ4-compatible compressed data blocks
+using any programming language.
+
+LZ4 is an LZ77-type compressor with a fixed, byte-oriented encoding.
+There is no entropy encoder back-end nor framing layer.
+The latter is assumed to be handled by other parts of the system (see [LZ4 Frame format]).
+This design is assumed to favor simplicity and speed.
+It helps later on for optimizations, compactness, and features.
+
+This document describes only the block format,
+not how the compressor nor decompressor actually work.
+The correctness of the decompressor should not depend
+on implementation details of the compressor, and vice versa.
+
+[LZ4 Frame format]: lz4_Frame_format.md
+
+
+
+Compressed block format
+-----------------------
+An LZ4 compressed block is composed of sequences.
+A sequence is a suite of literals (not-compressed bytes),
+followed by a match copy.
+
+Each sequence starts with a token.
+The token is a one byte value, separated into two 4-bits fields.
+Therefore each field ranges from 0 to 15.
+
+
+The first field uses the 4 high-bits of the token.
+It provides the length of literals to follow.
+
+If the field value is 0, then there is no literal.
+If it is 15, then we need to add some more bytes to indicate the full length.
+Each additional byte then represent a value from 0 to 255,
+which is added to the previous value to produce a total length.
+When the byte value is 255, another byte is output.
+There can be any number of bytes following the token. There is no "size limit".
+(Side note : this is why a not-compressible input block is expanded by 0.4%).
+
+Example 1 : A length of 48 will be represented as :
+
+  - 15 : value for the 4-bits High field
+  - 33 : (=48-15) remaining length to reach 48
+
+Example 2 : A length of 280 will be represented as :
+
+  - 15  : value for the 4-bits High field
+  - 255 : following byte is maxed, since 280-15 >= 255
+  - 10  : (=280 - 15 - 255) ) remaining length to reach 280
+
+Example 3 : A length of 15 will be represented as :
+
+  - 15 : value for the 4-bits High field
+  - 0  : (=15-15) yes, the zero must be output
+
+Following the token and optional length bytes, are the literals themselves.
+They are exactly as numerous as previously decoded (length of literals).
+It's possible that there are zero literal.
+
+
+Following the literals is the match copy operation.
+
+It starts by the offset.
+This is a 2 bytes value, in little endian format
+(the 1st byte is the "low" byte, the 2nd one is the "high" byte).
+
+The offset represents the position of the match to be copied from.
+1 means "current position - 1 byte".
+The maximum offset value is 65535, 65536 cannot be coded.
+Note that 0 is an invalid value, not used.
+
+Then we need to extract the match length.
+For this, we use the second token field, the low 4-bits.
+Value, obviously, ranges from 0 to 15.
+However here, 0 means that the copy operation will be minimal.
+The minimum length of a match, called minmatch, is 4.
+As a consequence, a 0 value means 4 bytes, and a value of 15 means 19+ bytes.
+Similar to literal length, on reaching the highest possible value (15),
+we output additional bytes, one at a time, with values ranging from 0 to 255.
+They are added to total to provide the final match length.
+A 255 value means there is another byte to read and add.
+There is no limit to the number of optional bytes that can be output this way.
+(This points towards a maximum achievable compression ratio of about 250).
+
+With the offset and the matchlength,
+the decoder can now proceed to copy the data from the already decoded buffer.
+On decoding the matchlength, we reach the end of the compressed sequence,
+and therefore start another one.
+
+
+Parsing restrictions
+-----------------------
+There are specific parsing rules to respect in order to remain compatible
+with assumptions made by the decoder :
+
+1. The last 5 bytes are always literals
+2. The last match must start at least 12 bytes before end of block.
+   Consequently, a block with less than 13 bytes cannot be compressed.
+
+These rules are in place to ensure that the decoder
+will never read beyond the input buffer, nor write beyond the output buffer.
+
+Note that the last sequence is also incomplete,
+and stops right after literals.
+
+
+Additional notes
+-----------------------
+There is no assumption nor limits to the way the compressor
+searches and selects matches within the source data block.
+It could be a fast scan, a multi-probe, a full search using BST,
+standard hash chains or MMC, well whatever.
+
+Advanced parsing strategies can also be implemented, such as lazy match,
+or full optimal parsing.
+
+All these trade-off offer distinctive speed/memory/compression advantages.
+Whatever the method used by the compressor, its result will be decodable
+by any LZ4 decoder if it follows the format specification described above.
diff --git a/doc/lz4_Frame_format.md b/doc/lz4_Frame_format.md
new file mode 100644
index 0000000..2ea1a86
--- /dev/null
+++ b/doc/lz4_Frame_format.md
@@ -0,0 +1,385 @@
+LZ4 Frame Format Description
+============================
+
+###Notices
+
+Copyright (c) 2013-2015 Yann Collet
+
+Permission is granted to copy and distribute this document
+for any  purpose and without charge,
+including translations into other  languages
+and incorporation into compilations,
+provided that the copyright notice and this notice are preserved,
+and that any substantive changes or deletions from the original
+are clearly marked.
+Distribution of this document is unlimited.
+
+###Version
+
+1.5.1 (31/03/2015)
+
+
+Introduction
+------------
+
+The purpose of this document is to define a lossless compressed data format,
+that is independent of CPU type, operating system,
+file system and character set, suitable for
+File compression, Pipe and streaming compression
+using the [LZ4 algorithm](http://www.lz4.org).
+
+The data can be produced or consumed,
+even for an arbitrarily long sequentially presented input data stream,
+using only an a priori bounded amount of intermediate storage,
+and hence can be used in data communications.
+The format uses the LZ4 compression method,
+and optional [xxHash-32 checksum method](https://github.com/Cyan4973/xxHash),
+for detection of data corruption.
+
+The data format defined by this specification
+does not attempt to allow random access to compressed data.
+
+This specification is intended for use by implementers of software
+to compress data into LZ4 format and/or decompress data from LZ4 format.
+The text of the specification assumes a basic background in programming
+at the level of bits and other primitive data representations.
+
+Unless otherwise indicated below,
+a compliant compressor must produce data sets
+that conform to the specifications presented here.
+It doesn’t need to support all options though.
+
+A compliant decompressor must be able to decompress
+at least one working set of parameters
+that conforms to the specifications presented here.
+It may also ignore checksums.
+Whenever it does not support a specific parameter within the compressed stream,
+it must produce a non-ambiguous error code
+and associated error message explaining which parameter is unsupported.
+
+
+General Structure of LZ4 Frame format
+-------------------------------------
+
+| MagicNb | F. Descriptor | Block | (...) | EndMark | C. Checksum |
+|:-------:|:-------------:| ----- | ----- | ------- | ----------- |
+| 4 bytes |  3-11 bytes   |       |       | 4 bytes | 0-4 bytes   |
+
+__Magic Number__
+
+4 Bytes, Little endian format.
+Value : 0x184D2204
+
+__Frame Descriptor__
+
+3 to 11 Bytes, to be detailed in the next part.
+Most important part of the spec.
+
+__Data Blocks__
+
+To be detailed later on.
+That’s where compressed data is stored.
+
+__EndMark__
+
+The flow of blocks ends when the last data block has a size of “0”.
+The size is expressed as a 32-bits value.
+
+__Content Checksum__
+
+Content Checksum verify that the full content has been decoded correctly.
+The content checksum is the result
+of [xxh32() hash function](https://github.com/Cyan4973/xxHash)
+digesting the original (decoded) data as input, and a seed of zero.
+Content checksum is only present when its associated flag
+is set in the frame descriptor.
+Content Checksum validates the result,
+that all blocks were fully transmitted in the correct order and without error,
+and also that the encoding/decoding process itself generated no distortion.
+Its usage is recommended.
+
+__Frame Concatenation__
+
+In some circumstances, it may be preferable to append multiple frames,
+for example in order to add new data to an existing compressed file
+without re-framing it.
+
+In such case, each frame has its own set of descriptor flags.
+Each frame is considered independent.
+The only relation between frames is their sequential order.
+
+The ability to decode multiple concatenated frames
+within a single stream or file
+is left outside of this specification.
+As an example, the reference lz4 command line utility behavior is
+to decode all concatenated frames in their sequential order.
+
+
+Frame Descriptor
+----------------
+
+| FLG     | BD      | (Content Size) | HC      |
+| ------- | ------- |:--------------:| ------- |
+| 1 byte  | 1 byte  |  0 - 8 bytes   | 1 byte  |
+
+The descriptor uses a minimum of 3 bytes,
+and up to 11 bytes depending on optional parameters.
+
+__FLG byte__
+
+|  BitNb  |   7-6   |    5    |     4     |   3     |     2     |    1-0   |
+| ------- | ------- | ------- | --------- | ------- | --------- | -------- |
+|FieldName| Version | B.Indep | B.Checksum| C.Size  | C.Checksum|*Reserved*|
+
+
+__BD byte__
+
+|  BitNb  |     7    |     6-5-4    |  3-2-1-0 |
+| ------- | -------- | ------------ | -------- |
+|FieldName|*Reserved*| Block MaxSize|*Reserved*|
+
+In the tables, bit 7 is highest bit, while bit 0 is lowest.
+
+__Version Number__
+
+2-bits field, must be set to “01”.
+Any other value cannot be decoded by this version of the specification.
+Other version numbers will use different flag layouts.
+
+__Block Independence flag__
+
+If this flag is set to “1”, blocks are independent.
+If this flag is set to “0”, each block depends on previous ones
+(up to LZ4 window size, which is 64 KB).
+In such case, it’s necessary to decode all blocks in sequence.
+
+Block dependency improves compression ratio, especially for small blocks.
+On the other hand, it makes direct jumps or multi-threaded decoding impossible.
+
+__Block checksum flag__
+
+If this flag is set, each data block will be followed by a 4-bytes checksum,
+calculated by using the xxHash-32 algorithm on the raw (compressed) data block.
+The intention is to detect data corruption (storage or transmission errors)
+immediately, before decoding.
+Block checksum usage is optional.
+
+__Content Size flag__
+
+If this flag is set, the uncompressed size of data included within the frame
+will be present as an 8 bytes unsigned little endian value, after the flags.
+Content Size usage is optional.
+
+__Content checksum flag__
+
+If this flag is set, a content checksum will be appended after the EndMark.
+
+Recommended value : “1” (content checksum is present)
+
+__Block Maximum Size__
+
+This information is intended to help the decoder allocate memory.
+Size here refers to the original (uncompressed) data size.
+Block Maximum Size is one value among the following table :
+
+|  0  |  1  |  2  |  3  |   4   |   5    |  6   |  7   |
+| --- | --- | --- | --- | ----- | ------ | ---- | ---- |
+| N/A | N/A | N/A | N/A | 64 KB | 256 KB | 1 MB | 4 MB |
+
+The decoder may refuse to allocate block sizes above a (system-specific) size.
+Unused values may be used in a future revision of the spec.
+A decoder conformant to the current version of the spec
+is only able to decode blocksizes defined in this spec.
+
+__Reserved bits__
+
+Value of reserved bits **must** be 0 (zero).
+Reserved bit might be used in a future version of the specification,
+typically enabling new optional features.
+If this happens, a decoder respecting the current version of the specification
+shall not be able to decode such a frame.
+
+__Content Size__
+
+This is the original (uncompressed) size.
+This information is optional, and only present if the associated flag is set.
+Content size is provided using unsigned 8 Bytes, for a maximum of 16 HexaBytes.
+Format is Little endian.
+This value is informational, typically for display or memory allocation.
+It can be skipped by a decoder, or used to validate content correctness.
+
+__Header Checksum__
+
+One-byte checksum of combined descriptor fields, including optional ones.
+The value is the second byte of xxh32() : ` (xxh32()>>8) & 0xFF `
+using zero as a seed,
+and the full Frame Descriptor as an input
+(including optional fields when they are present).
+A wrong checksum indicates an error in the descriptor.
+Header checksum is informational and can be skipped.
+
+
+Data Blocks
+-----------
+
+| Block Size |  data  | (Block Checksum) |
+|:----------:| ------ |:----------------:|
+|  4 bytes   |        |   0 - 4 bytes    |
+
+
+__Block Size__
+
+This field uses 4-bytes, format is little-endian.
+
+The highest bit is “1” if data in the block is uncompressed.
+
+The highest bit is “0” if data in the block is compressed by LZ4.
+
+All other bits give the size, in bytes, of the following data block
+(the size does not include the block checksum if present).
+
+Block Size shall never be larger than Block Maximum Size.
+Such a thing could happen for incompressible source data.
+In such case, such a data block shall be passed in uncompressed format.
+
+__Data__
+
+Where the actual data to decode stands.
+It might be compressed or not, depending on previous field indications.
+Uncompressed size of Data can be any size, up to “block maximum size”.
+Note that data block is not necessarily full :
+an arbitrary “flush” may happen anytime. Any block can be “partially filled”.
+
+__Block checksum__
+
+Only present if the associated flag is set.
+This is a 4-bytes checksum value, in little endian format,
+calculated by using the xxHash-32 algorithm on the raw (undecoded) data block,
+and a seed of zero.
+The intention is to detect data corruption (storage or transmission errors)
+before decoding.
+
+Block checksum is cumulative with Content checksum.
+
+
+Skippable Frames
+----------------
+
+| Magic Number | Frame Size | User Data |
+|:------------:|:----------:| --------- |
+|   4 bytes    |  4 bytes   |           |
+
+Skippable frames allow the integration of user-defined data
+into a flow of concatenated frames.
+Its design is pretty straightforward,
+with the sole objective to allow the decoder to quickly skip
+over user-defined data and continue decoding.
+
+For the purpose of facilitating identification,
+it is discouraged to start a flow of concatenated frames with a skippable frame.
+If there is a need to start such a flow with some user data
+encapsulated into a skippable frame,
+it’s recommended to start with a zero-byte LZ4 frame
+followed by a skippable frame.
+This will make it easier for file type identifiers.
+
+
+__Magic Number__
+
+4 Bytes, Little endian format.
+Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
+All 16 values are valid to identify a skippable frame.
+
+__Frame Size__
+
+This is the size, in bytes, of the following User Data
+(without including the magic number nor the size field itself).
+4 Bytes, Little endian format, unsigned 32-bits.
+This means User Data can’t be bigger than (2^32-1) Bytes.
+
+__User Data__
+
+User Data can be anything. Data will just be skipped by the decoder.
+
+
+Legacy frame
+------------
+
+The Legacy frame format was defined into the initial versions of “LZ4Demo”.
+Newer compressors should not use this format anymore, as it is too restrictive.
+
+Main characteristics of the legacy format :
+
+- Fixed block size : 8 MB.
+- All blocks must be completely filled, except the last one.
+- All blocks are always compressed, even when compression is detrimental.
+- The last block is detected either because
+  it is followed by the “EOF” (End of File) mark,
+  or because it is followed by a known Frame Magic Number.
+- No checksum
+- Convention is Little endian
+
+| MagicNb | B.CSize | CData | B.CSize | CData |  (...)  | EndMark |
+| ------- | ------- | ----- | ------- | ----- | ------- | ------- |
+| 4 bytes | 4 bytes | CSize | 4 bytes | CSize | x times |   EOF   |
+
+
+__Magic Number__
+
+4 Bytes, Little endian format.
+Value : 0x184C2102
+
+__Block Compressed Size__
+
+This is the size, in bytes, of the following compressed data block.
+4 Bytes, Little endian format.
+
+__Data__
+
+Where the actual compressed data stands.
+Data is always compressed, even when compression is detrimental.
+
+__EndMark__
+
+End of legacy frame is implicit only.
+It must be followed by a standard EOF (End Of File) signal,
+wether it is a file or a stream.
+
+Alternatively, if the frame is followed by a valid Frame Magic Number,
+it is considered completed.
+It makes legacy frames compatible with frame concatenation.
+
+Any other value will be interpreted as a block size,
+and trigger an error if it does not fit within acceptable range.
+
+
+Version changes
+---------------
+
+1.5.1 : changed format to MarkDown compatible
+
+1.5 : removed Dictionary ID from specification
+
+1.4.1 : changed wording from “stream” to “frame”
+
+1.4 : added skippable streams, re-added stream checksum
+
+1.3 : modified header checksum
+
+1.2 : reduced choice of “block size”, to postpone decision on “dynamic size of BlockSize Field”.
+
+1.1 : optional fields are now part of the descriptor
+
+1.0 : changed “block size” specification, adding a compressed/uncompressed flag
+
+0.9 : reduced scale of “block maximum size” table
+
+0.8 : removed : high compression flag
+
+0.7 : removed : stream checksum
+
+0.6 : settled : stream size uses 8 bytes, endian convention is little endian
+
+0.5: added copyright notice
+
+0.4 : changed format to Google Doc compatible OpenDocument
diff --git a/lib/README.md b/lib/README.md
index f932d42..209cac1 100644
--- a/lib/README.md
+++ b/lib/README.md
@@ -37,4 +37,4 @@ Other files present in the directory are not source code. There are :
  - liblz4.pc.in : for pkg-config (make install)
  - README.md : this file
 
-[official interoperable frame format]: ../lz4_Frame_format.md
+[official interoperable frame format]: ../doc/lz4_Frame_format.md
diff --git a/lib/lz4.h b/lib/lz4.h
index 3ece4dd..0f17c7c 100644
--- a/lib/lz4.h
+++ b/lib/lz4.h
@@ -44,7 +44,7 @@ extern "C" {
  * Block compression functions are not-enough to send information,
  * since it's still necessary to provide metadata (such as compressed size),
  * and each application can do it in whichever way it wants.
- * For interoperability, there is LZ4 frame specification (lz4_Frame_format.md).
+ * For interoperability, there is LZ4 frame specification (doc/lz4_Frame_format.md).
  * A library is provided to take care of it, see lz4frame.h.
 */
 
diff --git a/lz4_Block_format.md b/lz4_Block_format.md
deleted file mode 100644
index 0f6a5ba..0000000
--- a/lz4_Block_format.md
+++ /dev/null
@@ -1,127 +0,0 @@
-LZ4 Block Format Description
-============================
-Last revised: 2015-05-07.
-Author : Yann Collet
-
-
-This specification is intended for developers
-willing to produce LZ4-compatible compressed data blocks
-using any programming language.
-
-LZ4 is an LZ77-type compressor with a fixed, byte-oriented encoding.
-There is no entropy encoder back-end nor framing layer.
-The latter is assumed to be handled by other parts of the system (see [LZ4 Frame format]).
-This design is assumed to favor simplicity and speed.
-It helps later on for optimizations, compactness, and features.
-
-This document describes only the block format,
-not how the compressor nor decompressor actually work.
-The correctness of the decompressor should not depend
-on implementation details of the compressor, and vice versa.
-
-[LZ4 Frame format]: lz4_Frame_format.md
-
-
-
-Compressed block format
------------------------
-An LZ4 compressed block is composed of sequences.
-A sequence is a suite of literals (not-compressed bytes),
-followed by a match copy.
-
-Each sequence starts with a token.
-The token is a one byte value, separated into two 4-bits fields.
-Therefore each field ranges from 0 to 15.
-
-
-The first field uses the 4 high-bits of the token.
-It provides the length of literals to follow.
-
-If the field value is 0, then there is no literal.
-If it is 15, then we need to add some more bytes to indicate the full length.
-Each additional byte then represent a value from 0 to 255,
-which is added to the previous value to produce a total length.
-When the byte value is 255, another byte is output.
-There can be any number of bytes following the token. There is no "size limit".
-(Side note : this is why a not-compressible input block is expanded by 0.4%).
-
-Example 1 : A length of 48 will be represented as :
-
-  - 15 : value for the 4-bits High field
-  - 33 : (=48-15) remaining length to reach 48
-
-Example 2 : A length of 280 will be represented as :
-
-  - 15  : value for the 4-bits High field
-  - 255 : following byte is maxed, since 280-15 >= 255
-  - 10  : (=280 - 15 - 255) ) remaining length to reach 280
-
-Example 3 : A length of 15 will be represented as :
-
-  - 15 : value for the 4-bits High field
-  - 0  : (=15-15) yes, the zero must be output
-
-Following the token and optional length bytes, are the literals themselves.
-They are exactly as numerous as previously decoded (length of literals).
-It's possible that there are zero literal.
-
-
-Following the literals is the match copy operation.
-
-It starts by the offset.
-This is a 2 bytes value, in little endian format
-(the 1st byte is the "low" byte, the 2nd one is the "high" byte).
-
-The offset represents the position of the match to be copied from.
-1 means "current position - 1 byte".
-The maximum offset value is 65535, 65536 cannot be coded.
-Note that 0 is an invalid value, not used.
-
-Then we need to extract the match length.
-For this, we use the second token field, the low 4-bits.
-Value, obviously, ranges from 0 to 15.
-However here, 0 means that the copy operation will be minimal.
-The minimum length of a match, called minmatch, is 4.
-As a consequence, a 0 value means 4 bytes, and a value of 15 means 19+ bytes.
-Similar to literal length, on reaching the highest possible value (15),
-we output additional bytes, one at a time, with values ranging from 0 to 255.
-They are added to total to provide the final match length.
-A 255 value means there is another byte to read and add.
-There is no limit to the number of optional bytes that can be output this way.
-(This points towards a maximum achievable compression ratio of about 250).
-
-With the offset and the matchlength,
-the decoder can now proceed to copy the data from the already decoded buffer.
-On decoding the matchlength, we reach the end of the compressed sequence,
-and therefore start another one.
-
-
-Parsing restrictions
------------------------
-There are specific parsing rules to respect in order to remain compatible
-with assumptions made by the decoder :
-
-1. The last 5 bytes are always literals
-2. The last match must start at least 12 bytes before end of block.
-   Consequently, a block with less than 13 bytes cannot be compressed.
-
-These rules are in place to ensure that the decoder
-will never read beyond the input buffer, nor write beyond the output buffer.
-
-Note that the last sequence is also incomplete,
-and stops right after literals.
-
-
-Additional notes
------------------------
-There is no assumption nor limits to the way the compressor
-searches and selects matches within the source data block.
-It could be a fast scan, a multi-probe, a full search using BST,
-standard hash chains or MMC, well whatever.
-
-Advanced parsing strategies can also be implemented, such as lazy match,
-or full optimal parsing.
-
-All these trade-off offer distinctive speed/memory/compression advantages.
-Whatever the method used by the compressor, its result will be decodable
-by any LZ4 decoder if it follows the format specification described above.
diff --git a/lz4_Frame_format.md b/lz4_Frame_format.md
deleted file mode 100644
index 2ea1a86..0000000
--- a/lz4_Frame_format.md
+++ /dev/null
@@ -1,385 +0,0 @@
-LZ4 Frame Format Description
-============================
-
-###Notices
-
-Copyright (c) 2013-2015 Yann Collet
-
-Permission is granted to copy and distribute this document
-for any  purpose and without charge,
-including translations into other  languages
-and incorporation into compilations,
-provided that the copyright notice and this notice are preserved,
-and that any substantive changes or deletions from the original
-are clearly marked.
-Distribution of this document is unlimited.
-
-###Version
-
-1.5.1 (31/03/2015)
-
-
-Introduction
-------------
-
-The purpose of this document is to define a lossless compressed data format,
-that is independent of CPU type, operating system,
-file system and character set, suitable for
-File compression, Pipe and streaming compression
-using the [LZ4 algorithm](http://www.lz4.org).
-
-The data can be produced or consumed,
-even for an arbitrarily long sequentially presented input data stream,
-using only an a priori bounded amount of intermediate storage,
-and hence can be used in data communications.
-The format uses the LZ4 compression method,
-and optional [xxHash-32 checksum method](https://github.com/Cyan4973/xxHash),
-for detection of data corruption.
-
-The data format defined by this specification
-does not attempt to allow random access to compressed data.
-
-This specification is intended for use by implementers of software
-to compress data into LZ4 format and/or decompress data from LZ4 format.
-The text of the specification assumes a basic background in programming
-at the level of bits and other primitive data representations.
-
-Unless otherwise indicated below,
-a compliant compressor must produce data sets
-that conform to the specifications presented here.
-It doesn’t need to support all options though.
-
-A compliant decompressor must be able to decompress
-at least one working set of parameters
-that conforms to the specifications presented here.
-It may also ignore checksums.
-Whenever it does not support a specific parameter within the compressed stream,
-it must produce a non-ambiguous error code
-and associated error message explaining which parameter is unsupported.
-
-
-General Structure of LZ4 Frame format
--------------------------------------
-
-| MagicNb | F. Descriptor | Block | (...) | EndMark | C. Checksum |
-|:-------:|:-------------:| ----- | ----- | ------- | ----------- |
-| 4 bytes |  3-11 bytes   |       |       | 4 bytes | 0-4 bytes   |
-
-__Magic Number__
-
-4 Bytes, Little endian format.
-Value : 0x184D2204
-
-__Frame Descriptor__
-
-3 to 11 Bytes, to be detailed in the next part.
-Most important part of the spec.
-
-__Data Blocks__
-
-To be detailed later on.
-That’s where compressed data is stored.
-
-__EndMark__
-
-The flow of blocks ends when the last data block has a size of “0”.
-The size is expressed as a 32-bits value.
-
-__Content Checksum__
-
-Content Checksum verify that the full content has been decoded correctly.
-The content checksum is the result
-of [xxh32() hash function](https://github.com/Cyan4973/xxHash)
-digesting the original (decoded) data as input, and a seed of zero.
-Content checksum is only present when its associated flag
-is set in the frame descriptor.
-Content Checksum validates the result,
-that all blocks were fully transmitted in the correct order and without error,
-and also that the encoding/decoding process itself generated no distortion.
-Its usage is recommended.
-
-__Frame Concatenation__
-
-In some circumstances, it may be preferable to append multiple frames,
-for example in order to add new data to an existing compressed file
-without re-framing it.
-
-In such case, each frame has its own set of descriptor flags.
-Each frame is considered independent.
-The only relation between frames is their sequential order.
-
-The ability to decode multiple concatenated frames
-within a single stream or file
-is left outside of this specification.
-As an example, the reference lz4 command line utility behavior is
-to decode all concatenated frames in their sequential order.
-
-
-Frame Descriptor
-----------------
-
-| FLG     | BD      | (Content Size) | HC      |
-| ------- | ------- |:--------------:| ------- |
-| 1 byte  | 1 byte  |  0 - 8 bytes   | 1 byte  |
-
-The descriptor uses a minimum of 3 bytes,
-and up to 11 bytes depending on optional parameters.
-
-__FLG byte__
-
-|  BitNb  |   7-6   |    5    |     4     |   3     |     2     |    1-0   |
-| ------- | ------- | ------- | --------- | ------- | --------- | -------- |
-|FieldName| Version | B.Indep | B.Checksum| C.Size  | C.Checksum|*Reserved*|
-
-
-__BD byte__
-
-|  BitNb  |     7    |     6-5-4    |  3-2-1-0 |
-| ------- | -------- | ------------ | -------- |
-|FieldName|*Reserved*| Block MaxSize|*Reserved*|
-
-In the tables, bit 7 is highest bit, while bit 0 is lowest.
-
-__Version Number__
-
-2-bits field, must be set to “01”.
-Any other value cannot be decoded by this version of the specification.
-Other version numbers will use different flag layouts.
-
-__Block Independence flag__
-
-If this flag is set to “1”, blocks are independent.
-If this flag is set to “0”, each block depends on previous ones
-(up to LZ4 window size, which is 64 KB).
-In such case, it’s necessary to decode all blocks in sequence.
-
-Block dependency improves compression ratio, especially for small blocks.
-On the other hand, it makes direct jumps or multi-threaded decoding impossible.
-
-__Block checksum flag__
-
-If this flag is set, each data block will be followed by a 4-bytes checksum,
-calculated by using the xxHash-32 algorithm on the raw (compressed) data block.
-The intention is to detect data corruption (storage or transmission errors)
-immediately, before decoding.
-Block checksum usage is optional.
-
-__Content Size flag__
-
-If this flag is set, the uncompressed size of data included within the frame
-will be present as an 8 bytes unsigned little endian value, after the flags.
-Content Size usage is optional.
-
-__Content checksum flag__
-
-If this flag is set, a content checksum will be appended after the EndMark.
-
-Recommended value : “1” (content checksum is present)
-
-__Block Maximum Size__
-
-This information is intended to help the decoder allocate memory.
-Size here refers to the original (uncompressed) data size.
-Block Maximum Size is one value among the following table :
-
-|  0  |  1  |  2  |  3  |   4   |   5    |  6   |  7   |
-| --- | --- | --- | --- | ----- | ------ | ---- | ---- |
-| N/A | N/A | N/A | N/A | 64 KB | 256 KB | 1 MB | 4 MB |
-
-The decoder may refuse to allocate block sizes above a (system-specific) size.
-Unused values may be used in a future revision of the spec.
-A decoder conformant to the current version of the spec
-is only able to decode blocksizes defined in this spec.
-
-__Reserved bits__
-
-Value of reserved bits **must** be 0 (zero).
-Reserved bit might be used in a future version of the specification,
-typically enabling new optional features.
-If this happens, a decoder respecting the current version of the specification
-shall not be able to decode such a frame.
-
-__Content Size__
-
-This is the original (uncompressed) size.
-This information is optional, and only present if the associated flag is set.
-Content size is provided using unsigned 8 Bytes, for a maximum of 16 HexaBytes.
-Format is Little endian.
-This value is informational, typically for display or memory allocation.
-It can be skipped by a decoder, or used to validate content correctness.
-
-__Header Checksum__
-
-One-byte checksum of combined descriptor fields, including optional ones.
-The value is the second byte of xxh32() : ` (xxh32()>>8) & 0xFF `
-using zero as a seed,
-and the full Frame Descriptor as an input
-(including optional fields when they are present).
-A wrong checksum indicates an error in the descriptor.
-Header checksum is informational and can be skipped.
-
-
-Data Blocks
------------
-
-| Block Size |  data  | (Block Checksum) |
-|:----------:| ------ |:----------------:|
-|  4 bytes   |        |   0 - 4 bytes    |
-
-
-__Block Size__
-
-This field uses 4-bytes, format is little-endian.
-
-The highest bit is “1” if data in the block is uncompressed.
-
-The highest bit is “0” if data in the block is compressed by LZ4.
-
-All other bits give the size, in bytes, of the following data block
-(the size does not include the block checksum if present).
-
-Block Size shall never be larger than Block Maximum Size.
-Such a thing could happen for incompressible source data.
-In such case, such a data block shall be passed in uncompressed format.
-
-__Data__
-
-Where the actual data to decode stands.
-It might be compressed or not, depending on previous field indications.
-Uncompressed size of Data can be any size, up to “block maximum size”.
-Note that data block is not necessarily full :
-an arbitrary “flush” may happen anytime. Any block can be “partially filled”.
-
-__Block checksum__
-
-Only present if the associated flag is set.
-This is a 4-bytes checksum value, in little endian format,
-calculated by using the xxHash-32 algorithm on the raw (undecoded) data block,
-and a seed of zero.
-The intention is to detect data corruption (storage or transmission errors)
-before decoding.
-
-Block checksum is cumulative with Content checksum.
-
-
-Skippable Frames
-----------------
-
-| Magic Number | Frame Size | User Data |
-|:------------:|:----------:| --------- |
-|   4 bytes    |  4 bytes   |           |
-
-Skippable frames allow the integration of user-defined data
-into a flow of concatenated frames.
-Its design is pretty straightforward,
-with the sole objective to allow the decoder to quickly skip
-over user-defined data and continue decoding.
-
-For the purpose of facilitating identification,
-it is discouraged to start a flow of concatenated frames with a skippable frame.
-If there is a need to start such a flow with some user data
-encapsulated into a skippable frame,
-it’s recommended to start with a zero-byte LZ4 frame
-followed by a skippable frame.
-This will make it easier for file type identifiers.
-
-
-__Magic Number__
-
-4 Bytes, Little endian format.
-Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
-All 16 values are valid to identify a skippable frame.
-
-__Frame Size__
-
-This is the size, in bytes, of the following User Data
-(without including the magic number nor the size field itself).
-4 Bytes, Little endian format, unsigned 32-bits.
-This means User Data can’t be bigger than (2^32-1) Bytes.
-
-__User Data__
-
-User Data can be anything. Data will just be skipped by the decoder.
-
-
-Legacy frame
-------------
-
-The Legacy frame format was defined into the initial versions of “LZ4Demo”.
-Newer compressors should not use this format anymore, as it is too restrictive.
-
-Main characteristics of the legacy format :
-
-- Fixed block size : 8 MB.
-- All blocks must be completely filled, except the last one.
-- All blocks are always compressed, even when compression is detrimental.
-- The last block is detected either because
-  it is followed by the “EOF” (End of File) mark,
-  or because it is followed by a known Frame Magic Number.
-- No checksum
-- Convention is Little endian
-
-| MagicNb | B.CSize | CData | B.CSize | CData |  (...)  | EndMark |
-| ------- | ------- | ----- | ------- | ----- | ------- | ------- |
-| 4 bytes | 4 bytes | CSize | 4 bytes | CSize | x times |   EOF   |
-
-
-__Magic Number__
-
-4 Bytes, Little endian format.
-Value : 0x184C2102
-
-__Block Compressed Size__
-
-This is the size, in bytes, of the following compressed data block.
-4 Bytes, Little endian format.
-
-__Data__
-
-Where the actual compressed data stands.
-Data is always compressed, even when compression is detrimental.
-
-__EndMark__
-
-End of legacy frame is implicit only.
-It must be followed by a standard EOF (End Of File) signal,
-wether it is a file or a stream.
-
-Alternatively, if the frame is followed by a valid Frame Magic Number,
-it is considered completed.
-It makes legacy frames compatible with frame concatenation.
-
-Any other value will be interpreted as a block size,
-and trigger an error if it does not fit within acceptable range.
-
-
-Version changes
----------------
-
-1.5.1 : changed format to MarkDown compatible
-
-1.5 : removed Dictionary ID from specification
-
-1.4.1 : changed wording from “stream” to “frame”
-
-1.4 : added skippable streams, re-added stream checksum
-
-1.3 : modified header checksum
-
-1.2 : reduced choice of “block size”, to postpone decision on “dynamic size of BlockSize Field”.
-
-1.1 : optional fields are now part of the descriptor
-
-1.0 : changed “block size” specification, adding a compressed/uncompressed flag
-
-0.9 : reduced scale of “block maximum size” table
-
-0.8 : removed : high compression flag
-
-0.7 : removed : stream checksum
-
-0.6 : settled : stream size uses 8 bytes, endian convention is little endian
-
-0.5: added copyright notice
-
-0.4 : changed format to Google Doc compatible OpenDocument
-- 
cgit v0.12