summaryrefslogtreecommitdiffstats
path: root/doc
diff options
context:
space:
mode:
authorYann Collet <Cyan4973@users.noreply.github.com>2018-05-07 21:38:45 (GMT)
committerGitHub <noreply@github.com>2018-05-07 21:38:45 (GMT)
commitb3692db46d2b23a7c0af2d5e69988c94f126e10a (patch)
treed9aa14d72978e78a1ca2f1aee99511ce2d04293c /doc
parentdfed9fa1d77f0434306d377c4da1f7191d3ba08a (diff)
parentbf6fd938e522150e2a30bee978102769a51f4b3e (diff)
downloadlz4-1.8.2.zip
lz4-1.8.2.tar.gz
lz4-1.8.2.tar.bz2
Merge pull request #531 from lz4/devv1.8.2
Preparing v1.8.2
Diffstat (limited to 'doc')
-rw-r--r--doc/images/usingCDict_1_8_2.pngbin0 -> 81858 bytes
-rw-r--r--doc/lz4_Block_format.md63
-rw-r--r--doc/lz4_Frame_format.md16
-rw-r--r--doc/lz4_manual.html254
-rw-r--r--doc/lz4frame_manual.html156
5 files changed, 336 insertions, 153 deletions
diff --git a/doc/images/usingCDict_1_8_2.png b/doc/images/usingCDict_1_8_2.png
new file mode 100644
index 0000000..9434198
--- /dev/null
+++ b/doc/images/usingCDict_1_8_2.png
Binary files differ
diff --git a/doc/lz4_Block_format.md b/doc/lz4_Block_format.md
index 4e39b41..5438730 100644
--- a/doc/lz4_Block_format.md
+++ b/doc/lz4_Block_format.md
@@ -1,6 +1,6 @@
LZ4 Block Format Description
============================
-Last revised: 2015-05-07.
+Last revised: 2018-04-25.
Author : Yann Collet
@@ -29,8 +29,8 @@ An LZ4 compressed block is composed of sequences.
A sequence is a suite of literals (not-compressed bytes),
followed by a match copy.
-Each sequence starts with a token.
-The token is a one byte value, separated into two 4-bits fields.
+Each sequence starts with a `token`.
+The `token` is a one byte value, separated into two 4-bits fields.
Therefore each field ranges from 0 to 15.
@@ -42,46 +42,46 @@ If it is 15, then we need to add some more bytes to indicate the full length.
Each additional byte then represent a value from 0 to 255,
which is added to the previous value to produce a total length.
When the byte value is 255, another byte is output.
-There can be any number of bytes following the token. There is no "size limit".
+There can be any number of bytes following `token`. There is no "size limit".
(Side note : this is why a not-compressible input block is expanded by 0.4%).
-Example 1 : A length of 48 will be represented as :
+Example 1 : A literal length of 48 will be represented as :
- 15 : value for the 4-bits High field
- 33 : (=48-15) remaining length to reach 48
-Example 2 : A length of 280 will be represented as :
+Example 2 : A literal length of 280 will be represented as :
- 15 : value for the 4-bits High field
- 255 : following byte is maxed, since 280-15 >= 255
- 10 : (=280 - 15 - 255) ) remaining length to reach 280
-Example 3 : A length of 15 will be represented as :
+Example 3 : A literal length of 15 will be represented as :
- 15 : value for the 4-bits High field
- 0 : (=15-15) yes, the zero must be output
-Following the token and optional length bytes, are the literals themselves.
+Following `token` and optional length bytes, are the literals themselves.
They are exactly as numerous as previously decoded (length of literals).
It's possible that there are zero literal.
Following the literals is the match copy operation.
-It starts by the offset.
+It starts by the `offset`.
This is a 2 bytes value, in little endian format
(the 1st byte is the "low" byte, the 2nd one is the "high" byte).
-The offset represents the position of the match to be copied from.
+The `offset` represents the position of the match to be copied from.
1 means "current position - 1 byte".
-The maximum offset value is 65535, 65536 cannot be coded.
+The maximum `offset` value is 65535, 65536 cannot be coded.
Note that 0 is an invalid value, not used.
-Then we need to extract the match length.
+Then we need to extract the `matchlength`.
For this, we use the second token field, the low 4-bits.
Value, obviously, ranges from 0 to 15.
However here, 0 means that the copy operation will be minimal.
-The minimum length of a match, called minmatch, is 4.
+The minimum length of a match, called `minmatch`, is 4.
As a consequence, a 0 value means 4 bytes, and a value of 15 means 19+ bytes.
Similar to literal length, on reaching the highest possible value (15),
we output additional bytes, one at a time, with values ranging from 0 to 255.
@@ -90,18 +90,18 @@ A 255 value means there is another byte to read and add.
There is no limit to the number of optional bytes that can be output this way.
(This points towards a maximum achievable compression ratio of about 250).
-Decoding the matchlength reaches the end of current sequence.
+Decoding the `matchlength` reaches the end of current sequence.
Next byte will be the start of another sequence.
But before moving to next sequence,
it's time to use the decoded match position and length.
-The decoder copies matchlength bytes from match position to current position.
+The decoder copies `matchlength` bytes from match position to current position.
-In some cases, matchlength is larger than offset.
-Therefore, match pos + match length > current pos,
+In some cases, `matchlength` is larger than `offset`.
+Therefore, `match_pos + matchlength > current_pos`,
which means that later bytes to copy are not yet decoded.
This is called an "overlap match", and must be handled with special care.
-The most common case is an offset of 1,
-meaning the last byte is repeated matchlength times.
+A common case is an offset of 1,
+meaning the last byte is repeated `matchlength` times.
Parsing restrictions
@@ -109,15 +109,28 @@ Parsing restrictions
There are specific parsing rules to respect in order to remain compatible
with assumptions made by the decoder :
-1. The last 5 bytes are always literals
+1. The last 5 bytes are always literals. In other words, the last five bytes
+ from the uncompressed input (or all bytes, if the input has less than five
+ bytes) must be encoded as literals on behalf of the last sequence.
+ The last sequence is incomplete, and stops right after the literals.
2. The last match must start at least 12 bytes before end of block.
- Consequently, a block with less than 13 bytes cannot be compressed.
+ The last match is part of the penultimate sequence,
+ since the last sequence stops right after literals.
+ Note that, as a consequence, blocks < 13 bytes cannot be compressed.
These rules are in place to ensure that the decoder
-will never read beyond the input buffer, nor write beyond the output buffer.
-
-Note that the last sequence is also incomplete,
-and stops right after literals.
+can speculatively execute copy instructions
+without ever reading nor writing beyond provided I/O buffers.
+
+1. To copy literals from a non-last sequence, an 8-byte copy instruction
+ can always be safely issued (without reading past the input),
+ because literals are followed by a 2-byte offset,
+ and last sequence is at least 1+5 bytes long.
+2. Similarly, a match operation can speculatively copy up to 12 bytes
+ while remaining within output buffer boundaries.
+
+Empty inputs can be represented with a zero byte,
+interpreted as a token without literals and without a match.
Additional notes
diff --git a/doc/lz4_Frame_format.md b/doc/lz4_Frame_format.md
index 77454b2..0c98df1 100644
--- a/doc/lz4_Frame_format.md
+++ b/doc/lz4_Frame_format.md
@@ -16,7 +16,7 @@ Distribution of this document is unlimited.
### Version
-1.6.0 (08/08/2017)
+1.6.1 (30/01/2018)
Introduction
@@ -72,12 +72,15 @@ Value : 0x184D2204
__Frame Descriptor__
-3 to 15 Bytes, to be detailed in the next part.
-Most important part of the spec.
+3 to 15 Bytes, to be detailed in its own paragraph,
+as it is the most important part of the spec.
+
+The combined __Magic Number__ and __Frame Descriptor__ fields are sometimes
+called ___LZ4 Frame Header___. Its size varies between 7 and 19 bytes.
__Data Blocks__
-To be detailed later on.
+To be detailed in its own paragraph.
That’s where compressed data is stored.
__EndMark__
@@ -98,6 +101,9 @@ that all blocks were fully transmitted in the correct order and without error,
and also that the encoding/decoding process itself generated no distortion.
Its usage is recommended.
+The combined __EndMark__ and __Content Checksum__ fields might sometimes be
+referred to as ___LZ4 Frame Footer___. Its size varies between 4 and 8 bytes.
+
__Frame Concatenation__
In some circumstances, it may be preferable to append multiple frames,
@@ -380,6 +386,8 @@ and trigger an error if it does not fit within acceptable range.
Version changes
---------------
+1.6.1 : introduced terms "LZ4 Frame Header" and "LZ4 Frame Footer"
+
1.6.0 : restored Dictionary ID field in Frame header
1.5.1 : changed document format to MarkDown
diff --git a/doc/lz4_manual.html b/doc/lz4_manual.html
index 6b7935d..e5044fe 100644
--- a/doc/lz4_manual.html
+++ b/doc/lz4_manual.html
@@ -1,10 +1,10 @@
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
-<title>1.8.1 Manual</title>
+<title>1.8.2 Manual</title>
</head>
<body>
-<h1>1.8.1 Manual</h1>
+<h1>1.8.2 Manual</h1>
<hr>
<a name="Contents"></a><h2>Contents</h2>
<ol>
@@ -15,8 +15,9 @@
<li><a href="#Chapter5">Advanced Functions</a></li>
<li><a href="#Chapter6">Streaming Compression Functions</a></li>
<li><a href="#Chapter7">Streaming Decompression Functions</a></li>
-<li><a href="#Chapter8">Private definitions</a></li>
-<li><a href="#Chapter9">Obsolete Functions</a></li>
+<li><a href="#Chapter8">Unstable declarations</a></li>
+<li><a href="#Chapter9">Private definitions</a></li>
+<li><a href="#Chapter10">Obsolete Functions</a></li>
</ol>
<hr>
<a name="Chapter1"></a><h2>Introduction</h2><pre>
@@ -42,9 +43,9 @@
<a name="Chapter2"></a><h2>Version</h2><pre></pre>
-<pre><b>int LZ4_versionNumber (void); </b>/**< library version number; to be used when checking dll version */<b>
+<pre><b>int LZ4_versionNumber (void); </b>/**< library version number; useful to check dll version */<b>
</b></pre><BR>
-<pre><b>const char* LZ4_versionString (void); </b>/**< library version string; to be used when checking dll version */<b>
+<pre><b>const char* LZ4_versionString (void); </b>/**< library version string; unseful to check dll version */<b>
</b></pre><BR>
<a name="Chapter3"></a><h2>Tuning parameter</h2><pre></pre>
@@ -53,7 +54,7 @@
#endif
</b><p> Memory usage formula : N->2^N Bytes (examples : 10 -> 1KB; 12 -> 4KB ; 16 -> 64KB; 20 -> 1MB; etc.)
Increasing memory usage improves compression ratio
- Reduced memory usage can improve speed, due to cache effect
+ Reduced memory usage may improve speed, thanks to cache effect
Default value is 14, for 16KB, which nicely fits into Intel x86 L1 cache
</p></pre><BR>
@@ -65,12 +66,12 @@
into already allocated 'dst' buffer of size 'dstCapacity'.
Compression is guaranteed to succeed if 'dstCapacity' >= LZ4_compressBound(srcSize).
It also runs faster, so it's a recommended setting.
- If the function cannot compress 'src' into a limited 'dst' budget,
+ If the function cannot compress 'src' into a more limited 'dst' budget,
compression stops *immediately*, and the function result is zero.
- As a consequence, 'dst' content is not valid.
- This function never writes outside 'dst' buffer, nor read outside 'source' buffer.
- srcSize : supported max value is LZ4_MAX_INPUT_VALUE
- dstCapacity : full or partial size of buffer 'dst' (which must be already allocated)
+ Note : as a consequence, 'dst' content is not valid.
+ Note 2 : This function is protected against buffer overflow scenarios (never writes outside 'dst' buffer, nor read outside 'source' buffer).
+ srcSize : max supported value is LZ4_MAX_INPUT_SIZE.
+ dstCapacity : size of buffer 'dst' (which must be already allocated)
return : the number of bytes written into buffer 'dst' (necessarily <= dstCapacity)
or 0 if compression fails
</p></pre><BR>
@@ -81,8 +82,7 @@
return : the number of bytes decompressed into destination buffer (necessarily <= dstCapacity)
If destination buffer is not large enough, decoding will stop and output an error code (negative value).
If the source stream is detected malformed, the function will stop decoding and return a negative result.
- This function is protected against buffer overflow exploits, including malicious data packets.
- It never writes outside output buffer, nor reads outside input buffer.
+ This function is protected against malicious data packets.
</p></pre><BR>
<a name="Chapter5"></a><h2>Advanced Functions</h2><pre></pre>
@@ -91,18 +91,18 @@
</b><p> Provides the maximum size that LZ4 compression may output in a "worst case" scenario (input data not compressible)
This function is primarily useful for memory allocation purposes (destination buffer size).
Macro LZ4_COMPRESSBOUND() is also provided for compilation-time evaluation (stack memory allocation for example).
- Note that LZ4_compress_default() compress faster when dest buffer size is >= LZ4_compressBound(srcSize)
+ Note that LZ4_compress_default() compresses faster when dstCapacity is >= LZ4_compressBound(srcSize)
inputSize : max supported value is LZ4_MAX_INPUT_SIZE
return : maximum output size in a "worst case" scenario
- or 0, if input size is too large ( > LZ4_MAX_INPUT_SIZE)
+ or 0, if input size is incorrect (too large or negative)
</p></pre><BR>
<pre><b>int LZ4_compress_fast (const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
-</b><p> Same as LZ4_compress_default(), but allows to select an "acceleration" factor.
+</b><p> Same as LZ4_compress_default(), but allows selection of "acceleration" factor.
The larger the acceleration value, the faster the algorithm, but also the lesser the compression.
It's a trade-off. It can be fine tuned, with each successive value providing roughly +~3% to speed.
An acceleration value of "1" is the same as regular LZ4_compress_default()
- Values <= 0 will be replaced by ACCELERATION_DEFAULT (see lz4.c), which is 1.
+ Values <= 0 will be replaced by ACCELERATION_DEFAULT (currently == 1, see lz4.c).
</p></pre><BR>
<pre><b>int LZ4_sizeofState(void);
@@ -125,26 +125,30 @@ int LZ4_compress_fast_extState (void* state, const char* src, char* dst, int src
</p></pre><BR>
<pre><b>int LZ4_decompress_fast (const char* src, char* dst, int originalSize);
-</b><p> originalSize : is the original uncompressed size
- return : the number of bytes read from the source buffer (in other words, the compressed size)
- If the source stream is detected malformed, the function will stop decoding and return a negative result.
- Destination buffer must be already allocated. Its size must be >= 'originalSize' bytes.
- note : This function respects memory boundaries for *properly formed* compressed data.
- It is a bit faster than LZ4_decompress_safe().
- However, it does not provide any protection against intentionally modified data stream (malicious input).
- Use this function in trusted environment only (data to decode comes from a trusted source).
+</b><p>This function is a bit faster than LZ4_decompress_safe(),
+but it may misbehave on malformed input because it doesn't perform full validation of compressed data.
+ originalSize : is the uncompressed size to regenerate
+ Destination buffer must be already allocated, and its size must be >= 'originalSize' bytes.
+ return : number of bytes read from source buffer (== compressed size).
+ If the source stream is detected malformed, the function stops decoding and return a negative result.
+ note : This function is only usable if the originalSize of uncompressed data is known in advance.
+ The caller should also check that all the compressed input has been consumed properly,
+ i.e. that the return value matches the size of the buffer with compressed input.
+ The function never writes past the output buffer. However, since it doesn't know its 'src' size,
+ it may read past the intended input. Also, because match offsets are not validated during decoding,
+ reads from 'src' may underflow. Use this function in trusted environment **only**.
</p></pre><BR>
<pre><b>int LZ4_decompress_safe_partial (const char* src, char* dst, int srcSize, int targetOutputSize, int dstCapacity);
</b><p> This function decompress a compressed block of size 'srcSize' at position 'src'
into destination buffer 'dst' of size 'dstCapacity'.
The function will decompress a minimum of 'targetOutputSize' bytes, and stop after that.
- However, it's not accurate, and may write more than 'targetOutputSize' (but <= dstCapacity).
+ However, it's not accurate, and may write more than 'targetOutputSize' (but always <= dstCapacity).
@return : the number of bytes decoded in the destination buffer (necessarily <= dstCapacity)
- Note : this number can be < 'targetOutputSize' should the compressed block contain less data.
- Always control how many bytes were decoded.
- If the source stream is detected malformed, the function will stop decoding and return a negative result.
- This function never writes outside of output buffer, and never reads outside of input buffer. It is therefore protected against malicious data packets.
+ Note : this number can also be < targetOutputSize, if compressed block contains less data.
+ Therefore, always control how many bytes were decoded.
+ If source stream is detected malformed, function returns a negative result.
+ This function is protected against malicious data packets.
</p></pre><BR>
<a name="Chapter6"></a><h2>Streaming Compression Functions</h2><pre></pre>
@@ -171,25 +175,29 @@ int LZ4_freeStream (LZ4_stream_t* streamPtr);
</p></pre><BR>
<pre><b>int LZ4_compress_fast_continue (LZ4_stream_t* streamPtr, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
-</b><p> Compress content into 'src' using data from previously compressed blocks, improving compression ratio.
+</b><p> Compress 'src' content using data from previously compressed blocks, for better compression ratio.
'dst' buffer must be already allocated.
If dstCapacity >= LZ4_compressBound(srcSize), compression is guaranteed to succeed, and runs faster.
- Important : Up to 64KB of previously compressed data is assumed to remain present and unmodified in memory !
- Special 1 : If input buffer is a double-buffer, it can have any size, including < 64 KB.
- Special 2 : If input buffer is a ring-buffer, it can have any size, including < 64 KB.
+ Important : The previous 64KB of compressed data is assumed to remain present and unmodified in memory!
+
+ Special 1 : When input is a double-buffer, they can have any size, including < 64 KB.
+ Make sure that buffers are separated by at least one byte.
+ This way, each block only depends on previous block.
+ Special 2 : If input buffer is a ring-buffer, it can have any size, including < 64 KB.
@return : size of compressed block
- or 0 if there is an error (typically, compressed data cannot fit into 'dst')
+ or 0 if there is an error (typically, cannot fit into 'dst').
After an error, the stream status is invalid, it can only be reset or freed.
</p></pre><BR>
-<pre><b>int LZ4_saveDict (LZ4_stream_t* streamPtr, char* safeBuffer, int dictSize);
-</b><p> If previously compressed data block is not guaranteed to remain available at its current memory location,
+<pre><b>int LZ4_saveDict (LZ4_stream_t* streamPtr, char* safeBuffer, int maxDictSize);
+</b><p> If last 64KB data cannot be guaranteed to remain available at its current memory location,
save it into a safer place (char* safeBuffer).
- Note : it's not necessary to call LZ4_loadDict() after LZ4_saveDict(), dictionary is immediately usable.
- @return : saved dictionary size in bytes (necessarily <= dictSize), or 0 if error.
+ This is schematically equivalent to a memcpy() followed by LZ4_loadDict(),
+ but is much faster, because LZ4_saveDict() doesn't need to rebuild tables.
+ @return : saved dictionary size in bytes (necessarily <= maxDictSize), or 0 if error.
</p></pre><BR>
@@ -198,37 +206,59 @@ int LZ4_freeStream (LZ4_stream_t* streamPtr);
<pre><b>LZ4_streamDecode_t* LZ4_createStreamDecode(void);
int LZ4_freeStreamDecode (LZ4_streamDecode_t* LZ4_stream);
-</b><p> creation / destruction of streaming decompression tracking structure.
- A tracking structure can be re-used multiple times sequentially.
+</b><p> creation / destruction of streaming decompression tracking context.
+ A tracking context can be re-used multiple times.
+
</p></pre><BR>
<pre><b>int LZ4_setStreamDecode (LZ4_streamDecode_t* LZ4_streamDecode, const char* dictionary, int dictSize);
-</b><p> An LZ4_streamDecode_t structure can be allocated once and re-used multiple times.
+</b><p> An LZ4_streamDecode_t context can be allocated once and re-used multiple times.
Use this function to start decompression of a new stream of blocks.
- A dictionary can optionnally be set. Use NULL or size 0 for a simple reset order.
+ A dictionary can optionnally be set. Use NULL or size 0 for a reset order.
+ Dictionary is presumed stable : it must remain accessible and unmodified during next decompression.
@return : 1 if OK, 0 if error
</p></pre><BR>
+<pre><b>int LZ4_decoderRingBufferSize(int maxBlockSize);
+#define LZ4_DECODER_RING_BUFFER_SIZE(mbs) (65536 + 14 + (mbs)) </b>/* for static allocation; mbs presumed valid */<b>
+</b><p> Note : in a ring buffer scenario (optional),
+ blocks are presumed decompressed next to each other
+ up to the moment there is not enough remaining space for next block (remainingSize < maxBlockSize),
+ at which stage it resumes from beginning of ring buffer.
+ When setting such a ring buffer for streaming decompression,
+ provides the minimum size of this ring buffer
+ to be compatible with any source respecting maxBlockSize condition.
+ @return : minimum ring buffer size,
+ or 0 if there is an error (invalid maxBlockSize).
+
+</p></pre><BR>
+
<pre><b>int LZ4_decompress_safe_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* src, char* dst, int srcSize, int dstCapacity);
int LZ4_decompress_fast_continue (LZ4_streamDecode_t* LZ4_streamDecode, const char* src, char* dst, int originalSize);
</b><p> These decoding functions allow decompression of consecutive blocks in "streaming" mode.
A block is an unsplittable entity, it must be presented entirely to a decompression function.
- Decompression functions only accept one block at a time.
- Previously decoded blocks *must* remain available at the memory position where they were decoded (up to 64 KB).
-
- Special : if application sets a ring buffer for decompression, it must respect one of the following conditions :
- - Exactly same size as encoding buffer, with same update rule (block boundaries at same positions)
- In which case, the decoding & encoding ring buffer can have any size, including very small ones ( < 64 KB).
- - Larger than encoding buffer, by a minimum of maxBlockSize more bytes.
- maxBlockSize is implementation dependent. It's the maximum size of any single block.
+ Decompression functions only accepts one block at a time.
+ The last 64KB of previously decoded data *must* remain available and unmodified at the memory position where they were decoded.
+ If less than 64KB of data has been decoded, all the data must be present.
+
+ Special : if decompression side sets a ring buffer, it must respect one of the following conditions :
+ - Decompression buffer size is _at least_ LZ4_decoderRingBufferSize(maxBlockSize).
+ maxBlockSize is the maximum size of any single block. It can have any value > 16 bytes.
+ In which case, encoding and decoding buffers do not need to be synchronized.
+ Actually, data can be produced by any source compliant with LZ4 format specification, and respecting maxBlockSize.
+ - Synchronized mode :
+ Decompression buffer size is _exactly_ the same as compression buffer size,
+ and follows exactly same update rule (block boundaries at same positions),
+ and decoding function is provided with exact decompressed size of each block (exception for last block of the stream),
+ _then_ decoding & encoding ring buffer can have any size, including small ones ( < 64 KB).
+ - Decompression buffer is larger than encoding buffer, by a minimum of maxBlockSize more bytes.
In which case, encoding and decoding buffers do not need to be synchronized,
and encoding ring buffer can have any size, including small ones ( < 64 KB).
- - _At least_ 64 KB + 8 bytes + maxBlockSize.
- In which case, encoding and decoding buffers do not need to be synchronized,
- and encoding ring buffer can have any size, including larger than decoding buffer.
- Whenever these conditions are not possible, save the last 64KB of decoded data into a safe buffer,
- and indicate where it is saved using LZ4_setStreamDecode() before decompressing next block.
+
+ Whenever these conditions are not possible,
+ save the last 64KB of decoded data into a safe buffer where it can't be modified during decompression,
+ then indicate where this data is saved using LZ4_setStreamDecode(), before decompressing next block.
</p></pre><BR>
<pre><b>int LZ4_decompress_safe_usingDict (const char* src, char* dst, int srcSize, int dstCapcity, const char* dictStart, int dictSize);
@@ -236,25 +266,98 @@ int LZ4_decompress_fast_usingDict (const char* src, char* dst, int originalSize,
</b><p> These decoding functions work the same as
a combination of LZ4_setStreamDecode() followed by LZ4_decompress_*_continue()
They are stand-alone, and don't need an LZ4_streamDecode_t structure.
+ Dictionary is presumed stable : it must remain accessible and unmodified during next decompression.
+
+</p></pre><BR>
+
+<a name="Chapter8"></a><h2>Unstable declarations</h2><pre>
+ Declarations in this section should be considered unstable.
+ Use at your own peril, etc., etc.
+ They may be removed in the future.
+ Their signatures may change.
+<BR></pre>
+
+<pre><b>void LZ4_resetStream_fast (LZ4_stream_t* streamPtr);
+</b><p> Use this, like LZ4_resetStream(), to prepare a context for a new chain of
+ calls to a streaming API (e.g., LZ4_compress_fast_continue()).
+
+ Note:
+ Using this in advance of a non- streaming-compression function is redundant,
+ and potentially bad for performance, since they all perform their own custom
+ reset internally.
+
+ Differences from LZ4_resetStream():
+ When an LZ4_stream_t is known to be in a internally coherent state,
+ it can often be prepared for a new compression with almost no work, only
+ sometimes falling back to the full, expensive reset that is always required
+ when the stream is in an indeterminate state (i.e., the reset performed by
+ LZ4_resetStream()).
+
+ LZ4_streams are guaranteed to be in a valid state when:
+ - returned from LZ4_createStream()
+ - reset by LZ4_resetStream()
+ - memset(stream, 0, sizeof(LZ4_stream_t)), though this is discouraged
+ - the stream was in a valid state and was reset by LZ4_resetStream_fast()
+ - the stream was in a valid state and was then used in any compression call
+ that returned success
+ - the stream was in an indeterminate state and was used in a compression
+ call that fully reset the state (e.g., LZ4_compress_fast_extState()) and
+ that returned success
+
+ When a stream isn't known to be in a valid state, it is not safe to pass to
+ any fastReset or streaming function. It must first be cleansed by the full
+ LZ4_resetStream().
</p></pre><BR>
-<a name="Chapter8"></a><h2>Private definitions</h2><pre>
+<pre><b>int LZ4_compress_fast_extState_fastReset (void* state, const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);
+</b><p> A variant of LZ4_compress_fast_extState().
+
+ Using this variant avoids an expensive initialization step. It is only safe
+ to call if the state buffer is known to be correctly initialized already
+ (see above comment on LZ4_resetStream_fast() for a definition of "correctly
+ initialized"). From a high level, the difference is that this function
+ initializes the provided state with a call to something like
+ LZ4_resetStream_fast() while LZ4_compress_fast_extState() starts with a
+ call to LZ4_resetStream().
+
+</p></pre><BR>
+
+<pre><b>void LZ4_attach_dictionary(LZ4_stream_t *working_stream, const LZ4_stream_t *dictionary_stream);
+</b><p> This is an experimental API that allows for the efficient use of a
+ static dictionary many times.
+
+ Rather than re-loading the dictionary buffer into a working context before
+ each compression, or copying a pre-loaded dictionary's LZ4_stream_t into a
+ working LZ4_stream_t, this function introduces a no-copy setup mechanism,
+ in which the working stream references the dictionary stream in-place.
+
+ Several assumptions are made about the state of the dictionary stream.
+ Currently, only streams which have been prepared by LZ4_loadDict() should
+ be expected to work.
+
+ Alternatively, the provided dictionary stream pointer may be NULL, in which
+ case any existing dictionary stream is unset.
+
+ If a dictionary is provided, it replaces any pre-existing stream history.
+ The dictionary contents are the only history that can be referenced and
+ logically immediately precede the data compressed in the first subsequent
+ compression call.
+
+ The dictionary will only remain attached to the working stream through the
+ first compression call, at the end of which it is cleared. The dictionary
+ stream (and source buffer) must remain in-place / accessible / unchanged
+ through the completion of the first compression call on the stream.
+
+</p></pre><BR>
+
+<a name="Chapter9"></a><h2>Private definitions</h2><pre>
Do not use these definitions.
They are exposed to allow static allocation of `LZ4_stream_t` and `LZ4_streamDecode_t`.
Using these definitions will expose code to API and/or ABI break in future versions of the library.
<BR></pre>
<pre><b>typedef struct {
- uint32_t hashTable[LZ4_HASH_SIZE_U32];
- uint32_t currentOffset;
- uint32_t initCheck;
- const uint8_t* dictionary;
- uint8_t* bufferStart; </b>/* obsolete, used for slideInputBuffer */<b>
- uint32_t dictSize;
-} LZ4_stream_t_internal;
-</b></pre><BR>
-<pre><b>typedef struct {
const uint8_t* externalDict;
size_t extDictSize;
const uint8_t* prefixEnd;
@@ -262,15 +365,6 @@ int LZ4_decompress_fast_usingDict (const char* src, char* dst, int originalSize,
} LZ4_streamDecode_t_internal;
</b></pre><BR>
<pre><b>typedef struct {
- unsigned int hashTable[LZ4_HASH_SIZE_U32];
- unsigned int currentOffset;
- unsigned int initCheck;
- const unsigned char* dictionary;
- unsigned char* bufferStart; </b>/* obsolete, used for slideInputBuffer */<b>
- unsigned int dictSize;
-} LZ4_stream_t_internal;
-</b></pre><BR>
-<pre><b>typedef struct {
const unsigned char* externalDict;
size_t extDictSize;
const unsigned char* prefixEnd;
@@ -305,17 +399,15 @@ union LZ4_streamDecode_u {
</p></pre><BR>
-<a name="Chapter9"></a><h2>Obsolete Functions</h2><pre></pre>
+<a name="Chapter10"></a><h2>Obsolete Functions</h2><pre></pre>
<pre><b>#ifdef LZ4_DISABLE_DEPRECATE_WARNINGS
# define LZ4_DEPRECATED(message) </b>/* disable deprecation warnings */<b>
#else
# define LZ4_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)
-# if defined(__clang__) </b>/* clang doesn't handle mixed C++11 and CNU attributes */<b>
-# define LZ4_DEPRECATED(message) __attribute__((deprecated(message)))
-# elif defined (__cplusplus) && (__cplusplus >= 201402) </b>/* C++14 or greater */<b>
+# if defined (__cplusplus) && (__cplusplus >= 201402) </b>/* C++14 or greater */<b>
# define LZ4_DEPRECATED(message) [[deprecated(message)]]
-# elif (LZ4_GCC_VERSION >= 405)
+# elif (LZ4_GCC_VERSION >= 405) || defined(__clang__)
# define LZ4_DEPRECATED(message) __attribute__((deprecated(message)))
# elif (LZ4_GCC_VERSION >= 301)
# define LZ4_DEPRECATED(message) __attribute__((deprecated))
diff --git a/doc/lz4frame_manual.html b/doc/lz4frame_manual.html
index 590c632..53ea7eb 100644
--- a/doc/lz4frame_manual.html
+++ b/doc/lz4frame_manual.html
@@ -1,10 +1,10 @@
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
-<title>1.8.1 Manual</title>
+<title>1.8.2 Manual</title>
</head>
<body>
-<h1>1.8.1 Manual</h1>
+<h1>1.8.2 Manual</h1>
<hr>
<a name="Contents"></a><h2>Contents</h2>
<ol>
@@ -18,6 +18,7 @@
<li><a href="#Chapter8">Compression</a></li>
<li><a href="#Chapter9">Decompression functions</a></li>
<li><a href="#Chapter10">Streaming decompression functions</a></li>
+<li><a href="#Chapter11">Bulk processing dictionary API</a></li>
</ol>
<hr>
<a name="Chapter1"></a><h2>Introduction</h2><pre>
@@ -30,9 +31,9 @@
<a name="Chapter3"></a><h2>Error management</h2><pre></pre>
-<pre><b>unsigned LZ4F_isError(LZ4F_errorCode_t code); </b>/**< tells if a `LZ4F_errorCode_t` function result is an error code */<b>
+<pre><b>unsigned LZ4F_isError(LZ4F_errorCode_t code); </b>/**< tells when a function result is an error code */<b>
</b></pre><BR>
-<pre><b>const char* LZ4F_getErrorName(LZ4F_errorCode_t code); </b>/**< return error code string; useful for debugging */<b>
+<pre><b>const char* LZ4F_getErrorName(LZ4F_errorCode_t code); </b>/**< return error code string; for debugging */<b>
</b></pre><BR>
<a name="Chapter4"></a><h2>Frame compression types</h2><pre></pre>
@@ -74,13 +75,13 @@
} LZ4F_frameType_t;
</b></pre><BR>
<pre><b>typedef struct {
- LZ4F_blockSizeID_t blockSizeID; </b>/* max64KB, max256KB, max1MB, max4MB ; 0 == default */<b>
- LZ4F_blockMode_t blockMode; </b>/* LZ4F_blockLinked, LZ4F_blockIndependent ; 0 == default */<b>
- LZ4F_contentChecksum_t contentChecksumFlag; </b>/* if enabled, frame is terminated with a 32-bits checksum of decompressed data ; 0 == disabled (default) */<b>
- LZ4F_frameType_t frameType; </b>/* read-only field : LZ4F_frame or LZ4F_skippableFrame */<b>
- unsigned long long contentSize; </b>/* Size of uncompressed content ; 0 == unknown */<b>
- unsigned dictID; </b>/* Dictionary ID, sent by the compressor to help decoder select the correct dictionary; 0 == no dictID provided */<b>
- LZ4F_blockChecksum_t blockChecksumFlag; </b>/* if enabled, each block is followed by a checksum of block's compressed data ; 0 == disabled (default) */<b>
+ LZ4F_blockSizeID_t blockSizeID; </b>/* max64KB, max256KB, max1MB, max4MB; 0 == default */<b>
+ LZ4F_blockMode_t blockMode; </b>/* LZ4F_blockLinked, LZ4F_blockIndependent; 0 == default */<b>
+ LZ4F_contentChecksum_t contentChecksumFlag; </b>/* 1: frame terminated with 32-bit checksum of decompressed data; 0: disabled (default) */<b>
+ LZ4F_frameType_t frameType; </b>/* read-only field : LZ4F_frame or LZ4F_skippableFrame */<b>
+ unsigned long long contentSize; </b>/* Size of uncompressed content ; 0 == unknown */<b>
+ unsigned dictID; </b>/* Dictionary ID, sent by compressor to help decoder select correct dictionary; 0 == no dictID provided */<b>
+ LZ4F_blockChecksum_t blockChecksumFlag; </b>/* 1: each block followed by a checksum of block's compressed data; 0: disabled (default) */<b>
} LZ4F_frameInfo_t;
</b><p> makes it possible to set or read frame parameters.
It's not required to set all fields, as long as the structure was initially memset() to zero.
@@ -89,20 +90,23 @@
<pre><b>typedef struct {
LZ4F_frameInfo_t frameInfo;
- int compressionLevel; </b>/* 0 == default (fast mode); values above LZ4HC_CLEVEL_MAX count as LZ4HC_CLEVEL_MAX; values below 0 trigger "fast acceleration", proportional to value */<b>
- unsigned autoFlush; </b>/* 1 == always flush, to reduce usage of internal buffers */<b>
- unsigned reserved[4]; </b>/* must be zero for forward compatibility */<b>
+ int compressionLevel; </b>/* 0: default (fast mode); values > LZ4HC_CLEVEL_MAX count as LZ4HC_CLEVEL_MAX; values < 0 trigger "fast acceleration" */<b>
+ unsigned autoFlush; </b>/* 1: always flush, to reduce usage of internal buffers */<b>
+ unsigned favorDecSpeed; </b>/* 1: parser favors decompression speed vs compression ratio. Only works for high compression modes (>= LZ4LZ4HC_CLEVEL_OPT_MIN) */ /* >= v1.8.2 */<b>
+ unsigned reserved[3]; </b>/* must be zero for forward compatibility */<b>
} LZ4F_preferences_t;
</b><p> makes it possible to supply detailed compression parameters to the stream interface.
- It's not required to set all fields, as long as the structure was initially memset() to zero.
+ Structure is presumed initially memset() to zero, representing default settings.
All reserved fields must be set to zero.
</p></pre><BR>
<a name="Chapter5"></a><h2>Simple compression function</h2><pre></pre>
<pre><b>size_t LZ4F_compressFrameBound(size_t srcSize, const LZ4F_preferences_t* preferencesPtr);
-</b><p> Returns the maximum possible size of a frame compressed with LZ4F_compressFrame() given srcSize content and preferences.
- Note : this result is only usable with LZ4F_compressFrame(), not with multi-segments compression.
+</b><p> Returns the maximum possible compressed size with LZ4F_compressFrame() given srcSize and preferences.
+ `preferencesPtr` is optional. It can be replaced by NULL, in which case, the function will assume default preferences.
+ Note : this result is only usable with LZ4F_compressFrame().
+ It may also be used with LZ4F_compressUpdate() _if no flush() operation_ is performed.
</p></pre><BR>
@@ -151,41 +155,54 @@ LZ4F_errorCode_t LZ4F_freeCompressionContext(LZ4F_cctx* cctx);
</p></pre><BR>
<pre><b>size_t LZ4F_compressBound(size_t srcSize, const LZ4F_preferences_t* prefsPtr);
-</b><p> Provides dstCapacity given a srcSize to guarantee operation success in worst case situations.
- prefsPtr is optional : you can provide NULL as argument, preferences will be set to cover worst case scenario.
- Result is always the same for a srcSize and prefsPtr, so it can be trusted to size reusable buffers.
- When srcSize==0, LZ4F_compressBound() provides an upper bound for LZ4F_flush() and LZ4F_compressEnd() operations.
+</b><p> Provides minimum dstCapacity required to guarantee compression success
+ given a srcSize and preferences, covering worst case scenario.
+ prefsPtr is optional : when NULL is provided, preferences will be set to cover worst case scenario.
+ Estimation is valid for either LZ4F_compressUpdate(), LZ4F_flush() or LZ4F_compressEnd(),
+ Estimation includes the possibility that internal buffer might already be filled by up to (blockSize-1) bytes.
+ It also includes frame footer (ending + checksum), which would have to be generated by LZ4F_compressEnd().
+ Estimation doesn't include frame header, as it was already generated by LZ4F_compressBegin().
+ Result is always the same for a srcSize and prefsPtr, so it can be trusted to size reusable buffers.
+ When srcSize==0, LZ4F_compressBound() provides an upper bound for LZ4F_flush() and LZ4F_compressEnd() operations.
</p></pre><BR>
-<pre><b>size_t LZ4F_compressUpdate(LZ4F_cctx* cctx, void* dstBuffer, size_t dstCapacity, const void* srcBuffer, size_t srcSize, const LZ4F_compressOptions_t* cOptPtr);
-</b><p> LZ4F_compressUpdate() can be called repetitively to compress as much data as necessary.
- An important rule is that dstCapacity MUST be large enough to ensure operation success even in worst case situations.
- This value is provided by LZ4F_compressBound().
- If this condition is not respected, LZ4F_compress() will fail (result is an errorCode).
- LZ4F_compressUpdate() doesn't guarantee error recovery. When an error occurs, compression context must be freed or resized.
+<pre><b>size_t LZ4F_compressUpdate(LZ4F_cctx* cctx,
+ void* dstBuffer, size_t dstCapacity,
+ const void* srcBuffer, size_t srcSize,
+ const LZ4F_compressOptions_t* cOptPtr);
+</b><p> LZ4F_compressUpdate() can be called repetitively to compress as much data as necessary.
+ Important rule: dstCapacity MUST be large enough to ensure operation success even in worst case situations.
+ This value is provided by LZ4F_compressBound().
+ If this condition is not respected, LZ4F_compress() will fail (result is an errorCode).
+ LZ4F_compressUpdate() doesn't guarantee error recovery.
+ When an error occurs, compression context must be freed or resized.
`cOptPtr` is optional : NULL can be provided, in which case all options are set to default.
@return : number of bytes written into `dstBuffer` (it can be zero, meaning input data was just buffered).
or an error code if it fails (which can be tested using LZ4F_isError())
</p></pre><BR>
-<pre><b>size_t LZ4F_flush(LZ4F_cctx* cctx, void* dstBuffer, size_t dstCapacity, const LZ4F_compressOptions_t* cOptPtr);
-</b><p> When data must be generated and sent immediately, without waiting for a block to be completely filled,
- it's possible to call LZ4_flush(). It will immediately compress any data buffered within cctx.
+<pre><b>size_t LZ4F_flush(LZ4F_cctx* cctx,
+ void* dstBuffer, size_t dstCapacity,
+ const LZ4F_compressOptions_t* cOptPtr);
+</b><p> When data must be generated and sent immediately, without waiting for a block to be completely filled,
+ it's possible to call LZ4_flush(). It will immediately compress any data buffered within cctx.
`dstCapacity` must be large enough to ensure the operation will be successful.
`cOptPtr` is optional : it's possible to provide NULL, all options will be set to default.
- @return : number of bytes written into dstBuffer (it can be zero, which means there was no data stored within cctx)
+ @return : nb of bytes written into dstBuffer (can be zero, when there is no data stored within cctx)
or an error code if it fails (which can be tested using LZ4F_isError())
</p></pre><BR>
-<pre><b>size_t LZ4F_compressEnd(LZ4F_cctx* cctx, void* dstBuffer, size_t dstCapacity, const LZ4F_compressOptions_t* cOptPtr);
+<pre><b>size_t LZ4F_compressEnd(LZ4F_cctx* cctx,
+ void* dstBuffer, size_t dstCapacity,
+ const LZ4F_compressOptions_t* cOptPtr);
</b><p> To properly finish an LZ4 frame, invoke LZ4F_compressEnd().
It will flush whatever data remained within `cctx` (like LZ4_flush())
and properly finalize the frame, with an endMark and a checksum.
`cOptPtr` is optional : NULL can be provided, in which case all options will be set to default.
- @return : number of bytes written into dstBuffer (necessarily >= 4 (endMark), or 8 if optional frame checksum is enabled)
+ @return : nb of bytes written into dstBuffer, necessarily >= 4 (endMark),
or an error code if it fails (which can be tested using LZ4F_isError())
A successful call to LZ4F_compressEnd() makes `cctx` available again for another compression task.
@@ -194,19 +211,19 @@ LZ4F_errorCode_t LZ4F_freeCompressionContext(LZ4F_cctx* cctx);
<a name="Chapter9"></a><h2>Decompression functions</h2><pre></pre>
<pre><b>typedef struct {
- unsigned stableDst; </b>/* pledge that at least 64KB+64Bytes of previously decompressed data remain unmodifed where it was decoded. This optimization skips storage operations in tmp buffers */<b>
+ unsigned stableDst; </b>/* pledges that last 64KB decompressed data will remain available unmodified. This optimization skips storage operations in tmp buffers. */<b>
unsigned reserved[3]; </b>/* must be set to zero for forward compatibility */<b>
} LZ4F_decompressOptions_t;
</b></pre><BR>
<pre><b>LZ4F_errorCode_t LZ4F_createDecompressionContext(LZ4F_dctx** dctxPtr, unsigned version);
LZ4F_errorCode_t LZ4F_freeDecompressionContext(LZ4F_dctx* dctx);
-</b><p> Create an LZ4F_dctx object, to track all decompression operations.
- The version provided MUST be LZ4F_VERSION.
- The function provides a pointer to an allocated and initialized LZ4F_dctx object.
- The result is an errorCode, which can be tested using LZ4F_isError().
- dctx memory can be released using LZ4F_freeDecompressionContext();
- The result of LZ4F_freeDecompressionContext() is indicative of the current state of decompressionContext when being released.
- That is, it should be == 0 if decompression has been completed fully and correctly.
+</b><p> Create an LZ4F_dctx object, to track all decompression operations.
+ The version provided MUST be LZ4F_VERSION.
+ The function provides a pointer to an allocated and initialized LZ4F_dctx object.
+ The result is an errorCode, which can be tested using LZ4F_isError().
+ dctx memory can be released using LZ4F_freeDecompressionContext();
+ Result of LZ4F_freeDecompressionContext() indicates current state of decompressionContext when being released.
+ That is, it should be == 0 if decompression has been completed fully and correctly.
</p></pre><BR>
@@ -245,8 +262,8 @@ LZ4F_errorCode_t LZ4F_freeDecompressionContext(LZ4F_dctx* dctx);
The function will read up to *srcSizePtr bytes from srcBuffer,
and decompress data into dstBuffer, of capacity *dstSizePtr.
- The number of bytes consumed from srcBuffer will be written into *srcSizePtr (necessarily <= original value).
- The number of bytes decompressed into dstBuffer will be written into *dstSizePtr (necessarily <= original value).
+ The nb of bytes consumed from srcBuffer will be written into *srcSizePtr (necessarily <= original value).
+ The nb of bytes decompressed into dstBuffer will be written into *dstSizePtr (necessarily <= original value).
The function does not necessarily read all input bytes, so always check value in *srcSizePtr.
Unconsumed source data must be presented again in subsequent invocations.
@@ -278,5 +295,58 @@ LZ4F_errorCode_t LZ4F_freeDecompressionContext(LZ4F_dctx* dctx);
and start a new one using same context resources.
</p></pre><BR>
+<pre><b>typedef enum { LZ4F_LIST_ERRORS(LZ4F_GENERATE_ENUM) } LZ4F_errorCodes;
+</b></pre><BR>
+<a name="Chapter11"></a><h2>Bulk processing dictionary API</h2><pre></pre>
+
+<pre><b>LZ4FLIB_STATIC_API LZ4F_CDict* LZ4F_createCDict(const void* dictBuffer, size_t dictSize);
+LZ4FLIB_STATIC_API void LZ4F_freeCDict(LZ4F_CDict* CDict);
+</b><p> When compressing multiple messages / blocks with the same dictionary, it's recommended to load it just once.
+ LZ4_createCDict() will create a digested dictionary, ready to start future compression operations without startup delay.
+ LZ4_CDict can be created once and shared by multiple threads concurrently, since its usage is read-only.
+ `dictBuffer` can be released after LZ4_CDict creation, since its content is copied within CDict
+</p></pre><BR>
+
+<pre><b>LZ4FLIB_STATIC_API size_t LZ4F_compressFrame_usingCDict(
+ LZ4F_cctx* cctx,
+ void* dst, size_t dstCapacity,
+ const void* src, size_t srcSize,
+ const LZ4F_CDict* cdict,
+ const LZ4F_preferences_t* preferencesPtr);
+</b><p> Compress an entire srcBuffer into a valid LZ4 frame using a digested Dictionary.
+ cctx must point to a context created by LZ4F_createCompressionContext().
+ If cdict==NULL, compress without a dictionary.
+ dstBuffer MUST be >= LZ4F_compressFrameBound(srcSize, preferencesPtr).
+ If this condition is not respected, function will fail (@return an errorCode).
+ The LZ4F_preferences_t structure is optional : you may provide NULL as argument,
+ but it's not recommended, as it's the only way to provide dictID in the frame header.
+ @return : number of bytes written into dstBuffer.
+ or an error code if it fails (can be tested using LZ4F_isError())
+</p></pre><BR>
+
+<pre><b>LZ4FLIB_STATIC_API size_t LZ4F_compressBegin_usingCDict(
+ LZ4F_cctx* cctx,
+ void* dstBuffer, size_t dstCapacity,
+ const LZ4F_CDict* cdict,
+ const LZ4F_preferences_t* prefsPtr);
+</b><p> Inits streaming dictionary compression, and writes the frame header into dstBuffer.
+ dstCapacity must be >= LZ4F_HEADER_SIZE_MAX bytes.
+ `prefsPtr` is optional : you may provide NULL as argument,
+ however, it's the only way to provide dictID in the frame header.
+ @return : number of bytes written into dstBuffer for the header,
+ or an error code (which can be tested using LZ4F_isError())
+</p></pre><BR>
+
+<pre><b>LZ4FLIB_STATIC_API size_t LZ4F_decompress_usingDict(
+ LZ4F_dctx* dctxPtr,
+ void* dstBuffer, size_t* dstSizePtr,
+ const void* srcBuffer, size_t* srcSizePtr,
+ const void* dict, size_t dictSize,
+ const LZ4F_decompressOptions_t* decompressOptionsPtr);
+</b><p> Same as LZ4F_decompress(), using a predefined dictionary.
+ Dictionary is used "in place", without any preprocessing.
+ It must remain accessible throughout the entire frame decoding.
+</p></pre><BR>
+
</html>
</body>