Merge pull request #899 from lz4/endMark

Clarifies and fix EndMark
author: Yann Collet <Cyan4973@users.noreply.github.com> 2020-08-14 22:48:21 (GMT)
committer: GitHub <noreply@github.com> 2020-08-14 22:48:21 (GMT)
commit: 9a6e93859d8241643831994572f41c21b6887470 (patch)
tree: 53c505c4a36cb917e9b60b1cd3a83744f49595cb
parent: f328e329b3cec38ec8316d454279b79d19c36fdd (diff)
parent: 5ab7d22fa5622ab0a02bc627e6ec8742a8e3707c (diff)
download: lz4-9a6e93859d8241643831994572f41c21b6887470.zip
lz4-9a6e93859d8241643831994572f41c21b6887470.tar.gz
lz4-9a6e93859d8241643831994572f41c21b6887470.tar.bz2
3 files changed, 55 insertions, 30 deletions
diff --git a/doc/lz4_Frame_format.md b/doc/lz4_Frame_format.md
index a0514e0..e7cbdbf 100644
--- a/doc/lz4_Frame_format.md
+++ b/doc/lz4_Frame_format.md
@@ -16,7 +16,7 @@ Distribution of this document is unlimited.
 
 ### Version
 
-1.6.1 (30/01/2018)
+1.6.2 (12/08/2020)
 
 
 Introduction
@@ -75,7 +75,7 @@ __Frame Descriptor__
 3 to 15 Bytes, to be detailed in its own paragraph,
 as it is the most important part of the spec.
 
-The combined __Magic Number__ and __Frame Descriptor__ fields are sometimes
+The combined _Magic_Number_ and _Frame_Descriptor_ fields are sometimes
 called ___LZ4 Frame Header___. Its size varies between 7 and 19 bytes.
 
 __Data Blocks__
@@ -85,14 +85,13 @@ That’s where compressed data is stored.
 
 __EndMark__
 
-The flow of blocks ends when the last data block has a size of “0”.
-The size is expressed as a 32-bits value.
+The flow of blocks ends when the last data block is followed by
+the 32-bit value `0x00000000`.
 
 __Content Checksum__
 
-Content Checksum verify that the full content has been decoded correctly.
-The content checksum is the result
-of [xxh32() hash function](https://github.com/Cyan4973/xxHash)
+_Content_Checksum_ verify that the full content has been decoded correctly.
+The content checksum is the result of [xxHash-32 algorithm]
 digesting the original (decoded) data as input, and a seed of zero.
 Content checksum is only present when its associated flag
 is set in the frame descriptor.
@@ -101,7 +100,7 @@ that all blocks were fully transmitted in the correct order and without error,
 and also that the encoding/decoding process itself generated no distortion.
 Its usage is recommended.
 
-The combined __EndMark__ and __Content Checksum__ fields might sometimes be
+The combined _EndMark_ and _Content_Checksum_ fields might sometimes be
 referred to as ___LZ4 Frame Footer___. Its size varies between 4 and 8 bytes.
 
 __Frame Concatenation__
@@ -261,16 +260,24 @@ __Block Size__
 
 This field uses 4-bytes, format is little-endian.
 
-The highest bit is “1” if data in the block is uncompressed.
+If the highest bit is set (`1`), the block is uncompressed.
 
-The highest bit is “0” if data in the block is compressed by LZ4.
+If the highest bit is not set (`0`), the block is LZ4-compressed,
+using the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).
 
-All other bits give the size, in bytes, of the following data block.
+All other bits give the size, in bytes, of the data section.
 The size does not include the block checksum if present.
 
-Block Size shall never be larger than Block Maximum Size.
-Such a thing could potentially happen for non-compressible sources.
-In such a case, such data block shall be passed using uncompressed format.
+_Block_Size_ shall never be larger than _Block_Maximum_Size_.
+Such an outcome could potentially happen for non-compressible sources.
+In such a case, such data block must be passed using uncompressed format.
+
+A value of `0x00000000` is invalid, and signifies an _EndMark_ instead.
+Note that this is different from a value of `0x80000000` (highest bit set),
+which is an uncompressed block of size 0 (empty),
+which is valid, and therefore doesn't end a frame.
+Note that, if _Block_checksum_ is enabled,
+even an empty block must be followed by a 32-bit block checksum.
 
 __Data__
 
@@ -279,20 +286,22 @@ It might be compressed or not, depending on previous field indications.
 
 When compressed, the data must respect the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).
 
-Note that the block is not necessarily full.
-Uncompressed size of data can be any size, up to "Block Maximum Size”,
+Note that a block is not necessarily full.
+Uncompressed size of data can be any size __up to__ _Block_Maximum_Size_,
 so it may contain less data than the maximum block size.
 
 __Block checksum__
 
 Only present if the associated flag is set.
 This is a 4-bytes checksum value, in little endian format,
-calculated by using the xxHash-32 algorithm on the raw (undecoded) data block,
+calculated by using the [xxHash-32 algorithm] on the __raw__ (undecoded) data block,
 and a seed of zero.
 The intention is to detect data corruption (storage or transmission errors)
 before decoding.
 
-Block checksum is cumulative with Content checksum.
+_Block_checksum_ can be cumulative with _Content_checksum_.
+
+[xxHash-32 algorithm]: https://github.com/Cyan4973/xxHash/blob/release/doc/xxhash_spec.md
 
 
 Skippable Frames
@@ -389,6 +398,8 @@ and trigger an error if it does not fit within acceptable range.
 Version changes
 ---------------
 
+1.6.2 : clarifies specification of _EndMark_
+
 1.6.1 : introduced terms "LZ4 Frame Header" and "LZ4 Frame Footer"
 
 1.6.0 : restored Dictionary ID field in Frame header
diff --git a/lib/lz4frame.c b/lib/lz4frame.c
index 5d716ea..e11f1c8 100644
--- a/lib/lz4frame.c
+++ b/lib/lz4frame.c
@@ -1483,14 +1483,16 @@ size_t LZ4F_decompress(LZ4F_dctx* dctx,
             }   /* if (dctx->dStage == dstage_storeBlockHeader) */
 
         /* decode block header */
-            {   size_t const nextCBlockSize = LZ4F_readLE32(selectedIn) & 0x7FFFFFFFU;
+            {   U32 const blockHeader = LZ4F_readLE32(selectedIn);
+                size_t const nextCBlockSize = blockHeader & 0x7FFFFFFFU;
                 size_t const crcSize = dctx->frameInfo.blockChecksumFlag * BFSize;
-                if (nextCBlockSize==0) {  /* frameEnd signal, no more block */
+                if (blockHeader==0) {  /* frameEnd signal, no more block */
                     dctx->dStage = dstage_getSuffix;
                     break;
                 }
-                if (nextCBlockSize > dctx->maxBlockSize)
+                if (nextCBlockSize > dctx->maxBlockSize) {
                     return err0r(LZ4F_ERROR_maxBlockSize_invalid);
+                }
                 if (LZ4F_readLE32(selectedIn) & LZ4F_BLOCKUNCOMPRESSED_FLAG) {
                     /* next block is uncompressed */
                     dctx->tmpInTarget = nextCBlockSize;
diff --git a/tests/frametest.c b/tests/frametest.c
index f891530..236a98c 100644
--- a/tests/frametest.c
+++ b/tests/frametest.c
@@ -995,13 +995,13 @@ int fuzzerTests(U32 seed, unsigned nbTests, unsigned startTest, double compressi
             BYTE* op = (BYTE*)compressedBuffer;
             BYTE* const oend = op + (neverFlush ? LZ4F_compressFrameBound(srcSize, prefsPtr) : compressedBufferSize);  /* when flushes are possible, can't guarantee a max compressed size */
             unsigned const maxBits = FUZ_highbit((U32)srcSize);
-            size_t cSegmentSize;
             LZ4F_compressOptions_t cOptions;
             memset(&cOptions, 0, sizeof(cOptions));
-            cSegmentSize = LZ4F_compressBegin(cCtx, op, (size_t)(oend-op), prefsPtr);
-            CHECK(LZ4F_isError(cSegmentSize), "Compression header failed (error %i)",
-                                            (int)cSegmentSize);
-            op += cSegmentSize;
+            {   size_t const fhSize = LZ4F_compressBegin(cCtx, op, (size_t)(oend-op), prefsPtr);
+                CHECK(LZ4F_isError(fhSize), "Compression header failed (error %i)",
+                                            (int)fhSize);
+                op += fhSize;
+            }
             while (ip < iend) {
                 unsigned const nbBitsSeg = FUZ_rand(&randState) % maxBits;
                 size_t const sampleMax = (FUZ_rand(&randState) & ((1<<nbBitsSeg)-1)) + 1;
@@ -1024,8 +1024,20 @@ int fuzzerTests(U32 seed, unsigned nbTests, unsigned startTest, double compressi
                         DISPLAYLEVEL(6,"flushing %u bytes \n", (unsigned)flushSize);
                         CHECK(LZ4F_isError(flushSize), "Compression failed (error %i)", (int)flushSize);
                         op += flushSize;
-                }   }
-            }
+                        if ((FUZ_rand(&randState) % 1024) == 3) {
+                            /* add an empty block (requires uncompressed flag) */
+                            op[0] = op[1] = op[2] = 0;
+                            op[3] = 0x80; /* 0x80000000U in little-endian format */
+                            op += 4;
+                            if ((prefsPtr!= NULL) && prefsPtr->frameInfo.blockChecksumFlag) {
+                                U32 const bc32 = XXH32(op, 0, 0);
+                                op[0] = (BYTE)bc32; /* little endian format */
+                                op[1] = (BYTE)(bc32>>8);
+                                op[2] = (BYTE)(bc32>>16);
+                                op[3] = (BYTE)(bc32>>24);
+                                op += 4;
+                }   }   }   }
+            }  /* while (ip<iend) */
             CHECK(op>=oend, "LZ4F_compressFrameBound overflow");
             {   size_t const dstEndSafeSize = LZ4F_compressBound(0, prefsPtr);
                 int const tooSmallDstEnd = ((FUZ_rand(&randState) & 31) == 3);
@@ -1086,8 +1098,8 @@ int fuzzerTests(U32 seed, unsigned nbTests, unsigned startTest, double compressi
         DISPLAYLEVEL(6, "noisy decompression \n");
         test_lz4f_decompression(compressedBuffer, cSize, srcStart, srcSize, crcOrig, &randState, dCtxNoise, seed, testNb);
         /* note : we don't analyze result here : it probably failed, which is expected.
-         * We just check for potential out-of-bound reads and writes. */
-         LZ4F_resetDecompressionContext(dCtxNoise);  /* context must be reset after an error */
+         * The sole purpose is to catch potential out-of-bound reads and writes. */
+        LZ4F_resetDecompressionContext(dCtxNoise);  /* context must be reset after an error */
 #endif
 
 }   /* for ( ; (testNb < nbTests) ; ) */
author	Yann Collet <Cyan4973@users.noreply.github.com>	2020-08-14 22:48:21 (GMT)
committer	GitHub <noreply@github.com>	2020-08-14 22:48:21 (GMT)
commit	9a6e93859d8241643831994572f41c21b6887470 (patch)
tree	53c505c4a36cb917e9b60b1cd3a83744f49595cb
parent	f328e329b3cec38ec8316d454279b79d19c36fdd (diff)
parent	5ab7d22fa5622ab0a02bc627e6ec8742a8e3707c (diff)
download	lz4-9a6e93859d8241643831994572f41c21b6887470.zip lz4-9a6e93859d8241643831994572f41c21b6887470.tar.gz lz4-9a6e93859d8241643831994572f41c21b6887470.tar.bz2