From 19665c93ea4a25e02970a53ec5d2cd1af9303c43 Mon Sep 17 00:00:00 2001 From: Takayuki MATSUOKA Date: Wed, 25 Mar 2015 07:52:35 +0900 Subject: Add document for "Line by Line Text Compression" example --- examples/blockStreaming_lineByLine.md | 121 ++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) create mode 100644 examples/blockStreaming_lineByLine.md diff --git a/examples/blockStreaming_lineByLine.md b/examples/blockStreaming_lineByLine.md new file mode 100644 index 0000000..1d379a0 --- /dev/null +++ b/examples/blockStreaming_lineByLine.md @@ -0,0 +1,121 @@ +# LZ4 Streaming API Example : Line by Line Text Compression + +`blockStreaming_lineByLine.c` is LZ4 Straming API example which implements line by line incremental (de)compression. + +Please note the following restrictions : + + - Firstly, read "LZ4 Streaming API Basics". + - This is relatively advanced application example. + - Output file is not compatible with lz4frame and platform dependent. + + +## What's the point of this example ? + + - Line by line incremental (de)compression. + - Handle huge file in small amount of memory + - Generally better compression ratio than Block API + - Non-uniform block size + + +## How the compression works + +First of all, allocate "Ring Buffer" for input and LZ4 compressed data buffer for output. + +``` +(1) + Ring Buffer + + +--------+ + | Line#1 | + +---+----+ + | + v + {Out#1} + + +(2) + Prefix Mode Dependency + +----+ + | | + v | + +--------+-+------+ + | Line#1 | Line#2 | + +--------+---+----+ + | + v + {Out#2} + + +(3) + Prefix Prefix + +----+ +----+ + | | | | + v | v | + +--------+-+------+-+------+ + | Line#1 | Line#2 | Line#3 | + +--------+--------+---+----+ + | + v + {Out#3} + + +(4) + External Dictionary Mode + +----+ +----+ + | | | | + v | v | + ------+--------+-+------+-+--------+ + | .... | Line#X | Line#X+1 | + ------+--------+--------+-----+----+ + ^ | + | v + | {Out#X+1} + | + Reset + + +(5) + Prefix + +-----+ + | | + v | + ------+--------+--------+----------+--+-------+ + | .... | Line#X | Line#X+1 | Line#X+2 | + ------+--------+--------+----------+-----+----+ + ^ | + | v + | {Out#X+2} + | + Reset +``` + +Next (see (1)), read first line to ringbuffer and compress it by `LZ4_compress_continue()`. +For the first time, LZ4 doesn't know any previous dependencies, +so it just compress the line without dependencies and generates compressed line {Out#1} to LZ4 compressed data buffer. +After that, write {Out#1} to the file and forward ringbuffer offset. + +Do the same things to second line (see (2)). +But in this time, LZ4 can use dependency to Line#1 to improve compression ratio. +This dependency is called "Prefix mode". + +Eventually, we'll reach end of ringbuffer at Line#X (see (4)). +This time, we should reset ringbuffer offset. +After resetting, at Line#X+1 pointer is not adjacent, but LZ4 still maintain its memory. +This is called "External Dictionary Mode". + +In Line#X+2 (see (5)), finally LZ4 forget almost all memories but still remains Line#X+1. +This is the same situation as Line#2. + +Continue these procedure to the end of text file. + + +## How the decompression works + +Decompression will do reverse order. + + - Read compressed line from the file to buffer. + - Decompress it to the ringbuffer. + - Output decompressed plain text line to the file. + - Forward ringbuffer offset. If offset exceedes end of the ringbuffer, reset it. + +Continue these procedure to the end of the compressed file. -- cgit v0.12