diff options
Diffstat (limited to 'Doc/library/lzma.rst')
-rw-r--r-- | Doc/library/lzma.rst | 382 |
1 files changed, 382 insertions, 0 deletions
diff --git a/Doc/library/lzma.rst b/Doc/library/lzma.rst new file mode 100644 index 0000000..402fae2 --- /dev/null +++ b/Doc/library/lzma.rst @@ -0,0 +1,382 @@ +:mod:`lzma` --- Compression using the LZMA algorithm +==================================================== + +.. module:: lzma + :synopsis: A Python wrapper for the liblzma compression library. +.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> +.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> + +.. versionadded:: 3.3 + + +This module provides classes and convenience functions for compressing and +decompressing data using the LZMA compression algorithm. Also included is a file +interface supporting the ``.xz`` and legacy ``.lzma`` file formats used by the +:program:`xz` utility, as well as raw compressed streams. + +The interface provided by this module is very similar to that of the :mod:`bz2` +module. However, note that :class:`LZMAFile` is *not* thread-safe, unlike +:class:`bz2.BZ2File`, so if you need to use a single :class:`LZMAFile` instance +from multiple threads, it is necessary to protect it with a lock. + + +.. exception:: LZMAError + + This exception is raised when an error occurs during compression or + decompression, or while initializing the compressor/decompressor state. + + +Reading and writing compressed files +------------------------------------ + +.. function:: open(filename, mode="rb", \*, format=None, check=-1, preset=None, filters=None, encoding=None, errors=None, newline=None) + + Open an LZMA-compressed file in binary or text mode, returning a :term:`file + object`. + + The *filename* argument can be either an actual file name (given as a + :class:`str` or :class:`bytes` object), in which case the named file is + opened, or it can be an existing file object to read from or write to. + + The *mode* argument can be any of ``"r"``, ``"rb"``, ``"w"``, ``"wb"``, + ``"a"`` or ``"ab"`` for binary mode, or ``"rt"``, ``"wt"``, or ``"at"`` for + text mode. The default is ``"rb"``. + + When opening a file for reading, the *format* and *filters* arguments have + the same meanings as for :class:`LZMADecompressor`. In this case, the *check* + and *preset* arguments should not be used. + + When opening a file for writing, the *format*, *check*, *preset* and + *filters* arguments have the same meanings as for :class:`LZMACompressor`. + + For binary mode, this function is equivalent to the :class:`LZMAFile` + constructor: ``LZMAFile(filename, mode, ...)``. In this case, the *encoding*, + *errors* and *newline* arguments must not be provided. + + For text mode, a :class:`LZMAFile` object is created, and wrapped in an + :class:`io.TextIOWrapper` instance with the specified encoding, error + handling behavior, and line ending(s). + + +.. class:: LZMAFile(filename=None, mode="r", \*, format=None, check=-1, preset=None, filters=None) + + Open an LZMA-compressed file in binary mode. + + An :class:`LZMAFile` can wrap an already-open :term:`file object`, or operate + directly on a named file. The *filename* argument specifies either the file + object to wrap, or the name of the file to open (as a :class:`str` or + :class:`bytes` object). When wrapping an existing file object, the wrapped + file will not be closed when the :class:`LZMAFile` is closed. + + The *mode* argument can be either ``"r"`` for reading (default), ``"w"`` for + overwriting, or ``"a"`` for appending. These can equivalently be given as + ``"rb"``, ``"wb"``, and ``"ab"`` respectively. + + If *filename* is a file object (rather than an actual file name), a mode of + ``"w"`` does not truncate the file, and is instead equivalent to ``"a"``. + + When opening a file for reading, the input file may be the concatenation of + multiple separate compressed streams. These are transparently decoded as a + single logical stream. + + When opening a file for reading, the *format* and *filters* arguments have + the same meanings as for :class:`LZMADecompressor`. In this case, the *check* + and *preset* arguments should not be used. + + When opening a file for writing, the *format*, *check*, *preset* and + *filters* arguments have the same meanings as for :class:`LZMACompressor`. + + :class:`LZMAFile` supports all the members specified by + :class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`. + Iteration and the :keyword:`with` statement are supported. + + The following method is also provided: + + .. method:: peek(size=-1) + + Return buffered data without advancing the file position. At least one + byte of data will be returned, unless EOF has been reached. The exact + number of bytes returned is unspecified (the *size* argument is ignored). + + +Compressing and decompressing data in memory +-------------------------------------------- + +.. class:: LZMACompressor(format=FORMAT_XZ, check=-1, preset=None, filters=None) + + Create a compressor object, which can be used to compress data incrementally. + + For a more convenient way of compressing a single chunk of data, see + :func:`compress`. + + The *format* argument specifies what container format should be used. + Possible values are: + + * :const:`FORMAT_XZ`: The ``.xz`` container format. + This is the default format. + + * :const:`FORMAT_ALONE`: The legacy ``.lzma`` container format. + This format is more limited than ``.xz`` -- it does not support integrity + checks or multiple filters. + + * :const:`FORMAT_RAW`: A raw data stream, not using any container format. + This format specifier does not support integrity checks, and requires that + you always specify a custom filter chain (for both compression and + decompression). Additionally, data compressed in this manner cannot be + decompressed using :const:`FORMAT_AUTO` (see :class:`LZMADecompressor`). + + The *check* argument specifies the type of integrity check to include in the + compressed data. This check is used when decompressing, to ensure that the + data has not been corrupted. Possible values are: + + * :const:`CHECK_NONE`: No integrity check. + This is the default (and the only acceptable value) for + :const:`FORMAT_ALONE` and :const:`FORMAT_RAW`. + + * :const:`CHECK_CRC32`: 32-bit Cyclic Redundancy Check. + + * :const:`CHECK_CRC64`: 64-bit Cyclic Redundancy Check. + This is the default for :const:`FORMAT_XZ`. + + * :const:`CHECK_SHA256`: 256-bit Secure Hash Algorithm. + + If the specified check is not supported, an :class:`LZMAError` is raised. + + The compression settings can be specified either as a preset compression + level (with the *preset* argument), or in detail as a custom filter chain + (with the *filters* argument). + + The *preset* argument (if provided) should be an integer between ``0`` and + ``9`` (inclusive), optionally OR-ed with the constant + :const:`PRESET_EXTREME`. If neither *preset* nor *filters* are given, the + default behavior is to use :const:`PRESET_DEFAULT` (preset level ``6``). + Higher presets produce smaller output, but make the compression process + slower. + + .. note:: + + In addition to being more CPU-intensive, compression with higher presets + also requires much more memory (and produces output that needs more memory + to decompress). With preset ``9`` for example, the overhead for an + :class:`LZMACompressor` object can be as high as 800MiB. For this reason, + it is generally best to stick with the default preset. + + The *filters* argument (if provided) should be a filter chain specifier. + See :ref:`filter-chain-specs` for details. + + .. method:: compress(data) + + Compress *data* (a :class:`bytes` object), returning a :class:`bytes` + object containing compressed data for at least part of the input. Some of + *data* may be buffered internally, for use in later calls to + :meth:`compress` and :meth:`flush`. The returned data should be + concatenated with the output of any previous calls to :meth:`compress`. + + .. method:: flush() + + Finish the compression process, returning a :class:`bytes` object + containing any data stored in the compressor's internal buffers. + + The compressor cannot be used after this method has been called. + + +.. class:: LZMADecompressor(format=FORMAT_AUTO, memlimit=None, filters=None) + + Create a decompressor object, which can be used to decompress data + incrementally. + + For a more convenient way of decompressing an entire compressed stream at + once, see :func:`decompress`. + + The *format* argument specifies the container format that should be used. The + default is :const:`FORMAT_AUTO`, which can decompress both ``.xz`` and + ``.lzma`` files. Other possible values are :const:`FORMAT_XZ`, + :const:`FORMAT_ALONE`, and :const:`FORMAT_RAW`. + + The *memlimit* argument specifies a limit (in bytes) on the amount of memory + that the decompressor can use. When this argument is used, decompression will + fail with an :class:`LZMAError` if it is not possible to decompress the input + within the given memory limit. + + The *filters* argument specifies the filter chain that was used to create + the stream being decompressed. This argument is required if *format* is + :const:`FORMAT_RAW`, but should not be used for other formats. + See :ref:`filter-chain-specs` for more information about filter chains. + + .. note:: + This class does not transparently handle inputs containing multiple + compressed streams, unlike :func:`decompress` and :class:`LZMAFile`. To + decompress a multi-stream input with :class:`LZMADecompressor`, you must + create a new decompressor for each stream. + + .. method:: decompress(data) + + Decompress *data* (a :class:`bytes` object), returning a :class:`bytes` + object containing the decompressed data for at least part of the input. + Some of *data* may be buffered internally, for use in later calls to + :meth:`decompress`. The returned data should be concatenated with the + output of any previous calls to :meth:`decompress`. + + .. attribute:: check + + The ID of the integrity check used by the input stream. This may be + :const:`CHECK_UNKNOWN` until enough of the input has been decoded to + determine what integrity check it uses. + + .. attribute:: eof + + True if the end-of-stream marker has been reached. + + .. attribute:: unused_data + + Data found after the end of the compressed stream. + + Before the end of the stream is reached, this will be ``b""``. + + +.. function:: compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None) + + Compress *data* (a :class:`bytes` object), returning the compressed data as a + :class:`bytes` object. + + See :class:`LZMACompressor` above for a description of the *format*, *check*, + *preset* and *filters* arguments. + + +.. function:: decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None) + + Decompress *data* (a :class:`bytes` object), returning the uncompressed data + as a :class:`bytes` object. + + If *data* is the concatenation of multiple distinct compressed streams, + decompress all of these streams, and return the concatenation of the results. + + See :class:`LZMADecompressor` above for a description of the *format*, + *memlimit* and *filters* arguments. + + +Miscellaneous +------------- + +.. function:: is_check_supported(check) + + Returns true if the given integrity check is supported on this system. + + :const:`CHECK_NONE` and :const:`CHECK_CRC32` are always supported. + :const:`CHECK_CRC64` and :const:`CHECK_SHA256` may be unavailable if you are + using a version of :program:`liblzma` that was compiled with a limited + feature set. + + +.. _filter-chain-specs: + +Specifying custom filter chains +------------------------------- + +A filter chain specifier is a sequence of dictionaries, where each dictionary +contains the ID and options for a single filter. Each dictionary must contain +the key ``"id"``, and may contain additional keys to specify filter-dependent +options. Valid filter IDs are as follows: + +* Compression filters: + * :const:`FILTER_LZMA1` (for use with :const:`FORMAT_ALONE`) + * :const:`FILTER_LZMA2` (for use with :const:`FORMAT_XZ` and :const:`FORMAT_RAW`) + +* Delta filter: + * :const:`FILTER_DELTA` + +* Branch-Call-Jump (BCJ) filters: + * :const:`FILTER_X86` + * :const:`FILTER_IA64` + * :const:`FILTER_ARM` + * :const:`FILTER_ARMTHUMB` + * :const:`FILTER_POWERPC` + * :const:`FILTER_SPARC` + +A filter chain can consist of up to 4 filters, and cannot be empty. The last +filter in the chain must be a compression filter, and any other filters must be +delta or BCJ filters. + +Compression filters support the following options (specified as additional +entries in the dictionary representing the filter): + + * ``preset``: A compression preset to use as a source of default values for + options that are not specified explicitly. + * ``dict_size``: Dictionary size in bytes. This should be between 4KiB and + 1.5GiB (inclusive). + * ``lc``: Number of literal context bits. + * ``lp``: Number of literal position bits. The sum ``lc + lp`` must be at + most 4. + * ``pb``: Number of position bits; must be at most 4. + * ``mode``: :const:`MODE_FAST` or :const:`MODE_NORMAL`. + * ``nice_len``: What should be considered a "nice length" for a match. + This should be 273 or less. + * ``mf``: What match finder to use -- :const:`MF_HC3`, :const:`MF_HC4`, + :const:`MF_BT2`, :const:`MF_BT3`, or :const:`MF_BT4`. + * ``depth``: Maximum search depth used by match finder. 0 (default) means to + select automatically based on other filter options. + +The delta filter stores the differences between bytes, producing more repetitive +input for the compressor in certain circumstances. It only supports a single +The delta filter supports only one option, ``dist``. This indicates the distance +between bytes to be subtracted. The default is 1, i.e. take the differences +between adjacent bytes. + +The BCJ filters are intended to be applied to machine code. They convert +relative branches, calls and jumps in the code to use absolute addressing, with +the aim of increasing the redundancy that can be exploited by the compressor. +These filters support one option, ``start_offset``. This specifies the address +that should be mapped to the beginning of the input data. The default is 0. + + +Examples +-------- + +Reading in a compressed file:: + + import lzma + with lzma.LZMAFile("file.xz") as f: + file_content = f.read() + +Creating a compressed file:: + + import lzma + data = b"Insert Data Here" + with lzma.LZMAFile("file.xz", "w") as f: + f.write(data) + +Compressing data in memory:: + + import lzma + data_in = b"Insert Data Here" + data_out = lzma.compress(data_in) + +Incremental compression:: + + import lzma + lzc = lzma.LZMACompressor() + out1 = lzc.compress(b"Some data\n") + out2 = lzc.compress(b"Another piece of data\n") + out3 = lzc.compress(b"Even more data\n") + out4 = lzc.flush() + # Concatenate all the partial results: + result = b"".join([out1, out2, out3, out4]) + +Writing compressed data to an already-open file:: + + import lzma + with open("file.xz", "wb") as f: + f.write(b"This data will not be compressed\n") + with lzma.LZMAFile(f, "w") as lzf: + lzf.write(b"This *will* be compressed\n") + f.write(b"Not compressed\n") + +Creating a compressed file using a custom filter chain:: + + import lzma + my_filters = [ + {"id": lzma.FILTER_DELTA, "dist": 5}, + {"id": lzma.FILTER_LZMA2, "preset": 7 | lzma.PRESET_EXTREME}, + ] + with lzma.LZMAFile("file.xz", "w", filters=my_filters) as f: + f.write(b"blah blah blah") |