diff options
-rw-r--r-- | Doc/ACKS.txt | 1 | ||||
-rw-r--r-- | Doc/library/bz2.rst | 204 | ||||
-rw-r--r-- | Lib/bz2.py | 392 | ||||
-rw-r--r-- | Lib/test/test_bz2.py | 142 | ||||
-rw-r--r-- | Misc/NEWS | 4 | ||||
-rw-r--r-- | Modules/_bz2module.c | 583 | ||||
-rw-r--r-- | Modules/bz2module.c | 2180 | ||||
-rw-r--r-- | PCbuild/_bz2.vcproj (renamed from PCbuild/bz2.vcproj) | 4 | ||||
-rw-r--r-- | PCbuild/pcbuild.sln | 2 | ||||
-rw-r--r-- | PCbuild/readme.txt | 6 | ||||
-rw-r--r-- | setup.py | 4 |
11 files changed, 1192 insertions, 2330 deletions
diff --git a/Doc/ACKS.txt b/Doc/ACKS.txt index 7f67d36..755f647 100644 --- a/Doc/ACKS.txt +++ b/Doc/ACKS.txt @@ -202,6 +202,7 @@ docs@python.org), and we'll be glad to correct the problem. * Jim Tittsler * David Turner * Ville Vainio + * Nadeem Vawda * Martijn Vries * Charles G. Waldman * Greg Ward diff --git a/Doc/library/bz2.rst b/Doc/library/bz2.rst index d9a2bad..2ccdb51 100644 --- a/Doc/library/bz2.rst +++ b/Doc/library/bz2.rst @@ -1,189 +1,149 @@ -:mod:`bz2` --- Compression compatible with :program:`bzip2` -=========================================================== +:mod:`bz2` --- Support for :program:`bzip2` compression +======================================================= .. module:: bz2 - :synopsis: Interface to compression and decompression routines - compatible with bzip2. + :synopsis: Interfaces for bzip2 compression and decompression. .. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> +.. moduleauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> .. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com> +.. sectionauthor:: Nadeem Vawda <nadeem.vawda@gmail.com> -This module provides a comprehensive interface for the bz2 compression library. -It implements a complete file interface, one-shot (de)compression functions, and -types for sequential (de)compression. +This module provides a comprehensive interface for compressing and +decompressing data using the bzip2 compression algorithm. -For other archive formats, see the :mod:`gzip`, :mod:`zipfile`, and +For related file formats, see the :mod:`gzip`, :mod:`zipfile`, and :mod:`tarfile` modules. -Here is a summary of the features offered by the bz2 module: +The :mod:`bz2` module contains: -* :class:`BZ2File` class implements a complete file interface, including - :meth:`~BZ2File.readline`, :meth:`~BZ2File.readlines`, - :meth:`~BZ2File.writelines`, :meth:`~BZ2File.seek`, etc; +* The :class:`BZ2File` class for reading and writing compressed files. +* The :class:`BZ2Compressor` and :class:`BZ2Decompressor` classes for + incremental (de)compression. +* The :func:`compress` and :func:`decompress` functions for one-shot + (de)compression. -* :class:`BZ2File` class implements emulated :meth:`~BZ2File.seek` support; - -* :class:`BZ2File` class implements universal newline support; - -* :class:`BZ2File` class offers an optimized line iteration using a readahead - algorithm; - -* Sequential (de)compression supported by :class:`BZ2Compressor` and - :class:`BZ2Decompressor` classes; - -* One-shot (de)compression supported by :func:`compress` and :func:`decompress` - functions; - -* Thread safety uses individual locking mechanism. +All of the classes in this module may safely be accessed from multiple threads. (De)compression of files ------------------------ -Handling of compressed files is offered by the :class:`BZ2File` class. +.. class:: BZ2File(filename=None, mode='r', buffering=None, compresslevel=9, fileobj=None) + Open a bzip2-compressed file. -.. class:: BZ2File(filename, mode='r', buffering=0, compresslevel=9) + The :class:`BZ2File` can wrap an existing :term:`file object` (given by + *fileobj*), or operate directly on a named file (named by *filename*). + Exactly one of these two parameters should be provided. - Open a bz2 file. Mode can be either ``'r'`` or ``'w'``, for reading (default) - or writing. When opened for writing, the file will be created if it doesn't - exist, and truncated otherwise. If *buffering* is given, ``0`` means - unbuffered, and larger numbers specify the buffer size; the default is - ``0``. If *compresslevel* is given, it must be a number between ``1`` and - ``9``; the default is ``9``. Add a ``'U'`` to mode to open the file for input - with universal newline support. Any line ending in the input file will be - seen as a ``'\n'`` in Python. Also, a file so opened gains the attribute - :attr:`newlines`; the value for this attribute is one of ``None`` (no newline - read yet), ``'\r'``, ``'\n'``, ``'\r\n'`` or a tuple containing all the - newline types seen. Universal newlines are available only when - reading. Instances support iteration in the same way as normal :class:`file` - instances. - - :class:`BZ2File` supports the :keyword:`with` statement. - - .. versionchanged:: 3.1 - Support for the :keyword:`with` statement was added. + The *mode* argument can be either ``'r'`` for reading (default), or ``'w'`` + for writing. + The *buffering* argument is ignored. Its use is deprecated. - .. method:: close() + If *mode* is ``'w'``, *compresslevel* can be a number between ``1`` and + ``9`` specifying the level of compression: ``1`` produces the least + compression, and ``9`` (default) produces the most compression. - Close the file. Sets data attribute :attr:`closed` to true. A closed file - cannot be used for further I/O operations. :meth:`close` may be called - more than once without error. + :class:`BZ2File` provides all of the members specified by the + :class:`io.BufferedIOBase`, except for :meth:`detach` and :meth:`truncate`. + Iteration and the :keyword:`with` statement are supported. + :class:`BZ2File` also provides the following method: - .. method:: read([size]) + .. method:: peek([n]) - Read at most *size* uncompressed bytes, returned as a byte string. If the - *size* argument is negative or omitted, read until EOF is reached. + Return buffered data without advancing the file position. At least one + byte of data will be returned (unless at EOF). The exact number of bytes + returned is unspecified. + .. versionadded:: 3.3 - .. method:: readline([size]) - - Return the next line from the file, as a byte string, retaining newline. - A non-negative *size* argument limits the maximum number of bytes to - return (an incomplete line may be returned then). Return an empty byte - string at EOF. - - - .. method:: readlines([size]) - - Return a list of lines read. The optional *size* argument, if given, is an - approximate bound on the total number of bytes in the lines returned. - - - .. method:: seek(offset[, whence]) + .. versionchanged:: 3.1 + Support for the :keyword:`with` statement was added. - Move to new file position. Argument *offset* is a byte count. Optional - argument *whence* defaults to ``os.SEEK_SET`` or ``0`` (offset from start - of file; offset should be ``>= 0``); other values are ``os.SEEK_CUR`` or - ``1`` (move relative to current position; offset can be positive or - negative), and ``os.SEEK_END`` or ``2`` (move relative to end of file; - offset is usually negative, although many platforms allow seeking beyond - the end of a file). + .. versionchanged:: 3.3 + The :meth:`fileno`, :meth:`readable`, :meth:`seekable`, :meth:`writable`, + :meth:`read1` and :meth:`readinto` methods were added. - Note that seeking of bz2 files is emulated, and depending on the - parameters the operation may be extremely slow. + .. versionchanged:: 3.3 + The *fileobj* argument to the constructor was added. - .. method:: tell() +Incremental (de)compression +--------------------------- - Return the current file position, an integer. +.. class:: BZ2Compressor(compresslevel=9) + Create a new compressor object. This object may be used to compress data + incrementally. For one-shot compression, use the :func:`compress` function + instead. - .. method:: write(data) + *compresslevel*, if given, must be a number between ``1`` and ``9``. The + default is ``9``. - Write the byte string *data* to file. Note that due to buffering, - :meth:`close` may be needed before the file on disk reflects the data - written. + .. method:: compress(data) + Provide data to the compressor object. Returns a chunk of compressed data + if possible, or an empty byte string otherwise. - .. method:: writelines(sequence_of_byte_strings) + When you have finished providing data to the compressor, call the + :meth:`flush` method to finish the compression process. - Write the sequence of byte strings to the file. Note that newlines are not - added. The sequence can be any iterable object producing byte strings. - This is equivalent to calling write() for each byte string. + .. method:: flush() -Sequential (de)compression --------------------------- + Finish the compression process. Returns the compressed data left in + internal buffers. -Sequential compression and decompression is done using the classes -:class:`BZ2Compressor` and :class:`BZ2Decompressor`. + The compressor object may not be used after this method has been called. -.. class:: BZ2Compressor(compresslevel=9) +.. class:: BZ2Decompressor() - Create a new compressor object. This object may be used to compress data - sequentially. If you want to compress data in one shot, use the - :func:`compress` function instead. The *compresslevel* parameter, if given, - must be a number between ``1`` and ``9``; the default is ``9``. + Create a new decompressor object. This object may be used to decompress data + incrementally. For one-shot compression, use the :func:`decompress` function + instead. - .. method:: compress(data) + .. method:: decompress(data) - Provide more data to the compressor object. It will return chunks of - compressed data whenever possible. When you've finished providing data to - compress, call the :meth:`flush` method to finish the compression process, - and return what is left in internal buffers. + Provide data to the decompressor object. Returns a chunk of decompressed + data if possible, or an empty byte string otherwise. + Attempting to decompress data after the end of stream is reached raises + an :exc:`EOFError`. If any data is found after the end of the stream, it + is ignored and saved in the :attr:`unused_data` attribute. - .. method:: flush() - Finish the compression process and return what is left in internal - buffers. You must not use the compressor object after calling this method. + .. attribute:: eof + True if the end-of-stream marker has been reached. -.. class:: BZ2Decompressor() + .. versionadded:: 3.3 - Create a new decompressor object. This object may be used to decompress data - sequentially. If you want to decompress data in one shot, use the - :func:`decompress` function instead. - .. method:: decompress(data) + .. attribute:: unused_data - Provide more data to the decompressor object. It will return chunks of - decompressed data whenever possible. If you try to decompress data after - the end of stream is found, :exc:`EOFError` will be raised. If any data - was found after the end of stream, it'll be ignored and saved in - :attr:`unused_data` attribute. + Data found after the end of the compressed stream. One-shot (de)compression ------------------------ -One-shot compression and decompression is provided through the :func:`compress` -and :func:`decompress` functions. +.. function:: compress(data, compresslevel=9) + Compress *data*. -.. function:: compress(data, compresslevel=9) + *compresslevel*, if given, must be a number between ``1`` and ``9``. The + default is ``9``. - Compress *data* in one shot. If you want to compress data sequentially, use - an instance of :class:`BZ2Compressor` instead. The *compresslevel* parameter, - if given, must be a number between ``1`` and ``9``; the default is ``9``. + For incremental compression, use a :class:`BZ2Compressor` instead. .. function:: decompress(data) - Decompress *data* in one shot. If you want to decompress data sequentially, - use an instance of :class:`BZ2Decompressor` instead. + Decompress *data*. + + For incremental decompression, use a :class:`BZ2Decompressor` instead. diff --git a/Lib/bz2.py b/Lib/bz2.py new file mode 100644 index 0000000..667fffd --- /dev/null +++ b/Lib/bz2.py @@ -0,0 +1,392 @@ +"""Interface to the libbzip2 compression library. + +This module provides a file interface, classes for incremental +(de)compression, and functions for one-shot (de)compression. +""" + +__all__ = ["BZ2File", "BZ2Compressor", "BZ2Decompressor", "compress", + "decompress"] + +__author__ = "Nadeem Vawda <nadeem.vawda@gmail.com>" + +import io +import threading +import warnings + +from _bz2 import BZ2Compressor, BZ2Decompressor + + +_MODE_CLOSED = 0 +_MODE_READ = 1 +_MODE_READ_EOF = 2 +_MODE_WRITE = 3 + +_BUFFER_SIZE = 8192 + + +class BZ2File(io.BufferedIOBase): + + """A file object providing transparent bzip2 (de)compression. + + A BZ2File can act as a wrapper for an existing file object, or refer + directly to a named file on disk. + + Note that BZ2File provides a *binary* file interface - data read is + returned as bytes, and data to be written should be given as bytes. + """ + + def __init__(self, filename=None, mode="r", buffering=None, + compresslevel=9, fileobj=None): + """Open a bzip2-compressed file. + + If filename is given, open the named file. Otherwise, operate on + the file object given by fileobj. Exactly one of these two + parameters should be provided. + + mode can be 'r' for reading (default), or 'w' for writing. + + buffering is ignored. Its use is deprecated. + + If mode is 'w', compresslevel can be a number between 1 and 9 + specifying the level of compression: 1 produces the least + compression, and 9 (default) produces the most compression. + """ + # This lock must be recursive, so that BufferedIOBase's + # readline(), readlines() and writelines() don't deadlock. + self._lock = threading.RLock() + self._fp = None + self._closefp = False + self._mode = _MODE_CLOSED + self._pos = 0 + self._size = -1 + + if buffering is not None: + warnings.warn("Use of 'buffering' argument is deprecated", + DeprecationWarning) + + if not (1 <= compresslevel <= 9): + raise ValueError("compresslevel must be between 1 and 9") + + if mode in ("", "r", "rb"): + mode = "rb" + mode_code = _MODE_READ + self._decompressor = BZ2Decompressor() + self._buffer = None + elif mode in ("w", "wb"): + mode = "wb" + mode_code = _MODE_WRITE + self._compressor = BZ2Compressor() + else: + raise ValueError("Invalid mode: {!r}".format(mode)) + + if filename is not None and fileobj is None: + self._fp = open(filename, mode) + self._closefp = True + self._mode = mode_code + elif fileobj is not None and filename is None: + self._fp = fileobj + self._mode = mode_code + else: + raise ValueError("Must give exactly one of filename and fileobj") + + def close(self): + """Flush and close the file. + + May be called more than once without error. Once the file is + closed, any other operation on it will raise a ValueError. + """ + with self._lock: + if self._mode == _MODE_CLOSED: + return + try: + if self._mode in (_MODE_READ, _MODE_READ_EOF): + self._decompressor = None + elif self._mode == _MODE_WRITE: + self._fp.write(self._compressor.flush()) + self._compressor = None + finally: + try: + if self._closefp: + self._fp.close() + finally: + self._fp = None + self._closefp = False + self._mode = _MODE_CLOSED + self._buffer = None + + @property + def closed(self): + """True if this file is closed.""" + return self._mode == _MODE_CLOSED + + def fileno(self): + """Return the file descriptor for the underlying file.""" + return self._fp.fileno() + + def seekable(self): + """Return whether the file supports seeking.""" + return self.readable() + + def readable(self): + """Return whether the file was opened for reading.""" + return self._mode in (_MODE_READ, _MODE_READ_EOF) + + def writable(self): + """Return whether the file was opened for writing.""" + return self._mode == _MODE_WRITE + + # Mode-checking helper functions. + + def _check_not_closed(self): + if self.closed: + raise ValueError("I/O operation on closed file") + + def _check_can_read(self): + if not self.readable(): + self._check_not_closed() + raise io.UnsupportedOperation("File not open for reading") + + def _check_can_write(self): + if not self.writable(): + self._check_not_closed() + raise io.UnsupportedOperation("File not open for writing") + + def _check_can_seek(self): + if not self.seekable(): + self._check_not_closed() + raise io.UnsupportedOperation("Seeking is only supported " + "on files opening for reading") + + # Fill the readahead buffer if it is empty. Returns False on EOF. + def _fill_buffer(self): + if self._buffer: + return True + if self._decompressor.eof: + self._mode = _MODE_READ_EOF + self._size = self._pos + return False + rawblock = self._fp.read(_BUFFER_SIZE) + if not rawblock: + raise EOFError("Compressed file ended before the " + "end-of-stream marker was reached") + self._buffer = self._decompressor.decompress(rawblock) + return True + + # Read data until EOF. + # If return_data is false, consume the data without returning it. + def _read_all(self, return_data=True): + blocks = [] + while self._fill_buffer(): + if return_data: + blocks.append(self._buffer) + self._pos += len(self._buffer) + self._buffer = None + if return_data: + return b"".join(blocks) + + # Read a block of up to n bytes. + # If return_data is false, consume the data without returning it. + def _read_block(self, n, return_data=True): + blocks = [] + while n > 0 and self._fill_buffer(): + if n < len(self._buffer): + data = self._buffer[:n] + self._buffer = self._buffer[n:] + else: + data = self._buffer + self._buffer = None + if return_data: + blocks.append(data) + self._pos += len(data) + n -= len(data) + if return_data: + return b"".join(blocks) + + def peek(self, n=0): + """Return buffered data without advancing the file position. + + Always returns at least one byte of data, unless at EOF. + The exact number of bytes returned is unspecified. + """ + with self._lock: + self._check_can_read() + if self._mode == _MODE_READ_EOF or not self._fill_buffer(): + return b"" + return self._buffer + + def read(self, size=-1): + """Read up to size uncompressed bytes from the file. + + If size is negative or omitted, read until EOF is reached. + Returns b'' if the file is already at EOF. + """ + with self._lock: + self._check_can_read() + if self._mode == _MODE_READ_EOF or size == 0: + return b"" + elif size < 0: + return self._read_all() + else: + return self._read_block(size) + + def read1(self, size=-1): + """Read up to size uncompressed bytes with at most one read + from the underlying stream. + + Returns b'' if the file is at EOF. + """ + with self._lock: + self._check_can_read() + if (size == 0 or self._mode == _MODE_READ_EOF or + not self._fill_buffer()): + return b"" + if 0 < size < len(self._buffer): + data = self._buffer[:size] + self._buffer = self._buffer[size:] + else: + data = self._buffer + self._buffer = None + self._pos += len(data) + return data + + def readinto(self, b): + """Read up to len(b) bytes into b. + + Returns the number of bytes read (0 for EOF). + """ + with self._lock: + return io.BufferedIOBase.readinto(self, b) + + def readline(self, size=-1): + """Read a line of uncompressed bytes from the file. + + The terminating newline (if present) is retained. If size is + non-negative, no more than size bytes will be read (in which + case the line may be incomplete). Returns b'' if already at EOF. + """ + if not hasattr(size, "__index__"): + raise TypeError("Integer argument expected") + size = size.__index__() + with self._lock: + return io.BufferedIOBase.readline(self, size) + + def readlines(self, size=-1): + """Read a list of lines of uncompressed bytes from the file. + + size can be specified to control the number of lines read: no + further lines will be read once the total size of the lines read + so far equals or exceeds size. + """ + if not hasattr(size, "__index__"): + raise TypeError("Integer argument expected") + size = size.__index__() + with self._lock: + return io.BufferedIOBase.readlines(self, size) + + def write(self, data): + """Write a byte string to the file. + + Returns the number of uncompressed bytes written, which is + always len(data). Note that due to buffering, the file on disk + may not reflect the data written until close() is called. + """ + with self._lock: + self._check_can_write() + compressed = self._compressor.compress(data) + self._fp.write(compressed) + self._pos += len(data) + return len(data) + + def writelines(self, seq): + """Write a sequence of byte strings to the file. + + Returns the number of uncompressed bytes written. + seq can be any iterable yielding byte strings. + + Line separators are not added between the written byte strings. + """ + with self._lock: + return io.BufferedIOBase.writelines(self, seq) + + # Rewind the file to the beginning of the data stream. + def _rewind(self): + self._fp.seek(0, 0) + self._mode = _MODE_READ + self._pos = 0 + self._decompressor = BZ2Decompressor() + self._buffer = None + + def seek(self, offset, whence=0): + """Change the file position. + + The new position is specified by offset, relative to the + position indicated by whence. Values for whence are: + + 0: start of stream (default); offset must not be negative + 1: current stream position + 2: end of stream; offset must not be positive + + Returns the new file position. + + Note that seeking is emulated, so depending on the parameters, + this operation may be extremely slow. + """ + with self._lock: + self._check_can_seek() + + # Recalculate offset as an absolute file position. + if whence == 0: + pass + elif whence == 1: + offset = self._pos + offset + elif whence == 2: + # Seeking relative to EOF - we need to know the file's size. + if self._size < 0: + self._read_all(return_data=False) + offset = self._size + offset + else: + raise ValueError("Invalid value for whence: {}".format(whence)) + + # Make it so that offset is the number of bytes to skip forward. + if offset < self._pos: + self._rewind() + else: + offset -= self._pos + + # Read and discard data until we reach the desired position. + if self._mode != _MODE_READ_EOF: + self._read_block(offset, return_data=False) + + return self._pos + + def tell(self): + """Return the current file position.""" + with self._lock: + self._check_not_closed() + return self._pos + + +def compress(data, compresslevel=9): + """Compress a block of data. + + compresslevel, if given, must be a number between 1 and 9. + + For incremental compression, use a BZ2Compressor object instead. + """ + comp = BZ2Compressor(compresslevel) + return comp.compress(data) + comp.flush() + + +def decompress(data): + """Decompress a block of data. + + For incremental decompression, use a BZ2Decompressor object instead. + """ + if len(data) == 0: + return b"" + decomp = BZ2Decompressor() + result = decomp.decompress(data) + if not decomp.eof: + raise ValueError("Compressed data ended before the " + "end-of-stream marker was reached") + return result diff --git a/Lib/test/test_bz2.py b/Lib/test/test_bz2.py index be35580..cee38e0 100644 --- a/Lib/test/test_bz2.py +++ b/Lib/test/test_bz2.py @@ -21,7 +21,30 @@ has_cmdline_bunzip2 = sys.platform not in ("win32", "os2emx") class BaseTest(unittest.TestCase): "Base for other testcases." - TEXT = b'root:x:0:0:root:/root:/bin/bash\nbin:x:1:1:bin:/bin:\ndaemon:x:2:2:daemon:/sbin:\nadm:x:3:4:adm:/var/adm:\nlp:x:4:7:lp:/var/spool/lpd:\nsync:x:5:0:sync:/sbin:/bin/sync\nshutdown:x:6:0:shutdown:/sbin:/sbin/shutdown\nhalt:x:7:0:halt:/sbin:/sbin/halt\nmail:x:8:12:mail:/var/spool/mail:\nnews:x:9:13:news:/var/spool/news:\nuucp:x:10:14:uucp:/var/spool/uucp:\noperator:x:11:0:operator:/root:\ngames:x:12:100:games:/usr/games:\ngopher:x:13:30:gopher:/usr/lib/gopher-data:\nftp:x:14:50:FTP User:/var/ftp:/bin/bash\nnobody:x:65534:65534:Nobody:/home:\npostfix:x:100:101:postfix:/var/spool/postfix:\nniemeyer:x:500:500::/home/niemeyer:/bin/bash\npostgres:x:101:102:PostgreSQL Server:/var/lib/pgsql:/bin/bash\nmysql:x:102:103:MySQL server:/var/lib/mysql:/bin/bash\nwww:x:103:104::/var/www:/bin/false\n' + TEXT_LINES = [ + b'root:x:0:0:root:/root:/bin/bash\n', + b'bin:x:1:1:bin:/bin:\n', + b'daemon:x:2:2:daemon:/sbin:\n', + b'adm:x:3:4:adm:/var/adm:\n', + b'lp:x:4:7:lp:/var/spool/lpd:\n', + b'sync:x:5:0:sync:/sbin:/bin/sync\n', + b'shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown\n', + b'halt:x:7:0:halt:/sbin:/sbin/halt\n', + b'mail:x:8:12:mail:/var/spool/mail:\n', + b'news:x:9:13:news:/var/spool/news:\n', + b'uucp:x:10:14:uucp:/var/spool/uucp:\n', + b'operator:x:11:0:operator:/root:\n', + b'games:x:12:100:games:/usr/games:\n', + b'gopher:x:13:30:gopher:/usr/lib/gopher-data:\n', + b'ftp:x:14:50:FTP User:/var/ftp:/bin/bash\n', + b'nobody:x:65534:65534:Nobody:/home:\n', + b'postfix:x:100:101:postfix:/var/spool/postfix:\n', + b'niemeyer:x:500:500::/home/niemeyer:/bin/bash\n', + b'postgres:x:101:102:PostgreSQL Server:/var/lib/pgsql:/bin/bash\n', + b'mysql:x:102:103:MySQL server:/var/lib/mysql:/bin/bash\n', + b'www:x:103:104::/var/www:/bin/false\n', + ] + TEXT = b''.join(TEXT_LINES) DATA = b'BZh91AY&SY.\xc8N\x18\x00\x01>_\x80\x00\x10@\x02\xff\xf0\x01\x07n\x00?\xe7\xff\xe00\x01\x99\xaa\x00\xc0\x03F\x86\x8c#&\x83F\x9a\x03\x06\xa6\xd0\xa6\x93M\x0fQ\xa7\xa8\x06\x804hh\x12$\x11\xa4i4\xf14S\xd2<Q\xb5\x0fH\xd3\xd4\xdd\xd5\x87\xbb\xf8\x94\r\x8f\xafI\x12\xe1\xc9\xf8/E\x00pu\x89\x12]\xc9\xbbDL\nQ\x0e\t1\x12\xdf\xa0\xc0\x97\xac2O9\x89\x13\x94\x0e\x1c7\x0ed\x95I\x0c\xaaJ\xa4\x18L\x10\x05#\x9c\xaf\xba\xbc/\x97\x8a#C\xc8\xe1\x8cW\xf9\xe2\xd0\xd6M\xa7\x8bXa<e\x84t\xcbL\xb3\xa7\xd9\xcd\xd1\xcb\x84.\xaf\xb3\xab\xab\xad`n}\xa0lh\tE,\x8eZ\x15\x17VH>\x88\xe5\xcd9gd6\x0b\n\xe9\x9b\xd5\x8a\x99\xf7\x08.K\x8ev\xfb\xf7xw\xbb\xdf\xa1\x92\xf1\xdd|/";\xa2\xba\x9f\xd5\xb1#A\xb6\xf6\xb3o\xc9\xc5y\\\xebO\xe7\x85\x9a\xbc\xb6f8\x952\xd5\xd7"%\x89>V,\xf7\xa6z\xe2\x9f\xa3\xdf\x11\x11"\xd6E)I\xa9\x13^\xca\xf3r\xd0\x03U\x922\xf26\xec\xb6\xed\x8b\xc3U\x13\x9d\xc5\x170\xa4\xfa^\x92\xacDF\x8a\x97\xd6\x19\xfe\xdd\xb8\xbd\x1a\x9a\x19\xa3\x80ankR\x8b\xe5\xd83]\xa9\xc6\x08\x82f\xf6\xb9"6l$\xb8j@\xc0\x8a\xb0l1..\xbak\x83ls\x15\xbc\xf4\xc1\x13\xbe\xf8E\xb8\x9d\r\xa8\x9dk\x84\xd3n\xfa\xacQ\x07\xb1%y\xaav\xb4\x08\xe0z\x1b\x16\xf5\x04\xe9\xcc\xb9\x08z\x1en7.G\xfc]\xc9\x14\xe1B@\xbb!8`' DATA_CRLF = b'BZh91AY&SY\xaez\xbbN\x00\x01H\xdf\x80\x00\x12@\x02\xff\xf0\x01\x07n\x00?\xe7\xff\xe0@\x01\xbc\xc6`\x86*\x8d=M\xa9\x9a\x86\xd0L@\x0fI\xa6!\xa1\x13\xc8\x88jdi\x8d@\x03@\x1a\x1a\x0c\x0c\x83 \x00\xc4h2\x19\x01\x82D\x84e\t\xe8\x99\x89\x19\x1ah\x00\r\x1a\x11\xaf\x9b\x0fG\xf5(\x1b\x1f?\t\x12\xcf\xb5\xfc\x95E\x00ps\x89\x12^\xa4\xdd\xa2&\x05(\x87\x04\x98\x89u\xe40%\xb6\x19\'\x8c\xc4\x89\xca\x07\x0e\x1b!\x91UIFU%C\x994!DI\xd2\xfa\xf0\xf1N8W\xde\x13A\xf5\x9cr%?\x9f3;I45A\xd1\x8bT\xb1<l\xba\xcb_\xc00xY\x17r\x17\x88\x08\x08@\xa0\ry@\x10\x04$)`\xf2\xce\x89z\xb0s\xec\x9b.iW\x9d\x81\xb5-+t\x9f\x1a\'\x97dB\xf5x\xb5\xbe.[.\xd7\x0e\x81\xe7\x08\x1cN`\x88\x10\xca\x87\xc3!"\x80\x92R\xa1/\xd1\xc0\xe6mf\xac\xbd\x99\xcca\xb3\x8780>\xa4\xc7\x8d\x1a\\"\xad\xa1\xabyBg\x15\xb9l\x88\x88\x91k"\x94\xa4\xd4\x89\xae*\xa6\x0b\x10\x0c\xd6\xd4m\xe86\xec\xb5j\x8a\x86j\';\xca.\x01I\xf2\xaaJ\xe8\x88\x8cU+t3\xfb\x0c\n\xa33\x13r2\r\x16\xe0\xb3(\xbf\x1d\x83r\xe7M\xf0D\x1365\xd8\x88\xd3\xa4\x92\xcb2\x06\x04\\\xc1\xb0\xea//\xbek&\xd8\xe6+t\xe5\xa1\x13\xada\x16\xder5"w]\xa2i\xb7[\x97R \xe2IT\xcd;Z\x04dk4\xad\x8a\t\xd3\x81z\x10\xf1:^`\xab\x1f\xc5\xdc\x91N\x14$+\x9e\xae\xd3\x80' @@ -54,13 +77,15 @@ class BZ2FileTest(BaseTest): if os.path.isfile(self.filename): os.unlink(self.filename) - def createTempFile(self, crlf=0): + def getData(self, crlf=False): + if crlf: + return self.DATA_CRLF + else: + return self.DATA + + def createTempFile(self, crlf=False): with open(self.filename, "wb") as f: - if crlf: - data = self.DATA_CRLF - else: - data = self.DATA - f.write(data) + f.write(self.getData(crlf)) def testRead(self): # "Test BZ2File.read()" @@ -70,7 +95,7 @@ class BZ2FileTest(BaseTest): self.assertEqual(bz2f.read(), self.TEXT) def testRead0(self): - # Test BBZ2File.read(0)" + # "Test BBZ2File.read(0)" self.createTempFile() with BZ2File(self.filename) as bz2f: self.assertRaises(TypeError, bz2f.read, None) @@ -94,6 +119,28 @@ class BZ2FileTest(BaseTest): with BZ2File(self.filename) as bz2f: self.assertEqual(bz2f.read(100), self.TEXT[:100]) + def testPeek(self): + # "Test BZ2File.peek()" + self.createTempFile() + with BZ2File(self.filename) as bz2f: + pdata = bz2f.peek() + self.assertNotEqual(len(pdata), 0) + self.assertTrue(self.TEXT.startswith(pdata)) + self.assertEqual(bz2f.read(), self.TEXT) + + def testReadInto(self): + # "Test BZ2File.readinto()" + self.createTempFile() + with BZ2File(self.filename) as bz2f: + n = 128 + b = bytearray(n) + self.assertEqual(bz2f.readinto(b), n) + self.assertEqual(b, self.TEXT[:n]) + n = len(self.TEXT) - n + b = bytearray(len(self.TEXT)) + self.assertEqual(bz2f.readinto(b), n) + self.assertEqual(b[:n], self.TEXT[-n:]) + def testReadLine(self): # "Test BZ2File.readline()" self.createTempFile() @@ -125,7 +172,7 @@ class BZ2FileTest(BaseTest): bz2f = BZ2File(self.filename) bz2f.close() self.assertRaises(ValueError, bz2f.__next__) - # This call will deadlock of the above .__next__ call failed to + # This call will deadlock if the above .__next__ call failed to # release the lock. self.assertRaises(ValueError, bz2f.readlines) @@ -217,6 +264,13 @@ class BZ2FileTest(BaseTest): self.assertEqual(bz2f.tell(), 0) self.assertEqual(bz2f.read(), self.TEXT) + def testFileno(self): + # "Test BZ2File.fileno()" + self.createTempFile() + with open(self.filename) as rawf: + with BZ2File(fileobj=rawf) as bz2f: + self.assertEqual(bz2f.fileno(), rawf.fileno()) + def testOpenDel(self): # "Test opening and deleting a file many times" self.createTempFile() @@ -278,17 +332,65 @@ class BZ2FileTest(BaseTest): t.join() def testMixedIterationReads(self): - # Issue #8397: mixed iteration and reads should be forbidden. - with bz2.BZ2File(self.filename, 'wb') as f: - # The internal buffer size is hard-wired to 8192 bytes, we must - # write out more than that for the test to stop half through - # the buffer. - f.write(self.TEXT * 100) - with bz2.BZ2File(self.filename, 'rb') as f: - next(f) - self.assertRaises(ValueError, f.read) - self.assertRaises(ValueError, f.readline) - self.assertRaises(ValueError, f.readlines) + # "Test mixed iteration and reads." + self.createTempFile() + linelen = len(self.TEXT_LINES[0]) + halflen = linelen // 2 + with bz2.BZ2File(self.filename) as bz2f: + bz2f.read(halflen) + self.assertEqual(next(bz2f), self.TEXT_LINES[0][halflen:]) + self.assertEqual(bz2f.read(), self.TEXT[linelen:]) + with bz2.BZ2File(self.filename) as bz2f: + bz2f.readline() + self.assertEqual(next(bz2f), self.TEXT_LINES[1]) + self.assertEqual(bz2f.readline(), self.TEXT_LINES[2]) + with bz2.BZ2File(self.filename) as bz2f: + bz2f.readlines() + with self.assertRaises(StopIteration): + next(bz2f) + self.assertEqual(bz2f.readlines(), []) + + def testReadBytesIO(self): + # "Test BZ2File.read() with BytesIO source" + with BytesIO(self.getData()) as bio: + with BZ2File(fileobj=bio) as bz2f: + self.assertRaises(TypeError, bz2f.read, None) + self.assertEqual(bz2f.read(), self.TEXT) + self.assertFalse(bio.closed) + + def testPeekBytesIO(self): + # "Test BZ2File.peek() with BytesIO source" + with BytesIO(self.getData()) as bio: + with BZ2File(fileobj=bio) as bz2f: + pdata = bz2f.peek() + self.assertNotEqual(len(pdata), 0) + self.assertTrue(self.TEXT.startswith(pdata)) + self.assertEqual(bz2f.read(), self.TEXT) + + def testWriteBytesIO(self): + # "Test BZ2File.write() with BytesIO destination" + with BytesIO() as bio: + with BZ2File(fileobj=bio, mode="w") as bz2f: + self.assertRaises(TypeError, bz2f.write) + bz2f.write(self.TEXT) + self.assertEqual(self.decompress(bio.getvalue()), self.TEXT) + self.assertFalse(bio.closed) + + def testSeekForwardBytesIO(self): + # "Test BZ2File.seek(150, 0) with BytesIO source" + with BytesIO(self.getData()) as bio: + with BZ2File(fileobj=bio) as bz2f: + self.assertRaises(TypeError, bz2f.seek) + bz2f.seek(150) + self.assertEqual(bz2f.read(), self.TEXT[150:]) + + def testSeekBackwardsBytesIO(self): + # "Test BZ2File.seek(-150, 1) with BytesIO source" + with BytesIO(self.getData()) as bio: + with BZ2File(fileobj=bio) as bz2f: + bz2f.read(500) + bz2f.seek(-150, 1) + self.assertEqual(bz2f.read(), self.TEXT[500-150:]) class BZ2CompressorTest(BaseTest): def testCompress(self): @@ -87,6 +87,10 @@ Core and Builtins Library ------- +- Issue #5863: Rewrite BZ2File in pure Python, and allow it to accept + file-like objects using a new ``fileobj`` constructor argument. Patch by + Nadeem Vawda. + - unittest.TestCase.assertSameElements has been removed. - sys.getfilesystemencoding() raises a RuntimeError if initfsencoding() was not diff --git a/Modules/_bz2module.c b/Modules/_bz2module.c new file mode 100644 index 0000000..522b3e5 --- /dev/null +++ b/Modules/_bz2module.c @@ -0,0 +1,583 @@ +/* _bz2 - Low-level Python interface to libbzip2. */ + +#define PY_SSIZE_T_CLEAN + +#include "Python.h" +#include "structmember.h" + +#ifdef WITH_THREAD +#include "pythread.h" +#endif + +#include <bzlib.h> +#include <stdio.h> + + +#ifndef BZ_CONFIG_ERROR +#define BZ2_bzCompress bzCompress +#define BZ2_bzCompressInit bzCompressInit +#define BZ2_bzCompressEnd bzCompressEnd +#define BZ2_bzDecompress bzDecompress +#define BZ2_bzDecompressInit bzDecompressInit +#define BZ2_bzDecompressEnd bzDecompressEnd +#endif /* ! BZ_CONFIG_ERROR */ + + +#ifdef WITH_THREAD +#define ACQUIRE_LOCK(obj) do { \ + if (!PyThread_acquire_lock((obj)->lock, 0)) { \ + Py_BEGIN_ALLOW_THREADS \ + PyThread_acquire_lock((obj)->lock, 1); \ + Py_END_ALLOW_THREADS \ + } } while (0) +#define RELEASE_LOCK(obj) PyThread_release_lock((obj)->lock) +#else +#define ACQUIRE_LOCK(obj) +#define RELEASE_LOCK(obj) +#endif + + +typedef struct { + PyObject_HEAD + bz_stream bzs; + int flushed; +#ifdef WITH_THREAD + PyThread_type_lock lock; +#endif +} BZ2Compressor; + +typedef struct { + PyObject_HEAD + bz_stream bzs; + char eof; /* T_BOOL expects a char */ + PyObject *unused_data; +#ifdef WITH_THREAD + PyThread_type_lock lock; +#endif +} BZ2Decompressor; + + +/* Helper functions. */ + +static int +catch_bz2_error(int bzerror) +{ + switch(bzerror) { + case BZ_OK: + case BZ_RUN_OK: + case BZ_FLUSH_OK: + case BZ_FINISH_OK: + case BZ_STREAM_END: + return 0; + +#ifdef BZ_CONFIG_ERROR + case BZ_CONFIG_ERROR: + PyErr_SetString(PyExc_SystemError, + "libbzip2 was not compiled correctly"); + return 1; +#endif + case BZ_PARAM_ERROR: + PyErr_SetString(PyExc_ValueError, + "Internal error - " + "invalid parameters passed to libbzip2"); + return 1; + case BZ_MEM_ERROR: + PyErr_NoMemory(); + return 1; + case BZ_DATA_ERROR: + case BZ_DATA_ERROR_MAGIC: + PyErr_SetString(PyExc_IOError, "Invalid data stream"); + return 1; + case BZ_IO_ERROR: + PyErr_SetString(PyExc_IOError, "Unknown I/O error"); + return 1; + case BZ_UNEXPECTED_EOF: + PyErr_SetString(PyExc_EOFError, + "Compressed file ended before the logical " + "end-of-stream was detected"); + return 1; + case BZ_SEQUENCE_ERROR: + PyErr_SetString(PyExc_RuntimeError, + "Internal error - " + "Invalid sequence of commands sent to libbzip2"); + return 1; + default: + PyErr_Format(PyExc_IOError, + "Unrecognized error from libbzip2: %d", bzerror); + return 1; + } +} + +#if BUFSIZ < 8192 +#define SMALLCHUNK 8192 +#else +#define SMALLCHUNK BUFSIZ +#endif + +#if SIZEOF_INT < 4 +#define BIGCHUNK (512 * 32) +#else +#define BIGCHUNK (512 * 1024) +#endif + +static int +grow_buffer(PyObject **buf) +{ + size_t size = PyBytes_GET_SIZE(*buf); + if (size <= SMALLCHUNK) + return _PyBytes_Resize(buf, size + SMALLCHUNK); + else if (size <= BIGCHUNK) + return _PyBytes_Resize(buf, size * 2); + else + return _PyBytes_Resize(buf, size + BIGCHUNK); +} + + +/* BZ2Compressor class. */ + +static PyObject * +compress(BZ2Compressor *c, char *data, size_t len, int action) +{ + size_t data_size = 0; + PyObject *result; + + result = PyBytes_FromStringAndSize(NULL, SMALLCHUNK); + if (result == NULL) + return NULL; + c->bzs.next_in = data; + /* FIXME This is not 64-bit clean - avail_in is an int. */ + c->bzs.avail_in = len; + c->bzs.next_out = PyBytes_AS_STRING(result); + c->bzs.avail_out = PyBytes_GET_SIZE(result); + for (;;) { + char *this_out; + int bzerror; + + Py_BEGIN_ALLOW_THREADS + this_out = c->bzs.next_out; + bzerror = BZ2_bzCompress(&c->bzs, action); + data_size += c->bzs.next_out - this_out; + Py_END_ALLOW_THREADS + if (catch_bz2_error(bzerror)) + goto error; + + /* In regular compression mode, stop when input data is exhausted. + In flushing mode, stop when all buffered data has been flushed. */ + if ((action == BZ_RUN && c->bzs.avail_in == 0) || + (action == BZ_FINISH && bzerror == BZ_STREAM_END)) + break; + + if (c->bzs.avail_out == 0) { + if (grow_buffer(&result) < 0) + goto error; + c->bzs.next_out = PyBytes_AS_STRING(result) + data_size; + c->bzs.avail_out = PyBytes_GET_SIZE(result) - data_size; + } + } + if (data_size != PyBytes_GET_SIZE(result)) + if (_PyBytes_Resize(&result, data_size) < 0) + goto error; + return result; + +error: + Py_XDECREF(result); + return NULL; +} + +PyDoc_STRVAR(BZ2Compressor_compress__doc__, +"compress(data) -> bytes\n" +"\n" +"Provide data to the compressor object. Returns a chunk of\n" +"compressed data if possible, or b'' otherwise.\n" +"\n" +"When you have finished providing data to the compressor, call the\n" +"flush() method to finish the compression process.\n"); + +static PyObject * +BZ2Compressor_compress(BZ2Compressor *self, PyObject *args) +{ + Py_buffer buffer; + PyObject *result = NULL; + + if (!PyArg_ParseTuple(args, "y*:compress", &buffer)) + return NULL; + + ACQUIRE_LOCK(self); + if (self->flushed) + PyErr_SetString(PyExc_ValueError, "Compressor has been flushed"); + else + result = compress(self, buffer.buf, buffer.len, BZ_RUN); + RELEASE_LOCK(self); + PyBuffer_Release(&buffer); + return result; +} + +PyDoc_STRVAR(BZ2Compressor_flush__doc__, +"flush() -> bytes\n" +"\n" +"Finish the compression process. Returns the compressed data left\n" +"in internal buffers.\n" +"\n" +"The compressor object may not be used after this method is called.\n"); + +static PyObject * +BZ2Compressor_flush(BZ2Compressor *self, PyObject *noargs) +{ + PyObject *result = NULL; + + ACQUIRE_LOCK(self); + if (self->flushed) + PyErr_SetString(PyExc_ValueError, "Repeated call to flush()"); + else { + self->flushed = 1; + result = compress(self, NULL, 0, BZ_FINISH); + } + RELEASE_LOCK(self); + return result; +} + +static int +BZ2Compressor_init(BZ2Compressor *self, PyObject *args, PyObject *kwargs) +{ + int compresslevel = 9; + int bzerror; + + if (!PyArg_ParseTuple(args, "|i:BZ2Compressor", &compresslevel)) + return -1; + if (!(1 <= compresslevel && compresslevel <= 9)) { + PyErr_SetString(PyExc_ValueError, + "compresslevel must be between 1 and 9"); + return -1; + } + +#ifdef WITH_THREAD + self->lock = PyThread_allocate_lock(); + if (self->lock == NULL) { + PyErr_SetString(PyExc_MemoryError, "Unable to allocate lock"); + return -1; + } +#endif + + bzerror = BZ2_bzCompressInit(&self->bzs, compresslevel, 0, 0); + if (catch_bz2_error(bzerror)) + goto error; + + return 0; + +error: +#ifdef WITH_THREAD + PyThread_free_lock(self->lock); + self->lock = NULL; +#endif + return -1; +} + +static void +BZ2Compressor_dealloc(BZ2Compressor *self) +{ + BZ2_bzCompressEnd(&self->bzs); +#ifdef WITH_THREAD + if (self->lock != NULL) + PyThread_free_lock(self->lock); +#endif + Py_TYPE(self)->tp_free((PyObject *)self); +} + +static PyMethodDef BZ2Compressor_methods[] = { + {"compress", (PyCFunction)BZ2Compressor_compress, METH_VARARGS, + BZ2Compressor_compress__doc__}, + {"flush", (PyCFunction)BZ2Compressor_flush, METH_NOARGS, + BZ2Compressor_flush__doc__}, + {NULL} +}; + +PyDoc_STRVAR(BZ2Compressor__doc__, +"BZ2Compressor(compresslevel=9)\n" +"\n" +"Create a compressor object for compressing data incrementally.\n" +"\n" +"compresslevel, if given, must be a number between 1 and 9.\n" +"\n" +"For one-shot compression, use the compress() function instead.\n"); + +static PyTypeObject BZ2Compressor_Type = { + PyVarObject_HEAD_INIT(NULL, 0) + "_bz2.BZ2Compressor", /* tp_name */ + sizeof(BZ2Compressor), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)BZ2Compressor_dealloc, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_reserved */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + BZ2Compressor__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + BZ2Compressor_methods, /* tp_methods */ + 0, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)BZ2Compressor_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + + +/* BZ2Decompressor class. */ + +static PyObject * +decompress(BZ2Decompressor *d, char *data, size_t len) +{ + size_t data_size = 0; + PyObject *result; + + result = PyBytes_FromStringAndSize(NULL, SMALLCHUNK); + if (result == NULL) + return result; + d->bzs.next_in = data; + /* FIXME This is not 64-bit clean - avail_in is an int. */ + d->bzs.avail_in = len; + d->bzs.next_out = PyBytes_AS_STRING(result); + d->bzs.avail_out = PyBytes_GET_SIZE(result); + for (;;) { + char *this_out; + int bzerror; + + Py_BEGIN_ALLOW_THREADS + this_out = d->bzs.next_out; + bzerror = BZ2_bzDecompress(&d->bzs); + data_size += d->bzs.next_out - this_out; + Py_END_ALLOW_THREADS + if (catch_bz2_error(bzerror)) + goto error; + if (bzerror == BZ_STREAM_END) { + d->eof = 1; + if (d->bzs.avail_in > 0) { /* Save leftover input to unused_data */ + Py_CLEAR(d->unused_data); + d->unused_data = PyBytes_FromStringAndSize(d->bzs.next_in, + d->bzs.avail_in); + if (d->unused_data == NULL) + goto error; + } + break; + } + if (d->bzs.avail_in == 0) + break; + if (d->bzs.avail_out == 0) { + if (grow_buffer(&result) < 0) + goto error; + d->bzs.next_out = PyBytes_AS_STRING(result) + data_size; + d->bzs.avail_out = PyBytes_GET_SIZE(result) - data_size; + } + } + if (data_size != PyBytes_GET_SIZE(result)) + if (_PyBytes_Resize(&result, data_size) < 0) + goto error; + return result; + +error: + Py_XDECREF(result); + return NULL; +} + +PyDoc_STRVAR(BZ2Decompressor_decompress__doc__, +"decompress(data) -> bytes\n" +"\n" +"Provide data to the decompressor object. Returns a chunk of\n" +"decompressed data if possible, or b'' otherwise.\n" +"\n" +"Attempting to decompress data after the end of stream is reached\n" +"raises an EOFError. Any data found after the end of the stream\n" +"is ignored and saved in the unused_data attribute.\n"); + +static PyObject * +BZ2Decompressor_decompress(BZ2Decompressor *self, PyObject *args) +{ + Py_buffer buffer; + PyObject *result = NULL; + + if (!PyArg_ParseTuple(args, "y*:decompress", &buffer)) + return NULL; + + ACQUIRE_LOCK(self); + if (self->eof) + PyErr_SetString(PyExc_EOFError, "End of stream already reached"); + else + result = decompress(self, buffer.buf, buffer.len); + RELEASE_LOCK(self); + PyBuffer_Release(&buffer); + return result; +} + +static int +BZ2Decompressor_init(BZ2Decompressor *self, PyObject *args, PyObject *kwargs) +{ + int bzerror; + + if (!PyArg_ParseTuple(args, ":BZ2Decompressor")) + return -1; + +#ifdef WITH_THREAD + self->lock = PyThread_allocate_lock(); + if (self->lock == NULL) { + PyErr_SetString(PyExc_MemoryError, "Unable to allocate lock"); + return -1; + } +#endif + + self->unused_data = PyBytes_FromStringAndSize("", 0); + if (self->unused_data == NULL) + goto error; + + bzerror = BZ2_bzDecompressInit(&self->bzs, 0, 0); + if (catch_bz2_error(bzerror)) + goto error; + + return 0; + +error: + Py_CLEAR(self->unused_data); +#ifdef WITH_THREAD + PyThread_free_lock(self->lock); + self->lock = NULL; +#endif + return -1; +} + +static void +BZ2Decompressor_dealloc(BZ2Decompressor *self) +{ + BZ2_bzDecompressEnd(&self->bzs); + Py_CLEAR(self->unused_data); +#ifdef WITH_THREAD + if (self->lock != NULL) + PyThread_free_lock(self->lock); +#endif + Py_TYPE(self)->tp_free((PyObject *)self); +} + +static PyMethodDef BZ2Decompressor_methods[] = { + {"decompress", (PyCFunction)BZ2Decompressor_decompress, METH_VARARGS, + BZ2Decompressor_decompress__doc__}, + {NULL} +}; + +PyDoc_STRVAR(BZ2Decompressor_eof__doc__, +"True if the end-of-stream marker has been reached."); + +PyDoc_STRVAR(BZ2Decompressor_unused_data__doc__, +"Data found after the end of the compressed stream."); + +static PyMemberDef BZ2Decompressor_members[] = { + {"eof", T_BOOL, offsetof(BZ2Decompressor, eof), + READONLY, BZ2Decompressor_eof__doc__}, + {"unused_data", T_OBJECT_EX, offsetof(BZ2Decompressor, unused_data), + READONLY, BZ2Decompressor_unused_data__doc__}, + {NULL} +}; + +PyDoc_STRVAR(BZ2Decompressor__doc__, +"BZ2Decompressor()\n" +"\n" +"Create a decompressor object for decompressing data incrementally.\n" +"\n" +"For one-shot decompression, use the decompress() function instead.\n"); + +static PyTypeObject BZ2Decompressor_Type = { + PyVarObject_HEAD_INIT(NULL, 0) + "_bz2.BZ2Decompressor", /* tp_name */ + sizeof(BZ2Decompressor), /* tp_basicsize */ + 0, /* tp_itemsize */ + (destructor)BZ2Decompressor_dealloc,/* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_reserved */ + 0, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + 0, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT, /* tp_flags */ + BZ2Decompressor__doc__, /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + BZ2Decompressor_methods, /* tp_methods */ + BZ2Decompressor_members, /* tp_members */ + 0, /* tp_getset */ + 0, /* tp_base */ + 0, /* tp_dict */ + 0, /* tp_descr_get */ + 0, /* tp_descr_set */ + 0, /* tp_dictoffset */ + (initproc)BZ2Decompressor_init, /* tp_init */ + 0, /* tp_alloc */ + PyType_GenericNew, /* tp_new */ +}; + + +/* Module initialization. */ + +static struct PyModuleDef _bz2module = { + PyModuleDef_HEAD_INIT, + "_bz2", + NULL, + -1, + NULL, + NULL, + NULL, + NULL, + NULL +}; + +PyMODINIT_FUNC +PyInit__bz2(void) +{ + PyObject *m; + + if (PyType_Ready(&BZ2Compressor_Type) < 0) + return NULL; + if (PyType_Ready(&BZ2Decompressor_Type) < 0) + return NULL; + + m = PyModule_Create(&_bz2module); + if (m == NULL) + return NULL; + + Py_INCREF(&BZ2Compressor_Type); + PyModule_AddObject(m, "BZ2Compressor", (PyObject *)&BZ2Compressor_Type); + + Py_INCREF(&BZ2Decompressor_Type); + PyModule_AddObject(m, "BZ2Decompressor", + (PyObject *)&BZ2Decompressor_Type); + + return m; +} diff --git a/Modules/bz2module.c b/Modules/bz2module.c deleted file mode 100644 index 3e55202..0000000 --- a/Modules/bz2module.c +++ /dev/null @@ -1,2180 +0,0 @@ -/* - -python-bz2 - python bz2 library interface - -Copyright (c) 2002 Gustavo Niemeyer <niemeyer@conectiva.com> -Copyright (c) 2002 Python Software Foundation; All Rights Reserved - -*/ - -#include "Python.h" -#include <stdio.h> -#include <bzlib.h> -#include "structmember.h" - -#ifdef WITH_THREAD -#include "pythread.h" -#endif - -static char __author__[] = -"The bz2 python module was written by:\n\ -\n\ - Gustavo Niemeyer <niemeyer@conectiva.com>\n\ -"; - -/* Our very own off_t-like type, 64-bit if possible */ -/* copied from Objects/fileobject.c */ -#if !defined(HAVE_LARGEFILE_SUPPORT) -typedef off_t Py_off_t; -#elif SIZEOF_OFF_T >= 8 -typedef off_t Py_off_t; -#elif SIZEOF_FPOS_T >= 8 -typedef fpos_t Py_off_t; -#else -#error "Large file support, but neither off_t nor fpos_t is large enough." -#endif - -#define BUF(v) PyBytes_AS_STRING(v) - -#define MODE_CLOSED 0 -#define MODE_READ 1 -#define MODE_READ_EOF 2 -#define MODE_WRITE 3 - -#define BZ2FileObject_Check(v) (Py_TYPE(v) == &BZ2File_Type) - - -#ifdef BZ_CONFIG_ERROR - -#if SIZEOF_LONG >= 8 -#define BZS_TOTAL_OUT(bzs) \ - (((long)bzs->total_out_hi32 << 32) + bzs->total_out_lo32) -#elif SIZEOF_LONG_LONG >= 8 -#define BZS_TOTAL_OUT(bzs) \ - (((PY_LONG_LONG)bzs->total_out_hi32 << 32) + bzs->total_out_lo32) -#else -#define BZS_TOTAL_OUT(bzs) \ - bzs->total_out_lo32 -#endif - -#else /* ! BZ_CONFIG_ERROR */ - -#define BZ2_bzRead bzRead -#define BZ2_bzReadOpen bzReadOpen -#define BZ2_bzReadClose bzReadClose -#define BZ2_bzWrite bzWrite -#define BZ2_bzWriteOpen bzWriteOpen -#define BZ2_bzWriteClose bzWriteClose -#define BZ2_bzCompress bzCompress -#define BZ2_bzCompressInit bzCompressInit -#define BZ2_bzCompressEnd bzCompressEnd -#define BZ2_bzDecompress bzDecompress -#define BZ2_bzDecompressInit bzDecompressInit -#define BZ2_bzDecompressEnd bzDecompressEnd - -#define BZS_TOTAL_OUT(bzs) bzs->total_out - -#endif /* ! BZ_CONFIG_ERROR */ - - -#ifdef WITH_THREAD -#define ACQUIRE_LOCK(obj) do { \ - if (!PyThread_acquire_lock(obj->lock, 0)) { \ - Py_BEGIN_ALLOW_THREADS \ - PyThread_acquire_lock(obj->lock, 1); \ - Py_END_ALLOW_THREADS \ - } } while(0) -#define RELEASE_LOCK(obj) PyThread_release_lock(obj->lock) -#else -#define ACQUIRE_LOCK(obj) -#define RELEASE_LOCK(obj) -#endif - -/* Bits in f_newlinetypes */ -#define NEWLINE_UNKNOWN 0 /* No newline seen, yet */ -#define NEWLINE_CR 1 /* \r newline seen */ -#define NEWLINE_LF 2 /* \n newline seen */ -#define NEWLINE_CRLF 4 /* \r\n newline seen */ - -/* ===================================================================== */ -/* Structure definitions. */ - -typedef struct { - PyObject_HEAD - FILE *rawfp; - - char* f_buf; /* Allocated readahead buffer */ - char* f_bufend; /* Points after last occupied position */ - char* f_bufptr; /* Current buffer position */ - - BZFILE *fp; - int mode; - Py_off_t pos; - Py_off_t size; -#ifdef WITH_THREAD - PyThread_type_lock lock; -#endif -} BZ2FileObject; - -typedef struct { - PyObject_HEAD - bz_stream bzs; - int running; -#ifdef WITH_THREAD - PyThread_type_lock lock; -#endif -} BZ2CompObject; - -typedef struct { - PyObject_HEAD - bz_stream bzs; - int running; - PyObject *unused_data; -#ifdef WITH_THREAD - PyThread_type_lock lock; -#endif -} BZ2DecompObject; - -/* ===================================================================== */ -/* Utility functions. */ - -/* Refuse regular I/O if there's data in the iteration-buffer. - * Mixing them would cause data to arrive out of order, as the read* - * methods don't use the iteration buffer. */ -static int -check_iterbuffered(BZ2FileObject *f) -{ - if (f->f_buf != NULL && - (f->f_bufend - f->f_bufptr) > 0 && - f->f_buf[0] != '\0') { - PyErr_SetString(PyExc_ValueError, - "Mixing iteration and read methods would lose data"); - return -1; - } - return 0; -} - -static int -Util_CatchBZ2Error(int bzerror) -{ - int ret = 0; - switch(bzerror) { - case BZ_OK: - case BZ_STREAM_END: - break; - -#ifdef BZ_CONFIG_ERROR - case BZ_CONFIG_ERROR: - PyErr_SetString(PyExc_SystemError, - "the bz2 library was not compiled " - "correctly"); - ret = 1; - break; -#endif - - case BZ_PARAM_ERROR: - PyErr_SetString(PyExc_ValueError, - "the bz2 library has received wrong " - "parameters"); - ret = 1; - break; - - case BZ_MEM_ERROR: - PyErr_NoMemory(); - ret = 1; - break; - - case BZ_DATA_ERROR: - case BZ_DATA_ERROR_MAGIC: - PyErr_SetString(PyExc_IOError, "invalid data stream"); - ret = 1; - break; - - case BZ_IO_ERROR: - PyErr_SetString(PyExc_IOError, "unknown IO error"); - ret = 1; - break; - - case BZ_UNEXPECTED_EOF: - PyErr_SetString(PyExc_EOFError, - "compressed file ended before the " - "logical end-of-stream was detected"); - ret = 1; - break; - - case BZ_SEQUENCE_ERROR: - PyErr_SetString(PyExc_RuntimeError, - "wrong sequence of bz2 library " - "commands used"); - ret = 1; - break; - } - return ret; -} - -#if BUFSIZ < 8192 -#define SMALLCHUNK 8192 -#else -#define SMALLCHUNK BUFSIZ -#endif - -#if SIZEOF_INT < 4 -#define BIGCHUNK (512 * 32) -#else -#define BIGCHUNK (512 * 1024) -#endif - -/* This is a hacked version of Python's fileobject.c:new_buffersize(). */ -static size_t -Util_NewBufferSize(size_t currentsize) -{ - if (currentsize > SMALLCHUNK) { - /* Keep doubling until we reach BIGCHUNK; - then keep adding BIGCHUNK. */ - if (currentsize <= BIGCHUNK) - return currentsize + currentsize; - else - return currentsize + BIGCHUNK; - } - return currentsize + SMALLCHUNK; -} - -/* This is a hacked version of Python's fileobject.c:get_line(). */ -static PyObject * -Util_GetLine(BZ2FileObject *f, int n) -{ - char c; - char *buf, *end; - size_t total_v_size; /* total # of slots in buffer */ - size_t used_v_size; /* # used slots in buffer */ - size_t increment; /* amount to increment the buffer */ - PyObject *v; - int bzerror; - int bytes_read; - - total_v_size = n > 0 ? n : 100; - v = PyBytes_FromStringAndSize((char *)NULL, total_v_size); - if (v == NULL) - return NULL; - - buf = BUF(v); - end = buf + total_v_size; - - for (;;) { - Py_BEGIN_ALLOW_THREADS - do { - bytes_read = BZ2_bzRead(&bzerror, f->fp, &c, 1); - f->pos++; - if (bytes_read == 0) - break; - *buf++ = c; - } while (bzerror == BZ_OK && c != '\n' && buf != end); - Py_END_ALLOW_THREADS - if (bzerror == BZ_STREAM_END) { - f->size = f->pos; - f->mode = MODE_READ_EOF; - break; - } else if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - Py_DECREF(v); - return NULL; - } - if (c == '\n') - break; - /* Must be because buf == end */ - if (n > 0) - break; - used_v_size = total_v_size; - increment = total_v_size >> 2; /* mild exponential growth */ - total_v_size += increment; - if (total_v_size > INT_MAX) { - PyErr_SetString(PyExc_OverflowError, - "line is longer than a Python string can hold"); - Py_DECREF(v); - return NULL; - } - if (_PyBytes_Resize(&v, total_v_size) < 0) { - return NULL; - } - buf = BUF(v) + used_v_size; - end = BUF(v) + total_v_size; - } - - used_v_size = buf - BUF(v); - if (used_v_size != total_v_size) { - if (_PyBytes_Resize(&v, used_v_size) < 0) { - v = NULL; - } - } - return v; -} - -/* This is a hacked version of Python's fileobject.c:drop_readahead(). */ -static void -Util_DropReadAhead(BZ2FileObject *f) -{ - if (f->f_buf != NULL) { - PyMem_Free(f->f_buf); - f->f_buf = NULL; - } -} - -/* This is a hacked version of Python's fileobject.c:readahead(). */ -static int -Util_ReadAhead(BZ2FileObject *f, int bufsize) -{ - int chunksize; - int bzerror; - - if (f->f_buf != NULL) { - if((f->f_bufend - f->f_bufptr) >= 1) - return 0; - else - Util_DropReadAhead(f); - } - if (f->mode == MODE_READ_EOF) { - f->f_bufptr = f->f_buf; - f->f_bufend = f->f_buf; - return 0; - } - if ((f->f_buf = PyMem_Malloc(bufsize)) == NULL) { - PyErr_NoMemory(); - return -1; - } - Py_BEGIN_ALLOW_THREADS - chunksize = BZ2_bzRead(&bzerror, f->fp, f->f_buf, bufsize); - Py_END_ALLOW_THREADS - f->pos += chunksize; - if (bzerror == BZ_STREAM_END) { - f->size = f->pos; - f->mode = MODE_READ_EOF; - } else if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - Util_DropReadAhead(f); - return -1; - } - f->f_bufptr = f->f_buf; - f->f_bufend = f->f_buf + chunksize; - return 0; -} - -/* This is a hacked version of Python's - * fileobject.c:readahead_get_line_skip(). */ -static PyBytesObject * -Util_ReadAheadGetLineSkip(BZ2FileObject *f, int skip, int bufsize) -{ - PyBytesObject* s; - char *bufptr; - char *buf; - int len; - - if (f->f_buf == NULL) - if (Util_ReadAhead(f, bufsize) < 0) - return NULL; - - len = f->f_bufend - f->f_bufptr; - if (len == 0) - return (PyBytesObject *) - PyBytes_FromStringAndSize(NULL, skip); - bufptr = memchr(f->f_bufptr, '\n', len); - if (bufptr != NULL) { - bufptr++; /* Count the '\n' */ - len = bufptr - f->f_bufptr; - s = (PyBytesObject *) - PyBytes_FromStringAndSize(NULL, skip+len); - if (s == NULL) - return NULL; - memcpy(PyBytes_AS_STRING(s)+skip, f->f_bufptr, len); - f->f_bufptr = bufptr; - if (bufptr == f->f_bufend) - Util_DropReadAhead(f); - } else { - bufptr = f->f_bufptr; - buf = f->f_buf; - f->f_buf = NULL; /* Force new readahead buffer */ - s = Util_ReadAheadGetLineSkip(f, skip+len, - bufsize + (bufsize>>2)); - if (s == NULL) { - PyMem_Free(buf); - return NULL; - } - memcpy(PyBytes_AS_STRING(s)+skip, bufptr, len); - PyMem_Free(buf); - } - return s; -} - -/* ===================================================================== */ -/* Methods of BZ2File. */ - -PyDoc_STRVAR(BZ2File_read__doc__, -"read([size]) -> string\n\ -\n\ -Read at most size uncompressed bytes, returned as a string. If the size\n\ -argument is negative or omitted, read until EOF is reached.\n\ -"); - -/* This is a hacked version of Python's fileobject.c:file_read(). */ -static PyObject * -BZ2File_read(BZ2FileObject *self, PyObject *args) -{ - long bytesrequested = -1; - size_t bytesread, buffersize, chunksize; - int bzerror; - PyObject *ret = NULL; - - if (!PyArg_ParseTuple(args, "|l:read", &bytesrequested)) - return NULL; - - ACQUIRE_LOCK(self); - switch (self->mode) { - case MODE_READ: - break; - case MODE_READ_EOF: - ret = PyBytes_FromStringAndSize("", 0); - goto cleanup; - case MODE_CLOSED: - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - goto cleanup; - default: - PyErr_SetString(PyExc_IOError, - "file is not ready for reading"); - goto cleanup; - } - - /* refuse to mix with f.next() */ - if (check_iterbuffered(self)) - goto cleanup; - - if (bytesrequested < 0) - buffersize = Util_NewBufferSize((size_t)0); - else - buffersize = bytesrequested; - if (buffersize > INT_MAX) { - PyErr_SetString(PyExc_OverflowError, - "requested number of bytes is " - "more than a Python string can hold"); - goto cleanup; - } - ret = PyBytes_FromStringAndSize((char *)NULL, buffersize); - if (ret == NULL || buffersize == 0) - goto cleanup; - bytesread = 0; - - for (;;) { - Py_BEGIN_ALLOW_THREADS - chunksize = BZ2_bzRead(&bzerror, self->fp, - BUF(ret)+bytesread, - buffersize-bytesread); - self->pos += chunksize; - Py_END_ALLOW_THREADS - bytesread += chunksize; - if (bzerror == BZ_STREAM_END) { - self->size = self->pos; - self->mode = MODE_READ_EOF; - break; - } else if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - Py_DECREF(ret); - ret = NULL; - goto cleanup; - } - if (bytesrequested < 0) { - buffersize = Util_NewBufferSize(buffersize); - if (_PyBytes_Resize(&ret, buffersize) < 0) { - ret = NULL; - goto cleanup; - } - } else { - break; - } - } - if (bytesread != buffersize) { - if (_PyBytes_Resize(&ret, bytesread) < 0) { - ret = NULL; - } - } - -cleanup: - RELEASE_LOCK(self); - return ret; -} - -PyDoc_STRVAR(BZ2File_readline__doc__, -"readline([size]) -> string\n\ -\n\ -Return the next line from the file, as a string, retaining newline.\n\ -A non-negative size argument will limit the maximum number of bytes to\n\ -return (an incomplete line may be returned then). Return an empty\n\ -string at EOF.\n\ -"); - -static PyObject * -BZ2File_readline(BZ2FileObject *self, PyObject *args) -{ - PyObject *ret = NULL; - int sizehint = -1; - - if (!PyArg_ParseTuple(args, "|i:readline", &sizehint)) - return NULL; - - ACQUIRE_LOCK(self); - switch (self->mode) { - case MODE_READ: - break; - case MODE_READ_EOF: - ret = PyBytes_FromStringAndSize("", 0); - goto cleanup; - case MODE_CLOSED: - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - goto cleanup; - default: - PyErr_SetString(PyExc_IOError, - "file is not ready for reading"); - goto cleanup; - } - - /* refuse to mix with f.next() */ - if (check_iterbuffered(self)) - goto cleanup; - - if (sizehint == 0) - ret = PyBytes_FromStringAndSize("", 0); - else - ret = Util_GetLine(self, (sizehint < 0) ? 0 : sizehint); - -cleanup: - RELEASE_LOCK(self); - return ret; -} - -PyDoc_STRVAR(BZ2File_readlines__doc__, -"readlines([size]) -> list\n\ -\n\ -Call readline() repeatedly and return a list of lines read.\n\ -The optional size argument, if given, is an approximate bound on the\n\ -total number of bytes in the lines returned.\n\ -"); - -/* This is a hacked version of Python's fileobject.c:file_readlines(). */ -static PyObject * -BZ2File_readlines(BZ2FileObject *self, PyObject *args) -{ - long sizehint = 0; - PyObject *list = NULL; - PyObject *line; - char small_buffer[SMALLCHUNK]; - char *buffer = small_buffer; - size_t buffersize = SMALLCHUNK; - PyObject *big_buffer = NULL; - size_t nfilled = 0; - size_t nread; - size_t totalread = 0; - char *p, *q, *end; - int err; - int shortread = 0; - int bzerror; - - if (!PyArg_ParseTuple(args, "|l:readlines", &sizehint)) - return NULL; - - ACQUIRE_LOCK(self); - switch (self->mode) { - case MODE_READ: - break; - case MODE_READ_EOF: - list = PyList_New(0); - goto cleanup; - case MODE_CLOSED: - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - goto cleanup; - default: - PyErr_SetString(PyExc_IOError, - "file is not ready for reading"); - goto cleanup; - } - - /* refuse to mix with f.next() */ - if (check_iterbuffered(self)) - goto cleanup; - - if ((list = PyList_New(0)) == NULL) - goto cleanup; - - for (;;) { - Py_BEGIN_ALLOW_THREADS - nread = BZ2_bzRead(&bzerror, self->fp, - buffer+nfilled, buffersize-nfilled); - self->pos += nread; - Py_END_ALLOW_THREADS - if (bzerror == BZ_STREAM_END) { - self->size = self->pos; - self->mode = MODE_READ_EOF; - if (nread == 0) { - sizehint = 0; - break; - } - shortread = 1; - } else if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - error: - Py_DECREF(list); - list = NULL; - goto cleanup; - } - totalread += nread; - p = memchr(buffer+nfilled, '\n', nread); - if (!shortread && p == NULL) { - /* Need a larger buffer to fit this line */ - nfilled += nread; - buffersize *= 2; - if (buffersize > INT_MAX) { - PyErr_SetString(PyExc_OverflowError, - "line is longer than a Python string can hold"); - goto error; - } - if (big_buffer == NULL) { - /* Create the big buffer */ - big_buffer = PyBytes_FromStringAndSize( - NULL, buffersize); - if (big_buffer == NULL) - goto error; - buffer = PyBytes_AS_STRING(big_buffer); - memcpy(buffer, small_buffer, nfilled); - } - else { - /* Grow the big buffer */ - if (_PyBytes_Resize(&big_buffer, buffersize) < 0){ - big_buffer = NULL; - goto error; - } - buffer = PyBytes_AS_STRING(big_buffer); - } - continue; - } - end = buffer+nfilled+nread; - q = buffer; - while (p != NULL) { - /* Process complete lines */ - p++; - line = PyBytes_FromStringAndSize(q, p-q); - if (line == NULL) - goto error; - err = PyList_Append(list, line); - Py_DECREF(line); - if (err != 0) - goto error; - q = p; - p = memchr(q, '\n', end-q); - } - /* Move the remaining incomplete line to the start */ - nfilled = end-q; - memmove(buffer, q, nfilled); - if (sizehint > 0) - if (totalread >= (size_t)sizehint) - break; - if (shortread) { - sizehint = 0; - break; - } - } - if (nfilled != 0) { - /* Partial last line */ - line = PyBytes_FromStringAndSize(buffer, nfilled); - if (line == NULL) - goto error; - if (sizehint > 0) { - /* Need to complete the last line */ - PyObject *rest = Util_GetLine(self, 0); - if (rest == NULL) { - Py_DECREF(line); - goto error; - } - PyBytes_Concat(&line, rest); - Py_DECREF(rest); - if (line == NULL) - goto error; - } - err = PyList_Append(list, line); - Py_DECREF(line); - if (err != 0) - goto error; - } - - cleanup: - RELEASE_LOCK(self); - if (big_buffer) { - Py_DECREF(big_buffer); - } - return list; -} - -PyDoc_STRVAR(BZ2File_write__doc__, -"write(data) -> None\n\ -\n\ -Write the 'data' string to file. Note that due to buffering, close() may\n\ -be needed before the file on disk reflects the data written.\n\ -"); - -/* This is a hacked version of Python's fileobject.c:file_write(). */ -static PyObject * -BZ2File_write(BZ2FileObject *self, PyObject *args) -{ - PyObject *ret = NULL; - Py_buffer pbuf; - char *buf; - int len; - int bzerror; - - if (!PyArg_ParseTuple(args, "y*:write", &pbuf)) - return NULL; - buf = pbuf.buf; - len = pbuf.len; - - ACQUIRE_LOCK(self); - switch (self->mode) { - case MODE_WRITE: - break; - - case MODE_CLOSED: - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - goto cleanup; - - default: - PyErr_SetString(PyExc_IOError, - "file is not ready for writing"); - goto cleanup; - } - - Py_BEGIN_ALLOW_THREADS - BZ2_bzWrite (&bzerror, self->fp, buf, len); - self->pos += len; - Py_END_ALLOW_THREADS - - if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - goto cleanup; - } - - Py_INCREF(Py_None); - ret = Py_None; - -cleanup: - PyBuffer_Release(&pbuf); - RELEASE_LOCK(self); - return ret; -} - -PyDoc_STRVAR(BZ2File_writelines__doc__, -"writelines(sequence_of_strings) -> None\n\ -\n\ -Write the sequence of strings to the file. Note that newlines are not\n\ -added. The sequence can be any iterable object producing strings. This is\n\ -equivalent to calling write() for each string.\n\ -"); - -/* This is a hacked version of Python's fileobject.c:file_writelines(). */ -static PyObject * -BZ2File_writelines(BZ2FileObject *self, PyObject *seq) -{ -#define CHUNKSIZE 1000 - PyObject *list = NULL; - PyObject *iter = NULL; - PyObject *ret = NULL; - PyObject *line; - int i, j, index, len, islist; - int bzerror; - - ACQUIRE_LOCK(self); - switch (self->mode) { - case MODE_WRITE: - break; - - case MODE_CLOSED: - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - goto error; - - default: - PyErr_SetString(PyExc_IOError, - "file is not ready for writing"); - goto error; - } - - islist = PyList_Check(seq); - if (!islist) { - iter = PyObject_GetIter(seq); - if (iter == NULL) { - PyErr_SetString(PyExc_TypeError, - "writelines() requires an iterable argument"); - goto error; - } - list = PyList_New(CHUNKSIZE); - if (list == NULL) - goto error; - } - - /* Strategy: slurp CHUNKSIZE lines into a private list, - checking that they are all strings, then write that list - without holding the interpreter lock, then come back for more. */ - for (index = 0; ; index += CHUNKSIZE) { - if (islist) { - Py_XDECREF(list); - list = PyList_GetSlice(seq, index, index+CHUNKSIZE); - if (list == NULL) - goto error; - j = PyList_GET_SIZE(list); - } - else { - for (j = 0; j < CHUNKSIZE; j++) { - line = PyIter_Next(iter); - if (line == NULL) { - if (PyErr_Occurred()) - goto error; - break; - } - PyList_SetItem(list, j, line); - } - } - if (j == 0) - break; - - /* Check that all entries are indeed byte strings. If not, - apply the same rules as for file.write() and - convert the rets to strings. This is slow, but - seems to be the only way since all conversion APIs - could potentially execute Python code. */ - for (i = 0; i < j; i++) { - PyObject *v = PyList_GET_ITEM(list, i); - if (!PyBytes_Check(v)) { - const char *buffer; - Py_ssize_t len; - if (PyObject_AsCharBuffer(v, &buffer, &len)) { - PyErr_SetString(PyExc_TypeError, - "writelines() " - "argument must be " - "a sequence of " - "bytes objects"); - goto error; - } - line = PyBytes_FromStringAndSize(buffer, - len); - if (line == NULL) - goto error; - Py_DECREF(v); - PyList_SET_ITEM(list, i, line); - } - } - - /* Since we are releasing the global lock, the - following code may *not* execute Python code. */ - Py_BEGIN_ALLOW_THREADS - for (i = 0; i < j; i++) { - line = PyList_GET_ITEM(list, i); - len = PyBytes_GET_SIZE(line); - BZ2_bzWrite (&bzerror, self->fp, - PyBytes_AS_STRING(line), len); - if (bzerror != BZ_OK) { - Py_BLOCK_THREADS - Util_CatchBZ2Error(bzerror); - goto error; - } - } - Py_END_ALLOW_THREADS - - if (j < CHUNKSIZE) - break; - } - - Py_INCREF(Py_None); - ret = Py_None; - - error: - RELEASE_LOCK(self); - Py_XDECREF(list); - Py_XDECREF(iter); - return ret; -#undef CHUNKSIZE -} - -PyDoc_STRVAR(BZ2File_seek__doc__, -"seek(offset [, whence]) -> None\n\ -\n\ -Move to new file position. Argument offset is a byte count. Optional\n\ -argument whence defaults to 0 (offset from start of file, offset\n\ -should be >= 0); other values are 1 (move relative to current position,\n\ -positive or negative), and 2 (move relative to end of file, usually\n\ -negative, although many platforms allow seeking beyond the end of a file).\n\ -\n\ -Note that seeking of bz2 files is emulated, and depending on the parameters\n\ -the operation may be extremely slow.\n\ -"); - -static PyObject * -BZ2File_seek(BZ2FileObject *self, PyObject *args) -{ - int where = 0; - PyObject *offobj; - Py_off_t offset; - char small_buffer[SMALLCHUNK]; - char *buffer = small_buffer; - size_t buffersize = SMALLCHUNK; - Py_off_t bytesread = 0; - size_t readsize; - int chunksize; - int bzerror; - PyObject *ret = NULL; - - if (!PyArg_ParseTuple(args, "O|i:seek", &offobj, &where)) - return NULL; -#if !defined(HAVE_LARGEFILE_SUPPORT) - offset = PyLong_AsLong(offobj); -#else - offset = PyLong_Check(offobj) ? - PyLong_AsLongLong(offobj) : PyLong_AsLong(offobj); -#endif - if (PyErr_Occurred()) - return NULL; - - ACQUIRE_LOCK(self); - Util_DropReadAhead(self); - switch (self->mode) { - case MODE_READ: - case MODE_READ_EOF: - break; - - case MODE_CLOSED: - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - goto cleanup; - - default: - PyErr_SetString(PyExc_IOError, - "seek works only while reading"); - goto cleanup; - } - - if (where == 2) { - if (self->size == -1) { - assert(self->mode != MODE_READ_EOF); - for (;;) { - Py_BEGIN_ALLOW_THREADS - chunksize = BZ2_bzRead(&bzerror, self->fp, - buffer, buffersize); - self->pos += chunksize; - Py_END_ALLOW_THREADS - - bytesread += chunksize; - if (bzerror == BZ_STREAM_END) { - break; - } else if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - goto cleanup; - } - } - self->mode = MODE_READ_EOF; - self->size = self->pos; - bytesread = 0; - } - offset = self->size + offset; - } else if (where == 1) { - offset = self->pos + offset; - } - - /* Before getting here, offset must be the absolute position the file - * pointer should be set to. */ - - if (offset >= self->pos) { - /* we can move forward */ - offset -= self->pos; - } else { - /* we cannot move back, so rewind the stream */ - BZ2_bzReadClose(&bzerror, self->fp); - if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - goto cleanup; - } - rewind(self->rawfp); - self->pos = 0; - self->fp = BZ2_bzReadOpen(&bzerror, self->rawfp, - 0, 0, NULL, 0); - if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - goto cleanup; - } - self->mode = MODE_READ; - } - - if (offset <= 0 || self->mode == MODE_READ_EOF) - goto exit; - - /* Before getting here, offset must be set to the number of bytes - * to walk forward. */ - for (;;) { - if (offset-bytesread > buffersize) - readsize = buffersize; - else - /* offset might be wider that readsize, but the result - * of the subtraction is bound by buffersize (see the - * condition above). buffersize is 8192. */ - readsize = (size_t)(offset-bytesread); - Py_BEGIN_ALLOW_THREADS - chunksize = BZ2_bzRead(&bzerror, self->fp, buffer, readsize); - self->pos += chunksize; - Py_END_ALLOW_THREADS - bytesread += chunksize; - if (bzerror == BZ_STREAM_END) { - self->size = self->pos; - self->mode = MODE_READ_EOF; - break; - } else if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - goto cleanup; - } - if (bytesread == offset) - break; - } - -exit: - Py_INCREF(Py_None); - ret = Py_None; - -cleanup: - RELEASE_LOCK(self); - return ret; -} - -PyDoc_STRVAR(BZ2File_tell__doc__, -"tell() -> int\n\ -\n\ -Return the current file position, an integer (may be a long integer).\n\ -"); - -static PyObject * -BZ2File_tell(BZ2FileObject *self, PyObject *args) -{ - PyObject *ret = NULL; - - if (self->mode == MODE_CLOSED) { - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - goto cleanup; - } - -#if !defined(HAVE_LARGEFILE_SUPPORT) - ret = PyLong_FromLong(self->pos); -#else - ret = PyLong_FromLongLong(self->pos); -#endif - -cleanup: - return ret; -} - -PyDoc_STRVAR(BZ2File_close__doc__, -"close() -> None or (perhaps) an integer\n\ -\n\ -Close the file. Sets data attribute .closed to true. A closed file\n\ -cannot be used for further I/O operations. close() may be called more\n\ -than once without error.\n\ -"); - -static PyObject * -BZ2File_close(BZ2FileObject *self) -{ - PyObject *ret = NULL; - int bzerror = BZ_OK; - - if (self->mode == MODE_CLOSED) { - Py_RETURN_NONE; - } - - ACQUIRE_LOCK(self); - switch (self->mode) { - case MODE_READ: - case MODE_READ_EOF: - BZ2_bzReadClose(&bzerror, self->fp); - break; - case MODE_WRITE: - BZ2_bzWriteClose(&bzerror, self->fp, - 0, NULL, NULL); - break; - } - self->mode = MODE_CLOSED; - fclose(self->rawfp); - self->rawfp = NULL; - if (bzerror == BZ_OK) { - Py_INCREF(Py_None); - ret = Py_None; - } - else { - Util_CatchBZ2Error(bzerror); - } - - RELEASE_LOCK(self); - return ret; -} - -PyDoc_STRVAR(BZ2File_enter_doc, -"__enter__() -> self."); - -static PyObject * -BZ2File_enter(BZ2FileObject *self) -{ - if (self->mode == MODE_CLOSED) { - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - return NULL; - } - Py_INCREF(self); - return (PyObject *) self; -} - -PyDoc_STRVAR(BZ2File_exit_doc, -"__exit__(*excinfo) -> None. Closes the file."); - -static PyObject * -BZ2File_exit(BZ2FileObject *self, PyObject *args) -{ - PyObject *ret = PyObject_CallMethod((PyObject *) self, "close", NULL); - if (!ret) - /* If error occurred, pass through */ - return NULL; - Py_DECREF(ret); - Py_RETURN_NONE; -} - - -static PyObject *BZ2File_getiter(BZ2FileObject *self); - -static PyMethodDef BZ2File_methods[] = { - {"read", (PyCFunction)BZ2File_read, METH_VARARGS, BZ2File_read__doc__}, - {"readline", (PyCFunction)BZ2File_readline, METH_VARARGS, BZ2File_readline__doc__}, - {"readlines", (PyCFunction)BZ2File_readlines, METH_VARARGS, BZ2File_readlines__doc__}, - {"write", (PyCFunction)BZ2File_write, METH_VARARGS, BZ2File_write__doc__}, - {"writelines", (PyCFunction)BZ2File_writelines, METH_O, BZ2File_writelines__doc__}, - {"seek", (PyCFunction)BZ2File_seek, METH_VARARGS, BZ2File_seek__doc__}, - {"tell", (PyCFunction)BZ2File_tell, METH_NOARGS, BZ2File_tell__doc__}, - {"close", (PyCFunction)BZ2File_close, METH_NOARGS, BZ2File_close__doc__}, - {"__enter__", (PyCFunction)BZ2File_enter, METH_NOARGS, BZ2File_enter_doc}, - {"__exit__", (PyCFunction)BZ2File_exit, METH_VARARGS, BZ2File_exit_doc}, - {NULL, NULL} /* sentinel */ -}; - - -/* ===================================================================== */ -/* Getters and setters of BZ2File. */ - -static PyObject * -BZ2File_get_closed(BZ2FileObject *self, void *closure) -{ - return PyLong_FromLong(self->mode == MODE_CLOSED); -} - -static PyGetSetDef BZ2File_getset[] = { - {"closed", (getter)BZ2File_get_closed, NULL, - "True if the file is closed"}, - {NULL} /* Sentinel */ -}; - - -/* ===================================================================== */ -/* Slot definitions for BZ2File_Type. */ - -static int -BZ2File_init(BZ2FileObject *self, PyObject *args, PyObject *kwargs) -{ - static char *kwlist[] = {"filename", "mode", "buffering", - "compresslevel", 0}; - PyObject *name_obj = NULL; - char *name; - char *mode = "r"; - int buffering = -1; - int compresslevel = 9; - int bzerror; - int mode_char = 0; - - self->size = -1; - - if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O&|sii:BZ2File", - kwlist, PyUnicode_FSConverter, &name_obj, - &mode, &buffering, - &compresslevel)) - return -1; - - name = PyBytes_AsString(name_obj); - if (compresslevel < 1 || compresslevel > 9) { - PyErr_SetString(PyExc_ValueError, - "compresslevel must be between 1 and 9"); - Py_DECREF(name_obj); - return -1; - } - - for (;;) { - int error = 0; - switch (*mode) { - case 'r': - case 'w': - if (mode_char) - error = 1; - mode_char = *mode; - break; - - case 'b': - break; - - default: - error = 1; - break; - } - if (error) { - PyErr_Format(PyExc_ValueError, - "invalid mode char %c", *mode); - Py_DECREF(name_obj); - return -1; - } - mode++; - if (*mode == '\0') - break; - } - - if (mode_char == 0) { - mode_char = 'r'; - } - - mode = (mode_char == 'r') ? "rb" : "wb"; - - self->rawfp = fopen(name, mode); - Py_DECREF(name_obj); - if (self->rawfp == NULL) { - PyErr_SetFromErrno(PyExc_IOError); - return -1; - } - /* XXX Ignore buffering */ - - /* From now on, we have stuff to dealloc, so jump to error label - * instead of returning */ - -#ifdef WITH_THREAD - self->lock = PyThread_allocate_lock(); - if (!self->lock) { - PyErr_SetString(PyExc_MemoryError, "unable to allocate lock"); - goto error; - } -#endif - - if (mode_char == 'r') - self->fp = BZ2_bzReadOpen(&bzerror, self->rawfp, - 0, 0, NULL, 0); - else - self->fp = BZ2_bzWriteOpen(&bzerror, self->rawfp, - compresslevel, 0, 0); - - if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - goto error; - } - - self->mode = (mode_char == 'r') ? MODE_READ : MODE_WRITE; - - return 0; - -error: - fclose(self->rawfp); - self->rawfp = NULL; -#ifdef WITH_THREAD - if (self->lock) { - PyThread_free_lock(self->lock); - self->lock = NULL; - } -#endif - return -1; -} - -static void -BZ2File_dealloc(BZ2FileObject *self) -{ - int bzerror; -#ifdef WITH_THREAD - if (self->lock) - PyThread_free_lock(self->lock); -#endif - switch (self->mode) { - case MODE_READ: - case MODE_READ_EOF: - BZ2_bzReadClose(&bzerror, self->fp); - break; - case MODE_WRITE: - BZ2_bzWriteClose(&bzerror, self->fp, - 0, NULL, NULL); - break; - } - Util_DropReadAhead(self); - if (self->rawfp != NULL) - fclose(self->rawfp); - Py_TYPE(self)->tp_free((PyObject *)self); -} - -/* This is a hacked version of Python's fileobject.c:file_getiter(). */ -static PyObject * -BZ2File_getiter(BZ2FileObject *self) -{ - if (self->mode == MODE_CLOSED) { - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - return NULL; - } - Py_INCREF((PyObject*)self); - return (PyObject *)self; -} - -/* This is a hacked version of Python's fileobject.c:file_iternext(). */ -#define READAHEAD_BUFSIZE 8192 -static PyObject * -BZ2File_iternext(BZ2FileObject *self) -{ - PyBytesObject* ret; - ACQUIRE_LOCK(self); - if (self->mode == MODE_CLOSED) { - RELEASE_LOCK(self); - PyErr_SetString(PyExc_ValueError, - "I/O operation on closed file"); - return NULL; - } - ret = Util_ReadAheadGetLineSkip(self, 0, READAHEAD_BUFSIZE); - RELEASE_LOCK(self); - if (ret == NULL || PyBytes_GET_SIZE(ret) == 0) { - Py_XDECREF(ret); - return NULL; - } - return (PyObject *)ret; -} - -/* ===================================================================== */ -/* BZ2File_Type definition. */ - -PyDoc_VAR(BZ2File__doc__) = -PyDoc_STR( -"BZ2File(name [, mode='r', buffering=0, compresslevel=9]) -> file object\n\ -\n\ -Open a bz2 file. The mode can be 'r' or 'w', for reading (default) or\n\ -writing. When opened for writing, the file will be created if it doesn't\n\ -exist, and truncated otherwise. If the buffering argument is given, 0 means\n\ -unbuffered, and larger numbers specify the buffer size. If compresslevel\n\ -is given, must be a number between 1 and 9.\n\ -Data read is always returned in bytes; data written ought to be bytes.\n\ -"); - -static PyTypeObject BZ2File_Type = { - PyVarObject_HEAD_INIT(NULL, 0) - "bz2.BZ2File", /*tp_name*/ - sizeof(BZ2FileObject), /*tp_basicsize*/ - 0, /*tp_itemsize*/ - (destructor)BZ2File_dealloc, /*tp_dealloc*/ - 0, /*tp_print*/ - 0, /*tp_getattr*/ - 0, /*tp_setattr*/ - 0, /*tp_reserved*/ - 0, /*tp_repr*/ - 0, /*tp_as_number*/ - 0, /*tp_as_sequence*/ - 0, /*tp_as_mapping*/ - 0, /*tp_hash*/ - 0, /*tp_call*/ - 0, /*tp_str*/ - PyObject_GenericGetAttr,/*tp_getattro*/ - PyObject_GenericSetAttr,/*tp_setattro*/ - 0, /*tp_as_buffer*/ - Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE, /*tp_flags*/ - BZ2File__doc__, /*tp_doc*/ - 0, /*tp_traverse*/ - 0, /*tp_clear*/ - 0, /*tp_richcompare*/ - 0, /*tp_weaklistoffset*/ - (getiterfunc)BZ2File_getiter, /*tp_iter*/ - (iternextfunc)BZ2File_iternext, /*tp_iternext*/ - BZ2File_methods, /*tp_methods*/ - 0, /*tp_members*/ - BZ2File_getset, /*tp_getset*/ - 0, /*tp_base*/ - 0, /*tp_dict*/ - 0, /*tp_descr_get*/ - 0, /*tp_descr_set*/ - 0, /*tp_dictoffset*/ - (initproc)BZ2File_init, /*tp_init*/ - PyType_GenericAlloc, /*tp_alloc*/ - PyType_GenericNew, /*tp_new*/ - PyObject_Free, /*tp_free*/ - 0, /*tp_is_gc*/ -}; - - -/* ===================================================================== */ -/* Methods of BZ2Comp. */ - -PyDoc_STRVAR(BZ2Comp_compress__doc__, -"compress(data) -> string\n\ -\n\ -Provide more data to the compressor object. It will return chunks of\n\ -compressed data whenever possible. When you've finished providing data\n\ -to compress, call the flush() method to finish the compression process,\n\ -and return what is left in the internal buffers.\n\ -"); - -static PyObject * -BZ2Comp_compress(BZ2CompObject *self, PyObject *args) -{ - Py_buffer pdata; - char *data; - int datasize; - int bufsize = SMALLCHUNK; - PY_LONG_LONG totalout; - PyObject *ret = NULL; - bz_stream *bzs = &self->bzs; - int bzerror; - - if (!PyArg_ParseTuple(args, "y*:compress", &pdata)) - return NULL; - data = pdata.buf; - datasize = pdata.len; - - if (datasize == 0) { - PyBuffer_Release(&pdata); - return PyBytes_FromStringAndSize("", 0); - } - - ACQUIRE_LOCK(self); - if (!self->running) { - PyErr_SetString(PyExc_ValueError, - "this object was already flushed"); - goto error; - } - - ret = PyBytes_FromStringAndSize(NULL, bufsize); - if (!ret) - goto error; - - bzs->next_in = data; - bzs->avail_in = datasize; - bzs->next_out = BUF(ret); - bzs->avail_out = bufsize; - - totalout = BZS_TOTAL_OUT(bzs); - - for (;;) { - Py_BEGIN_ALLOW_THREADS - bzerror = BZ2_bzCompress(bzs, BZ_RUN); - Py_END_ALLOW_THREADS - if (bzerror != BZ_RUN_OK) { - Util_CatchBZ2Error(bzerror); - goto error; - } - if (bzs->avail_in == 0) - break; /* no more input data */ - if (bzs->avail_out == 0) { - bufsize = Util_NewBufferSize(bufsize); - if (_PyBytes_Resize(&ret, bufsize) < 0) { - BZ2_bzCompressEnd(bzs); - goto error; - } - bzs->next_out = BUF(ret) + (BZS_TOTAL_OUT(bzs) - - totalout); - bzs->avail_out = bufsize - (bzs->next_out - BUF(ret)); - } - } - - if (_PyBytes_Resize(&ret, - (Py_ssize_t)(BZS_TOTAL_OUT(bzs) - totalout)) < 0) - goto error; - - RELEASE_LOCK(self); - PyBuffer_Release(&pdata); - return ret; - -error: - RELEASE_LOCK(self); - PyBuffer_Release(&pdata); - Py_XDECREF(ret); - return NULL; -} - -PyDoc_STRVAR(BZ2Comp_flush__doc__, -"flush() -> string\n\ -\n\ -Finish the compression process and return what is left in internal buffers.\n\ -You must not use the compressor object after calling this method.\n\ -"); - -static PyObject * -BZ2Comp_flush(BZ2CompObject *self) -{ - int bufsize = SMALLCHUNK; - PyObject *ret = NULL; - bz_stream *bzs = &self->bzs; - PY_LONG_LONG totalout; - int bzerror; - - ACQUIRE_LOCK(self); - if (!self->running) { - PyErr_SetString(PyExc_ValueError, "object was already " - "flushed"); - goto error; - } - self->running = 0; - - ret = PyBytes_FromStringAndSize(NULL, bufsize); - if (!ret) - goto error; - - bzs->next_out = BUF(ret); - bzs->avail_out = bufsize; - - totalout = BZS_TOTAL_OUT(bzs); - - for (;;) { - Py_BEGIN_ALLOW_THREADS - bzerror = BZ2_bzCompress(bzs, BZ_FINISH); - Py_END_ALLOW_THREADS - if (bzerror == BZ_STREAM_END) { - break; - } else if (bzerror != BZ_FINISH_OK) { - Util_CatchBZ2Error(bzerror); - goto error; - } - if (bzs->avail_out == 0) { - bufsize = Util_NewBufferSize(bufsize); - if (_PyBytes_Resize(&ret, bufsize) < 0) - goto error; - bzs->next_out = BUF(ret); - bzs->next_out = BUF(ret) + (BZS_TOTAL_OUT(bzs) - - totalout); - bzs->avail_out = bufsize - (bzs->next_out - BUF(ret)); - } - } - - if (bzs->avail_out != 0) { - if (_PyBytes_Resize(&ret, - (Py_ssize_t)(BZS_TOTAL_OUT(bzs) - totalout)) < 0) - goto error; - } - - RELEASE_LOCK(self); - return ret; - -error: - RELEASE_LOCK(self); - Py_XDECREF(ret); - return NULL; -} - -static PyMethodDef BZ2Comp_methods[] = { - {"compress", (PyCFunction)BZ2Comp_compress, METH_VARARGS, - BZ2Comp_compress__doc__}, - {"flush", (PyCFunction)BZ2Comp_flush, METH_NOARGS, - BZ2Comp_flush__doc__}, - {NULL, NULL} /* sentinel */ -}; - - -/* ===================================================================== */ -/* Slot definitions for BZ2Comp_Type. */ - -static int -BZ2Comp_init(BZ2CompObject *self, PyObject *args, PyObject *kwargs) -{ - int compresslevel = 9; - int bzerror; - static char *kwlist[] = {"compresslevel", 0}; - - if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|i:BZ2Compressor", - kwlist, &compresslevel)) - return -1; - - if (compresslevel < 1 || compresslevel > 9) { - PyErr_SetString(PyExc_ValueError, - "compresslevel must be between 1 and 9"); - goto error; - } - -#ifdef WITH_THREAD - self->lock = PyThread_allocate_lock(); - if (!self->lock) { - PyErr_SetString(PyExc_MemoryError, "unable to allocate lock"); - goto error; - } -#endif - - memset(&self->bzs, 0, sizeof(bz_stream)); - bzerror = BZ2_bzCompressInit(&self->bzs, compresslevel, 0, 0); - if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - goto error; - } - - self->running = 1; - - return 0; -error: -#ifdef WITH_THREAD - if (self->lock) { - PyThread_free_lock(self->lock); - self->lock = NULL; - } -#endif - return -1; -} - -static void -BZ2Comp_dealloc(BZ2CompObject *self) -{ -#ifdef WITH_THREAD - if (self->lock) - PyThread_free_lock(self->lock); -#endif - BZ2_bzCompressEnd(&self->bzs); - Py_TYPE(self)->tp_free((PyObject *)self); -} - - -/* ===================================================================== */ -/* BZ2Comp_Type definition. */ - -PyDoc_STRVAR(BZ2Comp__doc__, -"BZ2Compressor([compresslevel=9]) -> compressor object\n\ -\n\ -Create a new compressor object. This object may be used to compress\n\ -data sequentially. If you want to compress data in one shot, use the\n\ -compress() function instead. The compresslevel parameter, if given,\n\ -must be a number between 1 and 9.\n\ -"); - -static PyTypeObject BZ2Comp_Type = { - PyVarObject_HEAD_INIT(NULL, 0) - "bz2.BZ2Compressor", /*tp_name*/ - sizeof(BZ2CompObject), /*tp_basicsize*/ - 0, /*tp_itemsize*/ - (destructor)BZ2Comp_dealloc, /*tp_dealloc*/ - 0, /*tp_print*/ - 0, /*tp_getattr*/ - 0, /*tp_setattr*/ - 0, /*tp_reserved*/ - 0, /*tp_repr*/ - 0, /*tp_as_number*/ - 0, /*tp_as_sequence*/ - 0, /*tp_as_mapping*/ - 0, /*tp_hash*/ - 0, /*tp_call*/ - 0, /*tp_str*/ - PyObject_GenericGetAttr,/*tp_getattro*/ - PyObject_GenericSetAttr,/*tp_setattro*/ - 0, /*tp_as_buffer*/ - Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE, /*tp_flags*/ - BZ2Comp__doc__, /*tp_doc*/ - 0, /*tp_traverse*/ - 0, /*tp_clear*/ - 0, /*tp_richcompare*/ - 0, /*tp_weaklistoffset*/ - 0, /*tp_iter*/ - 0, /*tp_iternext*/ - BZ2Comp_methods, /*tp_methods*/ - 0, /*tp_members*/ - 0, /*tp_getset*/ - 0, /*tp_base*/ - 0, /*tp_dict*/ - 0, /*tp_descr_get*/ - 0, /*tp_descr_set*/ - 0, /*tp_dictoffset*/ - (initproc)BZ2Comp_init, /*tp_init*/ - PyType_GenericAlloc, /*tp_alloc*/ - PyType_GenericNew, /*tp_new*/ - PyObject_Free, /*tp_free*/ - 0, /*tp_is_gc*/ -}; - - -/* ===================================================================== */ -/* Members of BZ2Decomp. */ - -#undef OFF -#define OFF(x) offsetof(BZ2DecompObject, x) - -static PyMemberDef BZ2Decomp_members[] = { - {"unused_data", T_OBJECT, OFF(unused_data), READONLY}, - {NULL} /* Sentinel */ -}; - - -/* ===================================================================== */ -/* Methods of BZ2Decomp. */ - -PyDoc_STRVAR(BZ2Decomp_decompress__doc__, -"decompress(data) -> string\n\ -\n\ -Provide more data to the decompressor object. It will return chunks\n\ -of decompressed data whenever possible. If you try to decompress data\n\ -after the end of stream is found, EOFError will be raised. If any data\n\ -was found after the end of stream, it'll be ignored and saved in\n\ -unused_data attribute.\n\ -"); - -static PyObject * -BZ2Decomp_decompress(BZ2DecompObject *self, PyObject *args) -{ - Py_buffer pdata; - char *data; - int datasize; - int bufsize = SMALLCHUNK; - PY_LONG_LONG totalout; - PyObject *ret = NULL; - bz_stream *bzs = &self->bzs; - int bzerror; - - if (!PyArg_ParseTuple(args, "y*:decompress", &pdata)) - return NULL; - data = pdata.buf; - datasize = pdata.len; - - ACQUIRE_LOCK(self); - if (!self->running) { - PyErr_SetString(PyExc_EOFError, "end of stream was " - "already found"); - goto error; - } - - ret = PyBytes_FromStringAndSize(NULL, bufsize); - if (!ret) - goto error; - - bzs->next_in = data; - bzs->avail_in = datasize; - bzs->next_out = BUF(ret); - bzs->avail_out = bufsize; - - totalout = BZS_TOTAL_OUT(bzs); - - for (;;) { - Py_BEGIN_ALLOW_THREADS - bzerror = BZ2_bzDecompress(bzs); - Py_END_ALLOW_THREADS - if (bzerror == BZ_STREAM_END) { - if (bzs->avail_in != 0) { - Py_DECREF(self->unused_data); - self->unused_data = - PyBytes_FromStringAndSize(bzs->next_in, - bzs->avail_in); - } - self->running = 0; - break; - } - if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - goto error; - } - if (bzs->avail_in == 0) - break; /* no more input data */ - if (bzs->avail_out == 0) { - bufsize = Util_NewBufferSize(bufsize); - if (_PyBytes_Resize(&ret, bufsize) < 0) { - BZ2_bzDecompressEnd(bzs); - goto error; - } - bzs->next_out = BUF(ret); - bzs->next_out = BUF(ret) + (BZS_TOTAL_OUT(bzs) - - totalout); - bzs->avail_out = bufsize - (bzs->next_out - BUF(ret)); - } - } - - if (bzs->avail_out != 0) { - if (_PyBytes_Resize(&ret, - (Py_ssize_t)(BZS_TOTAL_OUT(bzs) - totalout)) < 0) - goto error; - } - - RELEASE_LOCK(self); - PyBuffer_Release(&pdata); - return ret; - -error: - RELEASE_LOCK(self); - PyBuffer_Release(&pdata); - Py_XDECREF(ret); - return NULL; -} - -static PyMethodDef BZ2Decomp_methods[] = { - {"decompress", (PyCFunction)BZ2Decomp_decompress, METH_VARARGS, BZ2Decomp_decompress__doc__}, - {NULL, NULL} /* sentinel */ -}; - - -/* ===================================================================== */ -/* Slot definitions for BZ2Decomp_Type. */ - -static int -BZ2Decomp_init(BZ2DecompObject *self, PyObject *args, PyObject *kwargs) -{ - int bzerror; - - if (!PyArg_ParseTuple(args, ":BZ2Decompressor")) - return -1; - -#ifdef WITH_THREAD - self->lock = PyThread_allocate_lock(); - if (!self->lock) { - PyErr_SetString(PyExc_MemoryError, "unable to allocate lock"); - goto error; - } -#endif - - self->unused_data = PyBytes_FromStringAndSize("", 0); - if (!self->unused_data) - goto error; - - memset(&self->bzs, 0, sizeof(bz_stream)); - bzerror = BZ2_bzDecompressInit(&self->bzs, 0, 0); - if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - goto error; - } - - self->running = 1; - - return 0; - -error: -#ifdef WITH_THREAD - if (self->lock) { - PyThread_free_lock(self->lock); - self->lock = NULL; - } -#endif - Py_CLEAR(self->unused_data); - return -1; -} - -static void -BZ2Decomp_dealloc(BZ2DecompObject *self) -{ -#ifdef WITH_THREAD - if (self->lock) - PyThread_free_lock(self->lock); -#endif - Py_XDECREF(self->unused_data); - BZ2_bzDecompressEnd(&self->bzs); - Py_TYPE(self)->tp_free((PyObject *)self); -} - - -/* ===================================================================== */ -/* BZ2Decomp_Type definition. */ - -PyDoc_STRVAR(BZ2Decomp__doc__, -"BZ2Decompressor() -> decompressor object\n\ -\n\ -Create a new decompressor object. This object may be used to decompress\n\ -data sequentially. If you want to decompress data in one shot, use the\n\ -decompress() function instead.\n\ -"); - -static PyTypeObject BZ2Decomp_Type = { - PyVarObject_HEAD_INIT(NULL, 0) - "bz2.BZ2Decompressor", /*tp_name*/ - sizeof(BZ2DecompObject), /*tp_basicsize*/ - 0, /*tp_itemsize*/ - (destructor)BZ2Decomp_dealloc, /*tp_dealloc*/ - 0, /*tp_print*/ - 0, /*tp_getattr*/ - 0, /*tp_setattr*/ - 0, /*tp_reserved*/ - 0, /*tp_repr*/ - 0, /*tp_as_number*/ - 0, /*tp_as_sequence*/ - 0, /*tp_as_mapping*/ - 0, /*tp_hash*/ - 0, /*tp_call*/ - 0, /*tp_str*/ - PyObject_GenericGetAttr,/*tp_getattro*/ - PyObject_GenericSetAttr,/*tp_setattro*/ - 0, /*tp_as_buffer*/ - Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE, /*tp_flags*/ - BZ2Decomp__doc__, /*tp_doc*/ - 0, /*tp_traverse*/ - 0, /*tp_clear*/ - 0, /*tp_richcompare*/ - 0, /*tp_weaklistoffset*/ - 0, /*tp_iter*/ - 0, /*tp_iternext*/ - BZ2Decomp_methods, /*tp_methods*/ - BZ2Decomp_members, /*tp_members*/ - 0, /*tp_getset*/ - 0, /*tp_base*/ - 0, /*tp_dict*/ - 0, /*tp_descr_get*/ - 0, /*tp_descr_set*/ - 0, /*tp_dictoffset*/ - (initproc)BZ2Decomp_init, /*tp_init*/ - PyType_GenericAlloc, /*tp_alloc*/ - PyType_GenericNew, /*tp_new*/ - PyObject_Free, /*tp_free*/ - 0, /*tp_is_gc*/ -}; - - -/* ===================================================================== */ -/* Module functions. */ - -PyDoc_STRVAR(bz2_compress__doc__, -"compress(data [, compresslevel=9]) -> string\n\ -\n\ -Compress data in one shot. If you want to compress data sequentially,\n\ -use an instance of BZ2Compressor instead. The compresslevel parameter, if\n\ -given, must be a number between 1 and 9.\n\ -"); - -static PyObject * -bz2_compress(PyObject *self, PyObject *args, PyObject *kwargs) -{ - int compresslevel=9; - Py_buffer pdata; - char *data; - int datasize; - int bufsize; - PyObject *ret = NULL; - bz_stream _bzs; - bz_stream *bzs = &_bzs; - int bzerror; - static char *kwlist[] = {"data", "compresslevel", 0}; - - if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y*|i", - kwlist, &pdata, - &compresslevel)) - return NULL; - data = pdata.buf; - datasize = pdata.len; - - if (compresslevel < 1 || compresslevel > 9) { - PyErr_SetString(PyExc_ValueError, - "compresslevel must be between 1 and 9"); - PyBuffer_Release(&pdata); - return NULL; - } - - /* Conforming to bz2 manual, this is large enough to fit compressed - * data in one shot. We will check it later anyway. */ - bufsize = datasize + (datasize/100+1) + 600; - - ret = PyBytes_FromStringAndSize(NULL, bufsize); - if (!ret) { - PyBuffer_Release(&pdata); - return NULL; - } - - memset(bzs, 0, sizeof(bz_stream)); - - bzs->next_in = data; - bzs->avail_in = datasize; - bzs->next_out = BUF(ret); - bzs->avail_out = bufsize; - - bzerror = BZ2_bzCompressInit(bzs, compresslevel, 0, 0); - if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - PyBuffer_Release(&pdata); - Py_DECREF(ret); - return NULL; - } - - for (;;) { - Py_BEGIN_ALLOW_THREADS - bzerror = BZ2_bzCompress(bzs, BZ_FINISH); - Py_END_ALLOW_THREADS - if (bzerror == BZ_STREAM_END) { - break; - } else if (bzerror != BZ_FINISH_OK) { - BZ2_bzCompressEnd(bzs); - Util_CatchBZ2Error(bzerror); - PyBuffer_Release(&pdata); - Py_DECREF(ret); - return NULL; - } - if (bzs->avail_out == 0) { - bufsize = Util_NewBufferSize(bufsize); - if (_PyBytes_Resize(&ret, bufsize) < 0) { - BZ2_bzCompressEnd(bzs); - PyBuffer_Release(&pdata); - return NULL; - } - bzs->next_out = BUF(ret) + BZS_TOTAL_OUT(bzs); - bzs->avail_out = bufsize - (bzs->next_out - BUF(ret)); - } - } - - if (bzs->avail_out != 0) { - if (_PyBytes_Resize(&ret, (Py_ssize_t)BZS_TOTAL_OUT(bzs)) < 0) { - ret = NULL; - } - } - BZ2_bzCompressEnd(bzs); - - PyBuffer_Release(&pdata); - return ret; -} - -PyDoc_STRVAR(bz2_decompress__doc__, -"decompress(data) -> decompressed data\n\ -\n\ -Decompress data in one shot. If you want to decompress data sequentially,\n\ -use an instance of BZ2Decompressor instead.\n\ -"); - -static PyObject * -bz2_decompress(PyObject *self, PyObject *args) -{ - Py_buffer pdata; - char *data; - int datasize; - int bufsize = SMALLCHUNK; - PyObject *ret; - bz_stream _bzs; - bz_stream *bzs = &_bzs; - int bzerror; - - if (!PyArg_ParseTuple(args, "y*:decompress", &pdata)) - return NULL; - data = pdata.buf; - datasize = pdata.len; - - if (datasize == 0) { - PyBuffer_Release(&pdata); - return PyBytes_FromStringAndSize("", 0); - } - - ret = PyBytes_FromStringAndSize(NULL, bufsize); - if (!ret) { - PyBuffer_Release(&pdata); - return NULL; - } - - memset(bzs, 0, sizeof(bz_stream)); - - bzs->next_in = data; - bzs->avail_in = datasize; - bzs->next_out = BUF(ret); - bzs->avail_out = bufsize; - - bzerror = BZ2_bzDecompressInit(bzs, 0, 0); - if (bzerror != BZ_OK) { - Util_CatchBZ2Error(bzerror); - Py_DECREF(ret); - PyBuffer_Release(&pdata); - return NULL; - } - - for (;;) { - Py_BEGIN_ALLOW_THREADS - bzerror = BZ2_bzDecompress(bzs); - Py_END_ALLOW_THREADS - if (bzerror == BZ_STREAM_END) { - break; - } else if (bzerror != BZ_OK) { - BZ2_bzDecompressEnd(bzs); - Util_CatchBZ2Error(bzerror); - PyBuffer_Release(&pdata); - Py_DECREF(ret); - return NULL; - } - if (bzs->avail_in == 0) { - BZ2_bzDecompressEnd(bzs); - PyErr_SetString(PyExc_ValueError, - "couldn't find end of stream"); - PyBuffer_Release(&pdata); - Py_DECREF(ret); - return NULL; - } - if (bzs->avail_out == 0) { - bufsize = Util_NewBufferSize(bufsize); - if (_PyBytes_Resize(&ret, bufsize) < 0) { - BZ2_bzDecompressEnd(bzs); - PyBuffer_Release(&pdata); - return NULL; - } - bzs->next_out = BUF(ret) + BZS_TOTAL_OUT(bzs); - bzs->avail_out = bufsize - (bzs->next_out - BUF(ret)); - } - } - - if (bzs->avail_out != 0) { - if (_PyBytes_Resize(&ret, (Py_ssize_t)BZS_TOTAL_OUT(bzs)) < 0) { - ret = NULL; - } - } - BZ2_bzDecompressEnd(bzs); - PyBuffer_Release(&pdata); - - return ret; -} - -static PyMethodDef bz2_methods[] = { - {"compress", (PyCFunction) bz2_compress, METH_VARARGS|METH_KEYWORDS, - bz2_compress__doc__}, - {"decompress", (PyCFunction) bz2_decompress, METH_VARARGS, - bz2_decompress__doc__}, - {NULL, NULL} /* sentinel */ -}; - -/* ===================================================================== */ -/* Initialization function. */ - -PyDoc_STRVAR(bz2__doc__, -"The python bz2 module provides a comprehensive interface for\n\ -the bz2 compression library. It implements a complete file\n\ -interface, one shot (de)compression functions, and types for\n\ -sequential (de)compression.\n\ -"); - - -static struct PyModuleDef bz2module = { - PyModuleDef_HEAD_INIT, - "bz2", - bz2__doc__, - -1, - bz2_methods, - NULL, - NULL, - NULL, - NULL -}; - -PyMODINIT_FUNC -PyInit_bz2(void) -{ - PyObject *m; - - if (PyType_Ready(&BZ2File_Type) < 0) - return NULL; - if (PyType_Ready(&BZ2Comp_Type) < 0) - return NULL; - if (PyType_Ready(&BZ2Decomp_Type) < 0) - return NULL; - - m = PyModule_Create(&bz2module); - if (m == NULL) - return NULL; - - PyModule_AddObject(m, "__author__", PyUnicode_FromString(__author__)); - - Py_INCREF(&BZ2File_Type); - PyModule_AddObject(m, "BZ2File", (PyObject *)&BZ2File_Type); - - Py_INCREF(&BZ2Comp_Type); - PyModule_AddObject(m, "BZ2Compressor", (PyObject *)&BZ2Comp_Type); - - Py_INCREF(&BZ2Decomp_Type); - PyModule_AddObject(m, "BZ2Decompressor", (PyObject *)&BZ2Decomp_Type); - return m; -} diff --git a/PCbuild/bz2.vcproj b/PCbuild/_bz2.vcproj index 035736e..e6b9c88 100644 --- a/PCbuild/bz2.vcproj +++ b/PCbuild/_bz2.vcproj @@ -2,7 +2,7 @@ <VisualStudioProject
ProjectType="Visual C++"
Version="9,00"
- Name="bz2"
+ Name="_bz2"
ProjectGUID="{73FCD2BD-F133-46B7-8EC1-144CD82A59D5}"
RootNamespace="bz2"
Keyword="Win32Proj"
@@ -527,7 +527,7 @@ Name="Source Files"
>
<File
- RelativePath="..\Modules\bz2module.c"
+ RelativePath="..\Modules\_bz2module.c"
>
</File>
</Filter>
diff --git a/PCbuild/pcbuild.sln b/PCbuild/pcbuild.sln index 1de4ea1..ed3a7a0 100644 --- a/PCbuild/pcbuild.sln +++ b/PCbuild/pcbuild.sln @@ -87,7 +87,7 @@ Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "_tkinter", "_tkinter.vcproj {CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26} = {CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26}
EndProjectSection
EndProject
-Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "bz2", "bz2.vcproj", "{73FCD2BD-F133-46B7-8EC1-144CD82A59D5}"
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "_bz2", "_bz2.vcproj", "{73FCD2BD-F133-46B7-8EC1-144CD82A59D5}"
ProjectSection(ProjectDependencies) = postProject
{CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26} = {CF7AC3D1-E2DF-41D2-BEA6-1E2556CDEA26}
EndProjectSection
diff --git a/PCbuild/readme.txt b/PCbuild/readme.txt index af432c4..11ab127 100644 --- a/PCbuild/readme.txt +++ b/PCbuild/readme.txt @@ -112,9 +112,9 @@ _tkinter pre-built Tcl/Tk in either ..\..\tcltk for 32-bit or ..\..\tcltk64 for 64-bit (relative to this directory). See below for instructions to build Tcl/Tk. -bz2 - Python wrapper for the libbz2 compression library. Homepage - http://sources.redhat.com/bzip2/ +_bz2 + Python wrapper for the libbzip2 compression library. Homepage + http://www.bzip.org/ Download the source from the python.org copy into the dist directory: @@ -1233,11 +1233,11 @@ class PyBuildExt(build_ext): bz2_extra_link_args = ('-Wl,-search_paths_first',) else: bz2_extra_link_args = () - exts.append( Extension('bz2', ['bz2module.c'], + exts.append( Extension('_bz2', ['_bz2module.c'], libraries = ['bz2'], extra_link_args = bz2_extra_link_args) ) else: - missing.append('bz2') + missing.append('_bz2') # Interface to the Expat XML parser # |