diff options
author | Guido van Rossum <guido@python.org> | 2007-11-06 21:34:58 (GMT) |
---|---|---|
committer | Guido van Rossum <guido@python.org> | 2007-11-06 21:34:58 (GMT) |
commit | 98297ee7815939b124156e438b22bd652d67b5db (patch) | |
tree | a9d239ebd87c73af2571ab48003984c4e18e27e5 | |
parent | a19f80c6df2df5e8a5d0cff37131097835ef971e (diff) | |
download | cpython-98297ee7815939b124156e438b22bd652d67b5db.zip cpython-98297ee7815939b124156e438b22bd652d67b5db.tar.gz cpython-98297ee7815939b124156e438b22bd652d67b5db.tar.bz2 |
Merging the py3k-pep3137 branch back into the py3k branch.
No detailed change log; just check out the change log for the py3k-pep3137
branch. The most obvious changes:
- str8 renamed to bytes (PyString at the C level);
- bytes renamed to buffer (PyBytes at the C level);
- PyString and PyUnicode are no longer compatible.
I.e. we now have an immutable bytes type and a mutable bytes type.
The behavior of PyString was modified quite a bit, to make it more
bytes-like. Some changes are still on the to-do list.
148 files changed, 2528 insertions, 3512 deletions
diff --git a/Doc/library/array.rst b/Doc/library/array.rst index c2b7a44..4747b63 100644 --- a/Doc/library/array.rst +++ b/Doc/library/array.rst @@ -56,8 +56,9 @@ The module defines the following type: .. function:: array(typecode[, initializer]) Return a new array whose items are restricted by *typecode*, and initialized - from the optional *initializer* value, which must be a list, string, or iterable - over elements of the appropriate type. + from the optional *initializer* value, which must be a list, object + supporting the buffer interface, or iterable over elements of the + appropriate type. If given a list or string, the initializer is passed to the new array's :meth:`fromlist`, :meth:`fromstring`, or :meth:`fromunicode` method (see below) @@ -69,6 +70,10 @@ The module defines the following type: Obsolete alias for :func:`array`. +.. data:: typecodes + + A string with all available type codes. + Array objects support the ordinary sequence operations of indexing, slicing, concatenation, and multiplication. When using slice assignment, the assigned value must be an array object with the same type code; in all other cases, diff --git a/Doc/library/exceptions.rst b/Doc/library/exceptions.rst index 34fb429..9453b7a 100644 --- a/Doc/library/exceptions.rst +++ b/Doc/library/exceptions.rst @@ -405,7 +405,11 @@ module for more information. Base class for warnings related to Unicode. -The class hierarchy for built-in exceptions is: +.. exception:: BytesWarning + + Base class for warnings related to :class:`bytes` and :class:`buffer`. +The class hierarchy for built-in exceptions is: + .. literalinclude:: ../../Lib/test/exception_hierarchy.txt diff --git a/Doc/library/functions.rst b/Doc/library/functions.rst index 63f2c33..d554a08 100644 --- a/Doc/library/functions.rst +++ b/Doc/library/functions.rst @@ -118,18 +118,19 @@ available. They are listed here in alphabetical order. .. index:: pair: Boolean; type -.. function:: bytes([arg[, encoding[, errors]]]) +.. function:: buffer([arg[, encoding[, errors]]]) - Return a new array of bytes. The :class:`bytes` type is a mutable sequence + Return a new array of bytes. The :class:`buffer` type is an immutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of - mutable sequences, described in :ref:`typesseq-mutable`, as well as a few - methods borrowed from strings, described in :ref:`bytes-methods`. + mutable sequences, described in :ref:`typesseq-mutable`, as well as most methods + that the :class:`str` type has, see :ref:`bytes-methods`. The optional *arg* parameter can be used to initialize the array in a few different ways: * If it is a *string*, you must also give the *encoding* (and optionally, - *errors*) parameters; :func:`bytes` then acts like :meth:`str.encode`. + *errors*) parameters; :func:`buffer` then converts the Unicode string to + bytes using :meth:`str.encode`. * If it is an *integer*, the array will have that size and will be initialized with null bytes. @@ -137,12 +138,24 @@ available. They are listed here in alphabetical order. * If it is an object conforming to the *buffer* interface, a read-only buffer of the object will be used to initialize the bytes array. - * If it is an *iterable*, it must be an iterable of integers in the range 0 - <= x < 256, which are used as the initial contents of the array. + * If it is an *iterable*, it must be an iterable of integers in the range + ``0 <= x < 256``, which are used as the initial contents of the array. Without an argument, an array of size 0 is created. +.. function:: bytes([arg[, encoding[, errors]]]) + + Return a new "bytes" object, which is an immutable sequence of integers in + the range ``0 <= x < 256``. :class:`bytes` is an immutable version of + :class:`buffer` -- it has the same non-mutating methods and the same indexing + and slicing behavior. + + Accordingly, constructor arguments are interpreted as for :func:`buffer`. + + Bytes objects can also be created with literals, see :ref:`strings`. + + .. function:: chr(i) Return the string of one character whose Unicode codepoint is the integer diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst index f557b1f..9073bca 100644 --- a/Doc/library/stdtypes.rst +++ b/Doc/library/stdtypes.rst @@ -1313,9 +1313,11 @@ Bytes and Buffer Methods Bytes and buffer objects, being "strings of bytes", have all methods found on strings, with the exception of :func:`encode`, :func:`format` and -:func:`isidentifier`, which do not make sense with these types. Wherever one of -these methods needs to interpret the bytes as characters (e.g. the :func:`is...` -methods), the ASCII character set is assumed. +:func:`isidentifier`, which do not make sense with these types. For converting +the objects to strings, they have a :func:`decode` method. + +Wherever one of these methods needs to interpret the bytes as characters +(e.g. the :func:`is...` methods), the ASCII character set is assumed. .. note:: diff --git a/Doc/library/warnings.rst b/Doc/library/warnings.rst index 684209f..9a10385 100644 --- a/Doc/library/warnings.rst +++ b/Doc/library/warnings.rst @@ -80,6 +80,10 @@ following warnings category classes are currently defined: | :exc:`UnicodeWarning` | Base category for warnings related to | | | Unicode. | +----------------------------------+-----------------------------------------------+ +| :exc:`BytesWarning` | Base category for warnings related to | +| | :class:`bytes` and :class:`buffer`. | ++----------------------------------+-----------------------------------------------+ + While these are technically built-in exceptions, they are documented here, because conceptually they belong to the warnings mechanism. diff --git a/Doc/whatsnew/3.0.rst b/Doc/whatsnew/3.0.rst index afe842d..8d6babd 100644 --- a/Doc/whatsnew/3.0.rst +++ b/Doc/whatsnew/3.0.rst @@ -131,11 +131,6 @@ changes to rarely used features.) that if a file is opened using an incorrect mode or encoding, I/O will likely fail. -* Bytes aren't hashable, and don't support certain operations like - ``b.lower()``, ``b.strip()`` or ``b.split()``. - For the latter two, use ``b.strip(b" \t\r\n\f")`` or - ``b.split(b" \t\r\n\f")``. - * ``map()`` and ``filter()`` return iterators. A quick fix is e.g. ``list(map(...))``, but a better fix is often to use a list comprehension (especially when the original code uses ``lambda``). @@ -158,13 +153,11 @@ Strings and Bytes * There is only one string type; its name is ``str`` but its behavior and implementation are more like ``unicode`` in 2.x. -* PEP 358: There is a new type, ``bytes``, to represent binary data +* PEP 3137: There is a new type, ``bytes``, to represent binary data (and encoded text, which is treated as binary data until you decide to decode it). The ``str`` and ``bytes`` types cannot be mixed; you must always explicitly convert between them, using the ``.encode()`` - (str -> bytes) or ``.decode()`` (bytes -> str) methods. Comparing a - bytes and a str instance for equality raises a TypeError; this - catches common mistakes. + (str -> bytes) or ``.decode()`` (bytes -> str) methods. * PEP 3112: Bytes literals. E.g. b"abc". diff --git a/Include/abstract.h b/Include/abstract.h index 38628bb..dfef938 100644 --- a/Include/abstract.h +++ b/Include/abstract.h @@ -259,7 +259,7 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx*/ string representation on success, NULL on failure. This is the equivalent of the Python expression: repr(o). - Called by the repr() built-in function and by reverse quotes. + Called by the repr() built-in function. */ @@ -271,20 +271,7 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx*/ string representation on success, NULL on failure. This is the equivalent of the Python expression: str(o).) - Called by the str() built-in function and by the print - statement. - - */ - - /* Implemented elsewhere: - - PyObject *PyObject_Unicode(PyObject *o); - - Compute the unicode representation of object, o. Returns the - unicode representation on success, NULL on failure. This is - the equivalent of the Python expression: unistr(o).) - - Called by the unistr() built-in function. + Called by the str() and print() built-in functions. */ diff --git a/Include/object.h b/Include/object.h index 061fe11..e96e6a7 100644 --- a/Include/object.h +++ b/Include/object.h @@ -431,10 +431,8 @@ PyAPI_FUNC(int) PyObject_Print(PyObject *, FILE *, int); PyAPI_FUNC(void) _Py_BreakPoint(void); PyAPI_FUNC(void) _PyObject_Dump(PyObject *); PyAPI_FUNC(PyObject *) PyObject_Repr(PyObject *); -PyAPI_FUNC(PyObject *) PyObject_ReprStr8(PyObject *); -PyAPI_FUNC(PyObject *) _PyObject_Str(PyObject *); PyAPI_FUNC(PyObject *) PyObject_Str(PyObject *); -PyAPI_FUNC(PyObject *) PyObject_Unicode(PyObject *); +#define PyObject_Unicode PyObject_Str /* Compatibility */ PyAPI_FUNC(int) PyObject_Compare(PyObject *, PyObject *); PyAPI_FUNC(PyObject *) PyObject_RichCompare(PyObject *, PyObject *, int); PyAPI_FUNC(int) PyObject_RichCompareBool(PyObject *, PyObject *, int); @@ -478,7 +476,7 @@ PyAPI_FUNC(long) _Py_HashDouble(double); PyAPI_FUNC(long) _Py_HashPointer(void*); /* Helper for passing objects to printf and the like */ -#define PyObject_REPR(obj) PyString_AS_STRING(PyObject_ReprStr8(obj)) +#define PyObject_REPR(obj) PyUnicode_AsString(PyObject_Repr(obj)) /* Flag bits for printing: */ #define Py_PRINT_RAW 1 /* No string quotes etc. */ diff --git a/Include/opcode.h b/Include/opcode.h index f2abab3..100262a 100644 --- a/Include/opcode.h +++ b/Include/opcode.h @@ -65,7 +65,7 @@ extern "C" { #define RETURN_VALUE 83 #define IMPORT_STAR 84 -#define MAKE_BYTES 85 + #define YIELD_VALUE 86 #define POP_BLOCK 87 #define END_FINALLY 88 diff --git a/Include/pydebug.h b/Include/pydebug.h index 3b5e34d..756f1e6 100644 --- a/Include/pydebug.h +++ b/Include/pydebug.h @@ -11,6 +11,7 @@ PyAPI_DATA(int) Py_InteractiveFlag; PyAPI_DATA(int) Py_InspectFlag; PyAPI_DATA(int) Py_OptimizeFlag; PyAPI_DATA(int) Py_NoSiteFlag; +PyAPI_DATA(int) Py_BytesWarningFlag; PyAPI_DATA(int) Py_UseClassExceptionsFlag; PyAPI_DATA(int) Py_FrozenFlag; PyAPI_DATA(int) Py_TabcheckFlag; diff --git a/Include/pyerrors.h b/Include/pyerrors.h index 6e2f5cf..b499778 100644 --- a/Include/pyerrors.h +++ b/Include/pyerrors.h @@ -165,6 +165,7 @@ PyAPI_DATA(PyObject *) PyExc_RuntimeWarning; PyAPI_DATA(PyObject *) PyExc_FutureWarning; PyAPI_DATA(PyObject *) PyExc_ImportWarning; PyAPI_DATA(PyObject *) PyExc_UnicodeWarning; +PyAPI_DATA(PyObject *) PyExc_BytesWarning; /* Convenience functions */ diff --git a/Include/stringobject.h b/Include/stringobject.h index 223d382..8241f1e 100644 --- a/Include/stringobject.h +++ b/Include/stringobject.h @@ -25,26 +25,17 @@ functions should be applied to nil objects. */ /* Caching the hash (ob_shash) saves recalculation of a string's hash value. - Interning strings (ob_sstate) tries to ensure that only one string - object with a given value exists, so equality tests can be one pointer - comparison. This is generally restricted to strings that "look like" - Python identifiers, although the sys.intern() function can be used to force - interning of any string. - Together, these sped the interpreter by up to 20%. */ + This significantly speeds up dict lookups. */ typedef struct { PyObject_VAR_HEAD long ob_shash; - int ob_sstate; char ob_sval[1]; /* Invariants: * ob_sval contains space for 'ob_size+1' elements. * ob_sval[ob_size] == 0. * ob_shash is the hash of the string or -1 if not computed yet. - * ob_sstate != 0 iff the string object is in stringobject.c's - * 'interned' dictionary; in this case the two references - * from 'interned' to this object are *not counted* in ob_refcnt. */ } PyStringObject; @@ -74,86 +65,20 @@ PyAPI_FUNC(PyObject *) PyString_DecodeEscape(const char *, Py_ssize_t, const char *, Py_ssize_t, const char *); -PyAPI_FUNC(void) PyString_InternInPlace(PyObject **); -PyAPI_FUNC(void) PyString_InternImmortal(PyObject **); -PyAPI_FUNC(PyObject *) PyString_InternFromString(const char *); -PyAPI_FUNC(void) _Py_ReleaseInternedStrings(void); - -/* Use only if you know it's a string */ -#define PyString_CHECK_INTERNED(op) (((PyStringObject *)(op))->ob_sstate) - /* Macro, trading safety for speed */ -#define PyString_AS_STRING(op) (assert(PyString_Check(op)),(((PyStringObject *)(op))->ob_sval)) +#define PyString_AS_STRING(op) (assert(PyString_Check(op)), \ + (((PyStringObject *)(op))->ob_sval)) #define PyString_GET_SIZE(op) (assert(PyString_Check(op)),Py_Size(op)) /* _PyString_Join(sep, x) is like sep.join(x). sep must be PyStringObject*, x must be an iterable object. */ PyAPI_FUNC(PyObject *) _PyString_Join(PyObject *sep, PyObject *x); -/* --- Generic Codecs ----------------------------------------------------- */ - -/* Create an object by decoding the encoded string s of the - given size. */ - -PyAPI_FUNC(PyObject*) PyString_Decode( - const char *s, /* encoded string */ - Py_ssize_t size, /* size of buffer */ - const char *encoding, /* encoding */ - const char *errors /* error handling */ - ); - -/* Encodes a string object and returns the result as Python - object. */ - -PyAPI_FUNC(PyObject*) PyString_AsEncodedObject( - PyObject *str, /* string object */ - const char *encoding, /* encoding */ - const char *errors /* error handling */ - ); - -/* Encodes a string object and returns the result as Python string - object. - - If the codec returns an Unicode object, the object is converted - back to a string using the default encoding. - - DEPRECATED - use PyString_AsEncodedObject() instead. */ - -PyAPI_FUNC(PyObject*) PyString_AsEncodedString( - PyObject *str, /* string object */ - const char *encoding, /* encoding */ - const char *errors /* error handling */ - ); - -/* Decodes a string object and returns the result as Python - object. */ - -PyAPI_FUNC(PyObject*) PyString_AsDecodedObject( - PyObject *str, /* string object */ - const char *encoding, /* encoding */ - const char *errors /* error handling */ - ); - -/* Decodes a string object and returns the result as Python string - object. - - If the codec returns an Unicode object, the object is converted - back to a string using the default encoding. - - DEPRECATED - use PyString_AsDecodedObject() instead. */ - -PyAPI_FUNC(PyObject*) PyString_AsDecodedString( - PyObject *str, /* string object */ - const char *encoding, /* encoding */ - const char *errors /* error handling */ - ); - /* Provides access to the internal data buffer and size of a string object or the default encoded version of an Unicode object. Passing NULL as *len parameter will force the string buffer to be 0-terminated (passing a string with embedded NULL characters will cause an exception). */ - PyAPI_FUNC(int) PyString_AsStringAndSize( register PyObject *obj, /* string or Unicode object */ register char **s, /* pointer to buffer variable */ @@ -162,6 +87,12 @@ PyAPI_FUNC(int) PyString_AsStringAndSize( strings) */ ); +/* Flags used by string formatting */ +#define F_LJUST (1<<0) +#define F_SIGN (1<<1) +#define F_BLANK (1<<2) +#define F_ALT (1<<3) +#define F_ZERO (1<<4) #ifdef __cplusplus } diff --git a/Lib/_abcoll.py b/Lib/_abcoll.py index 8f630bf..ec3e2f8 100644 --- a/Lib/_abcoll.py +++ b/Lib/_abcoll.py @@ -489,7 +489,7 @@ class Sequence(metaclass=ABCMeta): Sequence.register(tuple) Sequence.register(str) -Sequence.register(str8) +Sequence.register(bytes) Sequence.register(memoryview) diff --git a/Lib/base64.py b/Lib/base64.py index e100e0f..18beffc 100755 --- a/Lib/base64.py +++ b/Lib/base64.py @@ -27,10 +27,13 @@ __all__ = [ ] +bytes_buffer = (bytes, buffer) # Types acceptable as binary data + + def _translate(s, altchars): - if not isinstance(s, bytes): + if not isinstance(s, bytes_buffer): raise TypeError("expected bytes, not %s" % s.__class__.__name__) - translation = bytes(range(256)) + translation = buffer(range(256)) for k, v in altchars.items(): translation[ord(k)] = v[0] return s.translate(translation) @@ -49,12 +52,12 @@ def b64encode(s, altchars=None): The encoded byte string is returned. """ - if not isinstance(s, bytes): + if not isinstance(s, bytes_buffer): s = bytes(s, "ascii") # Strip off the trailing newline encoded = binascii.b2a_base64(s)[:-1] if altchars is not None: - if not isinstance(altchars, bytes): + if not isinstance(altchars, bytes_buffer): altchars = bytes(altchars, "ascii") assert len(altchars) == 2, repr(altchars) return _translate(encoded, {'+': altchars[0:1], '/': altchars[1:2]}) @@ -72,10 +75,10 @@ def b64decode(s, altchars=None): s were incorrectly padded or if there are non-alphabet characters present in the string. """ - if not isinstance(s, bytes): + if not isinstance(s, bytes_buffer): s = bytes(s) if altchars is not None: - if not isinstance(altchars, bytes): + if not isinstance(altchars, bytes_buffer): altchars = bytes(altchars, "ascii") assert len(altchars) == 2, repr(altchars) s = _translate(s, {chr(altchars[0]): b'+', chr(altchars[1]): b'/'}) @@ -144,7 +147,7 @@ def b32encode(s): s is the byte string to encode. The encoded byte string is returned. """ - if not isinstance(s, bytes): + if not isinstance(s, bytes_buffer): s = bytes(s) quanta, leftover = divmod(len(s), 5) # Pad the last quantum with zero bits if necessary @@ -201,7 +204,7 @@ def b32decode(s, casefold=False, map01=None): the input is incorrectly padded or if there are non-alphabet characters present in the input. """ - if not isinstance(s, bytes): + if not isinstance(s, bytes_buffer): s = bytes(s) quanta, leftover = divmod(len(s), 8) if leftover: @@ -210,12 +213,12 @@ def b32decode(s, casefold=False, map01=None): # False, or the character to map the digit 1 (one) to. It should be # either L (el) or I (eye). if map01: - if not isinstance(map01, bytes): + if not isinstance(map01, bytes_buffer): map01 = bytes(map01) assert len(map01) == 1, repr(map01) - s = _translate(s, {'0': b'O', '1': map01}) + s = _translate(s, {b'0': b'O', b'1': map01}) if casefold: - s = bytes(str(s, "ascii").upper(), "ascii") + s = s.upper() # Strip off pad characters from the right. We need to count the pad # characters because this will tell us how many null bytes to remove from # the end of the decoded string. @@ -266,7 +269,7 @@ def b16encode(s): s is the byte string to encode. The encoded byte string is returned. """ - return bytes(str(binascii.hexlify(s), "ascii").upper(), "ascii") + return binascii.hexlify(s).upper() def b16decode(s, casefold=False): @@ -280,10 +283,10 @@ def b16decode(s, casefold=False): s were incorrectly padded or if there are non-alphabet characters present in the string. """ - if not isinstance(s, bytes): + if not isinstance(s, bytes_buffer): s = bytes(s) if casefold: - s = bytes(str(s, "ascii").upper(), "ascii") + s = s.upper() if re.search('[^0-9A-F]', s): raise binascii.Error('Non-base16 digit found') return binascii.unhexlify(s) @@ -327,7 +330,7 @@ def encodestring(s): Argument and return value are bytes. """ - if not isinstance(s, bytes): + if not isinstance(s, bytes_buffer): raise TypeError("expected bytes, not %s" % s.__class__.__name__) pieces = [] for i in range(0, len(s), MAXBINSIZE): @@ -341,7 +344,7 @@ def decodestring(s): Argument and return value are bytes. """ - if not isinstance(s, bytes): + if not isinstance(s, bytes_buffer): raise TypeError("expected bytes, not %s" % s.__class__.__name__) return binascii.a2b_base64(s) diff --git a/Lib/copy.py b/Lib/copy.py index fa75daa..1a14f0e 100644 --- a/Lib/copy.py +++ b/Lib/copy.py @@ -187,7 +187,7 @@ try: d[complex] = _deepcopy_atomic except NameError: pass -d[str8] = _deepcopy_atomic +d[bytes] = _deepcopy_atomic d[str] = _deepcopy_atomic try: d[types.CodeType] = _deepcopy_atomic diff --git a/Lib/ctypes/test/test_array_in_pointer.py b/Lib/ctypes/test/test_array_in_pointer.py index 2b939f0..6bed1f1 100644 --- a/Lib/ctypes/test/test_array_in_pointer.py +++ b/Lib/ctypes/test/test_array_in_pointer.py @@ -6,7 +6,7 @@ import re def dump(obj): # helper function to dump memory contents in hex, with a hyphen # between the bytes. - h = str(hexlify(memoryview(obj))) + h = hexlify(memoryview(obj)).decode() return re.sub(r"(..)", r"\1-", h)[:-1] diff --git a/Lib/ctypes/test/test_byteswap.py b/Lib/ctypes/test/test_byteswap.py index dab9722..67fa44b 100644 --- a/Lib/ctypes/test/test_byteswap.py +++ b/Lib/ctypes/test/test_byteswap.py @@ -4,7 +4,7 @@ from binascii import hexlify from ctypes import * def bin(s): - return str(hexlify(memoryview(s))).upper() + return hexlify(memoryview(s)).decode().upper() # Each *simple* type that supports different byte orders has an # __ctype_be__ attribute that specifies the same type in BIG ENDIAN diff --git a/Lib/ctypes/test/test_slicing.py b/Lib/ctypes/test/test_slicing.py index 28e66da..4974302 100644 --- a/Lib/ctypes/test/test_slicing.py +++ b/Lib/ctypes/test/test_slicing.py @@ -115,7 +115,7 @@ class SlicesTestCase(unittest.TestCase): dll.my_strdup.errcheck = errcheck try: res = dll.my_strdup(s) - self.failUnlessEqual(res, str(s)) + self.failUnlessEqual(res, s.decode()) finally: del dll.my_strdup.errcheck diff --git a/Lib/dumbdbm.py b/Lib/dumbdbm.py index b47b7bd..78e4999 100644 --- a/Lib/dumbdbm.py +++ b/Lib/dumbdbm.py @@ -163,7 +163,7 @@ class _Database(UserDict.DictMixin): if not isinstance(key, bytes): raise TypeError("keys must be bytes") key = key.decode("latin-1") # hashable bytes - if not isinstance(val, (str8, bytes)): + if not isinstance(val, (buffer, bytes)): raise TypeError("values must be byte strings") if key not in self._index: self._addkey(key, self._addval(val)) diff --git a/Lib/email/base64mime.py b/Lib/email/base64mime.py index 369bf10..cff558e 100644 --- a/Lib/email/base64mime.py +++ b/Lib/email/base64mime.py @@ -70,7 +70,7 @@ def header_encode(header_bytes, charset='iso-8859-1'): # Return empty headers unchanged if not header_bytes: return str(header_bytes) - encoded = b64encode(header_bytes) + encoded = b64encode(header_bytes).decode("ascii") return '=?%s?b?%s?=' % (charset, encoded) @@ -93,7 +93,7 @@ def body_encode(s, maxlinelen=76, eol=NL): for i in range(0, len(s), max_unencoded): # BAW: should encode() inherit b2a_base64()'s dubious behavior in # adding a newline to the encoded string? - enc = str(b2a_base64(s[i:i + max_unencoded])) + enc = b2a_base64(s[i:i + max_unencoded]).decode("ascii") if enc.endswith(NL) and eol != NL: enc = enc[:-1] + eol encvec.append(enc) diff --git a/Lib/email/test/test_email.py b/Lib/email/test/test_email.py index 74a3c9d..c544004 100644 --- a/Lib/email/test/test_email.py +++ b/Lib/email/test/test_email.py @@ -2448,9 +2448,7 @@ Here's the message body def test_crlf_separation(self): eq = self.assertEqual - # XXX When Guido fixes TextIOWrapper.read() to act just like - # .readlines(), open this in 'rb' mode with newlines='\n'. - with openfile('msg_26.txt', mode='rb') as fp: + with openfile('msg_26.txt', newline='\n') as fp: msg = Parser().parse(fp) eq(len(msg.get_payload()), 2) part1 = msg.get_payload(0) diff --git a/Lib/encodings/__init__.py b/Lib/encodings/__init__.py index 87e5745..d72eae9 100644 --- a/Lib/encodings/__init__.py +++ b/Lib/encodings/__init__.py @@ -52,7 +52,7 @@ def normalize_encoding(encoding): non-ASCII characters, these must be Latin-1 compatible. """ - if isinstance(encoding, str8): + if isinstance(encoding, bytes): encoding = str(encoding, "ascii") chars = [] punct = False diff --git a/Lib/encodings/idna.py b/Lib/encodings/idna.py index b81e5fa..30f507a 100644 --- a/Lib/encodings/idna.py +++ b/Lib/encodings/idna.py @@ -151,9 +151,9 @@ class Codec(codecs.Codec): raise UnicodeError("unsupported error handling "+errors) if not input: - return b"", 0 + return b'', 0 - result = b"" + result = buffer() labels = dots.split(input) if labels and not labels[-1]: trailing_dot = b'.' @@ -165,7 +165,7 @@ class Codec(codecs.Codec): # Join with U+002E result.extend(b'.') result.extend(ToASCII(label)) - return result+trailing_dot, len(input) + return bytes(result+trailing_dot), len(input) def decode(self, input, errors='strict'): @@ -216,7 +216,7 @@ class IncrementalEncoder(codecs.BufferedIncrementalEncoder): if labels: trailing_dot = b'.' - result = b"" + result = buffer() size = 0 for label in labels: if size: @@ -228,7 +228,7 @@ class IncrementalEncoder(codecs.BufferedIncrementalEncoder): result += trailing_dot size += len(trailing_dot) - return (result, size) + return (bytes(result), size) class IncrementalDecoder(codecs.BufferedIncrementalDecoder): def _buffer_decode(self, input, errors, final): diff --git a/Lib/encodings/punycode.py b/Lib/encodings/punycode.py index f08c807..56e6958 100644 --- a/Lib/encodings/punycode.py +++ b/Lib/encodings/punycode.py @@ -10,7 +10,7 @@ import codecs def segregate(str): """3.1 Basic code point segregation""" - base = b"" + base = buffer() extended = set() for c in str: if ord(c) < 128: @@ -18,7 +18,7 @@ def segregate(str): else: extended.add(c) extended = sorted(extended) - return (base, extended) + return bytes(base), extended def selective_len(str, max): """Return the length of str, considering only characters below max.""" @@ -78,13 +78,13 @@ def T(j, bias): digits = b"abcdefghijklmnopqrstuvwxyz0123456789" def generate_generalized_integer(N, bias): """3.3 Generalized variable-length integers""" - result = b"" + result = buffer() j = 0 while 1: t = T(j, bias) if N < t: result.append(digits[N]) - return result + return bytes(result) result.append(digits[t + ((N - t) % (36 - t))]) N = (N - t) // (36 - t) j += 1 @@ -107,13 +107,13 @@ def adapt(delta, first, numchars): def generate_integers(baselen, deltas): """3.4 Bias adaptation""" # Punycode parameters: initial bias = 72, damp = 700, skew = 38 - result = b"" + result = buffer() bias = 72 for points, delta in enumerate(deltas): s = generate_generalized_integer(delta, bias) result.extend(s) bias = adapt(delta, points==0, baselen+points+1) - return result + return bytes(result) def punycode_encode(text): base, extended = segregate(text) diff --git a/Lib/gettext.py b/Lib/gettext.py index 8ff0a80..be24f1d 100644 --- a/Lib/gettext.py +++ b/Lib/gettext.py @@ -292,7 +292,7 @@ class GNUTranslations(NullTranslations): # Catalog description lastk = k = None for b_item in tmsg.split('\n'.encode("ascii")): - item = str(b_item).strip() + item = b_item.decode().strip() if not item: continue if ':' in item: diff --git a/Lib/httplib.py b/Lib/httplib.py index e891883..dc8bd6b 100644 --- a/Lib/httplib.py +++ b/Lib/httplib.py @@ -827,6 +827,7 @@ class HTTPConnection: if self.port == HTTP_PORT: self.putheader('Host', host_enc) else: + host_enc = host_enc.decode("ascii") self.putheader('Host', "%s:%s" % (host_enc, self.port)) # note: we are assuming that clients will not attempt to set these @@ -860,8 +861,12 @@ class HTTPConnection: if self.__state != _CS_REQ_STARTED: raise CannotSendHeader() - header = '%s: %s' % (header, value) - self._output(header.encode('ascii')) + if hasattr(header, 'encode'): + header = header.encode('ascii') + if hasattr(value, 'encode'): + value = value.encode('ascii') + header = header + b': ' + value + self._output(header) def endheaders(self): """Indicate that the last header line has been sent to the server.""" diff --git a/Lib/idlelib/OutputWindow.py b/Lib/idlelib/OutputWindow.py index ac1361b..42aa77e 100644 --- a/Lib/idlelib/OutputWindow.py +++ b/Lib/idlelib/OutputWindow.py @@ -35,7 +35,7 @@ class OutputWindow(EditorWindow): # Act as output file def write(self, s, tags=(), mark="insert"): - if isinstance(s, (bytes, str8)): + if isinstance(s, (bytes, bytes)): s = s.decode(IOBinding.encoding, "replace") self.text.insert(mark, s, tags) self.text.see(mark) @@ -391,7 +391,7 @@ class IOBase(metaclass=abc.ABCMeta): return 1 if limit is None: limit = -1 - res = bytes() + res = buffer() while limit < 0 or len(res) < limit: b = self.read(nreadahead()) if not b: @@ -399,7 +399,7 @@ class IOBase(metaclass=abc.ABCMeta): res += b if res.endswith(b"\n"): break - return res + return bytes(res) def __iter__(self): self._checkClosed() @@ -454,20 +454,20 @@ class RawIOBase(IOBase): n = -1 if n < 0: return self.readall() - b = bytes(n.__index__()) + b = buffer(n.__index__()) n = self.readinto(b) del b[n:] - return b + return bytes(b) def readall(self): """readall() -> bytes. Read until EOF, using multiple read() call.""" - res = bytes() + res = buffer() while True: data = self.read(DEFAULT_BUFFER_SIZE) if not data: break res += data - return res + return bytes(res) def readinto(self, b: bytes) -> int: """readinto(b: bytes) -> int. Read up to len(b) bytes into b. @@ -655,14 +655,14 @@ class BytesIO(BufferedIOBase): # XXX More docs def __init__(self, initial_bytes=None): - buffer = b"" + buf = buffer() if initial_bytes is not None: - buffer += initial_bytes - self._buffer = buffer + buf += initial_bytes + self._buffer = buf self._pos = 0 def getvalue(self): - return self._buffer + return bytes(self._buffer) def read(self, n=None): if n is None: @@ -672,7 +672,7 @@ class BytesIO(BufferedIOBase): newpos = min(len(self._buffer), self._pos + n) b = self._buffer[self._pos : newpos] self._pos = newpos - return b + return bytes(b) def read1(self, n): return self.read(n) @@ -819,7 +819,7 @@ class BufferedWriter(_BufferedIOMixin): self.max_buffer_size = (2*buffer_size if max_buffer_size is None else max_buffer_size) - self._write_buf = b"" + self._write_buf = buffer() def write(self, b): if self.closed: @@ -1186,7 +1186,7 @@ class TextIOWrapper(TextIOBase): try: decoder.setstate((b"", decoder_state)) n = 0 - bb = bytes(1) + bb = buffer(1) for i, bb[0] in enumerate(readahead): n += len(decoder.decode(bb)) if n >= needed: @@ -1266,7 +1266,9 @@ class TextIOWrapper(TextIOBase): return line def readline(self, limit=None): - if limit is not None: + if limit is None: + limit = -1 + if limit >= 0: # XXX Hack to support limit argument, for backwards compatibility line = self.readline() if len(line) <= limit: diff --git a/Lib/mailbox.py b/Lib/mailbox.py index a37bec9..13e3eb7 100755 --- a/Lib/mailbox.py +++ b/Lib/mailbox.py @@ -333,7 +333,7 @@ class Maildir(Mailbox): def get_file(self, key): """Return a file-like representation or raise a KeyError.""" - f = open(os.path.join(self._path, self._lookup(key)), 'rb') + f = open(os.path.join(self._path, self._lookup(key)), 'r') return _ProxyFile(f) def iterkeys(self): @@ -936,7 +936,7 @@ class MH(Mailbox): def get_file(self, key): """Return a file-like representation or raise a KeyError.""" try: - f = open(os.path.join(self._path, str(key)), 'rb') + f = open(os.path.join(self._path, str(key)), 'r') except IOError as e: if e.errno == errno.ENOENT: raise KeyError('No message with key: %s' % key) @@ -1762,11 +1762,11 @@ class _ProxyFile: def read(self, size=None): """Read bytes.""" - return str(self._read(size, self._file.read)) + return self._read(size, self._file.read) def readline(self, size=None): """Read a line.""" - return str(self._read(size, self._file.readline)) + return self._read(size, self._file.readline) def readlines(self, sizehint=None): """Read multiple lines.""" diff --git a/Lib/modulefinder.py b/Lib/modulefinder.py index cc5ad19..c345a33 100644 --- a/Lib/modulefinder.py +++ b/Lib/modulefinder.py @@ -17,12 +17,12 @@ else: READ_MODE = "r" # XXX Clean up once str8's cstor matches bytes. -LOAD_CONST = str8([dis.opname.index('LOAD_CONST')]) -IMPORT_NAME = str8([dis.opname.index('IMPORT_NAME')]) -STORE_NAME = str8([dis.opname.index('STORE_NAME')]) -STORE_GLOBAL = str8([dis.opname.index('STORE_GLOBAL')]) +LOAD_CONST = bytes([dis.opname.index('LOAD_CONST')]) +IMPORT_NAME = bytes([dis.opname.index('IMPORT_NAME')]) +STORE_NAME = bytes([dis.opname.index('STORE_NAME')]) +STORE_GLOBAL = bytes([dis.opname.index('STORE_GLOBAL')]) STORE_OPS = [STORE_NAME, STORE_GLOBAL] -HAVE_ARGUMENT = str8([dis.HAVE_ARGUMENT]) +HAVE_ARGUMENT = bytes([dis.HAVE_ARGUMENT]) # Modulefinder does a good job at simulating Python's, but it can not # handle __path__ modifications packages make at runtime. Therefore there @@ -368,7 +368,7 @@ class ModuleFinder: consts = co.co_consts LOAD_LOAD_AND_IMPORT = LOAD_CONST + LOAD_CONST + IMPORT_NAME while code: - c = str8([code[0]]) + c = bytes([code[0]]) if c in STORE_OPS: oparg, = unpack('<H', code[1:3]) yield "store", (names[oparg],) diff --git a/Lib/pickle.py b/Lib/pickle.py index 18ad210..d7bf24e 100644 --- a/Lib/pickle.py +++ b/Lib/pickle.py @@ -38,6 +38,9 @@ import codecs __all__ = ["PickleError", "PicklingError", "UnpicklingError", "Pickler", "Unpickler", "dump", "dumps", "load", "loads"] +# Shortcut for use in isinstance testing +bytes_types = (bytes, buffer, memoryview) + # These are purely informational; no code uses these. format_version = "2.0" # File format version we write compatible_formats = ["1.0", # Original protocol 0 @@ -499,10 +502,10 @@ class Pickler: else: self.write(BINSTRING + pack("<i", n) + bytes(obj)) else: - # Strip leading 's' due to repr() of str8() returning s'...' - self.write(STRING + repr(obj).lstrip("s").encode("ascii") + b'\n') + # Strip leading 'b' due to repr() of bytes() returning b'...' + self.write(STRING + repr(obj).lstrip("b").encode("ascii") + b'\n') self.memoize(obj) - dispatch[str8] = save_string + dispatch[bytes] = save_string def save_unicode(self, obj, pack=struct.pack): if self.bin: @@ -804,7 +807,7 @@ class Unpickler: key = read(1) if not key: raise EOFError - assert isinstance(key, bytes) + assert isinstance(key, bytes_types) dispatch[key[0]](self) except _Stop as stopinst: return stopinst.value @@ -906,7 +909,8 @@ class Unpickler: dispatch[BINFLOAT[0]] = load_binfloat def load_string(self): - rep = self.readline()[:-1] + orig = self.readline() + rep = orig[:-1] for q in (b'"', b"'"): # double or single quote if rep.startswith(q): if not rep.endswith(q): @@ -914,13 +918,13 @@ class Unpickler: rep = rep[len(q):-len(q)] break else: - raise ValueError("insecure string pickle") - self.append(str(codecs.escape_decode(rep)[0], "latin-1")) + raise ValueError("insecure string pickle: %r" % orig) + self.append(codecs.escape_decode(rep)[0]) dispatch[STRING[0]] = load_string def load_binstring(self): len = mloads(b'i' + self.read(4)) - self.append(str(self.read(len), "latin-1")) + self.append(self.read(len)) dispatch[BINSTRING[0]] = load_binstring def load_unicode(self): @@ -934,7 +938,7 @@ class Unpickler: def load_short_binstring(self): len = ord(self.read(1)) - self.append(str(self.read(len), "latin-1")) + self.append(bytes(self.read(len))) dispatch[SHORT_BINSTRING[0]] = load_short_binstring def load_tuple(self): @@ -1063,9 +1067,9 @@ class Unpickler: def find_class(self, module, name): # Subclasses may override this - if isinstance(module, bytes): + if isinstance(module, bytes_types): module = module.decode("utf-8") - if isinstance(name, bytes): + if isinstance(name, bytes_types): name = name.decode("utf-8") __import__(module) mod = sys.modules[module] @@ -1099,7 +1103,7 @@ class Unpickler: dispatch[DUP[0]] = load_dup def load_get(self): - self.append(self.memo[str(self.readline())[:-1]]) + self.append(self.memo[self.readline()[:-1].decode("ascii")]) dispatch[GET[0]] = load_get def load_binget(self): @@ -1113,7 +1117,7 @@ class Unpickler: dispatch[LONG_BINGET[0]] = load_long_binget def load_put(self): - self.memo[str(self.readline()[:-1])] = self.stack[-1] + self.memo[self.readline()[:-1].decode("ascii")] = self.stack[-1] dispatch[PUT[0]] = load_put def load_binput(self): @@ -1298,7 +1302,7 @@ def dumps(obj, protocol=None): f = io.BytesIO() Pickler(f, protocol).dump(obj) res = f.getvalue() - assert isinstance(res, bytes) + assert isinstance(res, bytes_types) return res def load(file): diff --git a/Lib/pickletools.py b/Lib/pickletools.py index b1337c4..af84c1f 100644 --- a/Lib/pickletools.py +++ b/Lib/pickletools.py @@ -11,11 +11,15 @@ dis(pickle, out=None, memo=None, indentlevel=4) ''' import codecs +import pickle +import re __all__ = ['dis', 'genops', ] +bytes_types = pickle.bytes_types + # Other ideas: # # - A pickle verifier: read a pickle and check it exhaustively for @@ -307,7 +311,7 @@ def read_stringnl(f, decode=True, stripquotes=True): raise ValueError("no string quotes around %r" % data) if decode: - data = str(codecs.escape_decode(data)[0]) + data = codecs.escape_decode(data)[0].decode("ascii") return data stringnl = ArgumentDescriptor( @@ -321,7 +325,7 @@ stringnl = ArgumentDescriptor( """) def read_stringnl_noescape(f): - return read_stringnl(f, decode=False, stripquotes=False) + return read_stringnl(f, stripquotes=False) stringnl_noescape = ArgumentDescriptor( name='stringnl_noescape', @@ -744,14 +748,14 @@ pyfloat = StackObject( doc="A Python float object.") pystring = StackObject( - name='str', - obtype=str, - doc="A Python string object.") + name='bytes', + obtype=bytes, + doc="A Python bytes object.") pyunicode = StackObject( - name='unicode', + name='str', obtype=str, - doc="A Python Unicode string object.") + doc="A Python string object.") pynone = StackObject( name="None", @@ -1735,7 +1739,6 @@ for d in opcodes: del d def assure_pickle_consistency(verbose=False): - import pickle, re copy = code2op.copy() for name in pickle.__all__: @@ -1803,7 +1806,7 @@ def genops(pickle): to query its current position) pos is None. """ - if isinstance(pickle, bytes): + if isinstance(pickle, bytes_types): import io pickle = io.BytesIO(pickle) @@ -1978,7 +1981,7 @@ class _Example: _dis_test = r""" >>> import pickle ->>> x = [1, 2, (3, 4), {str8(b'abc'): "def"}] +>>> x = [1, 2, (3, 4), {bytes(b'abc'): "def"}] >>> pkl = pickle.dumps(x, 0) >>> dis(pkl) 0: ( MARK diff --git a/Lib/plat-mac/aepack.py b/Lib/plat-mac/aepack.py index 3caf2f5..e958b85 100644 --- a/Lib/plat-mac/aepack.py +++ b/Lib/plat-mac/aepack.py @@ -98,7 +98,7 @@ def pack(x, forcetype = None): return AE.AECreateDesc(b'long', struct.pack('l', x)) if isinstance(x, float): return AE.AECreateDesc(b'doub', struct.pack('d', x)) - if isinstance(x, (bytes, str8)): + if isinstance(x, (bytes, buffer)): return AE.AECreateDesc(b'TEXT', x) if isinstance(x, str): # See http://developer.apple.com/documentation/Carbon/Reference/Apple_Event_Manager/Reference/reference.html#//apple_ref/doc/constant_group/typeUnicodeText diff --git a/Lib/plat-mac/aetypes.py b/Lib/plat-mac/aetypes.py index cf6e3b9..d29ea97 100644 --- a/Lib/plat-mac/aetypes.py +++ b/Lib/plat-mac/aetypes.py @@ -22,7 +22,18 @@ def _four_char_code(four_chars): four_chars must contain only ASCII characters. """ - return ("%-4.4s" % str(four_chars)).encode("latin-1") + if isinstance(four_chars, (bytes, buffer)): + b = bytes(four_chars[:4]) + n = len(b) + if n < 4: + b += b' ' * (4 - n) + return b + else: + s = str(four_chars)[:4] + n = len(s) + if n < 4: + s += ' ' * (4 - n) + return bytes(s, "latin-1") # MacRoman? class Unknown: """An uninterpreted AE object""" @@ -47,7 +58,7 @@ class Enum: return "Enum(%r)" % (self.enum,) def __str__(self): - return self.enum.strip(b' ') + return self.enum.decode("latin-1").strip(" ") def __aepack__(self): return pack(self.enum, typeEnumeration) @@ -559,7 +570,7 @@ class DelayedComponentItem: return "selector for element %s of %s"%(self.__class__.__name__, str(self.fr)) template = """ -class %s(ComponentItem): want = '%s' +class %s(ComponentItem): want = %r """ exec(template % ("Text", b'text')) diff --git a/Lib/plat-mac/plistlib.py b/Lib/plat-mac/plistlib.py index e0e01ff..72dfa2e 100644 --- a/Lib/plat-mac/plistlib.py +++ b/Lib/plat-mac/plistlib.py @@ -164,7 +164,7 @@ class DumbXMLWriter: def simpleElement(self, element, value=None): if value is not None: - value = _escapeAndEncode(value) + value = _escape(value) self.writeln("<%s>%s</%s>" % (element, value, element)) else: self.writeln("<%s/>" % element) @@ -207,7 +207,7 @@ _controlCharPat = re.compile( r"[\x00\x01\x02\x03\x04\x05\x06\x07\x08\x0b\x0c\x0e\x0f" r"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f]") -def _escapeAndEncode(text): +def _escape(text): m = _controlCharPat.search(text) if m is not None: raise ValueError("strings can't contains control characters; " @@ -217,7 +217,7 @@ def _escapeAndEncode(text): text = text.replace("&", "&") # escape '&' text = text.replace("<", "<") # escape '<' text = text.replace(">", ">") # escape '>' - return text.encode("utf-8") # encode as UTF-8 + return text PLISTHEADER = b"""\ diff --git a/Lib/sqlite3/dbapi2.py b/Lib/sqlite3/dbapi2.py index 52fb4ae..d051f04 100644 --- a/Lib/sqlite3/dbapi2.py +++ b/Lib/sqlite3/dbapi2.py @@ -60,13 +60,13 @@ def register_adapters_and_converters(): return val.isoformat(" ") def convert_date(val): - return datetime.date(*map(int, val.split("-"))) + return datetime.date(*map(int, val.split(b"-"))) def convert_timestamp(val): - datepart, timepart = val.split(" ") - year, month, day = map(int, datepart.split("-")) - timepart_full = timepart.split(".") - hours, minutes, seconds = map(int, timepart_full[0].split(":")) + datepart, timepart = val.split(b" ") + year, month, day = map(int, datepart.split(b"-")) + timepart_full = timepart.split(b".") + hours, minutes, seconds = map(int, timepart_full[0].split(b":")) if len(timepart_full) == 2: microseconds = int(timepart_full[1]) else: diff --git a/Lib/sqlite3/test/factory.py b/Lib/sqlite3/test/factory.py index f20848f..a9a828f 100644 --- a/Lib/sqlite3/test/factory.py +++ b/Lib/sqlite3/test/factory.py @@ -163,8 +163,8 @@ class TextFactoryTests(unittest.TestCase): germany = "Deutchland" a_row = self.con.execute("select ?", (austria,)).fetchone() d_row = self.con.execute("select ?", (germany,)).fetchone() - self.failUnless(type(a_row[0]) == str, "type of non-ASCII row must be unicode") - self.failUnless(type(d_row[0]) == str8, "type of ASCII-only row must be str8") + self.failUnless(type(a_row[0]) == str, "type of non-ASCII row must be str") + self.failUnless(type(d_row[0]) == str, "type of ASCII-only row must be str") def tearDown(self): self.con.close() diff --git a/Lib/sqlite3/test/types.py b/Lib/sqlite3/test/types.py index 4ff948d..8845e0c 100644 --- a/Lib/sqlite3/test/types.py +++ b/Lib/sqlite3/test/types.py @@ -62,11 +62,12 @@ class SqliteTypeTests(unittest.TestCase): self.failUnlessEqual(row[0], val) def CheckBlob(self): - val = memoryview(b"Guglhupf") + sample = b"Guglhupf" + val = memoryview(sample) self.cur.execute("insert into test(b) values (?)", (val,)) self.cur.execute("select b from test") row = self.cur.fetchone() - self.failUnlessEqual(row[0], val) + self.failUnlessEqual(row[0], sample) def CheckUnicodeExecute(self): self.cur.execute("select 'Österreich'") @@ -76,8 +77,8 @@ class SqliteTypeTests(unittest.TestCase): class DeclTypesTests(unittest.TestCase): class Foo: def __init__(self, _val): - if isinstance(_val, str8): - # sqlite3 always calls __init__ with a str8 created from a + if isinstance(_val, bytes): + # sqlite3 always calls __init__ with a bytes created from a # UTF-8 string when __conform__ was used to store the object. _val = _val.decode('utf8') self.val = _val @@ -207,11 +208,12 @@ class DeclTypesTests(unittest.TestCase): def CheckBlob(self): # default - val = memoryview(b"Guglhupf") + sample = b"Guglhupf" + val = memoryview(sample) self.cur.execute("insert into test(bin) values (?)", (val,)) self.cur.execute("select bin from test") row = self.cur.fetchone() - self.failUnlessEqual(row[0], val) + self.failUnlessEqual(row[0], sample) class ColNamesTests(unittest.TestCase): def setUp(self): @@ -219,13 +221,11 @@ class ColNamesTests(unittest.TestCase): self.cur = self.con.cursor() self.cur.execute("create table test(x foo)") - sqlite.converters["FOO"] = lambda x: "[%s]" % x - sqlite.converters["BAR"] = lambda x: "<%s>" % x + sqlite.converters["BAR"] = lambda x: b"<" + x + b">" sqlite.converters["EXC"] = lambda x: 5/0 sqlite.converters["B1B1"] = lambda x: "MARKER" def tearDown(self): - del sqlite.converters["FOO"] del sqlite.converters["BAR"] del sqlite.converters["EXC"] del sqlite.converters["B1B1"] @@ -252,14 +252,14 @@ class ColNamesTests(unittest.TestCase): self.cur.execute("insert into test(x) values (?)", ("xxx",)) self.cur.execute('select x as "x [bar]" from test') val = self.cur.fetchone()[0] - self.failUnlessEqual(val, "<xxx>") + self.failUnlessEqual(val, b"<xxx>") # Check if the stripping of colnames works. Everything after the first # whitespace should be stripped. self.failUnlessEqual(self.cur.description[0][0], "x") def CheckCaseInConverterName(self): - self.cur.execute("""select 'other' as "x [b1b1]\"""") + self.cur.execute("select 'other' as \"x [b1b1]\"") val = self.cur.fetchone()[0] self.failUnlessEqual(val, "MARKER") diff --git a/Lib/sqlite3/test/userfunctions.py b/Lib/sqlite3/test/userfunctions.py index 994057e..dc3a709 100644 --- a/Lib/sqlite3/test/userfunctions.py +++ b/Lib/sqlite3/test/userfunctions.py @@ -198,7 +198,7 @@ class FunctionTests(unittest.TestCase): cur.execute("select returnblob()") val = cur.fetchone()[0] self.failUnlessEqual(type(val), bytes) - self.failUnlessEqual(val, memoryview(b"blob")) + self.failUnlessEqual(val, b"blob") def CheckFuncException(self): cur = self.con.cursor() diff --git a/Lib/sre_parse.py b/Lib/sre_parse.py index bf3e23f..fa0b8aa 100644 --- a/Lib/sre_parse.py +++ b/Lib/sre_parse.py @@ -192,14 +192,14 @@ class Tokenizer: char = self.string[self.index:self.index+1] # Special case for the str8, since indexing returns a integer # XXX This is only needed for test_bug_926075 in test_re.py - if isinstance(self.string, str8): + if isinstance(self.string, bytes): char = chr(char) if char == "\\": try: c = self.string[self.index + 1] except IndexError: raise error("bogus escape (end of line)") - if isinstance(self.string, str8): + if isinstance(self.string, bytes): char = chr(c) char = char + c self.index = self.index + len(char) diff --git a/Lib/string.py b/Lib/string.py index 03179fb..6117ac0 100644 --- a/Lib/string.py +++ b/Lib/string.py @@ -41,7 +41,7 @@ def capwords(s, sep=None): # Construct a translation map for bytes.translate -def maketrans(frm, to): +def maketrans(frm: bytes, to: bytes) -> bytes: """maketrans(frm, to) -> bytes Return a translation table (a bytes object of length 256) @@ -53,10 +53,10 @@ def maketrans(frm, to): raise ValueError("maketrans arguments must have same length") if not (isinstance(frm, bytes) and isinstance(to, bytes)): raise TypeError("maketrans arguments must be bytes objects") - L = bytes(range(256)) + L = buffer(range(256)) for i, c in enumerate(frm): L[c] = to[i] - return L + return bytes(L) #################################################################### diff --git a/Lib/struct.py b/Lib/struct.py index 10085b7..45f6729 100644 --- a/Lib/struct.py +++ b/Lib/struct.py @@ -26,8 +26,6 @@ Whitespace between formats is ignored. The variable struct.error is an exception raised on errors. """ -# XXX Move the bytes and str8 casts into the _struct module - __version__ = '3.0' @@ -36,7 +34,9 @@ from _struct import Struct as _Struct, error class Struct(_Struct): def __init__(self, fmt): if isinstance(fmt, str): - fmt = str8(fmt, 'latin1') + fmt = bytes(fmt, 'ascii') + elif isinstance(fmt, buffer): + fmt = bytes(fmt) _Struct.__init__(self, fmt) _MAXCACHE = 100 diff --git a/Lib/subprocess.py b/Lib/subprocess.py index 6eb9385..d134c3a 100644 --- a/Lib/subprocess.py +++ b/Lib/subprocess.py @@ -552,10 +552,9 @@ class Popen(object): self.stderr = io.TextIOWrapper(self.stderr) - def _translate_newlines(self, data): - data = data.replace(b"\r\n", b"\n") - data = data.replace(b"\r", b"\n") - return str(data) + def _translate_newlines(self, data, encoding): + data = data.replace(b"\r\n", b"\n").replace(b"\r", b"\n") + return data.decode(encoding) def __del__(self, sys=sys): @@ -825,16 +824,6 @@ class Popen(object): if stderr is not None: stderr = stderr[0] - # Translate newlines, if requested. We cannot let the file - # object do the translation: It is based on stdio, which is - # impossible to combine with select (unless forcing no - # buffering). - if self.universal_newlines: - if stdout is not None: - stdout = self._translate_newlines(stdout) - if stderr is not None: - stderr = self._translate_newlines(stderr) - self.wait() return (stdout, stderr) @@ -960,7 +949,8 @@ class Popen(object): os.close(p2cread) if c2pwrite is not None and c2pwrite not in (p2cread, 1): os.close(c2pwrite) - if errwrite is not None and errwrite not in (p2cread, c2pwrite, 2): + if (errwrite is not None and + errwrite not in (p2cread, c2pwrite, 2)): os.close(errwrite) # Close all other fds, if asked for @@ -1046,8 +1036,7 @@ class Popen(object): if self.stdin: if isinstance(input, str): # Unicode input = input.encode("utf-8") # XXX What else? - if not isinstance(input, (bytes, str8)): - input = bytes(input) + input = bytes(input) read_set = [] write_set = [] stdout = None # Return @@ -1072,6 +1061,9 @@ class Popen(object): while read_set or write_set: rlist, wlist, xlist = select.select(read_set, write_set, []) + # XXX Rewrite these to use non-blocking I/O on the + # file objects; they are no longer using C stdio! + if self.stdin in wlist: # When select has indicated that the file is writable, # we can write up to PIPE_BUF bytes without risk @@ -1099,19 +1091,19 @@ class Popen(object): # All data exchanged. Translate lists into strings. if stdout is not None: - stdout = b''.join(stdout) + stdout = b"".join(stdout) if stderr is not None: - stderr = b''.join(stderr) + stderr = b"".join(stderr) - # Translate newlines, if requested. We cannot let the file - # object do the translation: It is based on stdio, which is - # impossible to combine with select (unless forcing no - # buffering). + # Translate newlines, if requested. + # This also turns bytes into strings. if self.universal_newlines: if stdout is not None: - stdout = self._translate_newlines(stdout) + stdout = self._translate_newlines(stdout, + self.stdout.encoding) if stderr is not None: - stderr = self._translate_newlines(stderr) + stderr = self._translate_newlines(stderr, + self.stderr.encoding) self.wait() return (stdout, stderr) diff --git a/Lib/tarfile.py b/Lib/tarfile.py index 4864d97..aef8f94 100644 --- a/Lib/tarfile.py +++ b/Lib/tarfile.py @@ -30,13 +30,12 @@ """ __version__ = "$Revision$" -# $Source$ version = "0.9.0" -__author__ = "Lars Gust\xe4bel (lars@gustaebel.de)" +__author__ = "Lars Gust\u00e4bel (lars@gustaebel.de)" __date__ = "$Date$" __cvsid__ = "$Id$" -__credits__ = "Gustavo Niemeyer, Niels Gust\xe4bel, Richard Townsend." +__credits__ = "Gustavo Niemeyer, Niels Gust\u00e4bel, Richard Townsend." #--------- # Imports @@ -223,7 +222,7 @@ def itn(n, digits=8, format=DEFAULT_FORMAT): # this could raise OverflowError. n = struct.unpack("L", struct.pack("l", n))[0] - s = b"" + s = buffer() for i in range(digits - 1): s.insert(0, n & 0o377) n >>= 8 diff --git a/Lib/tempfile.py b/Lib/tempfile.py index 85d58e6..d725a9d 100644 --- a/Lib/tempfile.py +++ b/Lib/tempfile.py @@ -497,7 +497,7 @@ class SpooledTemporaryFile: else: # Setting newline="\n" avoids newline translation; # this is important because otherwise on Windows we'd - # get double newline translation upon rollover(). + # hget double newline translation upon rollover(). self._file = _io.StringIO(encoding=encoding, newline="\n") self._max_size = max_size self._rolled = False diff --git a/Lib/test/buffer_tests.py b/Lib/test/buffer_tests.py index 01ac3c5..db27759 100644 --- a/Lib/test/buffer_tests.py +++ b/Lib/test/buffer_tests.py @@ -1,11 +1,11 @@ -# Tests that work for both str8 (bytes) and bytes (buffer) objects. +# Tests that work for both bytes and buffer objects. # See PEP 3137. import struct import sys class MixinBytesBufferCommonTests(object): - """Tests that work for both str8 (bytes) and bytes (buffer) objects. + """Tests that work for both bytes and buffer objects. See PEP 3137. """ diff --git a/Lib/test/exception_hierarchy.txt b/Lib/test/exception_hierarchy.txt index 3714a41..965252c 100644 --- a/Lib/test/exception_hierarchy.txt +++ b/Lib/test/exception_hierarchy.txt @@ -44,5 +44,6 @@ BaseException +-- SyntaxWarning +-- UserWarning +-- FutureWarning - +-- ImportWarning - +-- UnicodeWarning + +-- ImportWarning + +-- UnicodeWarning + +-- BytesWarning diff --git a/Lib/test/pickletester.py b/Lib/test/pickletester.py index 0b1cf5a..6db3572 100644 --- a/Lib/test/pickletester.py +++ b/Lib/test/pickletester.py @@ -5,6 +5,8 @@ import copy_reg from test.test_support import TestFailed, TESTFN, run_with_locale +from pickle import bytes_types + # Tests that try a number of pickle protocols should have a # for proto in protocols: # kind of outer loop. @@ -87,149 +89,137 @@ class use_metaclass(object, metaclass=metaclass): # DATA0 .. DATA2 are the pickles we expect under the various protocols, for # the object returned by create_data(). -# break into multiple strings to avoid confusing font-lock-mode -DATA0 = b"""(lp1 -I0 -aL1L -aF2 -ac__builtin__ -complex -p2 -""" + \ -b"""(F3 -F0 -tRp3 -aI1 -aI-1 -aI255 -aI-255 -aI-256 -aI65535 -aI-65535 -aI-65536 -aI2147483647 -aI-2147483647 -aI-2147483648 -a""" + \ -b"""(S'abc' -p4 -g4 -""" + \ -b"""(i__main__ -C -p5 -""" + \ -b"""(dp6 -S'foo' -p7 -I1 -sS'bar' -p8 -I2 -sbg5 -tp9 -ag9 -aI5 -a. -""" - -# Disassembly of DATA0. +DATA0 = ( + b'(lp0\nL0\naL1\naF2.0\nac' + b'__builtin__\ncomplex\n' + b'p1\n(F3.0\nF0.0\ntp2\nRp' + b'3\naL1\naL-1\naL255\naL-' + b'255\naL-256\naL65535\na' + b'L-65535\naL-65536\naL2' + b'147483647\naL-2147483' + b'647\naL-2147483648\na(' + b'Vabc\np4\ng4\nccopy_reg' + b'\n_reconstructor\np5\n(' + b'c__main__\nC\np6\nc__bu' + b'iltin__\nobject\np7\nNt' + b'p8\nRp9\n(dp10\nVfoo\np1' + b'1\nL1\nsVbar\np12\nL2\nsb' + b'g9\ntp13\nag13\naL5\na.' +) + +# Disassembly of DATA0 DATA0_DIS = """\ 0: ( MARK 1: l LIST (MARK at 0) - 2: p PUT 1 - 5: I INT 0 + 2: p PUT 0 + 5: L LONG 0 8: a APPEND - 9: L LONG 1L - 13: a APPEND - 14: F FLOAT 2.0 - 17: a APPEND - 18: c GLOBAL '__builtin__ complex' - 39: p PUT 2 - 42: ( MARK - 43: F FLOAT 3.0 - 46: F FLOAT 0.0 - 49: t TUPLE (MARK at 42) - 50: R REDUCE - 51: p PUT 3 - 54: a APPEND - 55: I INT 1 - 58: a APPEND - 59: I INT -1 - 63: a APPEND - 64: I INT 255 - 69: a APPEND - 70: I INT -255 - 76: a APPEND - 77: I INT -256 - 83: a APPEND - 84: I INT 65535 + 9: L LONG 1 + 12: a APPEND + 13: F FLOAT 2.0 + 18: a APPEND + 19: c GLOBAL '__builtin__ complex' + 40: p PUT 1 + 43: ( MARK + 44: F FLOAT 3.0 + 49: F FLOAT 0.0 + 54: t TUPLE (MARK at 43) + 55: p PUT 2 + 58: R REDUCE + 59: p PUT 3 + 62: a APPEND + 63: L LONG 1 + 66: a APPEND + 67: L LONG -1 + 71: a APPEND + 72: L LONG 255 + 77: a APPEND + 78: L LONG -255 + 84: a APPEND + 85: L LONG -256 91: a APPEND - 92: I INT -65535 - 100: a APPEND - 101: I INT -65536 - 109: a APPEND - 110: I INT 2147483647 - 122: a APPEND - 123: I INT -2147483647 - 136: a APPEND - 137: I INT -2147483648 - 150: a APPEND - 151: ( MARK - 152: S STRING 'abc' - 159: p PUT 4 - 162: g GET 4 - 165: ( MARK - 166: i INST '__main__ C' (MARK at 165) - 178: p PUT 5 - 181: ( MARK - 182: d DICT (MARK at 181) - 183: p PUT 6 - 186: S STRING 'foo' - 193: p PUT 7 - 196: I INT 1 - 199: s SETITEM - 200: S STRING 'bar' - 207: p PUT 8 - 210: I INT 2 - 213: s SETITEM - 214: b BUILD - 215: g GET 5 - 218: t TUPLE (MARK at 151) - 219: p PUT 9 - 222: a APPEND - 223: g GET 9 - 226: a APPEND - 227: I INT 5 - 230: a APPEND - 231: . STOP + 92: L LONG 65535 + 99: a APPEND + 100: L LONG -65535 + 108: a APPEND + 109: L LONG -65536 + 117: a APPEND + 118: L LONG 2147483647 + 130: a APPEND + 131: L LONG -2147483647 + 144: a APPEND + 145: L LONG -2147483648 + 158: a APPEND + 159: ( MARK + 160: V UNICODE 'abc' + 165: p PUT 4 + 168: g GET 4 + 171: c GLOBAL 'copy_reg _reconstructor' + 196: p PUT 5 + 199: ( MARK + 200: c GLOBAL '__main__ C' + 212: p PUT 6 + 215: c GLOBAL '__builtin__ object' + 235: p PUT 7 + 238: N NONE + 239: t TUPLE (MARK at 199) + 240: p PUT 8 + 243: R REDUCE + 244: p PUT 9 + 247: ( MARK + 248: d DICT (MARK at 247) + 249: p PUT 10 + 253: V UNICODE 'foo' + 258: p PUT 11 + 262: L LONG 1 + 265: s SETITEM + 266: V UNICODE 'bar' + 271: p PUT 12 + 275: L LONG 2 + 278: s SETITEM + 279: b BUILD + 280: g GET 9 + 283: t TUPLE (MARK at 159) + 284: p PUT 13 + 288: a APPEND + 289: g GET 13 + 293: a APPEND + 294: L LONG 5 + 297: a APPEND + 298: . STOP highest protocol among opcodes = 0 """ -DATA1 = (b']q\x01(K\x00L1L\nG@\x00\x00\x00\x00\x00\x00\x00' - b'c__builtin__\ncomplex\nq\x02(G@\x08\x00\x00\x00\x00\x00' - b'\x00G\x00\x00\x00\x00\x00\x00\x00\x00tRq\x03K\x01J\xff\xff' - b'\xff\xffK\xffJ\x01\xff\xff\xffJ\x00\xff\xff\xffM\xff\xff' - b'J\x01\x00\xff\xffJ\x00\x00\xff\xffJ\xff\xff\xff\x7fJ\x01\x00' - b'\x00\x80J\x00\x00\x00\x80(U\x03abcq\x04h\x04(c__main__\n' - b'C\nq\x05oq\x06}q\x07(U\x03fooq\x08K\x01U\x03barq\tK\x02ubh' - b'\x06tq\nh\nK\x05e.' - ) - -# Disassembly of DATA1. +DATA1 = ( + b']q\x00(K\x00K\x01G@\x00\x00\x00\x00\x00\x00\x00c__' + b'builtin__\ncomplex\nq\x01' + b'(G@\x08\x00\x00\x00\x00\x00\x00G\x00\x00\x00\x00\x00\x00\x00\x00t' + b'q\x02Rq\x03K\x01J\xff\xff\xff\xffK\xffJ\x01\xff\xff\xffJ' + b'\x00\xff\xff\xffM\xff\xffJ\x01\x00\xff\xffJ\x00\x00\xff\xffJ\xff\xff' + b'\xff\x7fJ\x01\x00\x00\x80J\x00\x00\x00\x80(X\x03\x00\x00\x00ab' + b'cq\x04h\x04ccopy_reg\n_reco' + b'nstructor\nq\x05(c__main' + b'__\nC\nq\x06c__builtin__\n' + b'object\nq\x07Ntq\x08Rq\t}q\n(' + b'X\x03\x00\x00\x00fooq\x0bK\x01X\x03\x00\x00\x00bar' + b'q\x0cK\x02ubh\ttq\rh\rK\x05e.' +) + +# Disassembly of DATA1 DATA1_DIS = """\ 0: ] EMPTY_LIST - 1: q BINPUT 1 + 1: q BINPUT 0 3: ( MARK 4: K BININT1 0 - 6: L LONG 1L - 10: G BINFLOAT 2.0 - 19: c GLOBAL '__builtin__ complex' - 40: q BINPUT 2 - 42: ( MARK - 43: G BINFLOAT 3.0 - 52: G BINFLOAT 0.0 - 61: t TUPLE (MARK at 42) + 6: K BININT1 1 + 8: G BINFLOAT 2.0 + 17: c GLOBAL '__builtin__ complex' + 38: q BINPUT 1 + 40: ( MARK + 41: G BINFLOAT 3.0 + 50: G BINFLOAT 0.0 + 59: t TUPLE (MARK at 40) + 60: q BINPUT 2 62: R REDUCE 63: q BINPUT 3 65: K BININT1 1 @@ -244,97 +234,110 @@ DATA1_DIS = """\ 102: J BININT -2147483647 107: J BININT -2147483648 112: ( MARK - 113: U SHORT_BINSTRING 'abc' - 118: q BINPUT 4 - 120: h BINGET 4 - 122: ( MARK - 123: c GLOBAL '__main__ C' - 135: q BINPUT 5 - 137: o OBJ (MARK at 122) - 138: q BINPUT 6 - 140: } EMPTY_DICT - 141: q BINPUT 7 - 143: ( MARK - 144: U SHORT_BINSTRING 'foo' - 149: q BINPUT 8 - 151: K BININT1 1 - 153: U SHORT_BINSTRING 'bar' - 158: q BINPUT 9 - 160: K BININT1 2 - 162: u SETITEMS (MARK at 143) - 163: b BUILD - 164: h BINGET 6 - 166: t TUPLE (MARK at 112) - 167: q BINPUT 10 - 169: h BINGET 10 - 171: K BININT1 5 - 173: e APPENDS (MARK at 3) - 174: . STOP + 113: X BINUNICODE 'abc' + 121: q BINPUT 4 + 123: h BINGET 4 + 125: c GLOBAL 'copy_reg _reconstructor' + 150: q BINPUT 5 + 152: ( MARK + 153: c GLOBAL '__main__ C' + 165: q BINPUT 6 + 167: c GLOBAL '__builtin__ object' + 187: q BINPUT 7 + 189: N NONE + 190: t TUPLE (MARK at 152) + 191: q BINPUT 8 + 193: R REDUCE + 194: q BINPUT 9 + 196: } EMPTY_DICT + 197: q BINPUT 10 + 199: ( MARK + 200: X BINUNICODE 'foo' + 208: q BINPUT 11 + 210: K BININT1 1 + 212: X BINUNICODE 'bar' + 220: q BINPUT 12 + 222: K BININT1 2 + 224: u SETITEMS (MARK at 199) + 225: b BUILD + 226: h BINGET 9 + 228: t TUPLE (MARK at 112) + 229: q BINPUT 13 + 231: h BINGET 13 + 233: K BININT1 5 + 235: e APPENDS (MARK at 3) + 236: . STOP highest protocol among opcodes = 1 """ -DATA2 = (b'\x80\x02]q\x01(K\x00\x8a\x01\x01G@\x00\x00\x00\x00\x00\x00\x00' - b'c__builtin__\ncomplex\nq\x02G@\x08\x00\x00\x00\x00\x00\x00G\x00' - b'\x00\x00\x00\x00\x00\x00\x00\x86Rq\x03K\x01J\xff\xff\xff\xffK' - b'\xffJ\x01\xff\xff\xffJ\x00\xff\xff\xffM\xff\xffJ\x01\x00\xff\xff' - b'J\x00\x00\xff\xffJ\xff\xff\xff\x7fJ\x01\x00\x00\x80J\x00\x00\x00' - b'\x80(U\x03abcq\x04h\x04(c__main__\nC\nq\x05oq\x06}q\x07(U\x03foo' - b'q\x08K\x01U\x03barq\tK\x02ubh\x06tq\nh\nK\x05e.') - -# Disassembly of DATA2. +DATA2 = ( + b'\x80\x02]q\x00(K\x00K\x01G@\x00\x00\x00\x00\x00\x00\x00c' + b'__builtin__\ncomplex\n' + b'q\x01G@\x08\x00\x00\x00\x00\x00\x00G\x00\x00\x00\x00\x00\x00\x00\x00' + b'\x86q\x02Rq\x03K\x01J\xff\xff\xff\xffK\xffJ\x01\xff\xff\xff' + b'J\x00\xff\xff\xffM\xff\xffJ\x01\x00\xff\xffJ\x00\x00\xff\xffJ\xff' + b'\xff\xff\x7fJ\x01\x00\x00\x80J\x00\x00\x00\x80(X\x03\x00\x00\x00a' + b'bcq\x04h\x04c__main__\nC\nq\x05' + b')\x81q\x06}q\x07(X\x03\x00\x00\x00fooq\x08K\x01' + b'X\x03\x00\x00\x00barq\tK\x02ubh\x06tq\nh' + b'\nK\x05e.' +) + +# Disassembly of DATA2 DATA2_DIS = """\ 0: \x80 PROTO 2 2: ] EMPTY_LIST - 3: q BINPUT 1 + 3: q BINPUT 0 5: ( MARK 6: K BININT1 0 - 8: \x8a LONG1 1L - 11: G BINFLOAT 2.0 - 20: c GLOBAL '__builtin__ complex' - 41: q BINPUT 2 - 43: G BINFLOAT 3.0 - 52: G BINFLOAT 0.0 - 61: \x86 TUPLE2 - 62: R REDUCE - 63: q BINPUT 3 - 65: K BININT1 1 - 67: J BININT -1 - 72: K BININT1 255 - 74: J BININT -255 - 79: J BININT -256 - 84: M BININT2 65535 - 87: J BININT -65535 - 92: J BININT -65536 - 97: J BININT 2147483647 - 102: J BININT -2147483647 - 107: J BININT -2147483648 - 112: ( MARK - 113: U SHORT_BINSTRING 'abc' - 118: q BINPUT 4 - 120: h BINGET 4 - 122: ( MARK - 123: c GLOBAL '__main__ C' - 135: q BINPUT 5 - 137: o OBJ (MARK at 122) - 138: q BINPUT 6 - 140: } EMPTY_DICT - 141: q BINPUT 7 - 143: ( MARK - 144: U SHORT_BINSTRING 'foo' - 149: q BINPUT 8 - 151: K BININT1 1 - 153: U SHORT_BINSTRING 'bar' - 158: q BINPUT 9 - 160: K BININT1 2 - 162: u SETITEMS (MARK at 143) - 163: b BUILD - 164: h BINGET 6 - 166: t TUPLE (MARK at 112) - 167: q BINPUT 10 - 169: h BINGET 10 - 171: K BININT1 5 - 173: e APPENDS (MARK at 5) - 174: . STOP + 8: K BININT1 1 + 10: G BINFLOAT 2.0 + 19: c GLOBAL '__builtin__ complex' + 40: q BINPUT 1 + 42: G BINFLOAT 3.0 + 51: G BINFLOAT 0.0 + 60: \x86 TUPLE2 + 61: q BINPUT 2 + 63: R REDUCE + 64: q BINPUT 3 + 66: K BININT1 1 + 68: J BININT -1 + 73: K BININT1 255 + 75: J BININT -255 + 80: J BININT -256 + 85: M BININT2 65535 + 88: J BININT -65535 + 93: J BININT -65536 + 98: J BININT 2147483647 + 103: J BININT -2147483647 + 108: J BININT -2147483648 + 113: ( MARK + 114: X BINUNICODE 'abc' + 122: q BINPUT 4 + 124: h BINGET 4 + 126: c GLOBAL '__main__ C' + 138: q BINPUT 5 + 140: ) EMPTY_TUPLE + 141: \x81 NEWOBJ + 142: q BINPUT 6 + 144: } EMPTY_DICT + 145: q BINPUT 7 + 147: ( MARK + 148: X BINUNICODE 'foo' + 156: q BINPUT 8 + 158: K BININT1 1 + 160: X BINUNICODE 'bar' + 168: q BINPUT 9 + 170: K BININT1 2 + 172: u SETITEMS (MARK at 147) + 173: b BUILD + 174: h BINGET 6 + 176: t TUPLE (MARK at 113) + 177: q BINPUT 10 + 179: h BINGET 10 + 181: K BININT1 5 + 183: e APPENDS (MARK at 5) + 184: . STOP highest protocol among opcodes = 2 """ @@ -393,11 +396,14 @@ class AbstractPickleTests(unittest.TestCase): got = self.loads(s) self.assertEqual(expected, got) - def test_load_from_canned_string(self): - expected = self._testdata - for canned in DATA0, DATA1, DATA2: - got = self.loads(canned) - self.assertEqual(expected, got) + def test_load_from_data0(self): + self.assertEqual(self._testdata, self.loads(DATA0)) + + def test_load_from_data1(self): + self.assertEqual(self._testdata, self.loads(DATA1)) + + def test_load_from_data2(self): + self.assertEqual(self._testdata, self.loads(DATA2)) # There are gratuitous differences between pickles produced by # pickle and cPickle, largely because cPickle starts PUT indices at @@ -762,7 +768,7 @@ class AbstractPickleTests(unittest.TestCase): x = dict.fromkeys(range(n)) for proto in protocols: s = self.dumps(x, proto) - assert isinstance(s, bytes) + assert isinstance(s, bytes_types) y = self.loads(s) self.assertEqual(x, y) num_setitems = count_opcode(pickle.SETITEMS, s) @@ -996,3 +1002,21 @@ class AbstractPersistentPicklerTests(unittest.TestCase): self.assertEqual(self.loads(self.dumps(L, 1)), L) self.assertEqual(self.id_count, 5) self.assertEqual(self.load_count, 5) + +if __name__ == "__main__": + # Print some stuff that can be used to rewrite DATA{0,1,2} + from pickletools import dis + x = create_data() + for i in range(3): + p = pickle.dumps(x, i) + print("DATA{0} = (".format(i)) + for j in range(0, len(p), 20): + b = bytes(p[j:j+20]) + print(" {0!r}".format(b)) + print(")") + print() + print("# Disassembly of DATA{0}".format(i)) + print("DATA{0}_DIS = \"\"\"\\".format(i)) + dis(p) + print("\"\"\"") + print() diff --git a/Lib/test/regrtest.py b/Lib/test/regrtest.py index d612022..cd92511 100755 --- a/Lib/test/regrtest.py +++ b/Lib/test/regrtest.py @@ -278,6 +278,9 @@ def main(tests=None, testdir=None, verbose=0, quiet=False, generate=False, huntrleaks[1] = int(huntrleaks[1]) if len(huntrleaks) == 2 or not huntrleaks[2]: huntrleaks[2:] = ["reflog.txt"] + # Avoid false positives due to the character cache in + # stringobject.c filling slowly with random data + warm_char_cache() elif o in ('-M', '--memlimit'): test_support.set_memlimit(a) elif o in ('-u', '--use'): @@ -357,9 +360,9 @@ def main(tests=None, testdir=None, verbose=0, quiet=False, generate=False, # Strip .py extensions. if args: - args = map(removepy, args) + args = list(map(removepy, args)) if tests: - tests = map(removepy, tests) + tests = list(map(removepy, tests)) stdtests = STDTESTS[:] nottests = NOTTESTS.copy() @@ -768,6 +771,11 @@ def dash_R_cleanup(fs, ps, pic, abcs): # Collect cyclic trash. gc.collect() +def warm_char_cache(): + s = bytes(range(256)) + for i in range(256): + s[i:i+1] + def reportdiff(expected, output): import difflib print("*" * 70) diff --git a/Lib/test/string_tests.py b/Lib/test/string_tests.py index 9da062e..a789515 100644 --- a/Lib/test/string_tests.py +++ b/Lib/test/string_tests.py @@ -558,10 +558,10 @@ class CommonTest(BaseTest): a = self.type2test('DNSSEC') b = self.type2test('') for c in a: - # Special case for the str8, since indexing returns a integer - # XXX Maybe it would be a good idea to seperate str8's tests... - if self.type2test == str8: - c = chr(c) +## # Special case for the str8, since indexing returns a integer +## # XXX Maybe it would be a good idea to seperate str8's tests... +## if self.type2test == str8: +## c = chr(c) b += c hash(b) self.assertEqual(hash(a), hash(b)) @@ -992,14 +992,14 @@ class MixinStrUnicodeUserStringTest: self.checkequal('abc', 'a', 'join', ('abc',)) self.checkequal('z', 'a', 'join', UserList(['z'])) self.checkequal('a.b.c', '.', 'join', ['a', 'b', 'c']) - self.checkequal('a.b.3', '.', 'join', ['a', 'b', 3]) + self.assertRaises(TypeError, '.'.join, ['a', 'b', 3]) for i in [5, 25, 125]: self.checkequal(((('a' * i) + '-') * i)[:-1], '-', 'join', ['a' * i] * i) self.checkequal(((('a' * i) + '-') * i)[:-1], '-', 'join', ('a' * i,) * i) - self.checkequal(str(BadSeq1()), ' ', 'join', BadSeq1()) + #self.checkequal(str(BadSeq1()), ' ', 'join', BadSeq1()) self.checkequal('a b c', ' ', 'join', BadSeq2()) self.checkraises(TypeError, ' ', 'join') @@ -1147,16 +1147,16 @@ class MixinStrUnicodeTest: s2 = "".join([s1]) self.assert_(s1 is s2) - elif t is str8: - s1 = subclass("abcd") - s2 = "".join([s1]) - self.assert_(s1 is not s2) - self.assert_(type(s2) is str) # promotes! +## elif t is str8: +## s1 = subclass("abcd") +## s2 = "".join([s1]) +## self.assert_(s1 is not s2) +## self.assert_(type(s2) is str) # promotes! - s1 = t("abcd") - s2 = "".join([s1]) - self.assert_(s1 is not s2) - self.assert_(type(s2) is str) # promotes! +## s1 = t("abcd") +## s2 = "".join([s1]) +## self.assert_(s1 is not s2) +## self.assert_(type(s2) is str) # promotes! else: self.fail("unexpected type for MixinStrUnicodeTest %r" % t) diff --git a/Lib/test/test_asynchat.py b/Lib/test/test_asynchat.py index 78f5868..83b0972 100644 --- a/Lib/test/test_asynchat.py +++ b/Lib/test/test_asynchat.py @@ -105,17 +105,17 @@ class TestAsynchat(unittest.TestCase): def test_line_terminator1(self): # test one-character terminator for l in (1,2,3): - self.line_terminator_check(b'\n', l) + self.line_terminator_check('\n', l) def test_line_terminator2(self): # test two-character terminator for l in (1,2,3): - self.line_terminator_check(b'\r\n', l) + self.line_terminator_check('\r\n', l) def test_line_terminator3(self): # test three-character terminator for l in (1,2,3): - self.line_terminator_check(b'qqq', l) + self.line_terminator_check('qqq', l) def numeric_terminator_check(self, termlen): # Try reading a fixed number of bytes diff --git a/Lib/test/test_asyncore.py b/Lib/test/test_asyncore.py index 33c2fb2..6dc73ad 100644 --- a/Lib/test/test_asyncore.py +++ b/Lib/test/test_asyncore.py @@ -70,7 +70,6 @@ def capture_server(evt, buf): r, w, e = select.select([conn], [], []) if r: data = conn.recv(10) - assert isinstance(data, bytes) # keep everything except for the newline terminator buf.write(data.replace(b'\n', b'')) if b'\n' in data: diff --git a/Lib/test/test_audioop.py b/Lib/test/test_audioop.py index 194d783..fada40c 100644 --- a/Lib/test/test_audioop.py +++ b/Lib/test/test_audioop.py @@ -87,7 +87,7 @@ def testadd(data): print('add') data2 = [] for d in data: - str = bytes(len(d)) + str = buffer(len(d)) for i,b in enumerate(d): str[i] = 2*b data2.append(str) @@ -177,7 +177,7 @@ def testmul(data): print('mul') data2 = [] for d in data: - str = bytes(len(d)) + str = buffer(len(d)) for i,b in enumerate(d): str[i] = 2*b data2.append(str) @@ -207,7 +207,7 @@ def testreverse(data): def testtomono(data): if verbose: print('tomono') - data2 = b'' + data2 = buffer() for d in data[0]: data2.append(d) data2.append(d) @@ -218,7 +218,7 @@ def testtomono(data): def testtostereo(data): if verbose: print('tostereo') - data2 = b'' + data2 = buffer() for d in data[0]: data2.append(d) data2.append(d) diff --git a/Lib/test/test_binascii.py b/Lib/test/test_binascii.py index 9229f38..fa13563 100755 --- a/Lib/test/test_binascii.py +++ b/Lib/test/test_binascii.py @@ -56,7 +56,7 @@ class BinASCIITest(unittest.TestCase): a = binascii.b2a_base64(b) lines.append(a) - fillers = bytes() + fillers = buffer() valid = b"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/" for i in range(256): if i not in valid: @@ -64,7 +64,7 @@ class BinASCIITest(unittest.TestCase): def addnoise(line): noise = fillers ratio = len(line) // len(noise) - res = bytes() + res = buffer() while line and noise: if len(line) // len(noise) > ratio: c, line = line[0], line[1:] @@ -72,7 +72,7 @@ class BinASCIITest(unittest.TestCase): c, noise = noise[0], noise[1:] res.append(c) return res + noise + line - res = bytes() + res = buffer() for line in map(addnoise, lines): b = binascii.a2b_base64(line) res += b diff --git a/Lib/test/test_builtin.py b/Lib/test/test_builtin.py index 9670be0..4f84328 100644 --- a/Lib/test/test_builtin.py +++ b/Lib/test/test_builtin.py @@ -580,8 +580,7 @@ class BuiltinTest(unittest.TestCase): self.assertEqual(hash(1), hash(1)) self.assertEqual(hash(1), hash(1.0)) hash('spam') - self.assertEqual(hash('spam'), hash(str8(b'spam'))) # remove str8() - # when b"" is immutable + self.assertEqual(hash('spam'), hash(b'spam')) hash((0,1,2,3)) def f(): pass self.assertRaises(TypeError, hash, []) diff --git a/Lib/test/test_bytes.py b/Lib/test/test_bytes.py index 932fa44..b3c13b3 100644 --- a/Lib/test/test_bytes.py +++ b/Lib/test/test_bytes.py @@ -1,4 +1,9 @@ -"""Unit tests for the bytes type.""" +"""Unit tests for the bytes and buffer types. + +XXX This is a mess. Common tests should be moved to buffer_tests.py, +which itself ought to be unified with string_tests.py (and the latter +should be modernized). +""" import os import re @@ -7,6 +12,7 @@ import copy import pickle import tempfile import unittest +import warnings import test.test_support import test.string_tests import test.buffer_tests @@ -14,13 +20,19 @@ import test.buffer_tests class BytesTest(unittest.TestCase): + def setUp(self): + self.warning_filters = warnings.filters[:] + + def tearDown(self): + warnings.filters = self.warning_filters + def test_basics(self): - b = bytes() - self.assertEqual(type(b), bytes) - self.assertEqual(b.__class__, bytes) + b = buffer() + self.assertEqual(type(b), buffer) + self.assertEqual(b.__class__, buffer) def test_empty_sequence(self): - b = bytes() + b = buffer() self.assertEqual(len(b), 0) self.assertRaises(IndexError, lambda: b[0]) self.assertRaises(IndexError, lambda: b[1]) @@ -36,7 +48,7 @@ class BytesTest(unittest.TestCase): def test_from_list(self): ints = list(range(256)) - b = bytes(i for i in ints) + b = buffer(i for i in ints) self.assertEqual(len(b), 256) self.assertEqual(list(b), ints) @@ -46,44 +58,57 @@ class BytesTest(unittest.TestCase): self.i = i def __index__(self): return self.i - b = bytes([C(), C(1), C(254), C(255)]) + b = buffer([C(), C(1), C(254), C(255)]) self.assertEqual(list(b), [0, 1, 254, 255]) - self.assertRaises(ValueError, bytes, [C(-1)]) - self.assertRaises(ValueError, bytes, [C(256)]) + self.assertRaises(ValueError, buffer, [C(-1)]) + self.assertRaises(ValueError, buffer, [C(256)]) + + def test_from_ssize(self): + self.assertEqual(buffer(0), b'') + self.assertEqual(buffer(1), b'\x00') + self.assertEqual(buffer(5), b'\x00\x00\x00\x00\x00') + self.assertRaises(ValueError, buffer, -1) + + self.assertEqual(buffer('0', 'ascii'), b'0') + self.assertEqual(buffer(b'0'), b'0') def test_constructor_type_errors(self): - self.assertRaises(TypeError, bytes, 0.0) + self.assertRaises(TypeError, buffer, 0.0) class C: pass - self.assertRaises(TypeError, bytes, ["0"]) - self.assertRaises(TypeError, bytes, [0.0]) - self.assertRaises(TypeError, bytes, [None]) - self.assertRaises(TypeError, bytes, [C()]) + self.assertRaises(TypeError, buffer, ["0"]) + self.assertRaises(TypeError, buffer, [0.0]) + self.assertRaises(TypeError, buffer, [None]) + self.assertRaises(TypeError, buffer, [C()]) def test_constructor_value_errors(self): - self.assertRaises(ValueError, bytes, [-1]) - self.assertRaises(ValueError, bytes, [-sys.maxint]) - self.assertRaises(ValueError, bytes, [-sys.maxint-1]) - self.assertRaises(ValueError, bytes, [-sys.maxint-2]) - self.assertRaises(ValueError, bytes, [-10**100]) - self.assertRaises(ValueError, bytes, [256]) - self.assertRaises(ValueError, bytes, [257]) - self.assertRaises(ValueError, bytes, [sys.maxint]) - self.assertRaises(ValueError, bytes, [sys.maxint+1]) - self.assertRaises(ValueError, bytes, [10**100]) - - def test_repr(self): - self.assertEqual(repr(bytes()), "b''") - self.assertEqual(repr(bytes([0])), "b'\\x00'") - self.assertEqual(repr(bytes([0, 1, 254, 255])), - "b'\\x00\\x01\\xfe\\xff'") - self.assertEqual(repr(b"abc"), "b'abc'") - self.assertEqual(repr(b"'"), "b'\\''") + self.assertRaises(ValueError, buffer, [-1]) + self.assertRaises(ValueError, buffer, [-sys.maxint]) + self.assertRaises(ValueError, buffer, [-sys.maxint-1]) + self.assertRaises(ValueError, buffer, [-sys.maxint-2]) + self.assertRaises(ValueError, buffer, [-10**100]) + self.assertRaises(ValueError, buffer, [256]) + self.assertRaises(ValueError, buffer, [257]) + self.assertRaises(ValueError, buffer, [sys.maxint]) + self.assertRaises(ValueError, buffer, [sys.maxint+1]) + self.assertRaises(ValueError, buffer, [10**100]) + + def test_repr_str(self): + warnings.simplefilter('ignore', BytesWarning) + for f in str, repr: + self.assertEqual(f(buffer()), "buffer(b'')") + self.assertEqual(f(buffer([0])), "buffer(b'\\x00')") + self.assertEqual(f(buffer([0, 1, 254, 255])), + "buffer(b'\\x00\\x01\\xfe\\xff')") + self.assertEqual(f(b"abc"), "b'abc'") + self.assertEqual(f(b"'"), '''b"'"''') + self.assertEqual(f(b"'\""), r"""b'\'"'""") + def test_compare(self): - b1 = bytes([1, 2, 3]) - b2 = bytes([1, 2, 3]) - b3 = bytes([1, 3]) + b1 = buffer([1, 2, 3]) + b2 = buffer([1, 2, 3]) + b3 = buffer([1, 3]) self.assertEqual(b1, b2) self.failUnless(b2 != b3) @@ -103,54 +128,58 @@ class BytesTest(unittest.TestCase): self.failIf(b3 < b2) self.failIf(b3 <= b2) - def test_compare_to_str(self): - self.assertEqual(b"abc" == str8(b"abc"), True) - self.assertEqual(b"ab" != str8(b"abc"), True) - self.assertEqual(b"ab" <= str8(b"abc"), True) - self.assertEqual(b"ab" < str8(b"abc"), True) - self.assertEqual(b"abc" >= str8(b"ab"), True) - self.assertEqual(b"abc" > str8(b"ab"), True) - - self.assertEqual(b"abc" != str8(b"abc"), False) - self.assertEqual(b"ab" == str8(b"abc"), False) - self.assertEqual(b"ab" > str8(b"abc"), False) - self.assertEqual(b"ab" >= str8(b"abc"), False) - self.assertEqual(b"abc" < str8(b"ab"), False) - self.assertEqual(b"abc" <= str8(b"ab"), False) - - self.assertEqual(str8(b"abc") == b"abc", True) - self.assertEqual(str8(b"ab") != b"abc", True) - self.assertEqual(str8(b"ab") <= b"abc", True) - self.assertEqual(str8(b"ab") < b"abc", True) - self.assertEqual(str8(b"abc") >= b"ab", True) - self.assertEqual(str8(b"abc") > b"ab", True) - - self.assertEqual(str8(b"abc") != b"abc", False) - self.assertEqual(str8(b"ab") == b"abc", False) - self.assertEqual(str8(b"ab") > b"abc", False) - self.assertEqual(str8(b"ab") >= b"abc", False) - self.assertEqual(str8(b"abc") < b"ab", False) - self.assertEqual(str8(b"abc") <= b"ab", False) + def test_compare_bytes_to_buffer(self): + self.assertEqual(b"abc" == bytes(b"abc"), True) + self.assertEqual(b"ab" != bytes(b"abc"), True) + self.assertEqual(b"ab" <= bytes(b"abc"), True) + self.assertEqual(b"ab" < bytes(b"abc"), True) + self.assertEqual(b"abc" >= bytes(b"ab"), True) + self.assertEqual(b"abc" > bytes(b"ab"), True) + + self.assertEqual(b"abc" != bytes(b"abc"), False) + self.assertEqual(b"ab" == bytes(b"abc"), False) + self.assertEqual(b"ab" > bytes(b"abc"), False) + self.assertEqual(b"ab" >= bytes(b"abc"), False) + self.assertEqual(b"abc" < bytes(b"ab"), False) + self.assertEqual(b"abc" <= bytes(b"ab"), False) + + self.assertEqual(bytes(b"abc") == b"abc", True) + self.assertEqual(bytes(b"ab") != b"abc", True) + self.assertEqual(bytes(b"ab") <= b"abc", True) + self.assertEqual(bytes(b"ab") < b"abc", True) + self.assertEqual(bytes(b"abc") >= b"ab", True) + self.assertEqual(bytes(b"abc") > b"ab", True) + + self.assertEqual(bytes(b"abc") != b"abc", False) + self.assertEqual(bytes(b"ab") == b"abc", False) + self.assertEqual(bytes(b"ab") > b"abc", False) + self.assertEqual(bytes(b"ab") >= b"abc", False) + self.assertEqual(bytes(b"abc") < b"ab", False) + self.assertEqual(bytes(b"abc") <= b"ab", False) + def test_compare_to_str(self): + warnings.simplefilter('ignore', BytesWarning) # Byte comparisons with unicode should always fail! # Test this for all expected byte orders and Unicode character sizes self.assertEqual(b"\0a\0b\0c" == "abc", False) self.assertEqual(b"\0\0\0a\0\0\0b\0\0\0c" == "abc", False) self.assertEqual(b"a\0b\0c\0" == "abc", False) self.assertEqual(b"a\0\0\0b\0\0\0c\0\0\0" == "abc", False) - self.assertEqual(bytes() == str(), False) - self.assertEqual(bytes() != str(), True) + self.assertEqual(buffer() == str(), False) + self.assertEqual(buffer() != str(), True) def test_nohash(self): - self.assertRaises(TypeError, hash, bytes()) + self.assertRaises(TypeError, hash, buffer()) def test_doc(self): + self.failUnless(buffer.__doc__ != None) + self.failUnless(buffer.__doc__.startswith("buffer("), buffer.__doc__) self.failUnless(bytes.__doc__ != None) - self.failUnless(bytes.__doc__.startswith("bytes(")) + self.failUnless(bytes.__doc__.startswith("bytes("), bytes.__doc__) def test_buffer_api(self): short_sample = b"Hello world\n" - sample = short_sample + b"x"*(20 - len(short_sample)) + sample = short_sample + b"\0"*(20 - len(short_sample)) tfn = tempfile.mktemp() try: # Prepare @@ -158,7 +187,7 @@ class BytesTest(unittest.TestCase): f.write(short_sample) # Test readinto with open(tfn, "rb") as f: - b = b"x"*20 + b = buffer(20) n = f.readinto(b) self.assertEqual(n, len(short_sample)) self.assertEqual(list(b), list(sample)) @@ -176,25 +205,25 @@ class BytesTest(unittest.TestCase): def test_reversed(self): input = list(map(ord, "Hello")) - b = bytes(input) + b = buffer(input) output = list(reversed(b)) input.reverse() self.assertEqual(output, input) def test_reverse(self): - b = b'hello' + b = buffer(b'hello') self.assertEqual(b.reverse(), None) self.assertEqual(b, b'olleh') - b = b'hello1' # test even number of items + b = buffer(b'hello1') # test even number of items b.reverse() self.assertEqual(b, b'1olleh') - b = bytes() + b = buffer() b.reverse() self.assertFalse(b) def test_getslice(self): def by(s): - return bytes(map(ord, s)) + return buffer(map(ord, s)) b = by("Hello, world") self.assertEqual(b[:5], by("Hello")) @@ -215,33 +244,33 @@ class BytesTest(unittest.TestCase): def test_extended_getslice(self): # Test extended slicing by comparing with list slicing. L = list(range(255)) - b = bytes(L) + b = buffer(L) indices = (0, None, 1, 3, 19, 100, -1, -2, -31, -100) for start in indices: for stop in indices: # Skip step 0 (invalid) for step in indices[1:]: - self.assertEqual(b[start:stop:step], bytes(L[start:stop:step])) + self.assertEqual(b[start:stop:step], buffer(L[start:stop:step])) def test_regexps(self): def by(s): - return bytes(map(ord, s)) + return buffer(map(ord, s)) b = by("Hello, world") self.assertEqual(re.findall(r"\w+", b), [by("Hello"), by("world")]) def test_setitem(self): - b = bytes([1, 2, 3]) + b = buffer([1, 2, 3]) b[1] = 100 - self.assertEqual(b, bytes([1, 100, 3])) + self.assertEqual(b, buffer([1, 100, 3])) b[-1] = 200 - self.assertEqual(b, bytes([1, 100, 200])) + self.assertEqual(b, buffer([1, 100, 200])) class C: def __init__(self, i=0): self.i = i def __index__(self): return self.i b[0] = C(10) - self.assertEqual(b, bytes([10, 100, 200])) + self.assertEqual(b, buffer([10, 100, 200])) try: b[3] = 0 self.fail("Didn't raise IndexError") @@ -269,35 +298,35 @@ class BytesTest(unittest.TestCase): pass def test_delitem(self): - b = bytes(range(10)) + b = buffer(range(10)) del b[0] - self.assertEqual(b, bytes(range(1, 10))) + self.assertEqual(b, buffer(range(1, 10))) del b[-1] - self.assertEqual(b, bytes(range(1, 9))) + self.assertEqual(b, buffer(range(1, 9))) del b[4] - self.assertEqual(b, bytes([1, 2, 3, 4, 6, 7, 8])) + self.assertEqual(b, buffer([1, 2, 3, 4, 6, 7, 8])) def test_setslice(self): - b = bytes(range(10)) + b = buffer(range(10)) self.assertEqual(list(b), list(range(10))) - b[0:5] = bytes([1, 1, 1, 1, 1]) - self.assertEqual(b, bytes([1, 1, 1, 1, 1, 5, 6, 7, 8, 9])) + b[0:5] = buffer([1, 1, 1, 1, 1]) + self.assertEqual(b, buffer([1, 1, 1, 1, 1, 5, 6, 7, 8, 9])) del b[0:-5] - self.assertEqual(b, bytes([5, 6, 7, 8, 9])) + self.assertEqual(b, buffer([5, 6, 7, 8, 9])) - b[0:0] = bytes([0, 1, 2, 3, 4]) - self.assertEqual(b, bytes(range(10))) + b[0:0] = buffer([0, 1, 2, 3, 4]) + self.assertEqual(b, buffer(range(10))) - b[-7:-3] = bytes([100, 101]) - self.assertEqual(b, bytes([0, 1, 2, 100, 101, 7, 8, 9])) + b[-7:-3] = buffer([100, 101]) + self.assertEqual(b, buffer([0, 1, 2, 100, 101, 7, 8, 9])) b[3:5] = [3, 4, 5, 6] - self.assertEqual(b, bytes(range(10))) + self.assertEqual(b, buffer(range(10))) b[3:0] = [42, 42, 42] - self.assertEqual(b, bytes([0, 1, 2, 42, 42, 42, 3, 4, 5, 6, 7, 8, 9])) + self.assertEqual(b, buffer([0, 1, 2, 42, 42, 42, 3, 4, 5, 6, 7, 8, 9])) def test_extended_set_del_slice(self): indices = (0, None, 1, 3, 19, 300, -1, -2, -31, -300) @@ -306,93 +335,96 @@ class BytesTest(unittest.TestCase): # Skip invalid step 0 for step in indices[1:]: L = list(range(255)) - b = bytes(L) + b = buffer(L) # Make sure we have a slice of exactly the right length, # but with different data. data = L[start:stop:step] data.reverse() L[start:stop:step] = data b[start:stop:step] = data - self.assertEquals(b, bytes(L)) + self.assertEquals(b, buffer(L)) del L[start:stop:step] del b[start:stop:step] - self.assertEquals(b, bytes(L)) + self.assertEquals(b, buffer(L)) def test_setslice_trap(self): # This test verifies that we correctly handle assigning self # to a slice of self (the old Lambert Meertens trap). - b = bytes(range(256)) + b = buffer(range(256)) b[8:] = b - self.assertEqual(b, bytes(list(range(8)) + list(range(256)))) + self.assertEqual(b, buffer(list(range(8)) + list(range(256)))) def test_encoding(self): sample = "Hello world\n\u1234\u5678\u9abc\udef0" for enc in ("utf8", "utf16"): - b = bytes(sample, enc) - self.assertEqual(b, bytes(sample.encode(enc))) - self.assertRaises(UnicodeEncodeError, bytes, sample, "latin1") - b = bytes(sample, "latin1", "ignore") - self.assertEqual(b, bytes(sample[:-4], "utf-8")) + b = buffer(sample, enc) + self.assertEqual(b, buffer(sample.encode(enc))) + self.assertRaises(UnicodeEncodeError, buffer, sample, "latin1") + b = buffer(sample, "latin1", "ignore") + self.assertEqual(b, buffer(sample[:-4], "utf-8")) def test_decode(self): sample = "Hello world\n\u1234\u5678\u9abc\def0\def0" for enc in ("utf8", "utf16"): - b = bytes(sample, enc) + b = buffer(sample, enc) self.assertEqual(b.decode(enc), sample) sample = "Hello world\n\x80\x81\xfe\xff" - b = bytes(sample, "latin1") + b = buffer(sample, "latin1") self.assertRaises(UnicodeDecodeError, b.decode, "utf8") self.assertEqual(b.decode("utf8", "ignore"), "Hello world\n") def test_from_buffer(self): - sample = str8(b"Hello world\n\x80\x81\xfe\xff") + sample = bytes(b"Hello world\n\x80\x81\xfe\xff") buf = memoryview(sample) - b = bytes(buf) - self.assertEqual(b, bytes(sample)) + b = buffer(buf) + self.assertEqual(b, buffer(sample)) def test_to_str(self): - sample = "Hello world\n\x80\x81\xfe\xff" - b = bytes(sample, "utf-8") - self.assertEqual(str(b), sample) + warnings.simplefilter('ignore', BytesWarning) + self.assertEqual(str(b''), "b''") + self.assertEqual(str(b'x'), "b'x'") + self.assertEqual(str(b'\x80'), "b'\\x80'") def test_from_int(self): - b = bytes(0) - self.assertEqual(b, bytes()) - b = bytes(10) - self.assertEqual(b, bytes([0]*10)) - b = bytes(10000) - self.assertEqual(b, bytes([0]*10000)) + b = buffer(0) + self.assertEqual(b, buffer()) + b = buffer(10) + self.assertEqual(b, buffer([0]*10)) + b = buffer(10000) + self.assertEqual(b, buffer([0]*10000)) def test_concat(self): b1 = b"abc" b2 = b"def" self.assertEqual(b1 + b2, b"abcdef") - self.assertEqual(b1 + str8(b"def"), b"abcdef") - self.assertEqual(str8(b"def") + b1, b"defabc") + self.assertEqual(b1 + bytes(b"def"), b"abcdef") + self.assertEqual(bytes(b"def") + b1, b"defabc") self.assertRaises(TypeError, lambda: b1 + "def") self.assertRaises(TypeError, lambda: "abc" + b2) def test_repeat(self): - b = b"abc" - self.assertEqual(b * 3, b"abcabcabc") - self.assertEqual(b * 0, bytes()) - self.assertEqual(b * -1, bytes()) - self.assertRaises(TypeError, lambda: b * 3.14) - self.assertRaises(TypeError, lambda: 3.14 * b) - self.assertRaises(MemoryError, lambda: b * sys.maxint) + for b in b"abc", buffer(b"abc"): + self.assertEqual(b * 3, b"abcabcabc") + self.assertEqual(b * 0, b"") + self.assertEqual(b * -1, b"") + self.assertRaises(TypeError, lambda: b * 3.14) + self.assertRaises(TypeError, lambda: 3.14 * b) + # XXX Shouldn't bytes and buffer agree on what to raise? + self.assertRaises((OverflowError, MemoryError), + lambda: b * sys.maxint) def test_repeat_1char(self): - self.assertEqual(b'x'*100, bytes([ord('x')]*100)) + self.assertEqual(b'x'*100, buffer([ord('x')]*100)) def test_iconcat(self): - b = b"abc" + b = buffer(b"abc") b1 = b b += b"def" self.assertEqual(b, b"abcdef") self.assertEqual(b, b1) self.failUnless(b is b1) - b += str8(b"xyz") + b += b"xyz" self.assertEqual(b, b"abcdefxyz") try: b += "" @@ -402,7 +434,7 @@ class BytesTest(unittest.TestCase): self.fail("bytes += unicode didn't raise TypeError") def test_irepeat(self): - b = b"abc" + b = buffer(b"abc") b1 = b b *= 3 self.assertEqual(b, b"abcabcabc") @@ -410,38 +442,39 @@ class BytesTest(unittest.TestCase): self.failUnless(b is b1) def test_irepeat_1char(self): - b = b"x" + b = buffer(b"x") b1 = b b *= 100 - self.assertEqual(b, bytes([ord("x")]*100)) + self.assertEqual(b, b"x"*100) self.assertEqual(b, b1) self.failUnless(b is b1) def test_contains(self): - b = b"abc" - self.failUnless(ord('a') in b) - self.failUnless(int(ord('a')) in b) - self.failIf(200 in b) - self.failIf(200 in b) - self.assertRaises(ValueError, lambda: 300 in b) - self.assertRaises(ValueError, lambda: -1 in b) - self.assertRaises(TypeError, lambda: None in b) - self.assertRaises(TypeError, lambda: float(ord('a')) in b) - self.assertRaises(TypeError, lambda: "a" in b) - self.failUnless(b"" in b) - self.failUnless(b"a" in b) - self.failUnless(b"b" in b) - self.failUnless(b"c" in b) - self.failUnless(b"ab" in b) - self.failUnless(b"bc" in b) - self.failUnless(b"abc" in b) - self.failIf(b"ac" in b) - self.failIf(b"d" in b) - self.failIf(b"dab" in b) - self.failIf(b"abd" in b) + for b in b"abc", buffer(b"abc"): + self.failUnless(ord('a') in b) + self.failUnless(int(ord('a')) in b) + self.failIf(200 in b) + self.failIf(200 in b) + self.assertRaises(ValueError, lambda: 300 in b) + self.assertRaises(ValueError, lambda: -1 in b) + self.assertRaises(TypeError, lambda: None in b) + self.assertRaises(TypeError, lambda: float(ord('a')) in b) + self.assertRaises(TypeError, lambda: "a" in b) + for f in bytes, buffer: + self.failUnless(f(b"") in b) + self.failUnless(f(b"a") in b) + self.failUnless(f(b"b") in b) + self.failUnless(f(b"c") in b) + self.failUnless(f(b"ab") in b) + self.failUnless(f(b"bc") in b) + self.failUnless(f(b"abc") in b) + self.failIf(f(b"ac") in b) + self.failIf(f(b"d") in b) + self.failIf(f(b"dab") in b) + self.failIf(f(b"abd") in b) def test_alloc(self): - b = bytes() + b = buffer() alloc = b.__alloc__() self.assert_(alloc >= 0) seq = [alloc] @@ -453,23 +486,23 @@ class BytesTest(unittest.TestCase): seq.append(alloc) def test_fromhex(self): - self.assertRaises(TypeError, bytes.fromhex) - self.assertRaises(TypeError, bytes.fromhex, 1) - self.assertEquals(bytes.fromhex(''), bytes()) - b = bytes([0x1a, 0x2b, 0x30]) - self.assertEquals(bytes.fromhex('1a2B30'), b) - self.assertEquals(bytes.fromhex(' 1A 2B 30 '), b) - self.assertEquals(bytes.fromhex('0000'), b'\0\0') - self.assertRaises(TypeError, bytes.fromhex, b'1B') - self.assertRaises(ValueError, bytes.fromhex, 'a') - self.assertRaises(ValueError, bytes.fromhex, 'rt') - self.assertRaises(ValueError, bytes.fromhex, '1a b cd') - self.assertRaises(ValueError, bytes.fromhex, '\x00') - self.assertRaises(ValueError, bytes.fromhex, '12 \x00 34') + self.assertRaises(TypeError, buffer.fromhex) + self.assertRaises(TypeError, buffer.fromhex, 1) + self.assertEquals(buffer.fromhex(''), buffer()) + b = buffer([0x1a, 0x2b, 0x30]) + self.assertEquals(buffer.fromhex('1a2B30'), b) + self.assertEquals(buffer.fromhex(' 1A 2B 30 '), b) + self.assertEquals(buffer.fromhex('0000'), b'\0\0') + self.assertRaises(TypeError, buffer.fromhex, b'1B') + self.assertRaises(ValueError, buffer.fromhex, 'a') + self.assertRaises(ValueError, buffer.fromhex, 'rt') + self.assertRaises(ValueError, buffer.fromhex, '1a b cd') + self.assertRaises(ValueError, buffer.fromhex, '\x00') + self.assertRaises(ValueError, buffer.fromhex, '12 \x00 34') def test_join(self): - self.assertEqual(b"".join([]), bytes()) - self.assertEqual(b"".join([bytes()]), bytes()) + self.assertEqual(b"".join([]), b"") + self.assertEqual(b"".join([b""]), b"") for lst in [[b"abc"], [b"a", b"bc"], [b"ab", b"c"], [b"a", b"b", b"c"]]: self.assertEqual(b"".join(lst), b"abc") self.assertEqual(b"".join(tuple(lst)), b"abc") @@ -485,20 +518,20 @@ class BytesTest(unittest.TestCase): (br"\xaa\x00\000\200", r"\xaa\x00\000\200"), ] for b, s in tests: - self.assertEqual(b, bytes(s, 'latin-1')) + self.assertEqual(b, buffer(s, 'latin-1')) for c in range(128, 256): self.assertRaises(SyntaxError, eval, 'b"%s"' % chr(c)) def test_extend(self): orig = b'hello' - a = bytes(orig) + a = buffer(orig) a.extend(a) self.assertEqual(a, orig + orig) self.assertEqual(a[5:], orig) def test_remove(self): - b = b'hello' + b = buffer(b'hello') b.remove(ord('l')) self.assertEqual(b, b'helo') b.remove(ord('l')) @@ -513,15 +546,15 @@ class BytesTest(unittest.TestCase): self.assertRaises(TypeError, lambda: b.remove(b'e')) def test_pop(self): - b = b'world' + b = buffer(b'world') self.assertEqual(b.pop(), ord('d')) self.assertEqual(b.pop(0), ord('w')) self.assertEqual(b.pop(-2), ord('r')) self.assertRaises(IndexError, lambda: b.pop(10)) - self.assertRaises(OverflowError, lambda: bytes().pop()) + self.assertRaises(OverflowError, lambda: buffer().pop()) def test_nosort(self): - self.assertRaises(AttributeError, lambda: bytes().sort()) + self.assertRaises(AttributeError, lambda: buffer().sort()) def test_index(self): b = b'parrot' @@ -537,17 +570,17 @@ class BytesTest(unittest.TestCase): self.assertEqual(b.count(b'w'), 0) def test_append(self): - b = b'hell' + b = buffer(b'hell') b.append(ord('o')) self.assertEqual(b, b'hello') self.assertEqual(b.append(100), None) - b = bytes() + b = buffer() b.append(ord('A')) self.assertEqual(len(b), 1) self.assertRaises(TypeError, lambda: b.append(b'o')) def test_insert(self): - b = b'msssspp' + b = buffer(b'msssspp') b.insert(1, ord('i')) b.insert(4, ord('i')) b.insert(-2, ord('i')) @@ -557,7 +590,7 @@ class BytesTest(unittest.TestCase): def test_startswith(self): b = b'hello' - self.assertFalse(bytes().startswith(b"anything")) + self.assertFalse(buffer().startswith(b"anything")) self.assertTrue(b.startswith(b"hello")) self.assertTrue(b.startswith(b"hel")) self.assertTrue(b.startswith(b"h")) @@ -566,7 +599,7 @@ class BytesTest(unittest.TestCase): def test_endswith(self): b = b'hello' - self.assertFalse(bytes().endswith(b"anything")) + self.assertFalse(buffer().endswith(b"anything")) self.assertTrue(b.endswith(b"hello")) self.assertTrue(b.endswith(b"llo")) self.assertTrue(b.endswith(b"o")) @@ -612,7 +645,7 @@ class BytesTest(unittest.TestCase): def test_translate(self): b = b'hello' - rosetta = bytes(range(0, 256)) + rosetta = buffer(range(0, 256)) rosetta[ord('o')] = ord('e') c = b.translate(rosetta, b'l') self.assertEqual(b, b'hello') @@ -658,10 +691,10 @@ class BytesTest(unittest.TestCase): self.assertEqual(b' a bb c '.rsplit(None,2), [b' a', b'bb', b'c']) self.assertEqual(b' a bb c '.rsplit(None, 3), [b'a', b'bb', b'c']) - def test_rplit_buffer(self): + def test_rsplit_buffer(self): self.assertEqual(b'a b'.rsplit(memoryview(b' ')), [b'a', b'b']) - def test_rplit_string_error(self): + def test_rsplit_string_error(self): self.assertRaises(TypeError, b'a b'.rsplit, ' ') def test_partition(self): @@ -727,6 +760,28 @@ class BytesTest(unittest.TestCase): self.assertEqual([ord(b[i:i+1]) for i in range(len(b))], [0, 65, 127, 128, 255]) + def test_partition_buffer_doesnt_share_nullstring(self): + a, b, c = buffer(b"x").partition(b"y") + self.assertEqual(b, b"") + self.assertEqual(c, b"") + self.assert_(b is not c) + b += b"!" + self.assertEqual(c, b"") + a, b, c = buffer(b"x").partition(b"y") + self.assertEqual(b, b"") + self.assertEqual(c, b"") + # Same for rpartition + b, c, a = buffer(b"x").rpartition(b"y") + self.assertEqual(b, b"") + self.assertEqual(c, b"") + self.assert_(b is not c) + b += b"!" + self.assertEqual(c, b"") + c, b, a = buffer(b"x").rpartition(b"y") + self.assertEqual(b, b"") + self.assertEqual(c, b"") + + # Optimizations: # __iter__? (optimization) # __reversed__? (optimization) @@ -745,7 +800,7 @@ class BytesTest(unittest.TestCase): class BufferPEP3137Test(unittest.TestCase, test.buffer_tests.MixinBytesBufferCommonTests): def marshal(self, x): - return bytes(x) + return buffer(x) # TODO this should become: #return buffer(x) # once the bytes -> buffer and str8 -> bytes rename happens @@ -763,7 +818,7 @@ class BufferPEP3137Test(unittest.TestCase, class BytesAsStringTest(test.string_tests.BaseTest): - type2test = bytes + type2test = buffer def fixtype(self, obj): if isinstance(obj, str): @@ -783,17 +838,17 @@ class BytesAsStringTest(test.string_tests.BaseTest): pass -class BytesSubclass(bytes): +class BufferSubclass(buffer): pass -class BytesSubclassTest(unittest.TestCase): +class BufferSubclassTest(unittest.TestCase): def test_basic(self): - self.assert_(issubclass(BytesSubclass, bytes)) - self.assert_(isinstance(BytesSubclass(), bytes)) + self.assert_(issubclass(BufferSubclass, buffer)) + self.assert_(isinstance(BufferSubclass(), buffer)) a, b = b"abcd", b"efgh" - _a, _b = BytesSubclass(a), BytesSubclass(b) + _a, _b = BufferSubclass(a), BufferSubclass(b) # test comparison operators with subclass instances self.assert_(_a == _a) @@ -816,19 +871,19 @@ class BytesSubclassTest(unittest.TestCase): # Make sure join returns a NEW object for single item sequences # involving a subclass. # Make sure that it is of the appropriate type. - s1 = BytesSubclass(b"abcd") - s2 = b"".join([s1]) + s1 = BufferSubclass(b"abcd") + s2 = buffer().join([s1]) self.assert_(s1 is not s2) - self.assert_(type(s2) is bytes) + self.assert_(type(s2) is buffer, type(s2)) # Test reverse, calling join on subclass s3 = s1.join([b"abcd"]) - self.assert_(type(s3) is bytes) + self.assert_(type(s3) is buffer) def test_pickle(self): - a = BytesSubclass(b"abcd") + a = BufferSubclass(b"abcd") a.x = 10 - a.y = BytesSubclass(b"efgh") + a.y = BufferSubclass(b"efgh") for proto in range(pickle.HIGHEST_PROTOCOL): b = pickle.loads(pickle.dumps(a, proto)) self.assertNotEqual(id(a), id(b)) @@ -839,9 +894,9 @@ class BytesSubclassTest(unittest.TestCase): self.assertEqual(type(a.y), type(b.y)) def test_copy(self): - a = BytesSubclass(b"abcd") + a = BufferSubclass(b"abcd") a.x = 10 - a.y = BytesSubclass(b"efgh") + a.y = BufferSubclass(b"efgh") for copy_method in (copy.copy, copy.deepcopy): b = copy_method(a) self.assertNotEqual(id(a), id(b)) @@ -852,9 +907,9 @@ class BytesSubclassTest(unittest.TestCase): self.assertEqual(type(a.y), type(b.y)) def test_init_override(self): - class subclass(bytes): + class subclass(buffer): def __init__(self, newarg=1, *args, **kwargs): - bytes.__init__(self, *args, **kwargs) + buffer.__init__(self, *args, **kwargs) x = subclass(4, source=b"abcd") self.assertEqual(x, b"abcd") x = subclass(newarg=4, source=b"abcd") @@ -864,7 +919,7 @@ class BytesSubclassTest(unittest.TestCase): def test_main(): test.test_support.run_unittest(BytesTest) test.test_support.run_unittest(BytesAsStringTest) - test.test_support.run_unittest(BytesSubclassTest) + test.test_support.run_unittest(BufferSubclassTest) test.test_support.run_unittest(BufferPEP3137Test) if __name__ == "__main__": diff --git a/Lib/test/test_bz2.py b/Lib/test/test_bz2.py index 2233f84..39bf19c 100644 --- a/Lib/test/test_bz2.py +++ b/Lib/test/test_bz2.py @@ -160,12 +160,12 @@ class BZ2FileTest(BaseTest): def testWriteMethodsOnReadOnlyFile(self): bz2f = BZ2File(self.filename, "w") - bz2f.write("abc") + bz2f.write(b"abc") bz2f.close() bz2f = BZ2File(self.filename, "r") - self.assertRaises(IOError, bz2f.write, "a") - self.assertRaises(IOError, bz2f.writelines, ["a"]) + self.assertRaises(IOError, bz2f.write, b"a") + self.assertRaises(IOError, bz2f.writelines, [b"a"]) def testSeekForward(self): # "Test BZ2File.seek(150, 0)" @@ -307,7 +307,7 @@ class BZ2DecompressorTest(BaseTest): # "Calling BZ2Decompressor.decompress() after EOS must raise EOFError" bz2d = BZ2Decompressor() text = bz2d.decompress(self.DATA) - self.assertRaises(EOFError, bz2d.decompress, "anything") + self.assertRaises(EOFError, bz2d.decompress, b"anything") class FuncTest(BaseTest): diff --git a/Lib/test/test_codeccallbacks.py b/Lib/test/test_codeccallbacks.py index 9cf43a5..218bfc5 100644 --- a/Lib/test/test_codeccallbacks.py +++ b/Lib/test/test_codeccallbacks.py @@ -33,13 +33,13 @@ class BadObjectUnicodeEncodeError(UnicodeEncodeError): # A UnicodeDecodeError object without an end attribute class NoEndUnicodeDecodeError(UnicodeDecodeError): def __init__(self): - UnicodeDecodeError.__init__(self, "ascii", b"", 0, 1, "bad") + UnicodeDecodeError.__init__(self, "ascii", buffer(b""), 0, 1, "bad") del self.end # A UnicodeDecodeError object with a bad object attribute class BadObjectUnicodeDecodeError(UnicodeDecodeError): def __init__(self): - UnicodeDecodeError.__init__(self, "ascii", b"", 0, 1, "bad") + UnicodeDecodeError.__init__(self, "ascii", buffer(b""), 0, 1, "bad") self.object = [] # A UnicodeTranslateError object without a start attribute @@ -181,7 +181,7 @@ class CodecCallbackTest(unittest.TestCase): # mapped through the encoding again. This means, that # to be able to use e.g. the "replace" handler, the # charmap has to have a mapping for "?". - charmap = dict((ord(c), str8(2*c.upper(), 'ascii')) for c in "abcdefgh") + charmap = dict((ord(c), bytes(2*c.upper(), 'ascii')) for c in "abcdefgh") sin = "abc" sout = b"AABBCC" self.assertEquals(codecs.charmap_encode(sin, "strict", charmap)[0], sout) @@ -189,7 +189,7 @@ class CodecCallbackTest(unittest.TestCase): sin = "abcA" self.assertRaises(UnicodeError, codecs.charmap_encode, sin, "strict", charmap) - charmap[ord("?")] = str8(b"XYZ") + charmap[ord("?")] = b"XYZ" sin = "abcDEF" sout = b"AABBCCXYZXYZXYZ" self.assertEquals(codecs.charmap_encode(sin, "replace", charmap)[0], sout) @@ -309,7 +309,7 @@ class CodecCallbackTest(unittest.TestCase): # check with one argument too much self.assertRaises(TypeError, exctype, *(args + ["too much"])) # check with one argument of the wrong type - wrongargs = [ "spam", str8(b"eggs"), b"spam", 42, 1.0, None ] + wrongargs = [ "spam", b"eggs", b"spam", 42, 1.0, None ] for i in range(len(args)): for wrongarg in wrongargs: if type(wrongarg) is type(args[i]): @@ -363,12 +363,12 @@ class CodecCallbackTest(unittest.TestCase): def test_unicodedecodeerror(self): self.check_exceptionobjectargs( UnicodeDecodeError, - ["ascii", b"g\xfcrk", 1, 2, "ouch"], + ["ascii", buffer(b"g\xfcrk"), 1, 2, "ouch"], "'ascii' codec can't decode byte 0xfc in position 1: ouch" ) self.check_exceptionobjectargs( UnicodeDecodeError, - ["ascii", b"g\xfcrk", 1, 3, "ouch"], + ["ascii", buffer(b"g\xfcrk"), 1, 3, "ouch"], "'ascii' codec can't decode bytes in position 1-2: ouch" ) @@ -442,7 +442,7 @@ class CodecCallbackTest(unittest.TestCase): ) self.assertEquals( codecs.ignore_errors( - UnicodeDecodeError("ascii", b"\xff", 0, 1, "ouch")), + UnicodeDecodeError("ascii", buffer(b"\xff"), 0, 1, "ouch")), ("", 1) ) self.assertEquals( @@ -482,7 +482,7 @@ class CodecCallbackTest(unittest.TestCase): ) self.assertEquals( codecs.replace_errors( - UnicodeDecodeError("ascii", b"\xff", 0, 1, "ouch")), + UnicodeDecodeError("ascii", buffer(b"\xff"), 0, 1, "ouch")), ("\ufffd", 1) ) self.assertEquals( @@ -508,7 +508,7 @@ class CodecCallbackTest(unittest.TestCase): self.assertRaises( TypeError, codecs.xmlcharrefreplace_errors, - UnicodeDecodeError("ascii", b"\xff", 0, 1, "ouch") + UnicodeDecodeError("ascii", buffer(b"\xff"), 0, 1, "ouch") ) self.assertRaises( TypeError, @@ -542,7 +542,7 @@ class CodecCallbackTest(unittest.TestCase): self.assertRaises( TypeError, codecs.backslashreplace_errors, - UnicodeDecodeError("ascii", b"\xff", 0, 1, "ouch") + UnicodeDecodeError("ascii", buffer(b"\xff"), 0, 1, "ouch") ) self.assertRaises( TypeError, diff --git a/Lib/test/test_codecs.py b/Lib/test/test_codecs.py index 22db2ca..5833c6d 100644 --- a/Lib/test/test_codecs.py +++ b/Lib/test/test_codecs.py @@ -802,7 +802,7 @@ class UnicodeInternalTest(unittest.TestCase): if sys.maxunicode > 0xffff: codecs.register_error("UnicodeInternalTest", codecs.ignore_errors) decoder = codecs.getdecoder("unicode_internal") - ab = "ab".encode("unicode_internal") + ab = "ab".encode("unicode_internal").decode() ignored = decoder(bytes("%s\x22\x22\x22\x22%s" % (ab[:4], ab[4:]), "ascii"), "UnicodeInternalTest") @@ -1265,7 +1265,9 @@ class BasicUnicodeTest(unittest.TestCase, MixInCheckStateHandling): encodedresult = b"" for c in s: writer.write(c) - encodedresult += q.read() + chunk = q.read() + self.assert_(type(chunk) is bytes, type(chunk)) + encodedresult += chunk q = Queue(b"") reader = codecs.getreader(encoding)(q) decodedresult = "" diff --git a/Lib/test/test_collections.py b/Lib/test/test_collections.py index f92eb6f..86b47de 100644 --- a/Lib/test/test_collections.py +++ b/Lib/test/test_collections.py @@ -91,7 +91,7 @@ class TestOneTrickPonyABCs(unittest.TestCase): def test_Hashable(self): # Check some non-hashables - non_samples = [bytes(), list(), set(), dict()] + non_samples = [buffer(), list(), set(), dict()] for x in non_samples: self.failIf(isinstance(x, Hashable), repr(x)) self.failIf(issubclass(type(x), Hashable), repr(type(x))) @@ -100,7 +100,7 @@ class TestOneTrickPonyABCs(unittest.TestCase): int(), float(), complex(), str(), tuple(), frozenset(), - int, list, object, type, + int, list, object, type, bytes() ] for x in samples: self.failUnless(isinstance(x, Hashable), repr(x)) diff --git a/Lib/test/test_compile.py b/Lib/test/test_compile.py index 1d54953..e7ffc02 100644 --- a/Lib/test/test_compile.py +++ b/Lib/test/test_compile.py @@ -157,7 +157,7 @@ if 1: s256 = "".join(["\n"] * 256 + ["spam"]) co = compile(s256, 'fn', 'exec') self.assertEqual(co.co_firstlineno, 257) - self.assertEqual(co.co_lnotab, str8()) + self.assertEqual(co.co_lnotab, bytes()) def test_literals_with_leading_zeroes(self): for arg in ["077787", "0xj", "0x.", "0e", "090000000000000", diff --git a/Lib/test/test_datetime.py b/Lib/test/test_datetime.py index b1a1b38..6e5b990 100644 --- a/Lib/test/test_datetime.py +++ b/Lib/test/test_datetime.py @@ -1111,7 +1111,7 @@ class TestDate(HarmlessMixedComparison, unittest.TestCase): # This shouldn't blow up because of the month byte alone. If # the implementation changes to do more-careful checking, it may # blow up because other fields are insane. - self.theclass(bytes(base[:2] + chr(ord_byte) + base[3:], "ascii")) + self.theclass(buffer(base[:2] + chr(ord_byte) + base[3:], "ascii")) ############################################################################# # datetime tests diff --git a/Lib/test/test_descr.py b/Lib/test/test_descr.py index 67ae239..961369f 100644 --- a/Lib/test/test_descr.py +++ b/Lib/test/test_descr.py @@ -3145,7 +3145,7 @@ def str_of_str_subclass(): class octetstring(str): def __str__(self): - return str(binascii.b2a_hex(self)) + return binascii.b2a_hex(self).decode("ascii") def __repr__(self): return self + " repr" diff --git a/Lib/test/test_dumbdbm.py b/Lib/test/test_dumbdbm.py index 3553c19..9a2cb68 100644 --- a/Lib/test/test_dumbdbm.py +++ b/Lib/test/test_dumbdbm.py @@ -115,7 +115,7 @@ class DumbDBMTestCase(unittest.TestCase): # Mangle the file by changing the line separator to Windows or Unix data = io.open(_fname + '.dir', 'rb').read() - if os.linesep == b'\n': + if os.linesep == '\n': data = data.replace(b'\n', b'\r\n') else: data = data.replace(b'\r\n', b'\n') diff --git a/Lib/test/test_exceptions.py b/Lib/test/test_exceptions.py index d2a2191..c405ac9 100644 --- a/Lib/test/test_exceptions.py +++ b/Lib/test/test_exceptions.py @@ -253,6 +253,12 @@ class ExceptionTests(unittest.TestCase): 'ordinal not in range'), 'encoding' : 'ascii', 'object' : 'a', 'start' : 0, 'reason' : 'ordinal not in range'}), + (UnicodeDecodeError, ('ascii', buffer(b'\xff'), 0, 1, + 'ordinal not in range'), + {'args' : ('ascii', buffer(b'\xff'), 0, 1, + 'ordinal not in range'), + 'encoding' : 'ascii', 'object' : b'\xff', + 'start' : 0, 'reason' : 'ordinal not in range'}), (UnicodeDecodeError, ('ascii', b'\xff', 0, 1, 'ordinal not in range'), {'args' : ('ascii', b'\xff', 0, 1, @@ -278,7 +284,7 @@ class ExceptionTests(unittest.TestCase): try: e = exc(*args) except: - print("\nexc=%r, args=%r" % (exc, args)) + print("\nexc=%r, args=%r" % (exc, args), file=sys.stderr) raise else: # Verify module name diff --git a/Lib/test/test_float.py b/Lib/test/test_float.py index 4360c54..ca5e537 100644 --- a/Lib/test/test_float.py +++ b/Lib/test/test_float.py @@ -40,14 +40,14 @@ class FormatFunctionsTestCase(unittest.TestCase): 'chicken', 'unknown') BE_DOUBLE_INF = b'\x7f\xf0\x00\x00\x00\x00\x00\x00' -LE_DOUBLE_INF = bytes(reversed(BE_DOUBLE_INF)) +LE_DOUBLE_INF = bytes(reversed(buffer(BE_DOUBLE_INF))) BE_DOUBLE_NAN = b'\x7f\xf8\x00\x00\x00\x00\x00\x00' -LE_DOUBLE_NAN = bytes(reversed(BE_DOUBLE_NAN)) +LE_DOUBLE_NAN = bytes(reversed(buffer(BE_DOUBLE_NAN))) BE_FLOAT_INF = b'\x7f\x80\x00\x00' -LE_FLOAT_INF = bytes(reversed(BE_FLOAT_INF)) +LE_FLOAT_INF = bytes(reversed(buffer(BE_FLOAT_INF))) BE_FLOAT_NAN = b'\x7f\xc0\x00\x00' -LE_FLOAT_NAN = bytes(reversed(BE_FLOAT_NAN)) +LE_FLOAT_NAN = bytes(reversed(buffer(BE_FLOAT_NAN))) # on non-IEEE platforms, attempting to unpack a bit pattern # representing an infinity or a NaN should raise an exception. diff --git a/Lib/test/test_httplib.py b/Lib/test/test_httplib.py index 2216a99..7e5a8a5 100644 --- a/Lib/test/test_httplib.py +++ b/Lib/test/test_httplib.py @@ -157,7 +157,8 @@ class BasicTest(TestCase): sock = FakeSocket(body) conn.sock = sock conn.request('GET', '/foo', body) - self.assertTrue(sock.data.startswith(expected)) + self.assertTrue(sock.data.startswith(expected), '%r != %r' % + (sock.data[:len(expected)], expected)) class OfflineTest(TestCase): def test_responses(self): diff --git a/Lib/test/test_io.py b/Lib/test/test_io.py index 9d4163e..6091e89 100644 --- a/Lib/test/test_io.py +++ b/Lib/test/test_io.py @@ -88,7 +88,7 @@ class IOTest(unittest.TestCase): self.assertEqual(f.tell(), 6) self.assertEqual(f.seek(-1, 1), 5) self.assertEqual(f.tell(), 5) - self.assertEqual(f.write(str8(b" world\n\n\n")), 9) + self.assertEqual(f.write(buffer(b" world\n\n\n")), 9) self.assertEqual(f.seek(0), 0) self.assertEqual(f.write(b"h"), 1) self.assertEqual(f.seek(-1, 2), 13) @@ -99,6 +99,7 @@ class IOTest(unittest.TestCase): def read_ops(self, f, buffered=False): data = f.read(5) self.assertEqual(data, b"hello") + data = buffer(data) self.assertEqual(f.readinto(data), 5) self.assertEqual(data, b" worl") self.assertEqual(f.readinto(data), 2) @@ -107,11 +108,11 @@ class IOTest(unittest.TestCase): self.assertEqual(f.seek(0), 0) self.assertEqual(f.read(20), b"hello world\n") self.assertEqual(f.read(1), b"") - self.assertEqual(f.readinto(b"x"), 0) + self.assertEqual(f.readinto(buffer(b"x")), 0) self.assertEqual(f.seek(-6, 2), 6) self.assertEqual(f.read(5), b"world") self.assertEqual(f.read(0), b"") - self.assertEqual(f.readinto(b""), 0) + self.assertEqual(f.readinto(buffer()), 0) self.assertEqual(f.seek(-6, 1), 5) self.assertEqual(f.read(5), b" worl") self.assertEqual(f.tell(), 10) @@ -687,7 +688,7 @@ class TextIOWrapperTest(unittest.TestCase): f.close() f = io.open(test_support.TESTFN, "r", encoding="utf-8") s = f.read(prefix_size) - self.assertEquals(s, str(prefix)) + self.assertEquals(s, str(prefix, "ascii")) self.assertEquals(f.tell(), prefix_size) self.assertEquals(f.readline(), u_suffix) diff --git a/Lib/test/test_mailbox.py b/Lib/test/test_mailbox.py index 4345be7..a714255 100644 --- a/Lib/test/test_mailbox.py +++ b/Lib/test/test_mailbox.py @@ -168,9 +168,11 @@ class TestMailbox(TestBase): # Get file representations of messages key0 = self._box.add(self._template % 0) key1 = self._box.add(_sample_message) - self.assertEqual(self._box.get_file(key0).read().replace(os.linesep, '\n'), + data0 = self._box.get_file(key0).read() + data1 = self._box.get_file(key1).read() + self.assertEqual(data0.replace(os.linesep, '\n'), self._template % 0) - self.assertEqual(self._box.get_file(key1).read().replace(os.linesep, '\n'), + self.assertEqual(data1.replace(os.linesep, '\n'), _sample_message) def test_iterkeys(self): @@ -1488,69 +1490,73 @@ class TestProxyFileBase(TestBase): def _test_read(self, proxy): # Read by byte proxy.seek(0) - self.assertEqual(proxy.read(), 'bar') + self.assertEqual(proxy.read(), b'bar') proxy.seek(1) - self.assertEqual(proxy.read(), 'ar') + self.assertEqual(proxy.read(), b'ar') proxy.seek(0) - self.assertEqual(proxy.read(2), 'ba') + self.assertEqual(proxy.read(2), b'ba') proxy.seek(1) - self.assertEqual(proxy.read(-1), 'ar') + self.assertEqual(proxy.read(-1), b'ar') proxy.seek(2) - self.assertEqual(proxy.read(1000), 'r') + self.assertEqual(proxy.read(1000), b'r') def _test_readline(self, proxy): # Read by line + linesep = os.linesep.encode() proxy.seek(0) - self.assertEqual(proxy.readline(), 'foo' + os.linesep) - self.assertEqual(proxy.readline(), 'bar' + os.linesep) - self.assertEqual(proxy.readline(), 'fred' + os.linesep) - self.assertEqual(proxy.readline(), 'bob') + self.assertEqual(proxy.readline(), b'foo' + linesep) + self.assertEqual(proxy.readline(), b'bar' + linesep) + self.assertEqual(proxy.readline(), b'fred' + linesep) + self.assertEqual(proxy.readline(), b'bob') proxy.seek(2) - self.assertEqual(proxy.readline(), 'o' + os.linesep) + self.assertEqual(proxy.readline(), b'o' + linesep) proxy.seek(6 + 2 * len(os.linesep)) - self.assertEqual(proxy.readline(), 'fred' + os.linesep) + self.assertEqual(proxy.readline(), b'fred' + linesep) proxy.seek(6 + 2 * len(os.linesep)) - self.assertEqual(proxy.readline(2), 'fr') - self.assertEqual(proxy.readline(-10), 'ed' + os.linesep) + self.assertEqual(proxy.readline(2), b'fr') + self.assertEqual(proxy.readline(-10), b'ed' + linesep) def _test_readlines(self, proxy): # Read multiple lines + linesep = os.linesep.encode() proxy.seek(0) - self.assertEqual(proxy.readlines(), ['foo' + os.linesep, - 'bar' + os.linesep, - 'fred' + os.linesep, 'bob']) + self.assertEqual(proxy.readlines(), [b'foo' + linesep, + b'bar' + linesep, + b'fred' + linesep, b'bob']) proxy.seek(0) - self.assertEqual(proxy.readlines(2), ['foo' + os.linesep]) - proxy.seek(3 + len(os.linesep)) - self.assertEqual(proxy.readlines(4 + len(os.linesep)), - ['bar' + os.linesep, 'fred' + os.linesep]) + self.assertEqual(proxy.readlines(2), [b'foo' + linesep]) + proxy.seek(3 + len(linesep)) + self.assertEqual(proxy.readlines(4 + len(linesep)), + [b'bar' + linesep, b'fred' + linesep]) proxy.seek(3) - self.assertEqual(proxy.readlines(1000), [os.linesep, 'bar' + os.linesep, - 'fred' + os.linesep, 'bob']) + self.assertEqual(proxy.readlines(1000), [linesep, b'bar' + linesep, + b'fred' + linesep, b'bob']) def _test_iteration(self, proxy): # Iterate by line + linesep = os.linesep.encode() proxy.seek(0) iterator = iter(proxy) - self.assertEqual(next(iterator), 'foo' + os.linesep) - self.assertEqual(next(iterator), 'bar' + os.linesep) - self.assertEqual(next(iterator), 'fred' + os.linesep) - self.assertEqual(next(iterator), 'bob') + self.assertEqual(next(iterator), b'foo' + linesep) + self.assertEqual(next(iterator), b'bar' + linesep) + self.assertEqual(next(iterator), b'fred' + linesep) + self.assertEqual(next(iterator), b'bob') self.assertRaises(StopIteration, next, iterator) def _test_seek_and_tell(self, proxy): # Seek and use tell to check position + linesep = os.linesep.encode() proxy.seek(3) self.assertEqual(proxy.tell(), 3) - self.assertEqual(proxy.read(len(os.linesep)), os.linesep) + self.assertEqual(proxy.read(len(linesep)), linesep) proxy.seek(2, 1) - self.assertEqual(proxy.read(1 + len(os.linesep)), 'r' + os.linesep) - proxy.seek(-3 - len(os.linesep), 2) - self.assertEqual(proxy.read(3), 'bar') + self.assertEqual(proxy.read(1 + len(linesep)), b'r' + linesep) + proxy.seek(-3 - len(linesep), 2) + self.assertEqual(proxy.read(3), b'bar') proxy.seek(2, 0) - self.assertEqual(proxy.read(), 'o' + os.linesep + 'bar' + os.linesep) + self.assertEqual(proxy.read(), b'o' + linesep + b'bar' + linesep) proxy.seek(100) - self.assertEqual(proxy.read(), '') + self.failIf(proxy.read()) def _test_close(self, proxy): # Close a file diff --git a/Lib/test/test_marshal.py b/Lib/test/test_marshal.py index 3e44886..1e3520f 100644 --- a/Lib/test/test_marshal.py +++ b/Lib/test/test_marshal.py @@ -39,7 +39,7 @@ class IntTestCase(unittest.TestCase, HelperMixin): # we're running the test on a 32-bit box, of course. def to_little_endian_string(value, nbytes): - b = bytes() + b = buffer() for i in range(nbytes): b.append(value & 0xff) value >>= 8 diff --git a/Lib/test/test_mmap.py b/Lib/test/test_mmap.py index 974fde2..3d30109 100644 --- a/Lib/test/test_mmap.py +++ b/Lib/test/test_mmap.py @@ -39,15 +39,15 @@ class MmapTests(unittest.TestCase): self.assertEqual(len(m), 2*PAGESIZE) - self.assertEqual(m[0], b'\0') + self.assertEqual(m[0], 0) self.assertEqual(m[0:3], b'\0\0\0') # Modify the file's content - m[0] = b'3' + m[0] = b'3'[0] m[PAGESIZE +3: PAGESIZE +3+3] = b'bar' # Check that the modification worked - self.assertEqual(m[0], b'3') + self.assertEqual(m[0], b'3'[0]) self.assertEqual(m[0:3], b'3\0\0') self.assertEqual(m[PAGESIZE-1 : PAGESIZE + 7], b'\0foobar\0') @@ -297,11 +297,11 @@ class MmapTests(unittest.TestCase): # anonymous mmap.mmap(-1, PAGE) m = mmap.mmap(-1, PAGESIZE) for x in range(PAGESIZE): - self.assertEqual(m[x], b'\0', "anonymously mmap'ed contents should be zero") + self.assertEqual(m[x], 0, + "anonymously mmap'ed contents should be zero") - b = bytes(1) for x in range(PAGESIZE): - b[0] = x & 255 + b = x & 0xff m[x] = b self.assertEqual(m[x], b) diff --git a/Lib/test/test_multibytecodec_support.py b/Lib/test/test_multibytecodec_support.py index e51be04..957b9fc 100644 --- a/Lib/test/test_multibytecodec_support.py +++ b/Lib/test/test_multibytecodec_support.py @@ -52,6 +52,10 @@ class TestBase: func = self.encode if expected: result = func(source, scheme)[0] + if func is self.decode: + self.assert_(type(result) is str, type(result)) + else: + self.assert_(type(result) is bytes, type(result)) self.assertEqual(result, expected) else: self.assertRaises(UnicodeError, func, source, scheme) diff --git a/Lib/test/test_pickle.py b/Lib/test/test_pickle.py index 11254f4..f54381f 100644 --- a/Lib/test/test_pickle.py +++ b/Lib/test/test_pickle.py @@ -10,6 +10,9 @@ from test.pickletester import AbstractPersistentPicklerTests class PickleTests(AbstractPickleTests, AbstractPickleModuleTests): + module = pickle + error = KeyError + def dumps(self, arg, proto=0, fast=0): # Ignore fast return pickle.dumps(arg, proto) @@ -18,9 +21,6 @@ class PickleTests(AbstractPickleTests, AbstractPickleModuleTests): # Ignore fast return pickle.loads(buf) - module = pickle - error = KeyError - class PicklerTests(AbstractPickleTests): error = KeyError diff --git a/Lib/test/test_posix.py b/Lib/test/test_posix.py index 3569453..efd5fb0 100644 --- a/Lib/test/test_posix.py +++ b/Lib/test/test_posix.py @@ -193,6 +193,11 @@ class PosixTester(unittest.TestCase): if hasattr(st, 'st_flags'): posix.lchflags(test_support.TESTFN, st.st_flags) + def test_environ(self): + for k, v in posix.environ.items(): + self.assertEqual(type(k), str) + self.assertEqual(type(v), str) + def test_main(): test_support.run_unittest(PosixTester) diff --git a/Lib/test/test_struct.py b/Lib/test/test_struct.py index d2b5643..db6a97d 100644 --- a/Lib/test/test_struct.py +++ b/Lib/test/test_struct.py @@ -96,12 +96,12 @@ simple_err(struct.pack, 'iii', 3) simple_err(struct.pack, 'i', 3, 3, 3) simple_err(struct.pack, 'i', 'foo') simple_err(struct.pack, 'P', 'foo') -simple_err(struct.unpack, 'd', 'flap') +simple_err(struct.unpack, 'd', b'flap') s = struct.pack('ii', 1, 2) simple_err(struct.unpack, 'iii', s) simple_err(struct.unpack, 'i', s) -c = str8(b'a') +c = b'a' b = 1 h = 255 i = 65535 @@ -184,9 +184,9 @@ for fmt, arg, big, lil, asy in tests: xfmt, n, len(res))) rev = struct.unpack(xfmt, res)[0] if isinstance(arg, str): - # Strings are returned as str8 since you can't know the encoding of + # Strings are returned as bytes since you can't know the encoding of # the string when packed. - arg = str8(arg, 'latin1') + arg = bytes(arg, 'latin1') if rev != arg and not asy: raise TestFailed("unpack(%r, %r) -> (%r,) # expected (%r,)" % ( fmt, res, rev, arg)) @@ -428,14 +428,14 @@ for args in [("bB", 1), def test_p_code(): for code, input, expected, expectedback in [ - ('p','abc', '\x00', str8()), - ('1p', 'abc', '\x00', str8()), - ('2p', 'abc', '\x01a', str8(b'a')), - ('3p', 'abc', '\x02ab', str8(b'ab')), - ('4p', 'abc', '\x03abc', str8(b'abc')), - ('5p', 'abc', '\x03abc\x00', str8(b'abc')), - ('6p', 'abc', '\x03abc\x00\x00', str8(b'abc')), - ('1000p', 'x'*1000, '\xff' + 'x'*999, str8(b'x'*255))]: + ('p','abc', '\x00', b''), + ('1p', 'abc', '\x00', b''), + ('2p', 'abc', '\x01a', b'a'), + ('3p', 'abc', '\x02ab', b'ab'), + ('4p', 'abc', '\x03abc', b'abc'), + ('5p', 'abc', '\x03abc\x00', b'abc'), + ('6p', 'abc', '\x03abc\x00\x00', b'abc'), + ('1000p', 'x'*1000, '\xff' + 'x'*999, b'x'*255)]: expected = bytes(expected, "latin-1") got = struct.pack(code, input) if got != expected: @@ -560,26 +560,26 @@ def test_unpack_from(): test_string = b'abcd01234' fmt = '4s' s = struct.Struct(fmt) - for cls in (str, str8, bytes): # XXX + memoryview + for cls in (buffer, bytes): if verbose: print("test_unpack_from using", cls.__name__) data = cls(test_string) - if not isinstance(data, (str8, bytes)): - bytes_data = str8(data, 'latin1') + if not isinstance(data, (buffer, bytes)): + bytes_data = bytes(data, 'latin1') else: bytes_data = data - vereq(s.unpack_from(data), (str8(b'abcd'),)) - vereq(s.unpack_from(data, 2), (str8(b'cd01'),)) - vereq(s.unpack_from(data, 4), (str8(b'0123'),)) + vereq(s.unpack_from(data), (b'abcd',)) + vereq(s.unpack_from(data, 2), (b'cd01',)) + vereq(s.unpack_from(data, 4), (b'0123',)) for i in range(6): vereq(s.unpack_from(data, i), (bytes_data[i:i+4],)) for i in range(6, len(test_string) + 1): simple_err(s.unpack_from, data, i) - for cls in (str, str8, bytes): # XXX + memoryview + for cls in (buffer, bytes): data = cls(test_string) - vereq(struct.unpack_from(fmt, data), (str8(b'abcd'),)) - vereq(struct.unpack_from(fmt, data, 2), (str8(b'cd01'),)) - vereq(struct.unpack_from(fmt, data, 4), (str8(b'0123'),)) + vereq(struct.unpack_from(fmt, data), (b'abcd',)) + vereq(struct.unpack_from(fmt, data, 2), (b'cd01',)) + vereq(struct.unpack_from(fmt, data, 4), (b'0123',)) for i in range(6): vereq(struct.unpack_from(fmt, data, i), (bytes_data[i:i+4],)) for i in range(6, len(test_string) + 1): diff --git a/Lib/test/test_subprocess.py b/Lib/test/test_subprocess.py index 39a889d..806791b 100644 --- a/Lib/test/test_subprocess.py +++ b/Lib/test/test_subprocess.py @@ -24,7 +24,8 @@ else: # shutdown time. That frustrates tests trying to check stderr produced # from a spawned Python process. def remove_stderr_debug_decorations(stderr): - return re.sub(r"\[\d+ refs\]\r?\n?$", "", str(stderr)) + return re.sub("\[\d+ refs\]\r?\n?$", "", stderr.decode()).encode() + #return re.sub(r"\[\d+ refs\]\r?\n?$", "", stderr) class ProcessTestCase(unittest.TestCase): def setUp(self): @@ -77,9 +78,9 @@ class ProcessTestCase(unittest.TestCase): newenv = os.environ.copy() newenv["FRUIT"] = "banana" rc = subprocess.call([sys.executable, "-c", - 'import sys, os;' \ - 'sys.exit(os.getenv("FRUIT")=="banana")'], - env=newenv) + 'import sys, os;' + 'sys.exit(os.getenv("FRUIT")=="banana")'], + env=newenv) self.assertEqual(rc, 1) def test_stdin_none(self): @@ -180,7 +181,7 @@ class ProcessTestCase(unittest.TestCase): 'import sys; sys.stderr.write("strawberry")'], stderr=subprocess.PIPE) self.assertEqual(remove_stderr_debug_decorations(p.stderr.read()), - "strawberry") + b"strawberry") def test_stderr_filedes(self): # stderr is set to open file descriptor @@ -192,7 +193,7 @@ class ProcessTestCase(unittest.TestCase): p.wait() os.lseek(d, 0, 0) self.assertEqual(remove_stderr_debug_decorations(os.read(d, 1024)), - "strawberry") + b"strawberry") def test_stderr_fileobj(self): # stderr is set to open file object @@ -203,36 +204,36 @@ class ProcessTestCase(unittest.TestCase): p.wait() tf.seek(0) self.assertEqual(remove_stderr_debug_decorations(tf.read()), - "strawberry") + b"strawberry") def test_stdout_stderr_pipe(self): # capture stdout and stderr to the same pipe p = subprocess.Popen([sys.executable, "-c", - 'import sys;' \ - 'sys.stdout.write("apple");' \ - 'sys.stdout.flush();' \ - 'sys.stderr.write("orange")'], - stdout=subprocess.PIPE, - stderr=subprocess.STDOUT) + 'import sys;' + 'sys.stdout.write("apple");' + 'sys.stdout.flush();' + 'sys.stderr.write("orange")'], + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT) output = p.stdout.read() stripped = remove_stderr_debug_decorations(output) - self.assertEqual(stripped, "appleorange") + self.assertEqual(stripped, b"appleorange") def test_stdout_stderr_file(self): # capture stdout and stderr to the same open file tf = tempfile.TemporaryFile() p = subprocess.Popen([sys.executable, "-c", - 'import sys;' \ - 'sys.stdout.write("apple");' \ - 'sys.stdout.flush();' \ - 'sys.stderr.write("orange")'], - stdout=tf, - stderr=tf) + 'import sys;' + 'sys.stdout.write("apple");' + 'sys.stdout.flush();' + 'sys.stderr.write("orange")'], + stdout=tf, + stderr=tf) p.wait() tf.seek(0) output = tf.read() stripped = remove_stderr_debug_decorations(output) - self.assertEqual(stripped, "appleorange") + self.assertEqual(stripped, b"appleorange") def test_stdout_filedes_of_stdout(self): # stdout is set to 1 (#1531862). @@ -249,10 +250,10 @@ class ProcessTestCase(unittest.TestCase): tmpdir = os.getcwd() os.chdir(cwd) p = subprocess.Popen([sys.executable, "-c", - 'import sys,os;' \ - 'sys.stdout.write(os.getcwd())'], - stdout=subprocess.PIPE, - cwd=tmpdir) + 'import sys,os;' + 'sys.stdout.write(os.getcwd())'], + stdout=subprocess.PIPE, + cwd=tmpdir) normcase = os.path.normcase self.assertEqual(normcase(p.stdout.read().decode("utf-8")), normcase(tmpdir)) @@ -261,15 +262,16 @@ class ProcessTestCase(unittest.TestCase): newenv = os.environ.copy() newenv["FRUIT"] = "orange" p = subprocess.Popen([sys.executable, "-c", - 'import sys,os;' \ - 'sys.stdout.write(os.getenv("FRUIT"))'], - stdout=subprocess.PIPE, - env=newenv) + 'import sys,os;' + 'sys.stdout.write(os.getenv("FRUIT"))'], + stdout=subprocess.PIPE, + env=newenv) self.assertEqual(p.stdout.read(), b"orange") def test_communicate_stdin(self): p = subprocess.Popen([sys.executable, "-c", - 'import sys; sys.exit(sys.stdin.read() == "pear")'], + 'import sys;' + 'sys.exit(sys.stdin.read() == "pear")'], stdin=subprocess.PIPE) p.communicate(b"pear") self.assertEqual(p.returncode, 1) @@ -294,16 +296,16 @@ class ProcessTestCase(unittest.TestCase): def test_communicate(self): p = subprocess.Popen([sys.executable, "-c", - 'import sys,os;' \ - 'sys.stderr.write("pineapple");' \ - 'sys.stdout.write(sys.stdin.read())'], - stdin=subprocess.PIPE, - stdout=subprocess.PIPE, - stderr=subprocess.PIPE) + 'import sys,os;' + 'sys.stderr.write("pineapple");' + 'sys.stdout.write(sys.stdin.read())'], + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE) (stdout, stderr) = p.communicate("banana") self.assertEqual(stdout, b"banana") self.assertEqual(remove_stderr_debug_decorations(stderr), - "pineapple") + b"pineapple") def test_communicate_returns(self): # communicate() should return None if no redirection is active @@ -325,13 +327,13 @@ class ProcessTestCase(unittest.TestCase): os.close(x) os.close(y) p = subprocess.Popen([sys.executable, "-c", - 'import sys,os;' - 'sys.stdout.write(sys.stdin.read(47));' \ - 'sys.stderr.write("xyz"*%d);' \ - 'sys.stdout.write(sys.stdin.read())' % pipe_buf], - stdin=subprocess.PIPE, - stdout=subprocess.PIPE, - stderr=subprocess.PIPE) + 'import sys,os;' + 'sys.stdout.write(sys.stdin.read(47));' + 'sys.stderr.write("xyz"*%d);' + 'sys.stdout.write(sys.stdin.read())' % pipe_buf], + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE) string_to_write = b"abc"*pipe_buf (stdout, stderr) = p.communicate(string_to_write) self.assertEqual(stdout, string_to_write) @@ -339,68 +341,69 @@ class ProcessTestCase(unittest.TestCase): def test_writes_before_communicate(self): # stdin.write before communicate() p = subprocess.Popen([sys.executable, "-c", - 'import sys,os;' \ - 'sys.stdout.write(sys.stdin.read())'], - stdin=subprocess.PIPE, - stdout=subprocess.PIPE, - stderr=subprocess.PIPE) + 'import sys,os;' + 'sys.stdout.write(sys.stdin.read())'], + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE) p.stdin.write(b"banana") (stdout, stderr) = p.communicate(b"split") self.assertEqual(stdout, b"bananasplit") - self.assertEqual(remove_stderr_debug_decorations(stderr), "") + self.assertEqual(remove_stderr_debug_decorations(stderr), b"") def test_universal_newlines(self): p = subprocess.Popen([sys.executable, "-c", - 'import sys,os;' + SETBINARY + - 'sys.stdout.write("line1\\n");' - 'sys.stdout.flush();' - 'sys.stdout.write("line2\\n");' - 'sys.stdout.flush();' - 'sys.stdout.write("line3\\r\\n");' - 'sys.stdout.flush();' - 'sys.stdout.write("line4\\r");' - 'sys.stdout.flush();' - 'sys.stdout.write("\\nline5");' - 'sys.stdout.flush();' - 'sys.stdout.write("\\nline6");'], - stdout=subprocess.PIPE, - universal_newlines=1) + 'import sys,os;' + SETBINARY + + 'sys.stdout.write("line1\\n");' + 'sys.stdout.flush();' + 'sys.stdout.write("line2\\n");' + 'sys.stdout.flush();' + 'sys.stdout.write("line3\\r\\n");' + 'sys.stdout.flush();' + 'sys.stdout.write("line4\\r");' + 'sys.stdout.flush();' + 'sys.stdout.write("\\nline5");' + 'sys.stdout.flush();' + 'sys.stdout.write("\\nline6");'], + stdout=subprocess.PIPE, + universal_newlines=1) stdout = p.stdout.read() self.assertEqual(stdout, "line1\nline2\nline3\nline4\nline5\nline6") def test_universal_newlines_communicate(self): # universal newlines through communicate() p = subprocess.Popen([sys.executable, "-c", - 'import sys,os;' + SETBINARY + - 'sys.stdout.write("line1\\n");' - 'sys.stdout.flush();' - 'sys.stdout.write("line2\\n");' - 'sys.stdout.flush();' - 'sys.stdout.write("line3\\r\\n");' - 'sys.stdout.flush();' - 'sys.stdout.write("line4\\r");' - 'sys.stdout.flush();' - 'sys.stdout.write("\\nline5");' - 'sys.stdout.flush();' - 'sys.stdout.write("\\nline6");'], - stdout=subprocess.PIPE, stderr=subprocess.PIPE, - universal_newlines=1) + 'import sys,os;' + SETBINARY + + 'sys.stdout.write("line1\\n");' + 'sys.stdout.flush();' + 'sys.stdout.write("line2\\n");' + 'sys.stdout.flush();' + 'sys.stdout.write("line3\\r\\n");' + 'sys.stdout.flush();' + 'sys.stdout.write("line4\\r");' + 'sys.stdout.flush();' + 'sys.stdout.write("\\nline5");' + 'sys.stdout.flush();' + 'sys.stdout.write("\\nline6");'], + stdout=subprocess.PIPE, stderr=subprocess.PIPE, + universal_newlines=1) (stdout, stderr) = p.communicate() self.assertEqual(stdout, "line1\nline2\nline3\nline4\nline5\nline6") def test_no_leaking(self): # Make sure we leak no resources - if not hasattr(test_support, "is_resource_enabled") \ - or test_support.is_resource_enabled("subprocess") and not mswindows: + if (not hasattr(test_support, "is_resource_enabled") or + test_support.is_resource_enabled("subprocess") and not mswindows): max_handles = 1026 # too much for most UNIX systems else: max_handles = 65 for i in range(max_handles): p = subprocess.Popen([sys.executable, "-c", - "import sys;sys.stdout.write(sys.stdin.read())"], - stdin=subprocess.PIPE, - stdout=subprocess.PIPE, - stderr=subprocess.PIPE) + "import sys;" + "sys.stdout.write(sys.stdin.read())"], + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE) data = p.communicate("lime")[0] self.assertEqual(data, b"lime") @@ -516,10 +519,11 @@ class ProcessTestCase(unittest.TestCase): def test_preexec(self): # preexec function p = subprocess.Popen([sys.executable, "-c", - 'import sys,os;' \ - 'sys.stdout.write(os.getenv("FRUIT"))'], - stdout=subprocess.PIPE, - preexec_fn=lambda: os.putenv("FRUIT", "apple")) + 'import sys,os;' + 'sys.stdout.write(os.getenv("FRUIT"))'], + stdout=subprocess.PIPE, + preexec_fn=lambda: os.putenv("FRUIT", + "apple")) self.assertEqual(p.stdout.read(), b"apple") def test_args_string(self): @@ -654,4 +658,4 @@ def test_main(): test_support.reap_children() if __name__ == "__main__": - test_main() + unittest.main() # XXX test_main() diff --git a/Lib/test/test_sys.py b/Lib/test/test_sys.py index 8741830..ad7082e 100644 --- a/Lib/test/test_sys.py +++ b/Lib/test/test_sys.py @@ -300,7 +300,7 @@ class SysModuleTest(unittest.TestCase): def test_intern(self): self.assertRaises(TypeError, sys.intern) - s = str8(b"never interned before") + s = "never interned before" self.assert_(sys.intern(s) is s) s2 = s.swapcase().swapcase() self.assert_(sys.intern(s2) is s) @@ -310,28 +310,11 @@ class SysModuleTest(unittest.TestCase): # We don't want them in the interned dict and if they aren't # actually interned, we don't want to create the appearance # that they are by allowing intern() to succeeed. - class S(str8): + class S(str): def __hash__(self): return 123 - self.assertRaises(TypeError, sys.intern, S(b"abc")) - - s = "never interned as unicode before" - self.assert_(sys.intern(s) is s) - s2 = s.swapcase().swapcase() - self.assert_(sys.intern(s2) is s) - - class U(str): - def __hash__(self): - return 123 - - self.assertRaises(TypeError, sys.intern, U("abc")) - - # It's still safe to pass these strings to routines that - # call intern internally, e.g. PyObject_SetAttr(). - s = U("abc") - setattr(s, s, s) - self.assertEqual(getattr(s, s), s) + self.assertRaises(TypeError, sys.intern, S("abc")) def test_main(): diff --git a/Lib/test/test_unicode.py b/Lib/test/test_unicode.py index 4970845..d53317f 100644 --- a/Lib/test/test_unicode.py +++ b/Lib/test/test_unicode.py @@ -6,7 +6,11 @@ Written by Marc-Andre Lemburg (mal@lemburg.com). (c) Copyright CNRI, All Rights Reserved. NO WARRANTY. """#" -import unittest, sys, struct, codecs, new +import codecs +import struct +import sys +import unittest +import warnings from test import test_support, string_tests # Error handling (bad decoder return) @@ -34,6 +38,12 @@ class UnicodeTest( ): type2test = str + def setUp(self): + self.warning_filters = warnings.filters[:] + + def tearDown(self): + warnings.filters = self.warning_filters + def checkequalnofix(self, result, object, methodname, *args): method = getattr(object, methodname) realresult = method(*args) @@ -192,8 +202,10 @@ class UnicodeTest( self.checkequalnofix('a b c d', ' ', 'join', ['a', 'b', 'c', 'd']) self.checkequalnofix('abcd', '', 'join', ('a', 'b', 'c', 'd')) self.checkequalnofix('w x y z', ' ', 'join', string_tests.Sequence('wxyz')) - self.checkequalnofix('1 2 foo', ' ', 'join', [1, 2, MyWrapper('foo')]) - self.checkraises(TypeError, ' ', 'join', [1, 2, 3, bytes()]) + self.checkraises(TypeError, ' ', 'join', ['1', '2', MyWrapper('foo')]) + self.checkraises(TypeError, ' ', 'join', ['1', '2', '3', bytes()]) + self.checkraises(TypeError, ' ', 'join', [1, 2, 3]) + self.checkraises(TypeError, ' ', 'join', ['1', '2', 3]) def test_replace(self): string_tests.CommonTest.test_replace(self) @@ -202,9 +214,12 @@ class UnicodeTest( self.checkequalnofix('one@two!three!', 'one!two!three!', 'replace', '!', '@', 1) self.assertRaises(TypeError, 'replace'.replace, "r", 42) - def test_str8_comparison(self): - self.assertEqual('abc' == str8(b'abc'), False) - self.assertEqual('abc' != str8(b'abc'), True) + def test_bytes_comparison(self): + warnings.simplefilter('ignore', BytesWarning) + self.assertEqual('abc' == b'abc', False) + self.assertEqual('abc' != b'abc', True) + self.assertEqual('abc' == buffer(b'abc'), False) + self.assertEqual('abc' != buffer(b'abc'), True) def test_comparison(self): # Comparisons: @@ -661,16 +676,6 @@ class UnicodeTest( 'strings are converted to unicode' ) - class UnicodeCompat: - def __init__(self, x): - self.x = x - def __unicode__(self): - return self.x - - self.assertEqual( - str(UnicodeCompat('__unicode__ compatible objects are recognized')), - '__unicode__ compatible objects are recognized') - class StringCompat: def __init__(self, x): self.x = x @@ -688,14 +693,6 @@ class UnicodeTest( self.assertEqual(str(o), 'unicode(obj) is compatible to str()') self.assertEqual(str(o), 'unicode(obj) is compatible to str()') - # %-formatting and .__unicode__() - self.assertEqual('%s' % - UnicodeCompat("u'%s' % obj uses obj.__unicode__()"), - "u'%s' % obj uses obj.__unicode__()") - self.assertEqual('%s' % - UnicodeCompat("u'%s' % obj falls back to obj.__str__()"), - "u'%s' % obj falls back to obj.__str__()") - for obj in (123, 123.45, 123): self.assertEqual(str(obj), str(str(obj))) @@ -970,48 +967,46 @@ class UnicodeTest( return "foo" class Foo1: - def __unicode__(self): + def __str__(self): return "foo" class Foo2(object): - def __unicode__(self): + def __str__(self): return "foo" class Foo3(object): - def __unicode__(self): + def __str__(self): return "foo" class Foo4(str): - def __unicode__(self): + def __str__(self): return "foo" class Foo5(str): - def __unicode__(self): + def __str__(self): return "foo" class Foo6(str): def __str__(self): return "foos" - def __unicode__(self): + def __str__(self): return "foou" class Foo7(str): def __str__(self): return "foos" - def __unicode__(self): + def __str__(self): return "foou" class Foo8(str): def __new__(cls, content=""): return str.__new__(cls, 2*content) - def __unicode__(self): + def __str__(self): return self class Foo9(str): def __str__(self): - return "string" - def __unicode__(self): return "not unicode" self.assertEqual(str(Foo0()), "foo") diff --git a/Lib/test/test_unicodedata.py b/Lib/test/test_unicodedata.py index ff2dcf5..ba97e5d 100644 --- a/Lib/test/test_unicodedata.py +++ b/Lib/test/test_unicodedata.py @@ -176,7 +176,8 @@ class UnicodeFunctionsTest(UnicodeDatabaseTest): def test_east_asian_width(self): eaw = self.db.east_asian_width - self.assertRaises(TypeError, eaw, str8(b'a')) + self.assertRaises(TypeError, eaw, b'a') + self.assertRaises(TypeError, eaw, buffer()) self.assertRaises(TypeError, eaw, '') self.assertRaises(TypeError, eaw, 'ra') self.assertEqual(eaw('\x1e'), 'N') diff --git a/Lib/test/test_urllib2.py b/Lib/test/test_urllib2.py index 5cbc652..393e997 100644 --- a/Lib/test/test_urllib2.py +++ b/Lib/test/test_urllib2.py @@ -999,7 +999,8 @@ class HandlerTests(unittest.TestCase): self.assertEqual(len(http_handler.requests), 2) self.assertFalse(http_handler.requests[0].has_header(auth_header)) userpass = bytes('%s:%s' % (user, password), "ascii") - auth_hdr_value = 'Basic ' + str(base64.encodestring(userpass)).strip() + auth_hdr_value = ('Basic ' + + base64.encodestring(userpass).strip().decode()) self.assertEqual(http_handler.requests[1].get_header(auth_header), auth_hdr_value) diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 4788d3a..1a6b7f3 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -184,7 +184,7 @@ def parseliteral(): >>> print(ET.tostring(element)) <html><body>text</body></html> >>> print(repr(ET.tostring(element, "ascii"))) - b'<?xml version=\'1.0\' encoding=\'ascii\'?>\n<html><body>text</body></html>' + b"<?xml version='1.0' encoding='ascii'?>\n<html><body>text</body></html>" >>> _, ids = ET.XMLID("<html><body>text</body></html>") >>> len(ids) 0 diff --git a/Lib/test/test_xml_etree_c.py b/Lib/test/test_xml_etree_c.py index 86f1853..c8eec40 100644 --- a/Lib/test/test_xml_etree_c.py +++ b/Lib/test/test_xml_etree_c.py @@ -176,7 +176,7 @@ def parseliteral(): >>> print(ET.tostring(element)) <html><body>text</body></html> >>> print(repr(ET.tostring(element, "ascii"))) - b'<?xml version=\'1.0\' encoding=\'ascii\'?>\n<html><body>text</body></html>' + b"<?xml version='1.0' encoding='ascii'?>\n<html><body>text</body></html>" >>> _, ids = ET.XMLID("<html><body>text</body></html>") >>> len(ids) 0 diff --git a/Lib/test/test_zipimport.py b/Lib/test/test_zipimport.py index 58935b7..cb20222 100644 --- a/Lib/test/test_zipimport.py +++ b/Lib/test/test_zipimport.py @@ -153,7 +153,7 @@ class UncompressedZipImportTestCase(ImportHooksBaseTestCase): def testBadMagic(self): # make pyc magic word invalid, forcing loading from .py - badmagic_pyc = bytes(test_pyc) + badmagic_pyc = buffer(test_pyc) badmagic_pyc[0] ^= 0x04 # flip an arbitrary bit files = {TESTMOD + ".py": (NOW, test_src), TESTMOD + pyc_ext: (NOW, badmagic_pyc)} @@ -161,7 +161,7 @@ class UncompressedZipImportTestCase(ImportHooksBaseTestCase): def testBadMagic2(self): # make pyc magic word invalid, causing an ImportError - badmagic_pyc = bytes(test_pyc) + badmagic_pyc = buffer(test_pyc) badmagic_pyc[0] ^= 0x04 # flip an arbitrary bit files = {TESTMOD + pyc_ext: (NOW, badmagic_pyc)} try: @@ -172,7 +172,7 @@ class UncompressedZipImportTestCase(ImportHooksBaseTestCase): self.fail("expected ImportError; import from bad pyc") def testBadMTime(self): - badtime_pyc = bytes(test_pyc) + badtime_pyc = buffer(test_pyc) badtime_pyc[7] ^= 0x02 # flip the second bit -- not the first as that one # isn't stored in the .py's mtime in the zip archive. files = {TESTMOD + ".py": (NOW, test_src), diff --git a/Lib/test/testcodec.py b/Lib/test/testcodec.py index 0db18c1..77f408b 100644 --- a/Lib/test/testcodec.py +++ b/Lib/test/testcodec.py @@ -36,7 +36,7 @@ def getregentry(): decoding_map = codecs.make_identity_dict(range(256)) decoding_map.update({ 0x78: "abc", # 1-n decoding mapping - str8(b"abc"): 0x0078,# 1-n encoding mapping + b"abc": 0x0078,# 1-n encoding mapping 0x01: None, # decoding mapping to <undefined> 0x79: "", # decoding mapping to <remove character> }) diff --git a/Lib/urllib.py b/Lib/urllib.py index b2542fc..81a8cd6 100644 --- a/Lib/urllib.py +++ b/Lib/urllib.py @@ -925,22 +925,14 @@ class addinfourl(addbase): # unquote('abc%20def') -> 'abc def' # quote('abc def') -> 'abc%20def') -try: - str -except NameError: - def _is_unicode(x): - return 0 -else: - def _is_unicode(x): - return isinstance(x, str) - def toBytes(url): """toBytes(u"URL") --> 'URL'.""" # Most URL schemes require ASCII. If that changes, the conversion - # can be relaxed - if _is_unicode(url): + # can be relaxed. + # XXX get rid of toBytes() + if isinstance(url, str): try: - url = url.encode("ASCII") + url = url.encode("ASCII").decode() except UnicodeError: raise UnicodeError("URL " + repr(url) + " contains non-ASCII characters") @@ -1203,7 +1195,7 @@ def urlencode(query,doseq=0): if isinstance(v, str): v = quote_plus(v) l.append(k + '=' + v) - elif _is_unicode(v): + elif isinstance(v, str): # is there a reasonable way to convert to ASCII? # encode generates a string, but "replace" or "ignore" # lose information and "strict" can raise UnicodeError diff --git a/Lib/urllib2.py b/Lib/urllib2.py index d7679fc..fb2c303 100644 --- a/Lib/urllib2.py +++ b/Lib/urllib2.py @@ -802,7 +802,7 @@ class AbstractBasicAuthHandler: user, pw = self.passwd.find_user_password(realm, host) if pw is not None: raw = "%s:%s" % (user, pw) - auth = 'Basic %s' % str(base64.b64encode(raw)).strip() + auth = 'Basic %s' % base64.b64encode(raw).strip().decode() if req.headers.get(self.auth_header, None) == auth: return None req.add_header(self.auth_header, auth) diff --git a/Lib/uuid.py b/Lib/uuid.py index 74d4a7a..06115c7 100644 --- a/Lib/uuid.py +++ b/Lib/uuid.py @@ -234,7 +234,7 @@ class UUID(object): @property def bytes(self): - bytes = b'' + bytes = buffer() for shift in range(0, 128, 8): bytes.insert(0, (self.int >> shift) & 0xff) return bytes @@ -548,7 +548,7 @@ def uuid4(): return UUID(bytes=os.urandom(16), version=4) except: import random - bytes = [chr(random.randrange(256)) for i in range(16)] + bytes = bytes_(random.randrange(256) for i in range(16)) return UUID(bytes=bytes, version=4) def uuid5(namespace, name): diff --git a/Lib/xmlrpclib.py b/Lib/xmlrpclib.py index efb0a7b..da96420 100644 --- a/Lib/xmlrpclib.py +++ b/Lib/xmlrpclib.py @@ -622,7 +622,7 @@ class Marshaller: write("<value><string>") write(escape(value)) write("</string></value>\n") - dispatch[str8] = dump_string + dispatch[bytes] = dump_string def dump_unicode(self, value, write, escape=escape): write("<value><string>") diff --git a/Lib/zipfile.py b/Lib/zipfile.py index 088f4a0..97f639d 100644 --- a/Lib/zipfile.py +++ b/Lib/zipfile.py @@ -678,7 +678,7 @@ class ZipFile: print(centdir) filename = fp.read(centdir[_CD_FILENAME_LENGTH]) # Create ZipInfo instance to store file information - x = ZipInfo(str(filename)) + x = ZipInfo(filename.decode("utf-8")) x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH]) x.comment = fp.read(centdir[_CD_COMMENT_LENGTH]) total = (total + centdir[_CD_FILENAME_LENGTH] diff --git a/Mac/Modules/ae/_AEmodule.c b/Mac/Modules/ae/_AEmodule.c index 3442619..37c0b7c 100644 --- a/Mac/Modules/ae/_AEmodule.c +++ b/Mac/Modules/ae/_AEmodule.c @@ -835,9 +835,9 @@ static PyObject *AEDesc_get_data(AEDescObject *self, void *closure) OSErr err; size = AEGetDescDataSize(&self->ob_itself); - if ( (res = PyBytes_FromStringAndSize(NULL, size)) == NULL ) + if ( (res = PyString_FromStringAndSize(NULL, size)) == NULL ) return NULL; - if ( (ptr = PyBytes_AsString(res)) == NULL ) + if ( (ptr = PyString_AS_STRING(res)) == NULL ) return NULL; if ( (err=AEGetDescData(&self->ob_itself, ptr, size)) < 0 ) return PyMac_Error(err); @@ -214,6 +214,7 @@ Nils Fischbeck Frederik Fix Matt Fleming Hernán Martínez Foffani +Amaury Forgeot d'Arc Doug Fort John Fouhy Martin Franklin @@ -31,6 +31,9 @@ Core and Builtins - io.open() and _fileio.FileIO have grown a new argument closefd. A false value disables the closing of the file descriptor. +- Added a new option -b to issues warnings (-bb for errors) about certain + operations between bytes/buffer and str like str(b'') and comparsion. + Extension Modules ----------------- diff --git a/Modules/_bsddb.c b/Modules/_bsddb.c index bd1c271..0587071 100644 --- a/Modules/_bsddb.c +++ b/Modules/_bsddb.c @@ -1171,13 +1171,16 @@ _db_associateCallback(DB* db, const DBT* priKey, const DBT* priData, else if (PyInt_Check(result)) { retval = PyInt_AsLong(result); } - else if (PyBytes_Check(result)) { + else if (PyBytes_Check(result) || PyString_Check(result)) { char* data; Py_ssize_t size; CLEAR_DBT(*secKey); - size = PyBytes_Size(result); - data = PyBytes_AsString(result); + size = Py_Size(result); + if (PyBytes_Check(result)) + data = PyBytes_AS_STRING(result); + else + data = PyString_AS_STRING(result); secKey->flags = DB_DBT_APPMALLOC; /* DB will free */ secKey->data = malloc(size); /* TODO, check this */ if (secKey->data) { @@ -1517,7 +1520,7 @@ DB_get(DBObject* self, PyObject* args, PyObject* kwargs) retval = Py_BuildValue("y#y#", key.data, key.size, data.data, data.size); else /* return just the data */ - retval = PyBytes_FromStringAndSize((char*)data.data, data.size); + retval = PyString_FromStringAndSize((char*)data.data, data.size); free_dbt(&data); } FREE_DBT_VIEW(key, keyobj, key_buf_view); @@ -1587,13 +1590,13 @@ DB_pget(DBObject* self, PyObject* args, PyObject* kwargs) else if (!err) { PyObject *pkeyObj; PyObject *dataObj; - dataObj = PyBytes_FromStringAndSize(data.data, data.size); + dataObj = PyString_FromStringAndSize(data.data, data.size); if (self->primaryDBType == DB_RECNO || self->primaryDBType == DB_QUEUE) pkeyObj = PyInt_FromLong(*(int *)pkey.data); else - pkeyObj = PyBytes_FromStringAndSize(pkey.data, pkey.size); + pkeyObj = PyString_FromStringAndSize(pkey.data, pkey.size); if (flags & DB_SET_RECNO) /* return key , pkey and data */ { @@ -1602,7 +1605,7 @@ DB_pget(DBObject* self, PyObject* args, PyObject* kwargs) if (type == DB_RECNO || type == DB_QUEUE) keyObj = PyInt_FromLong(*(int *)key.data); else - keyObj = PyBytes_FromStringAndSize(key.data, key.size); + keyObj = PyString_FromStringAndSize(key.data, key.size); #if (PY_VERSION_HEX >= 0x02040000) retval = PyTuple_Pack(3, keyObj, pkeyObj, dataObj); #else @@ -1729,7 +1732,8 @@ DB_get_both(DBObject* self, PyObject* args, PyObject* kwargs) else if (!err) { /* XXX(nnorwitz): can we do: retval = dataobj; Py_INCREF(retval); */ /* XXX(gps) I think not: buffer API input vs. bytes object output. */ - retval = PyBytes_FromStringAndSize((char*)data.data, data.size); + /* XXX(guido) But what if the input is PyString? */ + retval = PyString_FromStringAndSize((char*)data.data, data.size); /* Even though the flags require DB_DBT_MALLOC, data is not always allocated. 4.4: allocated, 4.5: *not* allocated. :-( */ @@ -2773,7 +2777,7 @@ PyObject* DB_subscript(DBObject* self, PyObject* keyobj) retval = NULL; } else { - retval = PyBytes_FromStringAndSize((char*)data.data, data.size); + retval = PyString_FromStringAndSize((char*)data.data, data.size); free_dbt(&data); } @@ -2928,7 +2932,7 @@ _DB_make_list(DBObject* self, DB_TXN* txn, int type) case DB_BTREE: case DB_HASH: default: - item = PyBytes_FromStringAndSize((char*)key.data, key.size); + item = PyString_FromStringAndSize((char*)key.data, key.size); break; case DB_RECNO: case DB_QUEUE: @@ -2938,7 +2942,7 @@ _DB_make_list(DBObject* self, DB_TXN* txn, int type) break; case _VALUES_LIST: - item = PyBytes_FromStringAndSize((char*)data.data, data.size); + item = PyString_FromStringAndSize((char*)data.data, data.size); break; case _ITEMS_LIST: @@ -3286,13 +3290,13 @@ DBC_pget(DBCursorObject* self, PyObject* args, PyObject *kwargs) else { PyObject *pkeyObj; PyObject *dataObj; - dataObj = PyBytes_FromStringAndSize(data.data, data.size); + dataObj = PyString_FromStringAndSize(data.data, data.size); if (self->mydb->primaryDBType == DB_RECNO || self->mydb->primaryDBType == DB_QUEUE) pkeyObj = PyInt_FromLong(*(int *)pkey.data); else - pkeyObj = PyBytes_FromStringAndSize(pkey.data, pkey.size); + pkeyObj = PyString_FromStringAndSize(pkey.data, pkey.size); if (key.data && key.size) /* return key, pkey and data */ { @@ -3301,7 +3305,7 @@ DBC_pget(DBCursorObject* self, PyObject* args, PyObject *kwargs) if (type == DB_RECNO || type == DB_QUEUE) keyObj = PyInt_FromLong(*(int *)key.data); else - keyObj = PyBytes_FromStringAndSize(key.data, key.size); + keyObj = PyString_FromStringAndSize(key.data, key.size); retval = PyTuple_Pack(3, keyObj, pkeyObj, dataObj); Py_DECREF(keyObj); } @@ -4909,7 +4913,7 @@ DBSequence_get_key(DBSequenceObject* self, PyObject* args) MYDB_END_ALLOW_THREADS if (!err) - retval = PyBytes_FromStringAndSize(key.data, key.size); + retval = PyString_FromStringAndSize(key.data, key.size); free_dbt(&key); RETURN_IF_ERR(); diff --git a/Modules/_codecsmodule.c b/Modules/_codecsmodule.c index e3933e7..caee3fd 100644 --- a/Modules/_codecsmodule.c +++ b/Modules/_codecsmodule.c @@ -180,7 +180,7 @@ escape_encode(PyObject *self, "string is too large to encode"); return NULL; } - v = PyBytes_FromStringAndSize(NULL, newsize); + v = PyString_FromStringAndSize(NULL, newsize); if (v == NULL) { return NULL; @@ -188,11 +188,11 @@ escape_encode(PyObject *self, else { register Py_ssize_t i; register char c; - register char *p = PyBytes_AS_STRING(v); + register char *p = PyString_AS_STRING(v); for (i = 0; i < size; i++) { /* There's at least enough room for a hex escape */ - assert(newsize - (p - PyBytes_AS_STRING(v)) >= 4); + assert(newsize - (p - PyString_AS_STRING(v)) >= 4); c = PyString_AS_STRING(str)[i]; if (c == '\'' || c == '\\') *p++ = '\\', *p++ = c; @@ -212,13 +212,12 @@ escape_encode(PyObject *self, *p++ = c; } *p = '\0'; - if (PyBytes_Resize(v, (p - PyBytes_AS_STRING(v)))) { - Py_DECREF(v); + if (_PyString_Resize(&v, (p - PyString_AS_STRING(v)))) { return NULL; } } - return codec_tuple(v, PyBytes_Size(v)); + return codec_tuple(v, PyString_Size(v)); } /* --- Decoder ------------------------------------------------------------ */ @@ -654,7 +653,7 @@ readbuffer_encode(PyObject *self, &data, &size, &errors)) return NULL; - return codec_tuple(PyBytes_FromStringAndSize(data, size), size); + return codec_tuple(PyString_FromStringAndSize(data, size), size); } static PyObject * @@ -669,7 +668,7 @@ charbuffer_encode(PyObject *self, &data, &size, &errors)) return NULL; - return codec_tuple(PyBytes_FromStringAndSize(data, size), size); + return codec_tuple(PyString_FromStringAndSize(data, size), size); } static PyObject * @@ -688,12 +687,12 @@ unicode_internal_encode(PyObject *self, if (PyUnicode_Check(obj)) { data = PyUnicode_AS_DATA(obj); size = PyUnicode_GET_DATA_SIZE(obj); - return codec_tuple(PyBytes_FromStringAndSize(data, size), size); + return codec_tuple(PyString_FromStringAndSize(data, size), size); } else { if (PyObject_AsReadBuffer(obj, (const void **)&data, &size)) return NULL; - return codec_tuple(PyBytes_FromStringAndSize(data, size), size); + return codec_tuple(PyString_FromStringAndSize(data, size), size); } } diff --git a/Modules/_ctypes/_ctypes.c b/Modules/_ctypes/_ctypes.c index 81276fa..39dfdef 100644 --- a/Modules/_ctypes/_ctypes.c +++ b/Modules/_ctypes/_ctypes.c @@ -763,7 +763,7 @@ CharArray_set_raw(CDataObject *self, PyObject *value) static PyObject * CharArray_get_raw(CDataObject *self) { - return PyBytes_FromStringAndSize(self->b_ptr, self->b_size); + return PyString_FromStringAndSize(self->b_ptr, self->b_size); } static PyObject * @@ -774,7 +774,7 @@ CharArray_get_value(CDataObject *self) for (i = 0; i < self->b_size; ++i) if (*ptr++ == '\0') break; - return PyBytes_FromStringAndSize(self->b_ptr, i); + return PyString_FromStringAndSize(self->b_ptr, i); } static int @@ -789,14 +789,14 @@ CharArray_set_value(CDataObject *self, PyObject *value) conversion_mode_errors); if (!value) return -1; - } else if (!PyBytes_Check(value)) { + } else if (!PyString_Check(value)) { PyErr_Format(PyExc_TypeError, "str/bytes expected instead of %s instance", Py_Type(value)->tp_name); return -1; } else Py_INCREF(value); - size = PyBytes_GET_SIZE(value); + size = PyString_GET_SIZE(value); if (size > self->b_size) { PyErr_SetString(PyExc_ValueError, "string too long"); @@ -804,7 +804,7 @@ CharArray_set_value(CDataObject *self, PyObject *value) return -1; } - ptr = PyBytes_AS_STRING(value); + ptr = PyString_AS_STRING(value); memcpy(self->b_ptr, ptr, size); if (size < self->b_size) self->b_ptr[size] = '\0'; @@ -838,7 +838,7 @@ WCharArray_set_value(CDataObject *self, PyObject *value) { Py_ssize_t result = 0; - if (PyBytes_Check(value)) { + if (PyString_Check(value)) { value = PyUnicode_FromEncodedObject(value, conversion_mode_encoding, conversion_mode_errors); @@ -1106,7 +1106,7 @@ c_wchar_p_from_param(PyObject *type, PyObject *value) Py_INCREF(Py_None); return Py_None; } - if (PyUnicode_Check(value) || PyBytes_Check(value)) { + if (PyUnicode_Check(value) || PyString_Check(value)) { PyCArgObject *parg; struct fielddesc *fd = getentry("Z"); @@ -1167,7 +1167,7 @@ c_char_p_from_param(PyObject *type, PyObject *value) Py_INCREF(Py_None); return Py_None; } - if (PyBytes_Check(value) || PyUnicode_Check(value)) { + if (PyString_Check(value) || PyUnicode_Check(value)) { PyCArgObject *parg; struct fielddesc *fd = getentry("z"); @@ -1251,7 +1251,7 @@ c_void_p_from_param(PyObject *type, PyObject *value) } /* XXX struni: remove later */ /* string */ - if (PyBytes_Check(value)) { + if (PyString_Check(value)) { PyCArgObject *parg; struct fielddesc *fd = getentry("z"); @@ -2705,8 +2705,8 @@ _get_name(PyObject *obj, char **pname) return 1; } #endif - if (PyBytes_Check(obj)) { - *pname = PyBytes_AS_STRING(obj); + if (PyString_Check(obj)) { + *pname = PyString_AS_STRING(obj); return *pname ? 1 : 0; } if (PyUnicode_Check(obj)) { @@ -3734,9 +3734,9 @@ Array_subscript(PyObject *_self, PyObject *item) char *dest; if (slicelen <= 0) - return PyBytes_FromStringAndSize("", 0); + return PyString_FromStringAndSize("", 0); if (step == 1) { - return PyBytes_FromStringAndSize(ptr + start, + return PyString_FromStringAndSize(ptr + start, slicelen); } dest = (char *)PyMem_Malloc(slicelen); @@ -3749,7 +3749,7 @@ Array_subscript(PyObject *_self, PyObject *item) dest[i] = ptr[cur]; } - np = PyBytes_FromStringAndSize(dest, slicelen); + np = PyString_FromStringAndSize(dest, slicelen); PyMem_Free(dest); return np; } @@ -4411,9 +4411,9 @@ Pointer_subscript(PyObject *_self, PyObject *item) char *dest; if (len <= 0) - return PyBytes_FromStringAndSize("", 0); + return PyString_FromStringAndSize("", 0); if (step == 1) { - return PyBytes_FromStringAndSize(ptr + start, + return PyString_FromStringAndSize(ptr + start, len); } dest = (char *)PyMem_Malloc(len); @@ -4422,7 +4422,7 @@ Pointer_subscript(PyObject *_self, PyObject *item) for (cur = start, i = 0; i < len; cur += step, i++) { dest[i] = ptr[cur]; } - np = PyBytes_FromStringAndSize(dest, len); + np = PyString_FromStringAndSize(dest, len); PyMem_Free(dest); return np; } @@ -4658,8 +4658,8 @@ static PyObject * string_at(const char *ptr, int size) { if (size == -1) - return PyBytes_FromStringAndSize(ptr, strlen(ptr)); - return PyBytes_FromStringAndSize(ptr, size); + return PyString_FromStringAndSize(ptr, strlen(ptr)); + return PyString_FromStringAndSize(ptr, size); } static int diff --git a/Modules/_ctypes/callproc.c b/Modules/_ctypes/callproc.c index bc524f7..69129f7 100644 --- a/Modules/_ctypes/callproc.c +++ b/Modules/_ctypes/callproc.c @@ -507,9 +507,9 @@ static int ConvParam(PyObject *obj, Py_ssize_t index, struct argument *pa) return 0; } - if (PyBytes_Check(obj)) { + if (PyString_Check(obj)) { pa->ffi_type = &ffi_type_pointer; - pa->value.p = PyBytes_AsString(obj); + pa->value.p = PyString_AsString(obj); Py_INCREF(obj); pa->keep = obj; return 0; diff --git a/Modules/_ctypes/cfield.c b/Modules/_ctypes/cfield.c index 910470a..2ec7b3a 100644 --- a/Modules/_ctypes/cfield.c +++ b/Modules/_ctypes/cfield.c @@ -1157,16 +1157,20 @@ c_set(void *ptr, PyObject *value, Py_ssize_t size) conversion_mode_errors); if (value == NULL) return NULL; - if (PyBytes_GET_SIZE(value) != 1) { + if (PyString_GET_SIZE(value) != 1) { Py_DECREF(value); goto error; } - *(char *)ptr = PyBytes_AsString(value)[0]; + *(char *)ptr = PyString_AS_STRING(value)[0]; Py_DECREF(value); _RET(value); } + if (PyString_Check(value) && PyString_GET_SIZE(value) == 1) { + *(char *)ptr = PyString_AS_STRING(value)[0]; + _RET(value); + } if (PyBytes_Check(value) && PyBytes_GET_SIZE(value) == 1) { - *(char *)ptr = PyBytes_AsString(value)[0]; + *(char *)ptr = PyBytes_AS_STRING(value)[0]; _RET(value); } if (PyInt_Check(value)) @@ -1187,7 +1191,7 @@ c_set(void *ptr, PyObject *value, Py_ssize_t size) static PyObject * c_get(void *ptr, Py_ssize_t size) { - return PyBytes_FromStringAndSize((char *)ptr, 1); + return PyString_FromStringAndSize((char *)ptr, 1); } #ifdef CTYPES_UNICODE @@ -1196,7 +1200,7 @@ static PyObject * u_set(void *ptr, PyObject *value, Py_ssize_t size) { Py_ssize_t len; - if (PyBytes_Check(value)) { + if (PyString_Check(value)) { value = PyUnicode_FromEncodedObject(value, conversion_mode_encoding, conversion_mode_errors); @@ -1271,7 +1275,7 @@ U_set(void *ptr, PyObject *value, Py_ssize_t length) /* It's easier to calculate in characters than in bytes */ length /= sizeof(wchar_t); - if (PyBytes_Check(value)) { + if (PyString_Check(value)) { value = PyUnicode_FromEncodedObject(value, conversion_mode_encoding, conversion_mode_errors); @@ -1327,8 +1331,8 @@ s_set(void *ptr, PyObject *value, Py_ssize_t length) conversion_mode_errors); if (value == NULL) return NULL; - assert(PyBytes_Check(value)); - } else if(PyBytes_Check(value)) { + assert(PyString_Check(value)); + } else if(PyString_Check(value)) { Py_INCREF(value); } else { PyErr_Format(PyExc_TypeError, @@ -1337,10 +1341,10 @@ s_set(void *ptr, PyObject *value, Py_ssize_t length) return NULL; } - data = PyBytes_AsString(value); + data = PyString_AS_STRING(value); if (!data) return NULL; - size = strlen(data); + size = strlen(data); /* XXX Why not Py_Size(value)? */ if (size < length) { /* This will copy the leading NUL character * if there is space for it. @@ -1368,8 +1372,8 @@ z_set(void *ptr, PyObject *value, Py_ssize_t size) Py_INCREF(value); return value; } - if (PyBytes_Check(value)) { - *(char **)ptr = PyBytes_AsString(value); + if (PyString_Check(value)) { + *(char **)ptr = PyString_AsString(value); Py_INCREF(value); return value; } else if (PyUnicode_Check(value)) { @@ -1378,8 +1382,7 @@ z_set(void *ptr, PyObject *value, Py_ssize_t size) conversion_mode_errors); if (str == NULL) return NULL; - assert(PyBytes_Check(str)); - *(char **)ptr = PyBytes_AS_STRING(str); + *(char **)ptr = PyString_AS_STRING(str); return str; } else if (PyInt_Check(value)) { #if SIZEOF_VOID_P == SIZEOF_LONG_LONG @@ -1433,7 +1436,7 @@ Z_set(void *ptr, PyObject *value, Py_ssize_t size) Py_INCREF(Py_None); return Py_None; } - if (PyBytes_Check(value)) { + if (PyString_Check(value)) { value = PyUnicode_FromEncodedObject(value, conversion_mode_encoding, conversion_mode_errors); @@ -1516,7 +1519,7 @@ BSTR_set(void *ptr, PyObject *value, Py_ssize_t size) /* convert value into a PyUnicodeObject or NULL */ if (Py_None == value) { value = NULL; - } else if (PyBytes_Check(value)) { + } else if (PyString_Check(value)) { value = PyUnicode_FromEncodedObject(value, conversion_mode_encoding, conversion_mode_errors); diff --git a/Modules/_cursesmodule.c b/Modules/_cursesmodule.c index a1d7e2e..cf412d8 100644 --- a/Modules/_cursesmodule.c +++ b/Modules/_cursesmodule.c @@ -1796,7 +1796,7 @@ PyCurses_GetWin(PyCursesWindowObject *self, PyObject *stream) remove(fn); return NULL; } - if (!PyBytes_Check(data)) { + if (!PyString_Check(data)) { PyErr_Format(PyExc_TypeError, "f.read() returned %.100s instead of bytes", data->ob_type->tp_name); @@ -1805,7 +1805,7 @@ PyCurses_GetWin(PyCursesWindowObject *self, PyObject *stream) remove(fn); return NULL; } - fwrite(PyBytes_AS_STRING(data), 1, PyBytes_GET_SIZE(data), fp); + fwrite(PyString_AS_STRING(data), 1, PyString_GET_SIZE(data), fp); Py_DECREF(data); fseek(fp, 0, 0); win = getwin(fp); diff --git a/Modules/_fileio.c b/Modules/_fileio.c index f02c5ef..c357a73 100644 --- a/Modules/_fileio.c +++ b/Modules/_fileio.c @@ -400,14 +400,14 @@ fileio_readall(PyFileIOObject *self) Py_ssize_t total = 0; int n; - result = PyBytes_FromStringAndSize(NULL, DEFAULT_BUFFER_SIZE); + result = PyString_FromStringAndSize(NULL, DEFAULT_BUFFER_SIZE); if (result == NULL) return NULL; while (1) { Py_ssize_t newsize = total + DEFAULT_BUFFER_SIZE; - if (PyBytes_GET_SIZE(result) < newsize) { - if (PyBytes_Resize(result, newsize) < 0) { + if (PyString_GET_SIZE(result) < newsize) { + if (_PyString_Resize(&result, newsize) < 0) { if (total == 0) { Py_DECREF(result); return NULL; @@ -419,7 +419,7 @@ fileio_readall(PyFileIOObject *self) Py_BEGIN_ALLOW_THREADS errno = 0; n = read(self->fd, - PyBytes_AS_STRING(result) + total, + PyString_AS_STRING(result) + total, newsize - total); Py_END_ALLOW_THREADS if (n == 0) @@ -438,8 +438,8 @@ fileio_readall(PyFileIOObject *self) total += n; } - if (PyBytes_GET_SIZE(result) > total) { - if (PyBytes_Resize(result, total) < 0) { + if (PyString_GET_SIZE(result) > total) { + if (_PyString_Resize(&result, total) < 0) { /* This should never happen, but just in case */ Py_DECREF(result); return NULL; @@ -468,10 +468,10 @@ fileio_read(PyFileIOObject *self, PyObject *args) return fileio_readall(self); } - bytes = PyBytes_FromStringAndSize(NULL, size); + bytes = PyString_FromStringAndSize(NULL, size); if (bytes == NULL) return NULL; - ptr = PyBytes_AsString(bytes); + ptr = PyString_AS_STRING(bytes); Py_BEGIN_ALLOW_THREADS errno = 0; @@ -486,7 +486,7 @@ fileio_read(PyFileIOObject *self, PyObject *args) } if (n != size) { - if (PyBytes_Resize(bytes, n) < 0) { + if (_PyString_Resize(&bytes, n) < 0) { Py_DECREF(bytes); return NULL; } diff --git a/Modules/_hashopenssl.c b/Modules/_hashopenssl.c index 252a2ae..0f460bf 100644 --- a/Modules/_hashopenssl.c +++ b/Modules/_hashopenssl.c @@ -108,7 +108,7 @@ EVP_digest(EVPobject *self, PyObject *unused) digest_size = EVP_MD_CTX_size(&temp_ctx); EVP_DigestFinal(&temp_ctx, digest, NULL); - retval = PyBytes_FromStringAndSize((const char *)digest, digest_size); + retval = PyString_FromStringAndSize((const char *)digest, digest_size); EVP_MD_CTX_cleanup(&temp_ctx); return retval; } diff --git a/Modules/_sqlite/cache.c b/Modules/_sqlite/cache.c index 829c175..2f50e6a 100644 --- a/Modules/_sqlite/cache.c +++ b/Modules/_sqlite/cache.c @@ -241,12 +241,12 @@ PyObject* pysqlite_cache_display(pysqlite_Cache* self, PyObject* args) if (!fmt_args) { return NULL; } - template = PyString_FromString("%s <- %s ->%s\n"); + template = PyUnicode_FromString("%s <- %s ->%s\n"); if (!template) { Py_DECREF(fmt_args); return NULL; } - display_str = PyString_Format(template, fmt_args); + display_str = PyUnicode_Format(template, fmt_args); Py_DECREF(template); Py_DECREF(fmt_args); if (!display_str) { diff --git a/Modules/_sqlite/connection.c b/Modules/_sqlite/connection.c index 5f899e8..d4318de 100644 --- a/Modules/_sqlite/connection.c +++ b/Modules/_sqlite/connection.c @@ -425,8 +425,6 @@ void _pysqlite_set_result(sqlite3_context* context, PyObject* py_val) sqlite3_result_int64(context, (PY_LONG_LONG)longval); } else if (PyFloat_Check(py_val)) { sqlite3_result_double(context, PyFloat_AsDouble(py_val)); - } else if (PyString_Check(py_val)) { - sqlite3_result_text(context, PyString_AsString(py_val), -1, SQLITE_TRANSIENT); } else if (PyUnicode_Check(py_val)) { sqlite3_result_text(context, PyUnicode_AsString(py_val), -1, SQLITE_TRANSIENT); } else if (PyObject_CheckBuffer(py_val)) { @@ -467,7 +465,7 @@ PyObject* _pysqlite_build_py_params(sqlite3_context *context, int argc, sqlite3_ break; case SQLITE_TEXT: val_str = (const char*)sqlite3_value_text(cur_value); - cur_py_value = PyUnicode_DecodeUTF8(val_str, strlen(val_str), NULL); + cur_py_value = PyUnicode_FromString(val_str); /* TODO: have a way to show errors here */ if (!cur_py_value) { PyErr_Clear(); @@ -477,7 +475,7 @@ PyObject* _pysqlite_build_py_params(sqlite3_context *context, int argc, sqlite3_ break; case SQLITE_BLOB: buflen = sqlite3_value_bytes(cur_value); - cur_py_value = PyBytes_FromStringAndSize( + cur_py_value = PyString_FromStringAndSize( sqlite3_value_blob(cur_value), buflen); break; case SQLITE_NULL: @@ -1023,8 +1021,8 @@ pysqlite_collation_callback( goto finally; } - string1 = PyString_FromStringAndSize((const char*)text1_data, text1_length); - string2 = PyString_FromStringAndSize((const char*)text2_data, text2_length); + string1 = PyUnicode_FromStringAndSize((const char*)text1_data, text1_length); + string2 = PyUnicode_FromStringAndSize((const char*)text2_data, text2_length); if (!string1 || !string2) { goto finally; /* failed to allocate strings */ @@ -1093,7 +1091,7 @@ pysqlite_connection_create_collation(pysqlite_Connection* self, PyObject* args) goto finally; } - chk = PyString_AsString(uppercase_name); + chk = PyUnicode_AsString(uppercase_name); while (*chk) { if ((*chk >= '0' && *chk <= '9') || (*chk >= 'A' && *chk <= 'Z') @@ -1118,7 +1116,7 @@ pysqlite_connection_create_collation(pysqlite_Connection* self, PyObject* args) } rc = sqlite3_create_collation(self->db, - PyString_AsString(uppercase_name), + PyUnicode_AsString(uppercase_name), SQLITE_UTF8, (callable != Py_None) ? callable : NULL, (callable != Py_None) ? pysqlite_collation_callback : NULL); diff --git a/Modules/_sqlite/cursor.c b/Modules/_sqlite/cursor.c index c789faf..c51c92e 100644 --- a/Modules/_sqlite/cursor.c +++ b/Modules/_sqlite/cursor.c @@ -272,11 +272,7 @@ PyObject* pysqlite_unicode_from_string(const char* val_str, int optimize) } } - if (is_ascii) { - return PyString_FromString(val_str); - } else { - return PyUnicode_DecodeUTF8(val_str, strlen(val_str), NULL); - } + return PyUnicode_FromString(val_str); } /* @@ -379,7 +375,7 @@ PyObject* _pysqlite_fetch_one_row(pysqlite_Cursor* self) } else { /* coltype == SQLITE_BLOB */ nbytes = sqlite3_column_bytes(self->statement->st, i); - buffer = PyBytes_FromStringAndSize( + buffer = PyString_FromStringAndSize( sqlite3_column_blob(self->statement->st, i), nbytes); if (!buffer) { break; @@ -436,8 +432,8 @@ PyObject* _pysqlite_query_execute(pysqlite_Cursor* self, int multiple, PyObject* return NULL; } - if (!PyString_Check(operation) && !PyUnicode_Check(operation)) { - PyErr_SetString(PyExc_ValueError, "operation parameter must be str or unicode"); + if (!PyUnicode_Check(operation)) { + PyErr_SetString(PyExc_ValueError, "operation parameter must be str"); return NULL; } @@ -458,8 +454,8 @@ PyObject* _pysqlite_query_execute(pysqlite_Cursor* self, int multiple, PyObject* return NULL; } - if (!PyString_Check(operation) && !PyUnicode_Check(operation)) { - PyErr_SetString(PyExc_ValueError, "operation parameter must be str or unicode"); + if (!PyUnicode_Check(operation)) { + PyErr_SetString(PyExc_ValueError, "operation parameter must be str"); return NULL; } diff --git a/Modules/_sqlite/module.c b/Modules/_sqlite/module.c index 61ac0a1..107d61a 100644 --- a/Modules/_sqlite/module.c +++ b/Modules/_sqlite/module.c @@ -146,7 +146,7 @@ static PyObject* module_register_converter(PyObject* self, PyObject* args, PyObj PyObject* callable; PyObject* retval = NULL; - if (!PyArg_ParseTuple(args, "SO", &orig_name, &callable)) { + if (!PyArg_ParseTuple(args, "UO", &orig_name, &callable)) { return NULL; } diff --git a/Modules/_sqlite/row.c b/Modules/_sqlite/row.c index 2f3ba69..dfb6363 100644 --- a/Modules/_sqlite/row.c +++ b/Modules/_sqlite/row.c @@ -87,7 +87,7 @@ PyObject* pysqlite_row_subscript(pysqlite_Row* self, PyObject* idx) nitems = PyTuple_Size(self->description); for (i = 0; i < nitems; i++) { - compare_key = PyString_AsString(PyTuple_GET_ITEM(PyTuple_GET_ITEM(self->description, i), 0)); + compare_key = PyUnicode_AsString(PyTuple_GET_ITEM(PyTuple_GET_ITEM(self->description, i), 0)); if (!compare_key) { return NULL; } diff --git a/Modules/_sqlite/statement.c b/Modules/_sqlite/statement.c index 1cc3cdd..98cc68a 100644 --- a/Modules/_sqlite/statement.c +++ b/Modules/_sqlite/statement.c @@ -105,15 +105,10 @@ int pysqlite_statement_bind_parameter(pysqlite_Statement* self, int pos, PyObjec #endif } else if (PyFloat_Check(parameter)) { rc = sqlite3_bind_double(self->st, pos, PyFloat_AsDouble(parameter)); - } else if PyString_Check(parameter) { - string = PyString_AsString(parameter); - rc = sqlite3_bind_text(self->st, pos, string, -1, SQLITE_TRANSIENT); } else if PyUnicode_Check(parameter) { - stringval = PyUnicode_AsUTF8String(parameter); - string = PyBytes_AsString(stringval); + string = PyUnicode_AsString(parameter); rc = sqlite3_bind_text(self->st, pos, string, -1, SQLITE_TRANSIENT); - Py_DECREF(stringval); } else if (PyObject_CheckBuffer(parameter)) { if (PyObject_AsCharBuffer(parameter, &buffer, &buflen) == 0) { rc = sqlite3_bind_blob(self->st, pos, buffer, buflen, SQLITE_TRANSIENT); diff --git a/Modules/_struct.c b/Modules/_struct.c index 84aa828..8f66a96 100644 --- a/Modules/_struct.c +++ b/Modules/_struct.c @@ -12,11 +12,6 @@ static PyTypeObject PyStructType; -/* compatibility macros */ -#if (PY_VERSION_HEX < 0x02050000) -typedef int Py_ssize_t; -#endif - /* If PY_STRUCT_OVERFLOW_MASKING is defined, the struct module will wrap all input numbers for explicit endians such that they fit in the given type, much like explicit casting in C. A warning will be raised if the number did @@ -411,7 +406,7 @@ _range_error(const formatdef *f, int is_unsigned) if (msg == NULL) return -1; rval = PyErr_WarnEx(PyExc_DeprecationWarning, - PyString_AS_STRING(msg), 2); + PyUnicode_AsString(msg), 2); Py_DECREF(msg); if (rval == 0) return 0; @@ -1535,37 +1530,26 @@ Requires len(buffer) == self.size. See struct.__doc__ for more on format\n\ strings."); static PyObject * -s_unpack(PyObject *self, PyObject *inputstr) +s_unpack(PyObject *self, PyObject *input) { - char *start; - Py_ssize_t len; - PyObject *args=NULL, *result; + Py_buffer vbuf; + PyObject *result; PyStructObject *soself = (PyStructObject *)self; + assert(PyStruct_Check(self)); assert(soself->s_codes != NULL); - if (inputstr == NULL) - goto fail; - if (PyString_Check(inputstr) && - PyString_GET_SIZE(inputstr) == soself->s_size) { - return s_unpack_internal(soself, PyString_AS_STRING(inputstr)); - } - args = PyTuple_Pack(1, inputstr); - if (args == NULL) + if (PyObject_GetBuffer(input, &vbuf, PyBUF_SIMPLE) < 0) return NULL; - if (!PyArg_ParseTuple(args, "s#:unpack", &start, &len)) - goto fail; - if (soself->s_size != len) - goto fail; - result = s_unpack_internal(soself, start); - Py_DECREF(args); + if (vbuf.len != soself->s_size) { + PyErr_Format(StructError, + "unpack requires a bytes argument of length %zd", + soself->s_size); + PyObject_ReleaseBuffer(input, &vbuf); + return NULL; + } + result = s_unpack_internal(soself, vbuf.buf); + PyObject_ReleaseBuffer(input, &vbuf); return result; - -fail: - Py_XDECREF(args); - PyErr_Format(StructError, - "unpack requires a string argument of length %zd", - soself->s_size); - return NULL; } PyDoc_STRVAR(s_unpack_from__doc__, @@ -1580,37 +1564,34 @@ static PyObject * s_unpack_from(PyObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"buffer", "offset", 0}; -#if (PY_VERSION_HEX < 0x02050000) - static char *fmt = "z#|i:unpack_from"; -#else - static char *fmt = "z#|n:unpack_from"; -#endif - Py_ssize_t buffer_len = 0, offset = 0; - char *buffer = NULL; + + PyObject *input; + Py_ssize_t offset = 0; + Py_buffer vbuf; + PyObject *result; PyStructObject *soself = (PyStructObject *)self; + assert(PyStruct_Check(self)); assert(soself->s_codes != NULL); - if (!PyArg_ParseTupleAndKeywords(args, kwds, fmt, kwlist, - &buffer, &buffer_len, &offset)) + if (!PyArg_ParseTupleAndKeywords(args, kwds, + "O|n:unpack_from", kwlist, + &input, &offset)) return NULL; - - if (buffer == NULL) { - PyErr_Format(StructError, - "unpack_from requires a buffer argument"); + if (PyObject_GetBuffer(input, &vbuf, PyBUF_SIMPLE) < 0) return NULL; - } - if (offset < 0) - offset += buffer_len; - - if (offset < 0 || (buffer_len - offset) < soself->s_size) { + offset += vbuf.len; + if (offset < 0 || vbuf.len - offset < soself->s_size) { PyErr_Format(StructError, "unpack_from requires a buffer of at least %zd bytes", soself->s_size); + PyObject_ReleaseBuffer(input, &vbuf); return NULL; } - return s_unpack_internal(soself, buffer + offset); + result = s_unpack_internal(soself, (char*)vbuf.buf + offset); + PyObject_ReleaseBuffer(input, &vbuf); + return result; } diff --git a/Modules/arraymodule.c b/Modules/arraymodule.c index 8a24a7e..c7aeb5b 100644 --- a/Modules/arraymodule.c +++ b/Modules/arraymodule.c @@ -1212,14 +1212,14 @@ array_fromfile(arrayobject *self, PyObject *args) if (b == NULL) return NULL; - if (!PyBytes_Check(b)) { + if (!PyString_Check(b)) { PyErr_SetString(PyExc_TypeError, "read() didn't return bytes"); Py_DECREF(b); return NULL; } - if (PyBytes_GET_SIZE(b) != nbytes) { + if (PyString_GET_SIZE(b) != nbytes) { PyErr_SetString(PyExc_EOFError, "read() didn't return enough bytes"); Py_DECREF(b); @@ -1263,7 +1263,7 @@ array_tofile(arrayobject *self, PyObject *f) PyObject *bytes, *res; if (i*BLOCKSIZE + size > nbytes) size = nbytes - i*BLOCKSIZE; - bytes = PyBytes_FromStringAndSize(ptr, size); + bytes = PyString_FromStringAndSize(ptr, size); if (bytes == NULL) return NULL; res = PyObject_CallMethod(f, "write", "O", bytes); @@ -1395,7 +1395,7 @@ values, as if it had been read from a file using the fromfile() method)."); static PyObject * array_tostring(arrayobject *self, PyObject *unused) { - return PyBytes_FromStringAndSize(self->ob_item, + return PyString_FromStringAndSize(self->ob_item, Py_Size(self) * self->ob_descr->itemsize); } @@ -1861,6 +1861,7 @@ array_new(PyTypeObject *type, PyObject *args, PyObject *kwds) if (!(initial == NULL || PyList_Check(initial) || PyBytes_Check(initial) + || PyString_Check(initial) || PyTuple_Check(initial) || ((c=='u') && PyUnicode_Check(initial)))) { it = PyObject_GetIter(initial); @@ -1904,7 +1905,9 @@ array_new(PyTypeObject *type, PyObject *args, PyObject *kwds) } Py_DECREF(v); } - } else if (initial != NULL && PyBytes_Check(initial)) { + } + else if (initial != NULL && (PyBytes_Check(initial) || + PyString_Check(initial))) { PyObject *t_initial, *v; t_initial = PyTuple_Pack(1, initial); if (t_initial == NULL) { @@ -1919,7 +1922,8 @@ array_new(PyTypeObject *type, PyObject *args, PyObject *kwds) return NULL; } Py_DECREF(v); - } else if (initial != NULL && PyUnicode_Check(initial)) { + } + else if (initial != NULL && PyUnicode_Check(initial)) { Py_ssize_t n = PyUnicode_GET_DATA_SIZE(initial); if (n > 0) { arrayobject *self = (arrayobject *)a; diff --git a/Modules/binascii.c b/Modules/binascii.c index 3b55a35..62fc8c2 100644 --- a/Modules/binascii.c +++ b/Modules/binascii.c @@ -200,9 +200,9 @@ binascii_a2b_uu(PyObject *self, PyObject *args) ascii_len--; /* Allocate the buffer */ - if ( (rv=PyBytes_FromStringAndSize(NULL, bin_len)) == NULL ) + if ( (rv=PyString_FromStringAndSize(NULL, bin_len)) == NULL ) return NULL; - bin_data = (unsigned char *)PyBytes_AS_STRING(rv); + bin_data = (unsigned char *)PyString_AS_STRING(rv); for( ; bin_len > 0 ; ascii_len--, ascii_data++ ) { /* XXX is it really best to add NULs if there's no more data */ @@ -277,9 +277,9 @@ binascii_b2a_uu(PyObject *self, PyObject *args) } /* We're lazy and allocate to much (fixed up later) */ - if ( (rv=PyBytes_FromStringAndSize(NULL, bin_len*2+2)) == NULL ) + if ( (rv=PyString_FromStringAndSize(NULL, bin_len*2+2)) == NULL ) return NULL; - ascii_data = (unsigned char *)PyBytes_AS_STRING(rv); + ascii_data = (unsigned char *)PyString_AS_STRING(rv); /* Store the length */ *ascii_data++ = ' ' + (bin_len & 077); @@ -301,9 +301,9 @@ binascii_b2a_uu(PyObject *self, PyObject *args) } *ascii_data++ = '\n'; /* Append a courtesy newline */ - if (PyBytes_Resize(rv, + if (_PyString_Resize(&rv, (ascii_data - - (unsigned char *)PyBytes_AS_STRING(rv))) < 0) { + (unsigned char *)PyString_AS_STRING(rv))) < 0) { Py_DECREF(rv); rv = NULL; } @@ -355,9 +355,9 @@ binascii_a2b_base64(PyObject *self, PyObject *args) bin_len = ((ascii_len+3)/4)*3; /* Upper bound, corrected later */ /* Allocate the buffer */ - if ( (rv=PyBytes_FromStringAndSize(NULL, bin_len)) == NULL ) + if ( (rv=PyString_FromStringAndSize(NULL, bin_len)) == NULL ) return NULL; - bin_data = (unsigned char *)PyBytes_AS_STRING(rv); + bin_data = (unsigned char *)PyString_AS_STRING(rv); bin_len = 0; for( ; ascii_len > 0; ascii_len--, ascii_data++) { @@ -416,17 +416,17 @@ binascii_a2b_base64(PyObject *self, PyObject *args) /* And set string size correctly. If the result string is empty ** (because the input was all invalid) return the shared empty - ** string instead; PyBytes_Resize() won't do this for us. + ** string instead; _PyString_Resize() won't do this for us. */ if (bin_len > 0) { - if (PyBytes_Resize(rv, bin_len) < 0) { + if (_PyString_Resize(&rv, bin_len) < 0) { Py_DECREF(rv); rv = NULL; } } else { Py_DECREF(rv); - rv = PyBytes_FromStringAndSize("", 0); + rv = PyString_FromStringAndSize("", 0); } return rv; } @@ -453,9 +453,9 @@ binascii_b2a_base64(PyObject *self, PyObject *args) /* We're lazy and allocate too much (fixed up later). "+3" leaves room for up to two pad characters and a trailing newline. Note that 'b' gets encoded as 'Yg==\n' (1 in, 5 out). */ - if ( (rv=PyBytes_FromStringAndSize(NULL, bin_len*2 + 3)) == NULL ) + if ( (rv=PyString_FromStringAndSize(NULL, bin_len*2 + 3)) == NULL ) return NULL; - ascii_data = (unsigned char *)PyBytes_AS_STRING(rv); + ascii_data = (unsigned char *)PyString_AS_STRING(rv); for( ; bin_len > 0 ; bin_len--, bin_data++ ) { /* Shift the data into our buffer */ @@ -479,9 +479,9 @@ binascii_b2a_base64(PyObject *self, PyObject *args) } *ascii_data++ = '\n'; /* Append a courtesy newline */ - if (PyBytes_Resize(rv, + if (_PyString_Resize(&rv, (ascii_data - - (unsigned char *)PyBytes_AS_STRING(rv))) < 0) { + (unsigned char *)PyString_AS_STRING(rv))) < 0) { Py_DECREF(rv); rv = NULL; } @@ -507,9 +507,9 @@ binascii_a2b_hqx(PyObject *self, PyObject *args) /* Allocate a string that is too big (fixed later) Add two to the initial length to prevent interning which would preclude subsequent resizing. */ - if ( (rv=PyBytes_FromStringAndSize(NULL, len+2)) == NULL ) + if ( (rv=PyString_FromStringAndSize(NULL, len+2)) == NULL ) return NULL; - bin_data = (unsigned char *)PyBytes_AS_STRING(rv); + bin_data = (unsigned char *)PyString_AS_STRING(rv); for( ; len > 0 ; len--, ascii_data++ ) { /* Get the byte and look it up */ @@ -543,9 +543,9 @@ binascii_a2b_hqx(PyObject *self, PyObject *args) Py_DECREF(rv); return NULL; } - if (PyBytes_Resize(rv, + if (_PyString_Resize(&rv, (bin_data - - (unsigned char *)PyBytes_AS_STRING(rv))) < 0) { + (unsigned char *)PyString_AS_STRING(rv))) < 0) { Py_DECREF(rv); rv = NULL; } @@ -572,9 +572,9 @@ binascii_rlecode_hqx(PyObject *self, PyObject *args) return NULL; /* Worst case: output is twice as big as input (fixed later) */ - if ( (rv=PyBytes_FromStringAndSize(NULL, len*2+2)) == NULL ) + if ( (rv=PyString_FromStringAndSize(NULL, len*2+2)) == NULL ) return NULL; - out_data = (unsigned char *)PyBytes_AS_STRING(rv); + out_data = (unsigned char *)PyString_AS_STRING(rv); for( in=0; in<len; in++) { ch = in_data[in]; @@ -600,9 +600,9 @@ binascii_rlecode_hqx(PyObject *self, PyObject *args) } } } - if (PyBytes_Resize(rv, + if (_PyString_Resize(&rv, (out_data - - (unsigned char *)PyBytes_AS_STRING(rv))) < 0) { + (unsigned char *)PyString_AS_STRING(rv))) < 0) { Py_DECREF(rv); rv = NULL; } @@ -625,9 +625,9 @@ binascii_b2a_hqx(PyObject *self, PyObject *args) return NULL; /* Allocate a buffer that is at least large enough */ - if ( (rv=PyBytes_FromStringAndSize(NULL, len*2+2)) == NULL ) + if ( (rv=PyString_FromStringAndSize(NULL, len*2+2)) == NULL ) return NULL; - ascii_data = (unsigned char *)PyBytes_AS_STRING(rv); + ascii_data = (unsigned char *)PyString_AS_STRING(rv); for( ; len > 0 ; len--, bin_data++ ) { /* Shift into our buffer, and output any 6bits ready */ @@ -644,9 +644,9 @@ binascii_b2a_hqx(PyObject *self, PyObject *args) leftchar <<= (6-leftbits); *ascii_data++ = table_b2a_hqx[leftchar & 0x3f]; } - if (PyBytes_Resize(rv, + if (_PyString_Resize(&rv, (ascii_data - - (unsigned char *)PyBytes_AS_STRING(rv))) < 0) { + (unsigned char *)PyString_AS_STRING(rv))) < 0) { Py_DECREF(rv); rv = NULL; } @@ -668,14 +668,14 @@ binascii_rledecode_hqx(PyObject *self, PyObject *args) /* Empty string is a special case */ if ( in_len == 0 ) - return PyBytes_FromStringAndSize("", 0); + return PyString_FromStringAndSize("", 0); /* Allocate a buffer of reasonable size. Resized when needed */ out_len = in_len*2; - if ( (rv=PyBytes_FromStringAndSize(NULL, out_len)) == NULL ) + if ( (rv=PyString_FromStringAndSize(NULL, out_len)) == NULL ) return NULL; out_len_left = out_len; - out_data = (unsigned char *)PyBytes_AS_STRING(rv); + out_data = (unsigned char *)PyString_AS_STRING(rv); /* ** We need two macros here to get/put bytes and handle @@ -694,9 +694,9 @@ binascii_rledecode_hqx(PyObject *self, PyObject *args) #define OUTBYTE(b) \ do { \ if ( --out_len_left < 0 ) { \ - if (PyBytes_Resize(rv, 2*out_len) < 0) \ + if (_PyString_Resize(&rv, 2*out_len) < 0) \ { Py_DECREF(rv); return NULL; } \ - out_data = (unsigned char *)PyBytes_AS_STRING(rv) \ + out_data = (unsigned char *)PyString_AS_STRING(rv) \ + out_len; \ out_len_left = out_len-1; \ out_len = out_len * 2; \ @@ -744,9 +744,9 @@ binascii_rledecode_hqx(PyObject *self, PyObject *args) OUTBYTE(in_byte); } } - if (PyBytes_Resize(rv, + if (_PyString_Resize(&rv, (out_data - - (unsigned char *)PyBytes_AS_STRING(rv))) < 0) { + (unsigned char *)PyString_AS_STRING(rv))) < 0) { Py_DECREF(rv); rv = NULL; } @@ -940,10 +940,10 @@ binascii_hexlify(PyObject *self, PyObject *args) if (!PyArg_ParseTuple(args, "s#:b2a_hex", &argbuf, &arglen)) return NULL; - retval = PyBytes_FromStringAndSize(NULL, arglen*2); + retval = PyString_FromStringAndSize(NULL, arglen*2); if (!retval) return NULL; - retbuf = PyBytes_AS_STRING(retval); + retbuf = PyString_AS_STRING(retval); /* make hex version of string, taken from shamodule.c */ for (i=j=0; i < arglen; i++) { @@ -1000,10 +1000,10 @@ binascii_unhexlify(PyObject *self, PyObject *args) return NULL; } - retval = PyBytes_FromStringAndSize(NULL, (arglen/2)); + retval = PyString_FromStringAndSize(NULL, (arglen/2)); if (!retval) return NULL; - retbuf = PyBytes_AS_STRING(retval); + retbuf = PyString_AS_STRING(retval); for (i=j=0; i < arglen; i += 2) { int top = to_int(Py_CHARMASK(argbuf[i])); @@ -1115,7 +1115,7 @@ binascii_a2b_qp(PyObject *self, PyObject *args, PyObject *kwargs) out++; } } - if ((rv = PyBytes_FromStringAndSize((char *)odata, out)) == NULL) { + if ((rv = PyString_FromStringAndSize((char *)odata, out)) == NULL) { PyMem_Free(odata); return NULL; } @@ -1315,7 +1315,7 @@ binascii_b2a_qp (PyObject *self, PyObject *args, PyObject *kwargs) } } } - if ((rv = PyBytes_FromStringAndSize((char *)odata, out)) == NULL) { + if ((rv = PyString_FromStringAndSize((char *)odata, out)) == NULL) { PyMem_Free(odata); return NULL; } diff --git a/Modules/bz2module.c b/Modules/bz2module.c index 15b6e44..e0fbb57 100644 --- a/Modules/bz2module.c +++ b/Modules/bz2module.c @@ -34,7 +34,7 @@ typedef fpos_t Py_off_t; #error "Large file support, but neither off_t nor fpos_t is large enough." #endif -#define BUF(v) PyBytes_AS_STRING(v) +#define BUF(v) PyString_AS_STRING(v) #define MODE_CLOSED 0 #define MODE_READ 1 @@ -232,7 +232,7 @@ Util_GetLine(BZ2FileObject *f, int n) int bytes_read; total_v_size = n > 0 ? n : 100; - v = PyBytes_FromStringAndSize((char *)NULL, total_v_size); + v = PyString_FromStringAndSize((char *)NULL, total_v_size); if (v == NULL) return NULL; @@ -272,8 +272,7 @@ Util_GetLine(BZ2FileObject *f, int n) Py_DECREF(v); return NULL; } - if (PyBytes_Resize(v, total_v_size) < 0) { - Py_DECREF(v); + if (_PyString_Resize(&v, total_v_size) < 0) { return NULL; } buf = BUF(v) + used_v_size; @@ -282,8 +281,7 @@ Util_GetLine(BZ2FileObject *f, int n) used_v_size = buf - BUF(v); if (used_v_size != total_v_size) { - if (PyBytes_Resize(v, used_v_size) < 0) { - Py_DECREF(v); + if (_PyString_Resize(&v, used_v_size) < 0) { v = NULL; } } @@ -340,10 +338,10 @@ Util_ReadAhead(BZ2FileObject *f, int bufsize) /* This is a hacked version of Python's * fileobject.c:readahead_get_line_skip(). */ -static PyBytesObject * +static PyStringObject * Util_ReadAheadGetLineSkip(BZ2FileObject *f, int skip, int bufsize) { - PyBytesObject* s; + PyStringObject* s; char *bufptr; char *buf; int len; @@ -354,17 +352,17 @@ Util_ReadAheadGetLineSkip(BZ2FileObject *f, int skip, int bufsize) len = f->f_bufend - f->f_bufptr; if (len == 0) - return (PyBytesObject *) - PyBytes_FromStringAndSize(NULL, skip); + return (PyStringObject *) + PyString_FromStringAndSize(NULL, skip); bufptr = memchr(f->f_bufptr, '\n', len); if (bufptr != NULL) { bufptr++; /* Count the '\n' */ len = bufptr - f->f_bufptr; - s = (PyBytesObject *) - PyBytes_FromStringAndSize(NULL, skip+len); + s = (PyStringObject *) + PyString_FromStringAndSize(NULL, skip+len); if (s == NULL) return NULL; - memcpy(PyBytes_AS_STRING(s)+skip, f->f_bufptr, len); + memcpy(PyString_AS_STRING(s)+skip, f->f_bufptr, len); f->f_bufptr = bufptr; if (bufptr == f->f_bufend) Util_DropReadAhead(f); @@ -378,7 +376,7 @@ Util_ReadAheadGetLineSkip(BZ2FileObject *f, int skip, int bufsize) PyMem_Free(buf); return NULL; } - memcpy(PyBytes_AS_STRING(s)+skip, bufptr, len); + memcpy(PyString_AS_STRING(s)+skip, bufptr, len); PyMem_Free(buf); } return s; @@ -411,7 +409,7 @@ BZ2File_read(BZ2FileObject *self, PyObject *args) case MODE_READ: break; case MODE_READ_EOF: - ret = PyBytes_FromStringAndSize("", 0); + ret = PyString_FromStringAndSize("", 0); goto cleanup; case MODE_CLOSED: PyErr_SetString(PyExc_ValueError, @@ -433,7 +431,7 @@ BZ2File_read(BZ2FileObject *self, PyObject *args) "more than a Python string can hold"); goto cleanup; } - ret = PyBytes_FromStringAndSize((char *)NULL, buffersize); + ret = PyString_FromStringAndSize((char *)NULL, buffersize); if (ret == NULL || buffersize == 0) goto cleanup; bytesread = 0; @@ -458,8 +456,7 @@ BZ2File_read(BZ2FileObject *self, PyObject *args) } if (bytesrequested < 0) { buffersize = Util_NewBufferSize(buffersize); - if (PyBytes_Resize(ret, buffersize) < 0) { - Py_DECREF(ret); + if (_PyString_Resize(&ret, buffersize) < 0) { ret = NULL; goto cleanup; } @@ -468,8 +465,7 @@ BZ2File_read(BZ2FileObject *self, PyObject *args) } } if (bytesread != buffersize) { - if (PyBytes_Resize(ret, bytesread) < 0) { - Py_DECREF(ret); + if (_PyString_Resize(&ret, bytesread) < 0) { ret = NULL; } } @@ -502,7 +498,7 @@ BZ2File_readline(BZ2FileObject *self, PyObject *args) case MODE_READ: break; case MODE_READ_EOF: - ret = PyBytes_FromStringAndSize("", 0); + ret = PyString_FromStringAndSize("", 0); goto cleanup; case MODE_CLOSED: PyErr_SetString(PyExc_ValueError, @@ -515,7 +511,7 @@ BZ2File_readline(BZ2FileObject *self, PyObject *args) } if (sizehint == 0) - ret = PyBytes_FromStringAndSize("", 0); + ret = PyString_FromStringAndSize("", 0); else ret = Util_GetLine(self, (sizehint < 0) ? 0 : sizehint); @@ -608,21 +604,20 @@ BZ2File_readlines(BZ2FileObject *self, PyObject *args) } if (big_buffer == NULL) { /* Create the big buffer */ - big_buffer = PyBytes_FromStringAndSize( + big_buffer = PyString_FromStringAndSize( NULL, buffersize); if (big_buffer == NULL) goto error; - buffer = PyBytes_AS_STRING(big_buffer); + buffer = PyString_AS_STRING(big_buffer); memcpy(buffer, small_buffer, nfilled); } else { /* Grow the big buffer */ - if (PyBytes_Resize(big_buffer, buffersize) < 0){ - Py_DECREF(big_buffer); + if (_PyString_Resize(&big_buffer, buffersize) < 0){ big_buffer = NULL; goto error; } - buffer = PyBytes_AS_STRING(big_buffer); + buffer = PyString_AS_STRING(big_buffer); } continue; } @@ -631,7 +626,7 @@ BZ2File_readlines(BZ2FileObject *self, PyObject *args) while (p != NULL) { /* Process complete lines */ p++; - line = PyBytes_FromStringAndSize(q, p-q); + line = PyString_FromStringAndSize(q, p-q); if (line == NULL) goto error; err = PyList_Append(list, line); @@ -654,21 +649,18 @@ BZ2File_readlines(BZ2FileObject *self, PyObject *args) } if (nfilled != 0) { /* Partial last line */ - line = PyBytes_FromStringAndSize(buffer, nfilled); + line = PyString_FromStringAndSize(buffer, nfilled); if (line == NULL) goto error; if (sizehint > 0) { /* Need to complete the last line */ PyObject *rest = Util_GetLine(self, 0); - PyObject *new; if (rest == NULL) { Py_DECREF(line); goto error; } - new = PyBytes_Concat(line, rest); - Py_DECREF(line); + PyString_Concat(&line, rest); Py_DECREF(rest); - line = new; if (line == NULL) goto error; } @@ -702,7 +694,7 @@ BZ2File_write(BZ2FileObject *self, PyObject *args) int len; int bzerror; - if (!PyArg_ParseTuple(args, "s#:write", &buf, &len)) + if (!PyArg_ParseTuple(args, "y#:write", &buf, &len)) return NULL; ACQUIRE_LOCK(self); @@ -820,7 +812,7 @@ BZ2File_writelines(BZ2FileObject *self, PyObject *seq) could potentially execute Python code. */ for (i = 0; i < j; i++) { PyObject *v = PyList_GET_ITEM(list, i); - if (!PyBytes_Check(v)) { + if (!PyString_Check(v)) { const char *buffer; Py_ssize_t len; if (PyObject_AsCharBuffer(v, &buffer, &len)) { @@ -831,7 +823,7 @@ BZ2File_writelines(BZ2FileObject *self, PyObject *seq) "bytes objects"); goto error; } - line = PyBytes_FromStringAndSize(buffer, + line = PyString_FromStringAndSize(buffer, len); if (line == NULL) goto error; @@ -845,9 +837,9 @@ BZ2File_writelines(BZ2FileObject *self, PyObject *seq) Py_BEGIN_ALLOW_THREADS for (i = 0; i < j; i++) { line = PyList_GET_ITEM(list, i); - len = PyBytes_GET_SIZE(line); + len = PyString_GET_SIZE(line); BZ2_bzWrite (&bzerror, self->fp, - PyBytes_AS_STRING(line), len); + PyString_AS_STRING(line), len); if (bzerror != BZ_OK) { Py_BLOCK_THREADS Util_CatchBZ2Error(bzerror); @@ -1269,7 +1261,7 @@ BZ2File_getiter(BZ2FileObject *self) static PyObject * BZ2File_iternext(BZ2FileObject *self) { - PyBytesObject* ret; + PyStringObject* ret; ACQUIRE_LOCK(self); if (self->mode == MODE_CLOSED) { PyErr_SetString(PyExc_ValueError, @@ -1278,7 +1270,7 @@ BZ2File_iternext(BZ2FileObject *self) } ret = Util_ReadAheadGetLineSkip(self, 0, READAHEAD_BUFSIZE); RELEASE_LOCK(self); - if (ret == NULL || PyBytes_GET_SIZE(ret) == 0) { + if (ret == NULL || PyString_GET_SIZE(ret) == 0) { Py_XDECREF(ret); return NULL; } @@ -1367,11 +1359,11 @@ BZ2Comp_compress(BZ2CompObject *self, PyObject *args) bz_stream *bzs = &self->bzs; int bzerror; - if (!PyArg_ParseTuple(args, "s#:compress", &data, &datasize)) + if (!PyArg_ParseTuple(args, "y#:compress", &data, &datasize)) return NULL; if (datasize == 0) - return PyBytes_FromStringAndSize("", 0); + return PyString_FromStringAndSize("", 0); ACQUIRE_LOCK(self); if (!self->running) { @@ -1380,7 +1372,7 @@ BZ2Comp_compress(BZ2CompObject *self, PyObject *args) goto error; } - ret = PyBytes_FromStringAndSize(NULL, bufsize); + ret = PyString_FromStringAndSize(NULL, bufsize); if (!ret) goto error; @@ -1403,7 +1395,7 @@ BZ2Comp_compress(BZ2CompObject *self, PyObject *args) break; /* no more input data */ if (bzs->avail_out == 0) { bufsize = Util_NewBufferSize(bufsize); - if (PyBytes_Resize(ret, bufsize) < 0) { + if (_PyString_Resize(&ret, bufsize) < 0) { BZ2_bzCompressEnd(bzs); goto error; } @@ -1413,7 +1405,7 @@ BZ2Comp_compress(BZ2CompObject *self, PyObject *args) } } - if (PyBytes_Resize(ret, + if (_PyString_Resize(&ret, (Py_ssize_t)(BZS_TOTAL_OUT(bzs) - totalout)) < 0) goto error; @@ -1450,7 +1442,7 @@ BZ2Comp_flush(BZ2CompObject *self) } self->running = 0; - ret = PyBytes_FromStringAndSize(NULL, bufsize); + ret = PyString_FromStringAndSize(NULL, bufsize); if (!ret) goto error; @@ -1471,7 +1463,7 @@ BZ2Comp_flush(BZ2CompObject *self) } if (bzs->avail_out == 0) { bufsize = Util_NewBufferSize(bufsize); - if (PyBytes_Resize(ret, bufsize) < 0) + if (_PyString_Resize(&ret, bufsize) < 0) goto error; bzs->next_out = BUF(ret); bzs->next_out = BUF(ret) + (BZS_TOTAL_OUT(bzs) @@ -1481,7 +1473,7 @@ BZ2Comp_flush(BZ2CompObject *self) } if (bzs->avail_out != 0) { - if (PyBytes_Resize(ret, + if (_PyString_Resize(&ret, (Py_ssize_t)(BZS_TOTAL_OUT(bzs) - totalout)) < 0) goto error; } @@ -1656,7 +1648,7 @@ BZ2Decomp_decompress(BZ2DecompObject *self, PyObject *args) bz_stream *bzs = &self->bzs; int bzerror; - if (!PyArg_ParseTuple(args, "s#:decompress", &data, &datasize)) + if (!PyArg_ParseTuple(args, "y#:decompress", &data, &datasize)) return NULL; ACQUIRE_LOCK(self); @@ -1666,7 +1658,7 @@ BZ2Decomp_decompress(BZ2DecompObject *self, PyObject *args) goto error; } - ret = PyBytes_FromStringAndSize(NULL, bufsize); + ret = PyString_FromStringAndSize(NULL, bufsize); if (!ret) goto error; @@ -1685,7 +1677,7 @@ BZ2Decomp_decompress(BZ2DecompObject *self, PyObject *args) if (bzs->avail_in != 0) { Py_DECREF(self->unused_data); self->unused_data = - PyBytes_FromStringAndSize(bzs->next_in, + PyString_FromStringAndSize(bzs->next_in, bzs->avail_in); } self->running = 0; @@ -1699,7 +1691,7 @@ BZ2Decomp_decompress(BZ2DecompObject *self, PyObject *args) break; /* no more input data */ if (bzs->avail_out == 0) { bufsize = Util_NewBufferSize(bufsize); - if (PyBytes_Resize(ret, bufsize) < 0) { + if (_PyString_Resize(&ret, bufsize) < 0) { BZ2_bzDecompressEnd(bzs); goto error; } @@ -1711,7 +1703,7 @@ BZ2Decomp_decompress(BZ2DecompObject *self, PyObject *args) } if (bzs->avail_out != 0) { - if (PyBytes_Resize(ret, + if (_PyString_Resize(&ret, (Py_ssize_t)(BZS_TOTAL_OUT(bzs) - totalout)) < 0) goto error; } @@ -1750,7 +1742,7 @@ BZ2Decomp_init(BZ2DecompObject *self, PyObject *args, PyObject *kwargs) } #endif - self->unused_data = PyBytes_FromStringAndSize("", 0); + self->unused_data = PyString_FromStringAndSize("", 0); if (!self->unused_data) goto error; @@ -1868,7 +1860,7 @@ bz2_compress(PyObject *self, PyObject *args, PyObject *kwargs) int bzerror; static char *kwlist[] = {"data", "compresslevel", 0}; - if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s#|i", + if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y#|i", kwlist, &data, &datasize, &compresslevel)) return NULL; @@ -1883,7 +1875,7 @@ bz2_compress(PyObject *self, PyObject *args, PyObject *kwargs) * data in one shot. We will check it later anyway. */ bufsize = datasize + (datasize/100+1) + 600; - ret = PyBytes_FromStringAndSize(NULL, bufsize); + ret = PyString_FromStringAndSize(NULL, bufsize); if (!ret) return NULL; @@ -1915,9 +1907,8 @@ bz2_compress(PyObject *self, PyObject *args, PyObject *kwargs) } if (bzs->avail_out == 0) { bufsize = Util_NewBufferSize(bufsize); - if (PyBytes_Resize(ret, bufsize) < 0) { + if (_PyString_Resize(&ret, bufsize) < 0) { BZ2_bzCompressEnd(bzs); - Py_DECREF(ret); return NULL; } bzs->next_out = BUF(ret) + BZS_TOTAL_OUT(bzs); @@ -1926,8 +1917,7 @@ bz2_compress(PyObject *self, PyObject *args, PyObject *kwargs) } if (bzs->avail_out != 0) { - if (PyBytes_Resize(ret, (Py_ssize_t)BZS_TOTAL_OUT(bzs)) < 0) { - Py_DECREF(ret); + if (_PyString_Resize(&ret, (Py_ssize_t)BZS_TOTAL_OUT(bzs)) < 0) { ret = NULL; } } @@ -1954,13 +1944,13 @@ bz2_decompress(PyObject *self, PyObject *args) bz_stream *bzs = &_bzs; int bzerror; - if (!PyArg_ParseTuple(args, "s#:decompress", &data, &datasize)) + if (!PyArg_ParseTuple(args, "y#:decompress", &data, &datasize)) return NULL; if (datasize == 0) - return PyBytes_FromStringAndSize("", 0); + return PyString_FromStringAndSize("", 0); - ret = PyBytes_FromStringAndSize(NULL, bufsize); + ret = PyString_FromStringAndSize(NULL, bufsize); if (!ret) return NULL; @@ -1999,9 +1989,8 @@ bz2_decompress(PyObject *self, PyObject *args) } if (bzs->avail_out == 0) { bufsize = Util_NewBufferSize(bufsize); - if (PyBytes_Resize(ret, bufsize) < 0) { + if (_PyString_Resize(&ret, bufsize) < 0) { BZ2_bzDecompressEnd(bzs); - Py_DECREF(ret); return NULL; } bzs->next_out = BUF(ret) + BZS_TOTAL_OUT(bzs); @@ -2010,8 +1999,7 @@ bz2_decompress(PyObject *self, PyObject *args) } if (bzs->avail_out != 0) { - if (PyBytes_Resize(ret, (Py_ssize_t)BZS_TOTAL_OUT(bzs)) < 0) { - Py_DECREF(ret); + if (_PyString_Resize(&ret, (Py_ssize_t)BZS_TOTAL_OUT(bzs)) < 0) { ret = NULL; } } diff --git a/Modules/cjkcodecs/multibytecodec.c b/Modules/cjkcodecs/multibytecodec.c index 7ab3145..701b112 100644 --- a/Modules/cjkcodecs/multibytecodec.c +++ b/Modules/cjkcodecs/multibytecodec.c @@ -175,15 +175,15 @@ expand_encodebuffer(MultibyteEncodeBuffer *buf, Py_ssize_t esize) Py_ssize_t orgpos, orgsize; orgpos = (Py_ssize_t)((char *)buf->outbuf - - PyBytes_AS_STRING(buf->outobj)); - orgsize = PyBytes_GET_SIZE(buf->outobj); - if (PyBytes_Resize(buf->outobj, orgsize + ( + PyString_AS_STRING(buf->outobj)); + orgsize = PyString_GET_SIZE(buf->outobj); + if (_PyString_Resize(&buf->outobj, orgsize + ( esize < (orgsize >> 1) ? (orgsize >> 1) | 1 : esize)) == -1) return -1; - buf->outbuf = (unsigned char *)PyBytes_AS_STRING(buf->outobj) +orgpos; - buf->outbuf_end = (unsigned char *)PyBytes_AS_STRING(buf->outobj) - + PyBytes_GET_SIZE(buf->outobj); + buf->outbuf = (unsigned char *)PyString_AS_STRING(buf->outobj) +orgpos; + buf->outbuf_end = (unsigned char *)PyString_AS_STRING(buf->outobj) + + PyString_GET_SIZE(buf->outobj); return 0; } @@ -330,11 +330,11 @@ multibytecodec_encerror(MultibyteCodec *codec, goto errorexit; } - assert(PyBytes_Check(retstr)); - retstrsize = PyBytes_GET_SIZE(retstr); + assert(PyString_Check(retstr)); + retstrsize = PyString_GET_SIZE(retstr); REQUIRE_ENCODEBUFFER(buf, retstrsize); - memcpy(buf->outbuf, PyBytes_AS_STRING(retstr), retstrsize); + memcpy(buf->outbuf, PyString_AS_STRING(retstr), retstrsize); buf->outbuf += retstrsize; newpos = PyInt_AsSsize_t(PyTuple_GET_ITEM(retobj, 1)); @@ -476,16 +476,16 @@ multibytecodec_encode(MultibyteCodec *codec, Py_ssize_t finalsize, r = 0; if (datalen == 0) - return PyBytes_FromStringAndSize(NULL, 0); + return PyString_FromStringAndSize(NULL, 0); buf.excobj = NULL; buf.inbuf = buf.inbuf_top = *data; buf.inbuf_end = buf.inbuf_top + datalen; - buf.outobj = PyBytes_FromStringAndSize(NULL, datalen * 2 + 16); + buf.outobj = PyString_FromStringAndSize(NULL, datalen * 2 + 16); if (buf.outobj == NULL) goto errorexit; - buf.outbuf = (unsigned char *)PyBytes_AS_STRING(buf.outobj); - buf.outbuf_end = buf.outbuf + PyBytes_GET_SIZE(buf.outobj); + buf.outbuf = (unsigned char *)PyString_AS_STRING(buf.outobj); + buf.outbuf_end = buf.outbuf + PyString_GET_SIZE(buf.outobj); while (buf.inbuf < buf.inbuf_end) { Py_ssize_t inleft, outleft; @@ -520,10 +520,10 @@ multibytecodec_encode(MultibyteCodec *codec, } finalsize = (Py_ssize_t)((char *)buf.outbuf - - PyBytes_AS_STRING(buf.outobj)); + PyString_AS_STRING(buf.outobj)); - if (finalsize != PyBytes_GET_SIZE(buf.outobj)) - if (PyBytes_Resize(buf.outobj, finalsize) == -1) + if (finalsize != PyString_GET_SIZE(buf.outobj)) + if (_PyString_Resize(&buf.outobj, finalsize) == -1) goto errorexit; Py_XDECREF(buf.excobj); @@ -1611,8 +1611,8 @@ mbstreamwriter_reset(MultibyteStreamWriterObject *self) if (pwrt == NULL) return NULL; - assert(PyBytes_Check(pwrt)); - if (PyBytes_Size(pwrt) > 0) { + assert(PyString_Check(pwrt)); + if (PyString_Size(pwrt) > 0) { PyObject *wr; wr = PyObject_CallMethod(self->stream, "write", "O", pwrt); if (wr == NULL) { diff --git a/Modules/datetimemodule.c b/Modules/datetimemodule.c index 6f13a85..6b2cd5a 100644 --- a/Modules/datetimemodule.c +++ b/Modules/datetimemodule.c @@ -1133,7 +1133,7 @@ make_Zreplacement(PyObject *object, PyObject *tzinfoarg) { PyObject *temp; PyObject *tzinfo = get_tzinfo_member(object); - PyObject *Zreplacement = PyBytes_FromStringAndSize("", 0); + PyObject *Zreplacement = PyUnicode_FromStringAndSize(NULL, 0); if (Zreplacement == NULL) return NULL; if (tzinfo == Py_None || tzinfo == NULL) @@ -1158,14 +1158,7 @@ make_Zreplacement(PyObject *object, PyObject *tzinfoarg) Py_DECREF(temp); if (Zreplacement == NULL) return NULL; - if (PyUnicode_Check(Zreplacement)) { - PyObject *tmp = PyUnicode_AsUTF8String(Zreplacement); - Py_DECREF(Zreplacement); - if (tmp == NULL) - return NULL; - Zreplacement = tmp; - } - if (!PyBytes_Check(Zreplacement)) { + if (!PyUnicode_Check(Zreplacement)) { PyErr_SetString(PyExc_TypeError, "tzname.replace() did not return a string"); goto Error; @@ -1297,9 +1290,10 @@ wrap_strftime(PyObject *object, PyObject *format, PyObject *timetuple, goto Done; } assert(Zreplacement != NULL); - assert(PyBytes_Check(Zreplacement)); - ptoappend = PyBytes_AS_STRING(Zreplacement); - ntoappend = PyBytes_GET_SIZE(Zreplacement); + assert(PyUnicode_Check(Zreplacement)); + ptoappend = PyUnicode_AsStringAndSize(Zreplacement, + &ntoappend); + ntoappend = Py_Size(Zreplacement); } else { /* percent followed by neither z nor Z */ @@ -3194,7 +3188,7 @@ time_strftime(PyDateTime_Time *self, PyObject *args, PyObject *kw) PyObject *tuple; static char *keywords[] = {"format", NULL}; - if (! PyArg_ParseTupleAndKeywords(args, kw, "S:strftime", keywords, + if (! PyArg_ParseTupleAndKeywords(args, kw, "U:strftime", keywords, &format)) return NULL; diff --git a/Modules/dbmmodule.c b/Modules/dbmmodule.c index 5660882..6b05fad 100644 --- a/Modules/dbmmodule.c +++ b/Modules/dbmmodule.c @@ -219,14 +219,14 @@ dbm_contains(PyObject *self, PyObject *arg) if (arg == NULL) return -1; } - if (!PyBytes_Check(arg)) { + if (!PyString_Check(arg)) { PyErr_Format(PyExc_TypeError, "dbm key must be string, not %.100s", arg->ob_type->tp_name); return -1; } - key.dptr = PyBytes_AS_STRING(arg); - key.dsize = PyBytes_GET_SIZE(arg); + key.dptr = PyString_AS_STRING(arg); + key.dsize = PyString_GET_SIZE(arg); val = dbm_fetch(dp->di_dbm, key); return val.dptr != NULL; } diff --git a/Modules/gdbmmodule.c b/Modules/gdbmmodule.c index 5ee9c9a..86f98c0 100644 --- a/Modules/gdbmmodule.c +++ b/Modules/gdbmmodule.c @@ -251,14 +251,14 @@ dbm_contains(PyObject *self, PyObject *arg) "GDBM object has already been closed"); return -1; } - if (!PyBytes_Check(arg)) { + if (!PyString_Check(arg)) { PyErr_Format(PyExc_TypeError, "gdbm key must be bytes, not %.100s", arg->ob_type->tp_name); return -1; } - key.dptr = PyBytes_AsString(arg); - key.dsize = PyBytes_Size(arg); + key.dptr = PyString_AS_STRING(arg); + key.dsize = PyString_GET_SIZE(arg); return gdbm_exists(dp->di_dbm, key); } diff --git a/Modules/main.c b/Modules/main.c index b15f1714..ee4a1b8 100644 --- a/Modules/main.c +++ b/Modules/main.c @@ -44,7 +44,7 @@ static char **orig_argv; static int orig_argc; /* command line options */ -#define BASE_OPTS "c:dEhim:OStuvVW:xX?" +#define BASE_OPTS "bc:dEhim:OStuvVW:xX?" #define PROGRAM_OPTS BASE_OPTS @@ -55,32 +55,34 @@ static char *usage_line = /* Long usage message, split into parts < 512 bytes */ static char *usage_1 = "\ Options and arguments (and corresponding environment variables):\n\ +-b : issue warnings about str(bytes_instance), str(buffer_instance)\n\ + and comparing bytes/buffer with str. (-bb: issue errors)\n\ -c cmd : program passed in as string (terminates option list)\n\ -d : debug output from parser; also PYTHONDEBUG=x\n\ -E : ignore environment variables (such as PYTHONPATH)\n\ -h : print this help message and exit (also --help)\n\ --i : inspect interactively after running script; forces a prompt even\n\ - if stdin does not appear to be a terminal; also PYTHONINSPECT=x\n\ "; static char *usage_2 = "\ +-i : inspect interactively after running script; forces a prompt even\n\ + if stdin does not appear to be a terminal; also PYTHONINSPECT=x\n\ -m mod : run library module as a script (terminates option list)\n\ -O : optimize generated bytecode slightly; also PYTHONOPTIMIZE=x\n\ -OO : remove doc-strings in addition to the -O optimizations\n\ -S : don't imply 'import site' on initialization\n\ -t : issue warnings about inconsistent tab usage (-tt: issue errors)\n\ --u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x\n\ "; static char *usage_3 = "\ +-u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x\n\ see man page for details on internal buffering relating to '-u'\n\ -v : verbose (trace import statements); also PYTHONVERBOSE=x\n\ can be supplied multiple times to increase verbosity\n\ -V : print the Python version number and exit (also --version)\n\ -W arg : warning control; arg is action:message:category:module:lineno\n\ -x : skip first line of source, allowing use of non-Unix forms of #!cmd\n\ -file : program read from script file\n\ -- : program read from stdin (default; interactive mode if a tty)\n\ "; static char *usage_4 = "\ +file : program read from script file\n\ +- : program read from stdin (default; interactive mode if a tty)\n\ arg ...: arguments passed to program in sys.argv[1:]\n\n\ Other environment variables:\n\ PYTHONSTARTUP: file executed on interactive startup (no default)\n\ @@ -252,6 +254,9 @@ Py_Main(int argc, char **argv) } switch (c) { + case 'b': + Py_BytesWarningFlag++; + break; case 'd': Py_DebugFlag++; diff --git a/Modules/md5module.c b/Modules/md5module.c index 26f0e4c..c5770b3 100644 --- a/Modules/md5module.c +++ b/Modules/md5module.c @@ -363,7 +363,7 @@ MD5_digest(MD5object *self, PyObject *unused) temp = self->hash_state; md5_done(&temp, digest); - return PyBytes_FromStringAndSize((const char *)digest, MD5_DIGESTSIZE); + return PyString_FromStringAndSize((const char *)digest, MD5_DIGESTSIZE); } PyDoc_STRVAR(MD5_hexdigest__doc__, diff --git a/Modules/mmapmodule.c b/Modules/mmapmodule.c index e5d7711..1a58f3d 100644 --- a/Modules/mmapmodule.c +++ b/Modules/mmapmodule.c @@ -305,7 +305,7 @@ is_resizeable(mmap_object *self) return 0; } if ((self->access == ACCESS_WRITE) || (self->access == ACCESS_DEFAULT)) - return 1; + return 1; PyErr_Format(PyExc_TypeError, "mmap can't resize a readonly or copy-on-write memory map."); return 0; @@ -621,10 +621,10 @@ static struct PyMethodDef mmap_object_methods[] = { /* Functions for treating an mmap'ed file as a buffer */ static int -mmap_buffer_getbuf(mmap_object *self, Py_buffer *view, int flags) +mmap_buffer_getbuf(mmap_object *self, Py_buffer *view, int flags) { CHECK_VALID(-1); - if (PyBuffer_FillInfo(view, self->data, self->size, + if (PyBuffer_FillInfo(view, self->data, self->size, (self->access == ACCESS_READ), flags) < 0) return -1; self->exports++; @@ -676,7 +676,7 @@ mmap_subscript(mmap_object *self, PyObject *item) "mmap index out of range"); return NULL; } - return PyBytes_FromStringAndSize(self->data + i, 1); + return PyInt_FromLong(Py_CHARMASK(self->data[i])); } else if (PySlice_Check(item)) { Py_ssize_t start, stop, step, slicelen; @@ -685,12 +685,12 @@ mmap_subscript(mmap_object *self, PyObject *item) &start, &stop, &step, &slicelen) < 0) { return NULL; } - + if (slicelen <= 0) - return PyBytes_FromStringAndSize("", 0); + return PyString_FromStringAndSize("", 0); else if (step == 1) - return PyBytes_FromStringAndSize(self->data + start, - slicelen); + return PyString_FromStringAndSize(self->data + start, + slicelen); else { char *result_buf = (char *)PyMem_Malloc(slicelen); Py_ssize_t cur, i; @@ -702,8 +702,8 @@ mmap_subscript(mmap_object *self, PyObject *item) cur += step, i++) { result_buf[i] = self->data[cur]; } - result = PyBytes_FromStringAndSize(result_buf, - slicelen); + result = PyString_FromStringAndSize(result_buf, + slicelen); PyMem_Free(result_buf); return result; } @@ -765,9 +765,12 @@ mmap_ass_subscript(mmap_object *self, PyObject *item, PyObject *value) { CHECK_VALID(-1); + if (!is_writable(self)) + return -1; + if (PyIndex_Check(item)) { Py_ssize_t i = PyNumber_AsSsize_t(item, PyExc_IndexError); - const char *buf; + Py_ssize_t v; if (i == -1 && PyErr_Occurred()) return -1; @@ -775,28 +778,35 @@ mmap_ass_subscript(mmap_object *self, PyObject *item, PyObject *value) i += self->size; if (i < 0 || (size_t)i > self->size) { PyErr_SetString(PyExc_IndexError, - "mmap index out of range"); + "mmap index out of range"); return -1; } if (value == NULL) { PyErr_SetString(PyExc_TypeError, - "mmap object doesn't support item deletion"); + "mmap doesn't support item deletion"); return -1; } - if (!PyBytes_Check(value) || PyBytes_Size(value) != 1) { - PyErr_SetString(PyExc_IndexError, - "mmap assignment must be length-1 bytes()"); + if (!PyIndex_Check(value)) { + PyErr_SetString(PyExc_TypeError, + "mmap item value must be an int"); return -1; } - if (!is_writable(self)) + v = PyNumber_AsSsize_t(value, PyExc_TypeError); + if (v == -1 && PyErr_Occurred()) return -1; - buf = PyBytes_AsString(value); - self->data[i] = buf[0]; + if (v < 0 || v > 255) { + PyErr_SetString(PyExc_ValueError, + "mmap item value must be " + "in range(0, 256)"); + return -1; + } + self->data[i] = v; return 0; } else if (PySlice_Check(item)) { Py_ssize_t start, stop, step, slicelen; - + Py_buffer vbuf; + if (PySlice_GetIndicesEx((PySliceObject *)item, self->size, &start, &stop, &step, &slicelen) < 0) { @@ -807,41 +817,32 @@ mmap_ass_subscript(mmap_object *self, PyObject *item, PyObject *value) "mmap object doesn't support slice deletion"); return -1; } - if (!PyBytes_Check(value)) { - PyErr_SetString(PyExc_IndexError, - "mmap slice assignment must be bytes"); + if (PyObject_GetBuffer(value, &vbuf, PyBUF_SIMPLE) < 0) return -1; - } - if (PyBytes_Size(value) != slicelen) { + if (vbuf.len != slicelen) { PyErr_SetString(PyExc_IndexError, "mmap slice assignment is wrong size"); + PyObject_ReleaseBuffer(value, &vbuf); return -1; } - if (!is_writable(self)) - return -1; - if (slicelen == 0) - return 0; + if (slicelen == 0) { + } else if (step == 1) { - const char *buf = PyBytes_AsString(value); - - if (buf == NULL) - return -1; - memcpy(self->data + start, buf, slicelen); - return 0; + memcpy(self->data + start, vbuf.buf, slicelen); } else { Py_ssize_t cur, i; - const char *buf = PyBytes_AsString(value); - - if (buf == NULL) - return -1; - for (cur = start, i = 0; i < slicelen; - cur += step, i++) { - self->data[cur] = buf[i]; + + for (cur = start, i = 0; + i < slicelen; + cur += step, i++) + { + self->data[cur] = ((char *)vbuf.buf)[i]; } - return 0; } + PyObject_ReleaseBuffer(value, &vbuf); + return 0; } else { PyErr_SetString(PyExc_TypeError, @@ -908,9 +909,9 @@ _GetMapSize(PyObject *o, const char* param) return 0; if (PyIndex_Check(o)) { Py_ssize_t i = PyNumber_AsSsize_t(o, PyExc_OverflowError); - if (i==-1 && PyErr_Occurred()) + if (i==-1 && PyErr_Occurred()) return -1; - if (i < 0) { + if (i < 0) { PyErr_Format(PyExc_OverflowError, "memory mapped %s must be positive", param); @@ -1151,8 +1152,8 @@ new_mmap_object(PyObject *self, PyObject *args, PyObject *kwdict) (dwErr = GetLastError()) != NO_ERROR) { Py_DECREF(m_obj); return PyErr_SetFromWindowsErr(dwErr); - } - + } + #if SIZEOF_SIZE_T > 4 m_obj->size = (((size_t)high)<<32) + low; #else diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c index acd01da..cb74f84 100644 --- a/Modules/posixmodule.c +++ b/Modules/posixmodule.c @@ -357,12 +357,12 @@ convertenviron(void) char *p = strchr(*e, '='); if (p == NULL) continue; - k = PyString_FromStringAndSize(*e, (int)(p-*e)); + k = PyUnicode_FromStringAndSize(*e, (int)(p-*e)); if (k == NULL) { PyErr_Clear(); continue; } - v = PyString_FromString(p+1); + v = PyUnicode_FromString(p+1); if (v == NULL) { PyErr_Clear(); Py_DECREF(k); diff --git a/Modules/pyexpat.c b/Modules/pyexpat.c index ae6f143..658569e 100644 --- a/Modules/pyexpat.c +++ b/Modules/pyexpat.c @@ -858,6 +858,7 @@ readinst(char *buf, int buf_size, PyObject *meth) PyObject *bytes = NULL; PyObject *str = NULL; int len = -1; + char *ptr; if ((bytes = PyInt_FromLong(buf_size)) == NULL) goto finally; @@ -877,14 +878,17 @@ readinst(char *buf, int buf_size, PyObject *meth) if (str == NULL) goto finally; - /* XXX what to do if it returns a Unicode string? */ - if (!PyBytes_Check(str)) { + if (PyString_Check(str)) + ptr = PyString_AS_STRING(str); + else if (PyBytes_Check(str)) + ptr = PyBytes_AS_STRING(str); + else { PyErr_Format(PyExc_TypeError, "read() did not return a bytes object (type=%.400s)", Py_Type(str)->tp_name); goto finally; } - len = PyBytes_GET_SIZE(str); + len = Py_Size(str); if (len > buf_size) { PyErr_Format(PyExc_ValueError, "read() returned too much data: " @@ -892,7 +896,7 @@ readinst(char *buf, int buf_size, PyObject *meth) buf_size, len); goto finally; } - memcpy(buf, PyBytes_AsString(str), len); + memcpy(buf, ptr, len); finally: Py_XDECREF(arg); Py_XDECREF(str); @@ -998,7 +1002,7 @@ xmlparse_GetInputContext(xmlparseobject *self, PyObject *unused) = XML_GetInputContext(self->itself, &offset, &size); if (buffer != NULL) - return PyBytes_FromStringAndSize(buffer + offset, + return PyString_FromStringAndSize(buffer + offset, size - offset); else Py_RETURN_NONE; diff --git a/Modules/sha1module.c b/Modules/sha1module.c index d96fe35..6ba03ad 100644 --- a/Modules/sha1module.c +++ b/Modules/sha1module.c @@ -339,7 +339,7 @@ SHA1_digest(SHA1object *self, PyObject *unused) temp = self->hash_state; sha1_done(&temp, digest); - return PyBytes_FromStringAndSize((const char *)digest, SHA1_DIGESTSIZE); + return PyString_FromStringAndSize((const char *)digest, SHA1_DIGESTSIZE); } PyDoc_STRVAR(SHA1_hexdigest__doc__, diff --git a/Modules/sha256module.c b/Modules/sha256module.c index da31d18..a5dc9ad 100644 --- a/Modules/sha256module.c +++ b/Modules/sha256module.c @@ -432,7 +432,7 @@ SHA256_digest(SHAobject *self, PyObject *unused) SHAcopy(self, &temp); sha_final(digest, &temp); - return PyBytes_FromStringAndSize((const char *)digest, self->digestsize); + return PyString_FromStringAndSize((const char *)digest, self->digestsize); } PyDoc_STRVAR(SHA256_hexdigest__doc__, diff --git a/Modules/sha512module.c b/Modules/sha512module.c index 8b33bb0..c330e1b 100644 --- a/Modules/sha512module.c +++ b/Modules/sha512module.c @@ -498,7 +498,7 @@ SHA512_digest(SHAobject *self, PyObject *unused) SHAcopy(self, &temp); sha512_final(digest, &temp); - return PyBytes_FromStringAndSize((const char *)digest, self->digestsize); + return PyString_FromStringAndSize((const char *)digest, self->digestsize); } PyDoc_STRVAR(SHA512_hexdigest__doc__, diff --git a/Modules/socketmodule.c b/Modules/socketmodule.c index 7fe562e..78aeb55 100644 --- a/Modules/socketmodule.c +++ b/Modules/socketmodule.c @@ -3659,8 +3659,8 @@ socket_getaddrinfo(PyObject *self, PyObject *args) idna = PyObject_CallMethod(hobj, "encode", "s", "idna"); if (!idna) return NULL; - assert(PyBytes_Check(idna)); - hptr = PyBytes_AsString(idna); + assert(PyString_Check(idna)); + hptr = PyString_AS_STRING(idna); } else if (PyString_Check(hobj)) { hptr = PyString_AsString(hobj); } else { diff --git a/Objects/abstract.c b/Objects/abstract.c index e848f8f..01fbcbf 100644 --- a/Objects/abstract.c +++ b/Objects/abstract.c @@ -216,7 +216,7 @@ PyObject_DelItemString(PyObject *o, char *key) } /* We release the buffer right after use of this function which could - cause issues later on. Don't use these functions in new code. + cause issues later on. Don't use these functions in new code. */ int PyObject_AsCharBuffer(PyObject *obj, @@ -248,7 +248,7 @@ PyObject_AsCharBuffer(PyObject *obj, int PyObject_CheckReadBuffer(PyObject *obj) { - PyBufferProcs *pb = obj->ob_type->tp_as_buffer; + PyBufferProcs *pb = obj->ob_type->tp_as_buffer; if (pb == NULL || pb->bf_getbuffer == NULL) @@ -305,7 +305,7 @@ int PyObject_AsWriteBuffer(PyObject *obj, if (pb == NULL || pb->bf_getbuffer == NULL || ((*pb->bf_getbuffer)(obj, &view, PyBUF_WRITABLE) != 0)) { - PyErr_SetString(PyExc_TypeError, + PyErr_SetString(PyExc_TypeError, "expected an object with a writable buffer interface"); return -1; } @@ -323,8 +323,9 @@ int PyObject_GetBuffer(PyObject *obj, Py_buffer *view, int flags) { if (!PyObject_CheckBuffer(obj)) { - PyErr_SetString(PyExc_TypeError, - "object does not have the buffer interface"); + PyErr_Format(PyExc_TypeError, + "'%100s' does not have the buffer interface", + Py_Type(obj)->tp_name); return -1; } return (*(obj->ob_type->tp_as_buffer->bf_getbuffer))(obj, view, flags); @@ -333,7 +334,7 @@ PyObject_GetBuffer(PyObject *obj, Py_buffer *view, int flags) void PyObject_ReleaseBuffer(PyObject *obj, Py_buffer *view) { - if (obj->ob_type->tp_as_buffer != NULL && + if (obj->ob_type->tp_as_buffer != NULL && obj->ob_type->tp_as_buffer->bf_releasebuffer != NULL) { (*(obj->ob_type->tp_as_buffer->bf_releasebuffer))(obj, view); } @@ -345,7 +346,7 @@ _IsFortranContiguous(Py_buffer *view) { Py_ssize_t sd, dim; int i; - + if (view->ndim == 0) return 1; if (view->strides == NULL) return (view->ndim == 1); @@ -366,7 +367,7 @@ _IsCContiguous(Py_buffer *view) { Py_ssize_t sd, dim; int i; - + if (view->ndim == 0) return 1; if (view->strides == NULL) return 1; @@ -379,7 +380,7 @@ _IsCContiguous(Py_buffer *view) if (view->strides[i] != sd) return 0; sd *= dim; } - return 1; + return 1; } int @@ -390,7 +391,7 @@ PyBuffer_IsContiguous(Py_buffer *view, char fort) if (fort == 'C') return _IsCContiguous(view); - else if (fort == 'F') + else if (fort == 'F') return _IsFortranContiguous(view); else if (fort == 'A') return (_IsCContiguous(view) || _IsFortranContiguous(view)); @@ -398,7 +399,7 @@ PyBuffer_IsContiguous(Py_buffer *view, char fort) } -void* +void* PyBuffer_GetPointer(Py_buffer *view, Py_ssize_t *indices) { char* pointer; @@ -414,11 +415,11 @@ PyBuffer_GetPointer(Py_buffer *view, Py_ssize_t *indices) } -void +void _add_one_to_index_F(int nd, Py_ssize_t *index, Py_ssize_t *shape) { int k; - + for (k=0; k<nd; k++) { if (index[k] < shape[k]-1) { index[k]++; @@ -430,7 +431,7 @@ _add_one_to_index_F(int nd, Py_ssize_t *index, Py_ssize_t *shape) } } -void +void _add_one_to_index_C(int nd, Py_ssize_t *index, Py_ssize_t *shape) { int k; @@ -447,11 +448,11 @@ _add_one_to_index_C(int nd, Py_ssize_t *index, Py_ssize_t *shape) } /* view is not checked for consistency in either of these. It is - assumed that the size of the buffer is view->len in + assumed that the size of the buffer is view->len in view->len / view->itemsize elements. */ -int +int PyBuffer_ToContiguous(void *buf, Py_buffer *view, Py_ssize_t len, char fort) { int k; @@ -462,7 +463,7 @@ PyBuffer_ToContiguous(void *buf, Py_buffer *view, Py_ssize_t len, char fort) if (len > view->len) { len = view->len; } - + if (PyBuffer_IsContiguous(view, fort)) { /* simplest copy is all that is needed */ memcpy(buf, view->buf, len); @@ -470,7 +471,7 @@ PyBuffer_ToContiguous(void *buf, Py_buffer *view, Py_ssize_t len, char fort) } /* Otherwise a more elaborate scheme is needed */ - + /* XXX(nnorwitz): need to check for overflow! */ indices = (Py_ssize_t *)PyMem_Malloc(sizeof(Py_ssize_t)*(view->ndim)); if (indices == NULL) { @@ -480,7 +481,7 @@ PyBuffer_ToContiguous(void *buf, Py_buffer *view, Py_ssize_t len, char fort) for (k=0; k<view->ndim;k++) { indices[k] = 0; } - + if (fort == 'F') { addone = _add_one_to_index_F; } @@ -489,7 +490,7 @@ PyBuffer_ToContiguous(void *buf, Py_buffer *view, Py_ssize_t len, char fort) } dest = buf; /* XXX : This is not going to be the fastest code in the world - several optimizations are possible. + several optimizations are possible. */ elements = len / view->itemsize; while (elements--) { @@ -497,7 +498,7 @@ PyBuffer_ToContiguous(void *buf, Py_buffer *view, Py_ssize_t len, char fort) ptr = PyBuffer_GetPointer(view, indices); memcpy(dest, ptr, view->itemsize); dest += view->itemsize; - } + } PyMem_Free(indices); return 0; } @@ -521,7 +522,7 @@ PyBuffer_FromContiguous(Py_buffer *view, void *buf, Py_ssize_t len, char fort) } /* Otherwise a more elaborate scheme is needed */ - + /* XXX(nnorwitz): need to check for overflow! */ indices = (Py_ssize_t *)PyMem_Malloc(sizeof(Py_ssize_t)*(view->ndim)); if (indices == NULL) { @@ -531,7 +532,7 @@ PyBuffer_FromContiguous(Py_buffer *view, void *buf, Py_ssize_t len, char fort) for (k=0; k<view->ndim;k++) { indices[k] = 0; } - + if (fort == 'F') { addone = _add_one_to_index_F; } @@ -540,7 +541,7 @@ PyBuffer_FromContiguous(Py_buffer *view, void *buf, Py_ssize_t len, char fort) } src = buf; /* XXX : This is not going to be the fastest code in the world - several optimizations are possible. + several optimizations are possible. */ elements = len / view->itemsize; while (elements--) { @@ -549,12 +550,12 @@ PyBuffer_FromContiguous(Py_buffer *view, void *buf, Py_ssize_t len, char fort) memcpy(ptr, src, view->itemsize); src += view->itemsize; } - + PyMem_Free(indices); return 0; } -int PyObject_CopyData(PyObject *dest, PyObject *src) +int PyObject_CopyData(PyObject *dest, PyObject *src) { Py_buffer view_dest, view_src; int k; @@ -576,16 +577,16 @@ int PyObject_CopyData(PyObject *dest, PyObject *src) } if (view_dest.len < view_src.len) { - PyErr_SetString(PyExc_BufferError, + PyErr_SetString(PyExc_BufferError, "destination is too small to receive data from source"); PyObject_ReleaseBuffer(dest, &view_dest); PyObject_ReleaseBuffer(src, &view_src); return -1; } - if ((PyBuffer_IsContiguous(&view_dest, 'C') && + if ((PyBuffer_IsContiguous(&view_dest, 'C') && PyBuffer_IsContiguous(&view_src, 'C')) || - (PyBuffer_IsContiguous(&view_dest, 'F') && + (PyBuffer_IsContiguous(&view_dest, 'F') && PyBuffer_IsContiguous(&view_src, 'F'))) { /* simplest copy is all that is needed */ memcpy(view_dest.buf, view_src.buf, view_src.len); @@ -595,7 +596,7 @@ int PyObject_CopyData(PyObject *dest, PyObject *src) } /* Otherwise a more elaborate copy scheme is needed */ - + /* XXX(nnorwitz): need to check for overflow! */ indices = (Py_ssize_t *)PyMem_Malloc(sizeof(Py_ssize_t)*view_src.ndim); if (indices == NULL) { @@ -606,7 +607,7 @@ int PyObject_CopyData(PyObject *dest, PyObject *src) } for (k=0; k<view_src.ndim;k++) { indices[k] = 0; - } + } elements = 1; for (k=0; k<view_src.ndim; k++) { /* XXX(nnorwitz): can this overflow? */ @@ -617,7 +618,7 @@ int PyObject_CopyData(PyObject *dest, PyObject *src) dptr = PyBuffer_GetPointer(&view_dest, indices); sptr = PyBuffer_GetPointer(&view_src, indices); memcpy(dptr, sptr, view_src.itemsize); - } + } PyMem_Free(indices); PyObject_ReleaseBuffer(dest, &view_dest); PyObject_ReleaseBuffer(src, &view_src); @@ -631,13 +632,13 @@ PyBuffer_FillContiguousStrides(int nd, Py_ssize_t *shape, { int k; Py_ssize_t sd; - + sd = itemsize; if (fort == 'F') { for (k=0; k<nd; k++) { strides[k] = sd; sd *= shape[k]; - } + } } else { for (k=nd-1; k>=0; k--) { @@ -651,11 +652,11 @@ PyBuffer_FillContiguousStrides(int nd, Py_ssize_t *shape, int PyBuffer_FillInfo(Py_buffer *view, void *buf, Py_ssize_t len, int readonly, int flags) -{ +{ if (view == NULL) return 0; - if (((flags & PyBUF_LOCK) == PyBUF_LOCK) && + if (((flags & PyBUF_LOCK) == PyBUF_LOCK) && readonly >= 0) { - PyErr_SetString(PyExc_BufferError, + PyErr_SetString(PyExc_BufferError, "Cannot lock this object."); return -1; } @@ -665,13 +666,13 @@ PyBuffer_FillInfo(Py_buffer *view, void *buf, Py_ssize_t len, "Object is not writable."); return -1; } - + view->buf = buf; view->len = len; view->readonly = readonly; view->itemsize = 1; view->format = NULL; - if ((flags & PyBUF_FORMAT) == PyBUF_FORMAT) + if ((flags & PyBUF_FORMAT) == PyBUF_FORMAT) view->format = "B"; view->ndim = 1; view->shape = NULL; @@ -1143,9 +1144,9 @@ PyNumber_Absolute(PyObject *o) return type_error("bad operand type for abs(): '%.200s'", o); } -/* Return a Python Int or Long from the object item +/* Return a Python Int or Long from the object item Raise TypeError if the result is not an int-or-long - or if the object cannot be interpreted as an index. + or if the object cannot be interpreted as an index. */ PyObject * PyNumber_Index(PyObject *item) @@ -1193,19 +1194,19 @@ PyNumber_AsSsize_t(PyObject *item, PyObject *err) goto finish; /* Error handling code -- only manage OverflowError differently */ - if (!PyErr_GivenExceptionMatches(runerr, PyExc_OverflowError)) + if (!PyErr_GivenExceptionMatches(runerr, PyExc_OverflowError)) goto finish; PyErr_Clear(); - /* If no error-handling desired then the default clipping + /* If no error-handling desired then the default clipping is sufficient. */ if (!err) { assert(PyLong_Check(value)); - /* Whether or not it is less than or equal to + /* Whether or not it is less than or equal to zero is determined by the sign of ob_size */ - if (_PyLong_Sign(value) < 0) + if (_PyLong_Sign(value) < 0) result = PY_SSIZE_T_MIN; else result = PY_SSIZE_T_MAX; @@ -1213,10 +1214,10 @@ PyNumber_AsSsize_t(PyObject *item, PyObject *err) else { /* Otherwise replace the error with caller's error object. */ PyErr_Format(err, - "cannot fit '%.200s' into an index-sized integer", - item->ob_type->tp_name); + "cannot fit '%.200s' into an index-sized integer", + item->ob_type->tp_name); } - + finish: Py_DECREF(value); return result; @@ -1679,7 +1680,7 @@ PySequence_Tuple(PyObject *v) if (j >= n) { Py_ssize_t oldn = n; /* The over-allocation strategy can grow a bit faster - than for lists because unlike lists the + than for lists because unlike lists the over-allocation isn't permanent -- we reclaim the excess before the end of this routine. So, grow by ten and then add 25%. @@ -1690,7 +1691,7 @@ PySequence_Tuple(PyObject *v) /* Check for overflow */ PyErr_NoMemory(); Py_DECREF(item); - goto Fail; + goto Fail; } if (_PyTuple_Resize(&result, n) != 0) { Py_DECREF(item); @@ -2147,7 +2148,7 @@ PyObject_CallMethod(PyObject *o, char *name, char *format, ...) } if (!PyCallable_Check(func)) { - type_error("attribute of type '%.200s' is not callable", func); + type_error("attribute of type '%.200s' is not callable", func); goto exit; } @@ -2186,7 +2187,7 @@ _PyObject_CallMethod_SizeT(PyObject *o, char *name, char *format, ...) } if (!PyCallable_Check(func)) { - type_error("attribute of type '%.200s' is not callable", func); + type_error("attribute of type '%.200s' is not callable", func); goto exit; } diff --git a/Objects/bytesobject.c b/Objects/bytesobject.c index 3f2dbc2..b28cacf 100644 --- a/Objects/bytesobject.c +++ b/Objects/bytesobject.c @@ -1,7 +1,5 @@ /* Bytes object implementation */ -/* XXX TO DO: optimizations */ - #define PY_SSIZE_T_CLEAN #include "Python.h" #include "structmember.h" @@ -214,26 +212,21 @@ PyBytes_Concat(PyObject *a, PyObject *b) { Py_ssize_t size; Py_buffer va, vb; - PyBytesObject *result; + PyBytesObject *result = NULL; va.len = -1; vb.len = -1; if (_getbuffer(a, &va) < 0 || _getbuffer(b, &vb) < 0) { - if (va.len != -1) - PyObject_ReleaseBuffer(a, &va); - if (vb.len != -1) - PyObject_ReleaseBuffer(b, &vb); PyErr_Format(PyExc_TypeError, "can't concat %.100s to %.100s", Py_Type(a)->tp_name, Py_Type(b)->tp_name); - return NULL; + goto done; } size = va.len + vb.len; if (size < 0) { - PyObject_ReleaseBuffer(a, &va); - PyObject_ReleaseBuffer(b, &vb); return PyErr_NoMemory(); + goto done; } result = (PyBytesObject *) PyBytes_FromStringAndSize(NULL, size); @@ -242,8 +235,11 @@ PyBytes_Concat(PyObject *a, PyObject *b) memcpy(result->ob_bytes + va.len, vb.buf, vb.len); } - PyObject_ReleaseBuffer(a, &va); - PyObject_ReleaseBuffer(b, &vb); + done: + if (va.len != -1) + PyObject_ReleaseBuffer(a, &va); + if (vb.len != -1) + PyObject_ReleaseBuffer(b, &vb); return (PyObject *)result; } @@ -256,12 +252,6 @@ bytes_length(PyBytesObject *self) } static PyObject * -bytes_concat(PyBytesObject *self, PyObject *other) -{ - return PyBytes_Concat((PyObject *)self, other); -} - -static PyObject * bytes_iconcat(PyBytesObject *self, PyObject *other) { Py_ssize_t mysize; @@ -351,51 +341,13 @@ bytes_irepeat(PyBytesObject *self, Py_ssize_t count) return (PyObject *)self; } -static int -bytes_substring(PyBytesObject *self, PyBytesObject *other) -{ - Py_ssize_t i; - - if (Py_Size(other) == 1) { - return memchr(self->ob_bytes, other->ob_bytes[0], - Py_Size(self)) != NULL; - } - if (Py_Size(other) == 0) - return 1; /* Edge case */ - for (i = 0; i + Py_Size(other) <= Py_Size(self); i++) { - /* XXX Yeah, yeah, lots of optimizations possible... */ - if (memcmp(self->ob_bytes + i, other->ob_bytes, Py_Size(other)) == 0) - return 1; - } - return 0; -} - -static int -bytes_contains(PyBytesObject *self, PyObject *value) -{ - Py_ssize_t ival; - - if (PyBytes_Check(value)) - return bytes_substring(self, (PyBytesObject *)value); - - ival = PyNumber_AsSsize_t(value, PyExc_ValueError); - if (ival == -1 && PyErr_Occurred()) - return -1; - if (ival < 0 || ival >= 256) { - PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)"); - return -1; - } - - return memchr(self->ob_bytes, ival, Py_Size(self)) != NULL; -} - static PyObject * bytes_getitem(PyBytesObject *self, Py_ssize_t i) { if (i < 0) i += Py_Size(self); if (i < 0 || i >= Py_Size(self)) { - PyErr_SetString(PyExc_IndexError, "bytes index out of range"); + PyErr_SetString(PyExc_IndexError, "buffer index out of range"); return NULL; } return PyInt_FromLong((unsigned char)(self->ob_bytes[i])); @@ -414,7 +366,7 @@ bytes_subscript(PyBytesObject *self, PyObject *item) i += PyBytes_GET_SIZE(self); if (i < 0 || i >= Py_Size(self)) { - PyErr_SetString(PyExc_IndexError, "bytes index out of range"); + PyErr_SetString(PyExc_IndexError, "buffer index out of range"); return NULL; } return PyInt_FromLong((unsigned char)(self->ob_bytes[i])); @@ -451,7 +403,7 @@ bytes_subscript(PyBytesObject *self, PyObject *item) } } else { - PyErr_SetString(PyExc_TypeError, "bytes indices must be integers"); + PyErr_SetString(PyExc_TypeError, "buffer indices must be integers"); return NULL; } } @@ -551,7 +503,7 @@ bytes_setitem(PyBytesObject *self, Py_ssize_t i, PyObject *value) i += Py_Size(self); if (i < 0 || i >= Py_Size(self)) { - PyErr_SetString(PyExc_IndexError, "bytes index out of range"); + PyErr_SetString(PyExc_IndexError, "buffer index out of range"); return -1; } @@ -587,7 +539,7 @@ bytes_ass_subscript(PyBytesObject *self, PyObject *item, PyObject *values) i += PyBytes_GET_SIZE(self); if (i < 0 || i >= Py_Size(self)) { - PyErr_SetString(PyExc_IndexError, "bytes index out of range"); + PyErr_SetString(PyExc_IndexError, "buffer index out of range"); return -1; } @@ -619,7 +571,7 @@ bytes_ass_subscript(PyBytesObject *self, PyObject *item, PyObject *values) } } else { - PyErr_SetString(PyExc_TypeError, "bytes indices must be integer"); + PyErr_SetString(PyExc_TypeError, "buffer indices must be integer"); return -1; } @@ -772,13 +724,7 @@ bytes_init(PyBytesObject *self, PyObject *args, PyObject *kwds) encoded = PyCodec_Encode(arg, encoding, errors); if (encoded == NULL) return -1; - if (!PyBytes_Check(encoded) && !PyString_Check(encoded)) { - PyErr_Format(PyExc_TypeError, - "encoder did not return a str8 or bytes object (type=%.400s)", - Py_Type(encoded)->tp_name); - Py_DECREF(encoded); - return -1; - } + assert(PyString_Check(encoded)); new = bytes_iconcat(self, encoded); Py_DECREF(encoded); if (new == NULL) @@ -889,11 +835,15 @@ static PyObject * bytes_repr(PyBytesObject *self) { static const char *hexdigits = "0123456789abcdef"; - size_t newsize = 3 + 4 * Py_Size(self); + const char *quote_prefix = "buffer(b"; + const char *quote_postfix = ")"; + Py_ssize_t length = Py_Size(self); + /* 9 prefix + 2 postfix */ + size_t newsize = 11 + 4 * length; PyObject *v; - if (newsize > PY_SSIZE_T_MAX || newsize / 4 != Py_Size(self)) { + if (newsize > PY_SSIZE_T_MAX || newsize / 4 - 2 != length) { PyErr_SetString(PyExc_OverflowError, - "bytes object is too large to make repr"); + "buffer object is too large to make repr"); return NULL; } v = PyUnicode_FromUnicode(NULL, newsize); @@ -904,17 +854,36 @@ bytes_repr(PyBytesObject *self) register Py_ssize_t i; register Py_UNICODE c; register Py_UNICODE *p; - int quote = '\''; + int quote; + + /* Figure out which quote to use; single is preferred */ + quote = '\''; + { + char *test, *start; + start = PyBytes_AS_STRING(self); + for (test = start; test < start+length; ++test) { + if (*test == '"') { + quote = '\''; /* back to single */ + goto decided; + } + else if (*test == '\'') + quote = '"'; + } + decided: + ; + } p = PyUnicode_AS_UNICODE(v); - *p++ = 'b'; + while (*quote_prefix) + *p++ = *quote_prefix++; *p++ = quote; - for (i = 0; i < Py_Size(self); i++) { + + for (i = 0; i < length; i++) { /* There's at least enough room for a hex escape and a closing quote. */ assert(newsize - (p - PyUnicode_AS_UNICODE(v)) >= 5); c = self->ob_bytes[i]; - if (c == quote || c == '\\') + if (c == '\'' || c == '\\') *p++ = '\\', *p++ = c; else if (c == '\t') *p++ = '\\', *p++ = 't'; @@ -935,6 +904,9 @@ bytes_repr(PyBytesObject *self) } assert(newsize - (p - PyUnicode_AS_UNICODE(v)) >= 1); *p++ = quote; + while (*quote_postfix) { + *p++ = *quote_postfix++; + } *p = '\0'; if (PyUnicode_Resize(&v, (p - PyUnicode_AS_UNICODE(v)))) { Py_DECREF(v); @@ -945,9 +917,14 @@ bytes_repr(PyBytesObject *self) } static PyObject * -bytes_str(PyBytesObject *self) +bytes_str(PyObject *op) { - return PyString_FromStringAndSize(self->ob_bytes, Py_Size(self)); + if (Py_BytesWarningFlag) { + if (PyErr_WarnEx(PyExc_BytesWarning, + "str() on a buffer instance", 1)) + return NULL; + } + return bytes_repr((PyBytesObject*)op); } static PyObject * @@ -964,6 +941,12 @@ bytes_richcompare(PyObject *self, PyObject *other, int op) error, even if the comparison is for equality. */ if (PyObject_IsInstance(self, (PyObject*)&PyUnicode_Type) || PyObject_IsInstance(other, (PyObject*)&PyUnicode_Type)) { + if (Py_BytesWarningFlag && op == Py_EQ) { + if (PyErr_WarnEx(PyExc_BytesWarning, + "Comparsion between buffer and string", 1)) + return NULL; + } + Py_INCREF(Py_NotImplemented); return Py_NotImplemented; } @@ -1112,7 +1095,7 @@ bytes_find(PyBytesObject *self, PyObject *args) } PyDoc_STRVAR(count__doc__, -"B.count(sub[, start[, end]]) -> int\n\ +"B.count(sub [,start [,end]]) -> int\n\ \n\ Return the number of non-overlapping occurrences of subsection sub in\n\ bytes B[start:end]. Optional arguments start and end are interpreted\n\ @@ -1203,6 +1186,30 @@ bytes_rindex(PyBytesObject *self, PyObject *args) } +static int +bytes_contains(PyObject *self, PyObject *arg) +{ + Py_ssize_t ival = PyNumber_AsSsize_t(arg, PyExc_ValueError); + if (ival == -1 && PyErr_Occurred()) { + Py_buffer varg; + int pos; + PyErr_Clear(); + if (_getbuffer(arg, &varg) < 0) + return -1; + pos = stringlib_find(PyBytes_AS_STRING(self), Py_Size(self), + varg.buf, varg.len, 0); + PyObject_ReleaseBuffer(arg, &varg); + return pos >= 0; + } + if (ival < 0 || ival >= 256) { + PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)"); + return -1; + } + + return memchr(PyBytes_AS_STRING(self), ival, Py_Size(self)) != NULL; +} + + /* Matches the end (direction >= 0) or start (direction < 0) of self * against substr, using the start and end arguments. Returns * -1 on error, 0 if not found and 1 if found. @@ -1247,7 +1254,7 @@ done: PyDoc_STRVAR(startswith__doc__, -"B.startswith(prefix[, start[, end]]) -> bool\n\ +"B.startswith(prefix [,start [,end]]) -> bool\n\ \n\ Return True if B starts with the specified prefix, False otherwise.\n\ With optional start, test B beginning at that position.\n\ @@ -1287,7 +1294,7 @@ bytes_startswith(PyBytesObject *self, PyObject *args) } PyDoc_STRVAR(endswith__doc__, -"B.endswith(suffix[, start[, end]]) -> bool\n\ +"B.endswith(suffix [,start [,end]]) -> bool\n\ \n\ Return True if B ends with the specified suffix, False otherwise.\n\ With optional start, test B beginning at that position.\n\ @@ -1328,12 +1335,12 @@ bytes_endswith(PyBytesObject *self, PyObject *args) PyDoc_STRVAR(translate__doc__, -"B.translate(table [,deletechars]) -> bytes\n\ +"B.translate(table[, deletechars]) -> buffer\n\ \n\ -Return a copy of the bytes B, where all characters occurring\n\ -in the optional argument deletechars are removed, and the\n\ -remaining characters have been mapped through the given\n\ -translation table, which must be a bytes of length 256."); +Return a copy of B, where all characters occurring in the\n\ +optional argument deletechars are removed, and the remaining\n\ +characters have been mapped through the given translation\n\ +table, which must be a bytes object of length 256."); static PyObject * bytes_translate(PyBytesObject *self, PyObject *args) @@ -2026,9 +2033,9 @@ replace(PyBytesObject *self, PyDoc_STRVAR(replace__doc__, -"B.replace (old, new[, count]) -> bytes\n\ +"B.replace(old, new[, count]) -> bytes\n\ \n\ -Return a copy of bytes B with all occurrences of subsection\n\ +Return a copy of B with all occurrences of subsection\n\ old replaced by new. If the optional argument count is\n\ given, only the first count occurrences are replaced."); @@ -2149,23 +2156,23 @@ split_whitespace(const char *s, Py_ssize_t len, Py_ssize_t maxcount) return NULL; for (i = j = 0; i < len; ) { - /* find a token */ - while (i < len && ISSPACE(s[i])) - i++; - j = i; - while (i < len && !ISSPACE(s[i])) - i++; - if (j < i) { - if (maxcount-- <= 0) - break; - SPLIT_ADD(s, j, i); - while (i < len && ISSPACE(s[i])) - i++; - j = i; - } + /* find a token */ + while (i < len && ISSPACE(s[i])) + i++; + j = i; + while (i < len && !ISSPACE(s[i])) + i++; + if (j < i) { + if (maxcount-- <= 0) + break; + SPLIT_ADD(s, j, i); + while (i < len && ISSPACE(s[i])) + i++; + j = i; + } } if (j < len) { - SPLIT_ADD(s, j, len); + SPLIT_ADD(s, j, len); } FIX_PREALLOC_SIZE(list); return list; @@ -2176,10 +2183,10 @@ split_whitespace(const char *s, Py_ssize_t len, Py_ssize_t maxcount) } PyDoc_STRVAR(split__doc__, -"B.split([sep [, maxsplit]]) -> list of bytes\n\ +"B.split([sep[, maxsplit]]) -> list of buffer\n\ \n\ -Return a list of the bytes in the string B, using sep as the delimiter.\n\ -If sep is not given, B is split on ASCII whitespace charcters\n\ +Return a list of the sections in B, using sep as the delimiter.\n\ +If sep is not given, B is split on ASCII whitespace characters\n\ (space, tab, return, newline, formfeed, vertical tab).\n\ If maxsplit is given, at most maxsplit splits are done."); @@ -2255,12 +2262,37 @@ bytes_split(PyBytesObject *self, PyObject *args) return NULL; } +/* stringlib's partition shares nullbytes in some cases. + undo this, we don't want the nullbytes to be shared. */ +static PyObject * +make_nullbytes_unique(PyObject *result) +{ + if (result != NULL) { + int i; + assert(PyTuple_Check(result)); + assert(PyTuple_GET_SIZE(result) == 3); + for (i = 0; i < 3; i++) { + if (PyTuple_GET_ITEM(result, i) == (PyObject *)nullbytes) { + PyObject *new = PyBytes_FromStringAndSize(NULL, 0); + if (new == NULL) { + Py_DECREF(result); + result = NULL; + break; + } + Py_DECREF(nullbytes); + PyTuple_SET_ITEM(result, i, new); + } + } + } + return result; +} + PyDoc_STRVAR(partition__doc__, "B.partition(sep) -> (head, sep, tail)\n\ \n\ Searches for the separator sep in B, and returns the part before it,\n\ the separator itself, and the part after it. If the separator is not\n\ -found, returns B and two empty bytes."); +found, returns B and two empty buffer."); static PyObject * bytes_partition(PyBytesObject *self, PyObject *sep_obj) @@ -2279,15 +2311,16 @@ bytes_partition(PyBytesObject *self, PyObject *sep_obj) ); Py_DECREF(bytesep); - return result; + return make_nullbytes_unique(result); } PyDoc_STRVAR(rpartition__doc__, "B.rpartition(sep) -> (tail, sep, head)\n\ \n\ -Searches for the separator sep in B, starting at the end of B, and returns\n\ -the part before it, the separator itself, and the part after it. If the\n\ -separator is not found, returns two empty bytes and B."); +Searches for the separator sep in B, starting at the end of B,\n\ +and returns the part before it, the separator itself, and the\n\ +part after it. If the separator is not found, returns two empty\n\ +buffer objects and B."); static PyObject * bytes_rpartition(PyBytesObject *self, PyObject *sep_obj) @@ -2306,7 +2339,7 @@ bytes_rpartition(PyBytesObject *self, PyObject *sep_obj) ); Py_DECREF(bytesep); - return result; + return make_nullbytes_unique(result); } Py_LOCAL_INLINE(PyObject *) @@ -2354,23 +2387,23 @@ rsplit_whitespace(const char *s, Py_ssize_t len, Py_ssize_t maxcount) return NULL; for (i = j = len - 1; i >= 0; ) { - /* find a token */ - while (i >= 0 && Py_UNICODE_ISSPACE(s[i])) - i--; - j = i; - while (i >= 0 && !Py_UNICODE_ISSPACE(s[i])) - i--; - if (j > i) { - if (maxcount-- <= 0) - break; - SPLIT_ADD(s, i + 1, j + 1); - while (i >= 0 && Py_UNICODE_ISSPACE(s[i])) - i--; - j = i; - } + /* find a token */ + while (i >= 0 && Py_UNICODE_ISSPACE(s[i])) + i--; + j = i; + while (i >= 0 && !Py_UNICODE_ISSPACE(s[i])) + i--; + if (j > i) { + if (maxcount-- <= 0) + break; + SPLIT_ADD(s, i + 1, j + 1); + while (i >= 0 && Py_UNICODE_ISSPACE(s[i])) + i--; + j = i; + } } if (j >= 0) { - SPLIT_ADD(s, 0, j + 1); + SPLIT_ADD(s, 0, j + 1); } FIX_PREALLOC_SIZE(list); if (PyList_Reverse(list) < 0) @@ -2384,10 +2417,10 @@ rsplit_whitespace(const char *s, Py_ssize_t len, Py_ssize_t maxcount) } PyDoc_STRVAR(rsplit__doc__, -"B.rsplit(sep [,maxsplit]) -> list of bytes\n\ +"B.rsplit(sep[, maxsplit]) -> list of buffer\n\ \n\ -Return a list of the sections in the byte B, using sep as the delimiter,\n\ -starting at the end of the bytes and working to the front.\n\ +Return a list of the sections in B, using sep as the delimiter,\n\ +starting at the end of B and working to the front.\n\ If sep is not given, B is split on ASCII whitespace characters\n\ (space, tab, return, newline, formfeed, vertical tab).\n\ If maxsplit is given, at most maxsplit splits are done."); @@ -2458,7 +2491,7 @@ PyDoc_STRVAR(extend__doc__, "B.extend(iterable int) -> None\n\ \n\ Append all the elements from the iterator or sequence to the\n\ -end of the bytes."); +end of B."); static PyObject * bytes_extend(PyBytesObject *self, PyObject *arg) { @@ -2475,7 +2508,7 @@ bytes_extend(PyBytesObject *self, PyObject *arg) PyDoc_STRVAR(reverse__doc__, "B.reverse() -> None\n\ \n\ -Reverse the order of the values in bytes in place."); +Reverse the order of the values in B in place."); static PyObject * bytes_reverse(PyBytesObject *self, PyObject *unused) { @@ -2497,7 +2530,7 @@ bytes_reverse(PyBytesObject *self, PyObject *unused) PyDoc_STRVAR(insert__doc__, "B.insert(index, int) -> None\n\ \n\ -Insert a single item into the bytes before the given index."); +Insert a single item into the buffer before the given index."); static PyObject * bytes_insert(PyBytesObject *self, PyObject *args) { @@ -2536,7 +2569,7 @@ bytes_insert(PyBytesObject *self, PyObject *args) PyDoc_STRVAR(append__doc__, "B.append(int) -> None\n\ \n\ -Append a single item to the end of the bytes."); +Append a single item to the end of B."); static PyObject * bytes_append(PyBytesObject *self, PyObject *arg) { @@ -2561,7 +2594,7 @@ bytes_append(PyBytesObject *self, PyObject *arg) PyDoc_STRVAR(pop__doc__, "B.pop([index]) -> int\n\ \n\ -Remove and return a single item from the bytes. If no index\n\ +Remove and return a single item from B. If no index\n\ argument is give, will pop the last value."); static PyObject * bytes_pop(PyBytesObject *self, PyObject *args) @@ -2595,7 +2628,7 @@ bytes_pop(PyBytesObject *self, PyObject *args) PyDoc_STRVAR(remove__doc__, "B.remove(int) -> None\n\ \n\ -Remove the first occurance of a value in bytes"); +Remove the first occurance of a value in B."); static PyObject * bytes_remove(PyBytesObject *self, PyObject *arg) { @@ -2644,7 +2677,7 @@ rstrip_helper(unsigned char *myptr, Py_ssize_t mysize, } PyDoc_STRVAR(strip__doc__, -"B.strip([bytes]) -> bytes\n\ +"B.strip([bytes]) -> buffer\n\ \n\ Strip leading and trailing bytes contained in the argument.\n\ If the argument is omitted, strip ASCII whitespace."); @@ -2662,10 +2695,10 @@ bytes_strip(PyBytesObject *self, PyObject *args) argsize = 6; } else { - if (_getbuffer(arg, &varg) < 0) - return NULL; - argptr = varg.buf; - argsize = varg.len; + if (_getbuffer(arg, &varg) < 0) + return NULL; + argptr = varg.buf; + argsize = varg.len; } myptr = self->ob_bytes; mysize = Py_Size(self); @@ -2675,12 +2708,12 @@ bytes_strip(PyBytesObject *self, PyObject *args) else right = rstrip_helper(myptr, mysize, argptr, argsize); if (arg != Py_None) - PyObject_ReleaseBuffer(arg, &varg); + PyObject_ReleaseBuffer(arg, &varg); return PyBytes_FromStringAndSize(self->ob_bytes + left, right - left); } PyDoc_STRVAR(lstrip__doc__, -"B.lstrip([bytes]) -> bytes\n\ +"B.lstrip([bytes]) -> buffer\n\ \n\ Strip leading bytes contained in the argument.\n\ If the argument is omitted, strip leading ASCII whitespace."); @@ -2698,22 +2731,22 @@ bytes_lstrip(PyBytesObject *self, PyObject *args) argsize = 6; } else { - if (_getbuffer(arg, &varg) < 0) - return NULL; - argptr = varg.buf; - argsize = varg.len; + if (_getbuffer(arg, &varg) < 0) + return NULL; + argptr = varg.buf; + argsize = varg.len; } myptr = self->ob_bytes; mysize = Py_Size(self); left = lstrip_helper(myptr, mysize, argptr, argsize); right = mysize; if (arg != Py_None) - PyObject_ReleaseBuffer(arg, &varg); + PyObject_ReleaseBuffer(arg, &varg); return PyBytes_FromStringAndSize(self->ob_bytes + left, right - left); } PyDoc_STRVAR(rstrip__doc__, -"B.rstrip([bytes]) -> bytes\n\ +"B.rstrip([bytes]) -> buffer\n\ \n\ Strip trailing bytes contained in the argument.\n\ If the argument is omitted, strip trailing ASCII whitespace."); @@ -2731,27 +2764,27 @@ bytes_rstrip(PyBytesObject *self, PyObject *args) argsize = 6; } else { - if (_getbuffer(arg, &varg) < 0) - return NULL; - argptr = varg.buf; - argsize = varg.len; + if (_getbuffer(arg, &varg) < 0) + return NULL; + argptr = varg.buf; + argsize = varg.len; } myptr = self->ob_bytes; mysize = Py_Size(self); left = 0; right = rstrip_helper(myptr, mysize, argptr, argsize); if (arg != Py_None) - PyObject_ReleaseBuffer(arg, &varg); + PyObject_ReleaseBuffer(arg, &varg); return PyBytes_FromStringAndSize(self->ob_bytes + left, right - left); } PyDoc_STRVAR(decode_doc, -"B.decode([encoding[,errors]]) -> unicode obect.\n\ +"B.decode([encoding[, errors]]) -> unicode object.\n\ \n\ Decodes B using the codec registered for encoding. encoding defaults\n\ to the default encoding. errors may be given to set a different error\n\ -handling scheme. Default is 'strict' meaning that encoding errors raise\n\ -a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'\n\ +handling scheme. Default is 'strict' meaning that encoding errors raise\n\ +a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'\n\ as well as any other name registerd with codecs.register_error that is\n\ able to handle UnicodeDecodeErrors."); @@ -2782,8 +2815,7 @@ bytes_alloc(PyBytesObject *self) PyDoc_STRVAR(join_doc, "B.join(iterable_of_bytes) -> bytes\n\ \n\ -Concatenates any number of bytes objects, with B in between each pair.\n\ -Example: b'.'.join([b'ab', b'pq', b'rs']) -> b'ab.pq.rs'."); +Concatenates any number of buffer objects, with B in between each pair."); static PyObject * bytes_join(PyBytesObject *self, PyObject *it) @@ -2804,9 +2836,10 @@ bytes_join(PyBytesObject *self, PyObject *it) items = PySequence_Fast_ITEMS(seq); /* Compute the total size, and check that they are all bytes */ + /* XXX Shouldn't we use _getbuffer() on these items instead? */ for (i = 0; i < n; i++) { PyObject *obj = items[i]; - if (!PyBytes_Check(obj)) { + if (!PyBytes_Check(obj) && !PyString_Check(obj)) { PyErr_Format(PyExc_TypeError, "can only join an iterable of bytes " "(item %ld has type '%.100s')", @@ -2816,7 +2849,7 @@ bytes_join(PyBytesObject *self, PyObject *it) } if (i > 0) totalsize += mysize; - totalsize += PyBytes_GET_SIZE(obj); + totalsize += Py_Size(obj); if (totalsize < 0) { PyErr_NoMemory(); goto error; @@ -2830,12 +2863,17 @@ bytes_join(PyBytesObject *self, PyObject *it) dest = PyBytes_AS_STRING(result); for (i = 0; i < n; i++) { PyObject *obj = items[i]; - Py_ssize_t size = PyBytes_GET_SIZE(obj); - if (i > 0) { + Py_ssize_t size = Py_Size(obj); + char *buf; + if (PyBytes_Check(obj)) + buf = PyBytes_AS_STRING(obj); + else + buf = PyString_AS_STRING(obj); + if (i) { memcpy(dest, self->ob_bytes, mysize); dest += mysize; } - memcpy(dest, PyBytes_AS_STRING(obj), size); + memcpy(dest, buf, size); dest += size; } @@ -2850,11 +2888,11 @@ bytes_join(PyBytesObject *self, PyObject *it) } PyDoc_STRVAR(fromhex_doc, -"bytes.fromhex(string) -> bytes\n\ +"buffer.fromhex(string) -> buffer\n\ \n\ -Create a bytes object from a string of hexadecimal numbers.\n\ -Spaces between two numbers are accepted. Example:\n\ -bytes.fromhex('10 1112') -> b'\\x10\\x11\\x12'."); +Create a buffer object from a string of hexadecimal numbers.\n\ +Spaces between two numbers are accepted.\n\ +Example: buffer.fromhex('B9 01EF') -> buffer(b'\\xb9\\x01\\xef')."); static int hex_digit_to_int(Py_UNICODE c) @@ -2940,7 +2978,7 @@ bytes_reduce(PyBytesObject *self) static PySequenceMethods bytes_as_sequence = { (lenfunc)bytes_length, /* sq_length */ - (binaryfunc)bytes_concat, /* sq_concat */ + (binaryfunc)PyBytes_Concat, /* sq_concat */ (ssizeargfunc)bytes_repeat, /* sq_repeat */ (ssizeargfunc)bytes_getitem, /* sq_item */ 0, /* sq_slice */ @@ -3027,15 +3065,27 @@ bytes_methods[] = { }; PyDoc_STRVAR(bytes_doc, -"bytes([iterable]) -> new array of bytes.\n\ +"buffer(iterable_of_ints) -> buffer.\n\ +buffer(string, encoding[, errors]) -> buffer.\n\ +buffer(bytes_or_buffer) -> mutable copy of bytes_or_buffer.\n\ +buffer(memory_view) -> buffer.\n\ +\n\ +Construct an mutable buffer object from:\n\ + - an iterable yielding integers in range(256)\n\ + - a text string encoded using the specified encoding\n\ + - a bytes or a buffer object\n\ + - any object implementing the buffer API.\n\ \n\ -If an argument is given it must be an iterable yielding ints in range(256)."); +buffer(int) -> buffer.\n\ +\n\ +Construct a zero-initialized buffer of the given length."); + static PyObject *bytes_iter(PyObject *seq); PyTypeObject PyBytes_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) - "bytes", + "buffer", sizeof(PyBytesObject), 0, (destructor)bytes_dealloc, /* tp_dealloc */ @@ -3049,7 +3099,7 @@ PyTypeObject PyBytes_Type = { &bytes_as_mapping, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ - (reprfunc)bytes_str, /* tp_str */ + bytes_str, /* tp_str */ PyObject_GenericGetAttr, /* tp_getattro */ 0, /* tp_setattro */ &bytes_as_buffer, /* tp_as_buffer */ diff --git a/Objects/codeobject.c b/Objects/codeobject.c index b9a26ba..80c2df9 100644 --- a/Objects/codeobject.c +++ b/Objects/codeobject.c @@ -8,7 +8,7 @@ /* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */ static int -all_name_chars(unsigned char *s) +all_name_chars(Py_UNICODE *s) { static char ok_name_char[256]; static unsigned char *name_chars = (unsigned char *)NAME_CHARS; @@ -19,6 +19,8 @@ all_name_chars(unsigned char *s) ok_name_char[*p] = 1; } while (*s) { + if (*s >= 128) + return 0; if (ok_name_char[*s++] == 0) return 0; } @@ -73,11 +75,11 @@ PyCode_New(int argcount, int kwonlyargcount, /* Intern selected string constants */ for (i = PyTuple_Size(consts); --i >= 0; ) { PyObject *v = PyTuple_GetItem(consts, i); - if (!PyString_Check(v)) + if (!PyUnicode_Check(v)) continue; - if (!all_name_chars((unsigned char *)PyString_AS_STRING(v))) + if (!all_name_chars(PyUnicode_AS_UNICODE(v))) continue; - PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i)); + PyUnicode_InternInPlace(&PyTuple_GET_ITEM(consts, i)); } co = PyObject_NEW(PyCodeObject, &PyCode_Type); if (co != NULL) { @@ -202,7 +204,7 @@ code_new(PyTypeObject *type, PyObject *args, PyObject *kw) int firstlineno; PyObject *lnotab; - if (!PyArg_ParseTuple(args, "iiiiiSO!O!O!SSiS|O!O!:code", + if (!PyArg_ParseTuple(args, "iiiiiSO!O!O!UUiS|O!O!:code", &argcount, &kwonlyargcount, &nlocals, &stacksize, &flags, &code, diff --git a/Objects/exceptions.c b/Objects/exceptions.c index abe4bde..6ef765b 100644 --- a/Objects/exceptions.c +++ b/Objects/exceptions.c @@ -1045,14 +1045,14 @@ SimpleExtendsException(PyExc_ValueError, UnicodeError, "Unicode related error."); static PyObject * -get_bytes(PyObject *attr, const char *name) +get_string(PyObject *attr, const char *name) { if (!attr) { PyErr_Format(PyExc_TypeError, "%.200s attribute not set", name); return NULL; } - if (!PyBytes_Check(attr)) { + if (!PyString_Check(attr)) { PyErr_Format(PyExc_TypeError, "%.200s attribute must be bytes", name); return NULL; } @@ -1109,7 +1109,7 @@ PyUnicodeEncodeError_GetObject(PyObject *exc) PyObject * PyUnicodeDecodeError_GetObject(PyObject *exc) { - return get_bytes(((PyUnicodeErrorObject *)exc)->object, "object"); + return get_string(((PyUnicodeErrorObject *)exc)->object, "object"); } PyObject * @@ -1141,10 +1141,10 @@ int PyUnicodeDecodeError_GetStart(PyObject *exc, Py_ssize_t *start) { Py_ssize_t size; - PyObject *obj = get_bytes(((PyUnicodeErrorObject *)exc)->object, "object"); + PyObject *obj = get_string(((PyUnicodeErrorObject *)exc)->object, "object"); if (!obj) return -1; - size = PyBytes_GET_SIZE(obj); + size = PyString_GET_SIZE(obj); *start = ((PyUnicodeErrorObject *)exc)->start; if (*start<0) *start = 0; @@ -1209,10 +1209,10 @@ int PyUnicodeDecodeError_GetEnd(PyObject *exc, Py_ssize_t *end) { Py_ssize_t size; - PyObject *obj = get_bytes(((PyUnicodeErrorObject *)exc)->object, "object"); + PyObject *obj = get_string(((PyUnicodeErrorObject *)exc)->object, "object"); if (!obj) return -1; - size = PyBytes_GET_SIZE(obj); + size = PyString_GET_SIZE(obj); *end = ((PyUnicodeErrorObject *)exc)->end; if (*end<1) *end = 1; @@ -1299,31 +1299,6 @@ PyUnicodeTranslateError_SetReason(PyObject *exc, const char *reason) static int -UnicodeError_init(PyUnicodeErrorObject *self, PyObject *args, PyObject *kwds, - PyTypeObject *objecttype) -{ - Py_CLEAR(self->encoding); - Py_CLEAR(self->object); - Py_CLEAR(self->reason); - - if (!PyArg_ParseTuple(args, "O!O!nnO!", - &PyUnicode_Type, &self->encoding, - objecttype, &self->object, - &self->start, - &self->end, - &PyUnicode_Type, &self->reason)) { - self->encoding = self->object = self->reason = NULL; - return -1; - } - - Py_INCREF(self->encoding); - Py_INCREF(self->object); - Py_INCREF(self->reason); - - return 0; -} - -static int UnicodeError_clear(PyUnicodeErrorObject *self) { Py_CLEAR(self->encoding); @@ -1371,10 +1346,32 @@ static PyMemberDef UnicodeError_members[] = { static int UnicodeEncodeError_init(PyObject *self, PyObject *args, PyObject *kwds) { + PyUnicodeErrorObject *err; + if (BaseException_init((PyBaseExceptionObject *)self, args, kwds) == -1) return -1; - return UnicodeError_init((PyUnicodeErrorObject *)self, args, - kwds, &PyUnicode_Type); + + err = (PyUnicodeErrorObject *)self; + + Py_CLEAR(err->encoding); + Py_CLEAR(err->object); + Py_CLEAR(err->reason); + + if (!PyArg_ParseTuple(args, "O!O!nnO!", + &PyUnicode_Type, &err->encoding, + &PyUnicode_Type, &err->object, + &err->start, + &err->end, + &PyUnicode_Type, &err->reason)) { + err->encoding = err->object = err->reason = NULL; + return -1; + } + + Py_INCREF(err->encoding); + Py_INCREF(err->object); + Py_INCREF(err->reason); + + return 0; } static PyObject * @@ -1439,10 +1436,44 @@ PyUnicodeEncodeError_Create( static int UnicodeDecodeError_init(PyObject *self, PyObject *args, PyObject *kwds) { + PyUnicodeErrorObject *ude; + const char *data; + Py_ssize_t size; + if (BaseException_init((PyBaseExceptionObject *)self, args, kwds) == -1) return -1; - return UnicodeError_init((PyUnicodeErrorObject *)self, args, - kwds, &PyBytes_Type); + + ude = (PyUnicodeErrorObject *)self; + + Py_CLEAR(ude->encoding); + Py_CLEAR(ude->object); + Py_CLEAR(ude->reason); + + if (!PyArg_ParseTuple(args, "O!OnnO!", + &PyUnicode_Type, &ude->encoding, + &ude->object, + &ude->start, + &ude->end, + &PyUnicode_Type, &ude->reason)) { + ude->encoding = ude->object = ude->reason = NULL; + return -1; + } + + if (!PyString_Check(ude->object)) { + if (PyObject_AsReadBuffer(ude->object, (const void **)&data, &size)) { + ude->encoding = ude->object = ude->reason = NULL; + return -1; + } + ude->object = PyString_FromStringAndSize(data, size); + } + else { + Py_INCREF(ude->object); + } + + Py_INCREF(ude->encoding); + Py_INCREF(ude->reason); + + return 0; } static PyObject * @@ -1451,7 +1482,7 @@ UnicodeDecodeError_str(PyObject *self) PyUnicodeErrorObject *uself = (PyUnicodeErrorObject *)self; if (uself->end==uself->start+1) { - int byte = (int)(PyBytes_AS_STRING(((PyUnicodeErrorObject *)self)->object)[uself->start]&0xff); + int byte = (int)(PyString_AS_STRING(((PyUnicodeErrorObject *)self)->object)[uself->start]&0xff); return PyUnicode_FromFormat( "'%U' codec can't decode byte 0x%02x in position %zd: %U", ((PyUnicodeErrorObject *)self)->encoding, @@ -1709,6 +1740,14 @@ SimpleExtendsException(PyExc_Warning, UnicodeWarning, "Base class for warnings about Unicode related problems, mostly\n" "related to conversion problems."); +/* + * BytesWarning extends Warning + */ +SimpleExtendsException(PyExc_Warning, BytesWarning, + "Base class for warnings about bytes and buffer related problems, mostly\n" + "related to conversion from str or comparing to str."); + + /* Pre-computed MemoryError instance. Best to create this as early as * possible and not wait until a MemoryError is actually raised! @@ -1808,6 +1847,7 @@ _PyExc_Init(void) PRE_INIT(FutureWarning) PRE_INIT(ImportWarning) PRE_INIT(UnicodeWarning) + PRE_INIT(BytesWarning) bltinmod = PyImport_ImportModule("__builtin__"); if (bltinmod == NULL) @@ -1868,6 +1908,7 @@ _PyExc_Init(void) POST_INIT(FutureWarning) POST_INIT(ImportWarning) POST_INIT(UnicodeWarning) + POST_INIT(BytesWarning) PyExc_MemoryErrorInst = BaseException_new(&_PyExc_MemoryError, NULL, NULL); if (!PyExc_MemoryErrorInst) diff --git a/Objects/fileobject.c b/Objects/fileobject.c index 97c2756..c6c7d8e 100644 --- a/Objects/fileobject.c +++ b/Objects/fileobject.c @@ -146,7 +146,7 @@ PyFile_WriteObject(PyObject *v, PyObject *f, int flags) if (writer == NULL) return -1; if (flags & Py_PRINT_RAW) { - value = _PyObject_Str(v); + value = PyObject_Str(v); } else value = PyObject_Repr(v); diff --git a/Objects/longobject.c b/Objects/longobject.c index 8ebc31c..d827e7e 100644 --- a/Objects/longobject.c +++ b/Objects/longobject.c @@ -3462,14 +3462,22 @@ long_new(PyTypeObject *type, PyObject *args, PyObject *kwds) return PyLong_FromLong(0L); if (base == -909) return PyNumber_Long(x); - else if (PyBytes_Check(x)) { + else if (PyUnicode_Check(x)) + return PyLong_FromUnicode(PyUnicode_AS_UNICODE(x), + PyUnicode_GET_SIZE(x), + base); + else if (PyBytes_Check(x) || PyString_Check(x)) { /* Since PyLong_FromString doesn't have a length parameter, * check here for possible NULs in the string. */ - char *string = PyBytes_AS_STRING(x); - int size = PyBytes_GET_SIZE(x); + char *string; + int size = Py_Size(x); + if (PyBytes_Check(x)) + string = PyBytes_AS_STRING(x); + else + string = PyString_AS_STRING(x); if (strlen(string) != size) { /* We only see this if there's a null byte in x, - x is a str8 or a bytes, *and* a base is given. */ + x is a bytes or buffer, *and* a base is given. */ PyErr_Format(PyExc_ValueError, "invalid literal for int() with base %d: %R", base, x); @@ -3477,10 +3485,6 @@ long_new(PyTypeObject *type, PyObject *args, PyObject *kwds) } return PyLong_FromString(string, NULL, base); } - else if (PyUnicode_Check(x)) - return PyLong_FromUnicode(PyUnicode_AS_UNICODE(x), - PyUnicode_GET_SIZE(x), - base); else { PyErr_SetString(PyExc_TypeError, "int() can't convert non-string with explicit base"); diff --git a/Objects/moduleobject.c b/Objects/moduleobject.c index 13c1ab4..b8b2b8e 100644 --- a/Objects/moduleobject.c +++ b/Objects/moduleobject.c @@ -151,7 +151,7 @@ module_init(PyModuleObject *m, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"name", "doc", NULL}; PyObject *dict, *name = Py_None, *doc = Py_None; - if (!PyArg_ParseTupleAndKeywords(args, kwds, "S|O:module.__init__", + if (!PyArg_ParseTupleAndKeywords(args, kwds, "U|O:module.__init__", kwlist, &name, &doc)) return -1; dict = m->md_dict; diff --git a/Objects/object.c b/Objects/object.c index 40b8b42..df93a19 100644 --- a/Objects/object.c +++ b/Objects/object.c @@ -372,50 +372,34 @@ PyObject_Repr(PyObject *v) #endif if (v == NULL) return PyUnicode_FromString("<NULL>"); - else if (Py_Type(v)->tp_repr == NULL) - return PyUnicode_FromFormat("<%s object at %p>", v->ob_type->tp_name, v); - else { - res = (*v->ob_type->tp_repr)(v); - if (res != NULL && !PyUnicode_Check(res)) { - PyErr_Format(PyExc_TypeError, - "__repr__ returned non-string (type %.200s)", - res->ob_type->tp_name); - Py_DECREF(res); - return NULL; - } - return res; - } -} - -PyObject * -PyObject_ReprStr8(PyObject *v) -{ - PyObject *resu = PyObject_Repr(v); - if (resu) { - PyObject *resb = PyUnicode_AsEncodedString(resu, NULL, NULL); - Py_DECREF(resu); - if (resb) { - PyObject *ress = PyString_FromStringAndSize( - PyBytes_AS_STRING(resb), - PyBytes_GET_SIZE(resb) - ); - Py_DECREF(resb); - return ress; - } - } - return NULL; + if (Py_Type(v)->tp_repr == NULL) + return PyUnicode_FromFormat("<%s object at %p>", + v->ob_type->tp_name, v); + res = (*v->ob_type->tp_repr)(v); + if (res != NULL && !PyUnicode_Check(res)) { + PyErr_Format(PyExc_TypeError, + "__repr__ returned non-string (type %.200s)", + res->ob_type->tp_name); + Py_DECREF(res); + return NULL; + } + return res; } PyObject * -_PyObject_Str(PyObject *v) +PyObject_Str(PyObject *v) { PyObject *res; + if (PyErr_CheckSignals()) + return NULL; +#ifdef USE_STACKCHECK + if (PyOS_CheckStack()) { + PyErr_SetString(PyExc_MemoryError, "stack overflow"); + return NULL; + } +#endif if (v == NULL) return PyUnicode_FromString("<NULL>"); - if (PyString_CheckExact(v)) { - Py_INCREF(v); - return v; - } if (PyUnicode_CheckExact(v)) { Py_INCREF(v); return v; @@ -431,7 +415,7 @@ _PyObject_Str(PyObject *v) Py_LeaveRecursiveCall(); if (res == NULL) return NULL; - if (!(PyString_Check(res) || PyUnicode_Check(res))) { + if (!PyUnicode_Check(res)) { PyErr_Format(PyExc_TypeError, "__str__ returned non-string (type %.200s)", Py_Type(res)->tp_name); @@ -441,90 +425,12 @@ _PyObject_Str(PyObject *v) return res; } -PyObject * -PyObject_Str(PyObject *v) -{ - PyObject *res = _PyObject_Str(v); - if (res == NULL) - return NULL; - if (PyUnicode_Check(res)) { - PyObject* str; - str = _PyUnicode_AsDefaultEncodedString(res, NULL); - Py_XINCREF(str); - Py_DECREF(res); - if (str) - res = str; - else - return NULL; - } - assert(PyString_Check(res)); - return res; -} - -PyObject * -PyObject_Unicode(PyObject *v) -{ - PyObject *res; - PyObject *func; - PyObject *str; - static PyObject *unicodestr; - - if (v == NULL) - return PyUnicode_FromString("<NULL>"); - else if (PyUnicode_CheckExact(v)) { - Py_INCREF(v); - return v; - } - /* XXX As soon as we have a tp_unicode slot, we should - check this before trying the __unicode__ - method. */ - if (unicodestr == NULL) { - unicodestr= PyUnicode_InternFromString("__unicode__"); - if (unicodestr == NULL) - return NULL; - } - func = PyObject_GetAttr(v, unicodestr); - if (func != NULL) { - res = PyEval_CallObject(func, (PyObject *)NULL); - Py_DECREF(func); - } - else { - PyErr_Clear(); - if (PyUnicode_Check(v) && - v->ob_type->tp_str == PyUnicode_Type.tp_str) { - /* For a Unicode subtype that's didn't overwrite - __unicode__ or __str__, - return a true Unicode object with the same data. */ - return PyUnicode_FromUnicode(PyUnicode_AS_UNICODE(v), - PyUnicode_GET_SIZE(v)); - } - if (PyString_CheckExact(v)) { - Py_INCREF(v); - res = v; - } - else { - if (Py_Type(v)->tp_str != NULL) - res = (*Py_Type(v)->tp_str)(v); - else - res = PyObject_Repr(v); - } - } - if (res == NULL) - return NULL; - if (!PyUnicode_Check(res)) { - str = PyUnicode_FromEncodedObject(res, NULL, "strict"); - Py_DECREF(res); - res = str; - } - return res; -} - /* The new comparison philosophy is: we completely separate three-way comparison from rich comparison. That is, PyObject_Compare() and PyObject_Cmp() *just* use the tp_compare slot. And PyObject_RichCompare() and PyObject_RichCompareBool() *just* use the tp_richcompare slot. - + See (*) below for practical amendments. IOW, only cmp() uses tp_compare; the comparison operators (==, !=, <=, <, @@ -580,7 +486,7 @@ do_compare(PyObject *v, PyObject *w) cmpfunc f; int ok; - if (v->ob_type == w->ob_type && + if (v->ob_type == w->ob_type && (f = v->ob_type->tp_compare) != NULL) { return (*f)(v, w); } @@ -738,25 +644,25 @@ Py_CmpToRich(int op, int cmp) return NULL; switch (op) { case Py_LT: - ok = cmp < 0; + ok = cmp < 0; break; case Py_LE: - ok = cmp <= 0; + ok = cmp <= 0; break; case Py_EQ: - ok = cmp == 0; + ok = cmp == 0; break; case Py_NE: - ok = cmp != 0; + ok = cmp != 0; break; - case Py_GT: - ok = cmp > 0; + case Py_GT: + ok = cmp > 0; break; case Py_GE: - ok = cmp >= 0; + ok = cmp >= 0; break; default: - PyErr_BadArgument(); + PyErr_BadArgument(); return NULL; } res = ok ? Py_True : Py_False; @@ -1335,10 +1241,10 @@ _dir_locals(void) } /* Helper for PyObject_Dir of type objects: returns __dict__ and __bases__. - We deliberately don't suck up its __class__, as methods belonging to the - metaclass would probably be more confusing than helpful. + We deliberately don't suck up its __class__, as methods belonging to the + metaclass would probably be more confusing than helpful. */ -static PyObject * +static PyObject * _specialized_dir_type(PyObject *obj) { PyObject *result = NULL; @@ -1381,7 +1287,7 @@ _generic_dir(PyObject *obj) PyObject *result = NULL; PyObject *dict = NULL; PyObject *itsclass = NULL; - + /* Get __dict__ (which may or may not be a real dict...) */ dict = PyObject_GetAttrString(obj, "__dict__"); if (dict == NULL) { @@ -1486,7 +1392,7 @@ PyObject_Dir(PyObject *obj) Py_DECREF(result); result = NULL; } - + return result; } diff --git a/Objects/stringlib/transmogrify.h b/Objects/stringlib/transmogrify.h index 1ee8e75..fe478c3 100644 --- a/Objects/stringlib/transmogrify.h +++ b/Objects/stringlib/transmogrify.h @@ -12,7 +12,7 @@ shared code in bytes_methods.c to cut down on duplicate code bloat. */ PyDoc_STRVAR(expandtabs__doc__, -"B.expandtabs([tabsize]) -> modified copy of B\n\ +"B.expandtabs([tabsize]) -> copy of B\n\ \n\ Return a copy of B where all tab characters are expanded using spaces.\n\ If tabsize is not given, a tab size of 8 characters is assumed."); @@ -133,7 +133,7 @@ pad(PyObject *self, Py_ssize_t left, Py_ssize_t right, char fill) } PyDoc_STRVAR(ljust__doc__, -"B.ljust(width[, fillchar]) -> modified copy of B\n" +"B.ljust(width[, fillchar]) -> copy of B\n" "\n" "Return B left justified in a string of length width. Padding is\n" "done using the specified fill character (default is a space)."); @@ -163,7 +163,7 @@ stringlib_ljust(PyObject *self, PyObject *args) PyDoc_STRVAR(rjust__doc__, -"B.rjust(width[, fillchar]) -> modified copy of B\n" +"B.rjust(width[, fillchar]) -> copy of B\n" "\n" "Return B right justified in a string of length width. Padding is\n" "done using the specified fill character (default is a space)"); @@ -193,10 +193,10 @@ stringlib_rjust(PyObject *self, PyObject *args) PyDoc_STRVAR(center__doc__, -"B.center(width[, fillchar]) -> modified copy of B\n" +"B.center(width[, fillchar]) -> copy of B\n" "\n" -"Return B centered in a string of length width. Padding is\n" -"done using the specified fill character (default is a space)"); +"Return B centered in a string of length width. Padding is\n" +"done using the specified fill character (default is a space)."); static PyObject * stringlib_center(PyObject *self, PyObject *args) @@ -226,7 +226,7 @@ stringlib_center(PyObject *self, PyObject *args) } PyDoc_STRVAR(zfill__doc__, -"B.zfill(width) -> modified copy of B\n" +"B.zfill(width) -> copy of B\n" "\n" "Pad a numeric string B with zeros on the left, to fill a field\n" "of the specified width. B is never truncated."); diff --git a/Objects/stringobject.c b/Objects/stringobject.c index 3dd1051..8761477 100644 --- a/Objects/stringobject.c +++ b/Objects/stringobject.c @@ -1,11 +1,32 @@ /* String object implementation */ +/* XXX This is now called 'bytes' as far as the user is concerned. + Many docstrings and error messages need to be cleaned up. */ + #define PY_SSIZE_T_CLEAN #include "Python.h" #include "bytes_methods.h" +static Py_ssize_t +_getbuffer(PyObject *obj, Py_buffer *view) +{ + PyBufferProcs *buffer = Py_Type(obj)->tp_as_buffer; + + if (buffer == NULL || buffer->bf_getbuffer == NULL) + { + PyErr_Format(PyExc_TypeError, + "Type %.100s doesn't support the buffer API", + Py_Type(obj)->tp_name); + return -1; + } + + if (buffer->bf_getbuffer(obj, view, PyBUF_SIMPLE) < 0) + return -1; + return view->len; +} + #ifdef COUNT_ALLOCS int null_strings, one_strings; #endif @@ -13,16 +34,6 @@ int null_strings, one_strings; static PyStringObject *characters[UCHAR_MAX + 1]; static PyStringObject *nullstring; -/* This dictionary holds all interned strings. Note that references to - strings in this dictionary are *not* counted in the string's ob_refcnt. - When the interned string reaches a refcnt of 0 the string deallocation - function will delete the reference from this dictionary. - - Another way to look at this is that to say that the actual reference - count of a string is: s->ob_refcnt + (s->ob_sstate?2:0) -*/ -static PyObject *interned; - /* For both PyString_FromString() and PyString_FromStringAndSize(), the parameter `size' denotes number of characters to allocate, not counting any @@ -77,21 +88,14 @@ PyString_FromStringAndSize(const char *str, Py_ssize_t size) return PyErr_NoMemory(); PyObject_INIT_VAR(op, &PyString_Type, size); op->ob_shash = -1; - op->ob_sstate = SSTATE_NOT_INTERNED; if (str != NULL) Py_MEMCPY(op->ob_sval, str, size); op->ob_sval[size] = '\0'; /* share short strings */ if (size == 0) { - PyObject *t = (PyObject *)op; - PyString_InternInPlace(&t); - op = (PyStringObject *)t; nullstring = op; Py_INCREF(op); } else if (size == 1 && str != NULL) { - PyObject *t = (PyObject *)op; - PyString_InternInPlace(&t); - op = (PyStringObject *)t; characters[*str & UCHAR_MAX] = op; Py_INCREF(op); } @@ -132,19 +136,12 @@ PyString_FromString(const char *str) return PyErr_NoMemory(); PyObject_INIT_VAR(op, &PyString_Type, size); op->ob_shash = -1; - op->ob_sstate = SSTATE_NOT_INTERNED; Py_MEMCPY(op->ob_sval, str, size+1); /* share short strings */ if (size == 0) { - PyObject *t = (PyObject *)op; - PyString_InternInPlace(&t); - op = (PyStringObject *)t; nullstring = op; Py_INCREF(op); } else if (size == 1) { - PyObject *t = (PyObject *)op; - PyString_InternInPlace(&t); - op = (PyStringObject *)t; characters[*str & UCHAR_MAX] = op; Py_INCREF(op); } @@ -351,174 +348,9 @@ PyString_FromFormat(const char *format, ...) return ret; } - -PyObject *PyString_Decode(const char *s, - Py_ssize_t size, - const char *encoding, - const char *errors) -{ - PyObject *v, *str; - - str = PyString_FromStringAndSize(s, size); - if (str == NULL) - return NULL; - v = PyString_AsDecodedString(str, encoding, errors); - Py_DECREF(str); - return v; -} - -PyObject *PyString_AsDecodedObject(PyObject *str, - const char *encoding, - const char *errors) -{ - PyObject *v; - - if (!PyString_Check(str)) { - PyErr_BadArgument(); - goto onError; - } - - if (encoding == NULL) { - encoding = PyUnicode_GetDefaultEncoding(); - } - - /* Decode via the codec registry */ - v = PyCodec_Decode(str, encoding, errors); - if (v == NULL) - goto onError; - - return v; - - onError: - return NULL; -} - -PyObject *PyString_AsDecodedString(PyObject *str, - const char *encoding, - const char *errors) -{ - PyObject *v; - - v = PyString_AsDecodedObject(str, encoding, errors); - if (v == NULL) - goto onError; - - /* Convert Unicode to a string using the default encoding */ - if (PyUnicode_Check(v)) { - PyObject *temp = v; - v = PyUnicode_AsEncodedString(v, NULL, NULL); - Py_DECREF(temp); - if (v == NULL) - goto onError; - } - if (!PyString_Check(v)) { - PyErr_Format(PyExc_TypeError, - "decoder did not return a string object (type=%.400s)", - Py_Type(v)->tp_name); - Py_DECREF(v); - goto onError; - } - - return v; - - onError: - return NULL; -} - -PyObject *PyString_Encode(const char *s, - Py_ssize_t size, - const char *encoding, - const char *errors) -{ - PyObject *v, *str; - - str = PyString_FromStringAndSize(s, size); - if (str == NULL) - return NULL; - v = PyString_AsEncodedString(str, encoding, errors); - Py_DECREF(str); - return v; -} - -PyObject *PyString_AsEncodedObject(PyObject *str, - const char *encoding, - const char *errors) -{ - PyObject *v; - - if (!PyString_Check(str)) { - PyErr_BadArgument(); - goto onError; - } - - if (encoding == NULL) { - encoding = PyUnicode_GetDefaultEncoding(); - } - - /* Encode via the codec registry */ - v = PyCodec_Encode(str, encoding, errors); - if (v == NULL) - goto onError; - - return v; - - onError: - return NULL; -} - -PyObject *PyString_AsEncodedString(PyObject *str, - const char *encoding, - const char *errors) -{ - PyObject *v; - - v = PyString_AsEncodedObject(str, encoding, errors); - if (v == NULL) - goto onError; - - /* Convert Unicode to a string using the default encoding */ - if (PyUnicode_Check(v)) { - PyObject *temp = v; - v = PyUnicode_AsEncodedString(v, NULL, NULL); - Py_DECREF(temp); - if (v == NULL) - goto onError; - } - if (!PyString_Check(v)) { - PyErr_Format(PyExc_TypeError, - "encoder did not return a string object (type=%.400s)", - Py_Type(v)->tp_name); - Py_DECREF(v); - goto onError; - } - - return v; - - onError: - return NULL; -} - static void string_dealloc(PyObject *op) { - switch (PyString_CHECK_INTERNED(op)) { - case SSTATE_NOT_INTERNED: - break; - - case SSTATE_INTERNED_MORTAL: - /* revive dead object temporarily for DelItem */ - Py_Refcnt(op) = 3; - if (PyDict_DelItem(interned, op) != 0) - Py_FatalError( - "deletion of interned string failed"); - break; - - case SSTATE_INTERNED_IMMORTAL: - Py_FatalError("Immortal interned string died."); - - default: - Py_FatalError("Inconsistent interned string state."); - } Py_Type(op)->tp_free(op); } @@ -577,7 +409,7 @@ PyObject *PyString_DecodeEscape(const char *s, continue; } s++; - if (s==end) { + if (s==end) { PyErr_SetString(PyExc_ValueError, "Trailing \\ in string"); goto failed; @@ -639,8 +471,8 @@ PyObject *PyString_DecodeEscape(const char *s, /* do nothing */; else { PyErr_Format(PyExc_ValueError, - "decoding error; " - "unknown error handling code: %.400s", + "decoding error; unknown " + "error handling code: %.400s", errors); goto failed; } @@ -665,8 +497,8 @@ PyObject *PyString_DecodeEscape(const char *s, static Py_ssize_t string_getsize(register PyObject *op) { - char *s; - Py_ssize_t len; + char *s; + Py_ssize_t len; if (PyString_AsStringAndSize(op, &s, &len)) return -1; return len; @@ -675,8 +507,8 @@ string_getsize(register PyObject *op) static /*const*/ char * string_getbuffer(register PyObject *op) { - char *s; - Py_ssize_t len; + char *s; + Py_ssize_t len; if (PyString_AsStringAndSize(op, &s, &len)) return NULL; return s; @@ -753,7 +585,7 @@ PyString_AsStringAndSize(register PyObject *obj, #define STRINGLIB_LEN PyString_GET_SIZE #define STRINGLIB_NEW PyString_FromStringAndSize #define STRINGLIB_STR PyString_AS_STRING -#define STRINGLIB_WANT_CONTAINS_OBJ 1 +/* #define STRINGLIB_WANT_CONTAINS_OBJ 1 */ #define STRINGLIB_EMPTY nullstring #define STRINGLIB_CHECK_EXACT PyString_CheckExact @@ -773,12 +605,12 @@ PyString_Repr(PyObject *obj, int smartquotes) { static const char *hexdigits = "0123456789abcdef"; register PyStringObject* op = (PyStringObject*) obj; - Py_ssize_t length = PyString_GET_SIZE(op); - size_t newsize = 3 + 4 * Py_Size(op); + Py_ssize_t length = Py_Size(op); + size_t newsize = 3 + 4 * length; PyObject *v; - if (newsize > PY_SSIZE_T_MAX || newsize / 4 != Py_Size(op)) { + if (newsize > PY_SSIZE_T_MAX || (newsize-3) / 4 != length) { PyErr_SetString(PyExc_OverflowError, - "string is too large to make repr"); + "bytes object is too large to make repr"); } v = PyUnicode_FromUnicode(NULL, newsize); if (v == NULL) { @@ -790,14 +622,14 @@ PyString_Repr(PyObject *obj, int smartquotes) register Py_UNICODE *p = PyUnicode_AS_UNICODE(v); int quote; - /* figure out which quote to use; single is preferred */ + /* Figure out which quote to use; single is preferred */ quote = '\''; if (smartquotes) { char *test, *start; start = PyString_AS_STRING(op); for (test = start; test < start+length; ++test) { if (*test == '"') { - quote = '\''; /* switch back to single quote */ + quote = '\''; /* back to single */ goto decided; } else if (*test == '\'') @@ -807,8 +639,8 @@ PyString_Repr(PyObject *obj, int smartquotes) ; } - *p++ = 's', *p++ = quote; - for (i = 0; i < Py_Size(op); i++) { + *p++ = 'b', *p++ = quote; + for (i = 0; i < length; i++) { /* There's at least enough room for a hex escape and a closing quote. */ assert(newsize - (p - PyUnicode_AS_UNICODE(v)) >= 5); @@ -848,18 +680,14 @@ string_repr(PyObject *op) } static PyObject * -string_str(PyObject *s) +string_str(PyObject *op) { - assert(PyString_Check(s)); - if (PyString_CheckExact(s)) { - Py_INCREF(s); - return s; - } - else { - /* Subtype -- return genuine string with the same value. */ - PyStringObject *t = (PyStringObject *) s; - return PyString_FromStringAndSize(t->ob_sval, Py_Size(t)); + if (Py_BytesWarningFlag) { + if (PyErr_WarnEx(PyExc_BytesWarning, + "str() on a bytes instance", 1)) + return NULL; } + return string_repr(op); } static Py_ssize_t @@ -868,51 +696,53 @@ string_length(PyStringObject *a) return Py_Size(a); } +/* This is also used by PyString_Concat() */ static PyObject * -string_concat(register PyStringObject *a, register PyObject *bb) +string_concat(PyObject *a, PyObject *b) { - register Py_ssize_t size; - register PyStringObject *op; - if (!PyString_Check(bb)) { - if (PyUnicode_Check(bb)) - return PyUnicode_Concat((PyObject *)a, bb); - if (PyBytes_Check(bb)) - return PyBytes_Concat((PyObject *)a, bb); - PyErr_Format(PyExc_TypeError, - "cannot concatenate 'str8' and '%.200s' objects", - Py_Type(bb)->tp_name); - return NULL; + Py_ssize_t size; + Py_buffer va, vb; + PyObject *result = NULL; + + va.len = -1; + vb.len = -1; + if (_getbuffer(a, &va) < 0 || + _getbuffer(b, &vb) < 0) { + PyErr_Format(PyExc_TypeError, "can't concat %.100s to %.100s", + Py_Type(a)->tp_name, Py_Type(b)->tp_name); + goto done; } -#define b ((PyStringObject *)bb) - /* Optimize cases with empty left or right operand */ - if ((Py_Size(a) == 0 || Py_Size(b) == 0) && - PyString_CheckExact(a) && PyString_CheckExact(b)) { - if (Py_Size(a) == 0) { - Py_INCREF(bb); - return bb; - } - Py_INCREF(a); - return (PyObject *)a; + + /* Optimize end cases */ + if (va.len == 0 && PyString_CheckExact(b)) { + result = b; + Py_INCREF(result); + goto done; + } + if (vb.len == 0 && PyString_CheckExact(a)) { + result = a; + Py_INCREF(result); + goto done; } - size = Py_Size(a) + Py_Size(b); + + size = va.len + vb.len; if (size < 0) { - PyErr_SetString(PyExc_OverflowError, - "strings are too large to concat"); - return NULL; + PyErr_NoMemory(); + goto done; } - /* Inline PyObject_NewVar */ - op = (PyStringObject *)PyObject_MALLOC(sizeof(PyStringObject) + size); - if (op == NULL) - return PyErr_NoMemory(); - PyObject_INIT_VAR(op, &PyString_Type, size); - op->ob_shash = -1; - op->ob_sstate = SSTATE_NOT_INTERNED; - Py_MEMCPY(op->ob_sval, a->ob_sval, Py_Size(a)); - Py_MEMCPY(op->ob_sval + Py_Size(a), b->ob_sval, Py_Size(b)); - op->ob_sval[size] = '\0'; - return (PyObject *) op; -#undef b + result = PyString_FromStringAndSize(NULL, size); + if (result != NULL) { + memcpy(PyString_AS_STRING(result), va.buf, va.len); + memcpy(PyString_AS_STRING(result) + va.len, vb.buf, vb.len); + } + + done: + if (va.len != -1) + PyObject_ReleaseBuffer(a, &va); + if (vb.len != -1) + PyObject_ReleaseBuffer(b, &vb); + return result; } static PyObject * @@ -950,7 +780,6 @@ string_repeat(register PyStringObject *a, register Py_ssize_t n) return PyErr_NoMemory(); PyObject_INIT_VAR(op, &PyString_Type, size); op->ob_shash = -1; - op->ob_sstate = SSTATE_NOT_INTERNED; op->ob_sval[size] = '\0'; if (Py_Size(a) == 1 && n > 0) { memset(op->ob_sval, a->ob_sval[0] , n); @@ -970,20 +799,36 @@ string_repeat(register PyStringObject *a, register Py_ssize_t n) } static int -string_contains(PyObject *str_obj, PyObject *sub_obj) +string_contains(PyObject *self, PyObject *arg) +{ + Py_ssize_t ival = PyNumber_AsSsize_t(arg, PyExc_ValueError); + if (ival == -1 && PyErr_Occurred()) { + Py_buffer varg; + int pos; + PyErr_Clear(); + if (_getbuffer(arg, &varg) < 0) + return -1; + pos = stringlib_find(PyString_AS_STRING(self), Py_Size(self), + varg.buf, varg.len, 0); + PyObject_ReleaseBuffer(arg, &varg); + return pos >= 0; + } + if (ival < 0 || ival >= 256) { + PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)"); + return -1; + } + + return memchr(PyString_AS_STRING(self), ival, Py_Size(self)) != NULL; +} + +static PyObject * +string_item(PyStringObject *a, register Py_ssize_t i) { - if (!PyString_CheckExact(sub_obj)) { - if (PyUnicode_Check(sub_obj)) - return PyUnicode_Contains(str_obj, sub_obj); - if (!PyString_Check(sub_obj)) { - PyErr_Format(PyExc_TypeError, - "'in <string>' requires string as left operand, " - "not %.200s", Py_Type(sub_obj)->tp_name); - return -1; - } + if (i < 0 || i >= Py_Size(a)) { + PyErr_SetString(PyExc_IndexError, "string index out of range"); + return NULL; } - - return stringlib_contains_obj(str_obj, sub_obj); + return PyInt_FromLong((unsigned char)a->ob_sval[i]); } static PyObject* @@ -996,6 +841,15 @@ string_richcompare(PyStringObject *a, PyStringObject *b, int op) /* Make sure both arguments are strings. */ if (!(PyString_Check(a) && PyString_Check(b))) { + if (Py_BytesWarningFlag && (op == Py_EQ) && + (PyObject_IsInstance((PyObject*)a, + (PyObject*)&PyUnicode_Type) || + PyObject_IsInstance((PyObject*)b, + (PyObject*)&PyUnicode_Type))) { + if (PyErr_WarnEx(PyExc_BytesWarning, + "Comparsion between bytes and string", 1)) + return NULL; + } result = Py_NotImplemented; goto out; } @@ -1053,9 +907,9 @@ _PyString_Eq(PyObject *o1, PyObject *o2) { PyStringObject *a = (PyStringObject*) o1; PyStringObject *b = (PyStringObject*) o2; - return Py_Size(a) == Py_Size(b) - && *a->ob_sval == *b->ob_sval - && memcmp(a->ob_sval, b->ob_sval, Py_Size(a)) == 0; + return Py_Size(a) == Py_Size(b) + && *a->ob_sval == *b->ob_sval + && memcmp(a->ob_sval, b->ob_sval, Py_Size(a)) == 0; } static long @@ -1088,12 +942,12 @@ string_subscript(PyStringObject* self, PyObject* item) return NULL; if (i < 0) i += PyString_GET_SIZE(self); - if (i < 0 || i >= PyString_GET_SIZE(self)) { + if (i < 0 || i >= PyString_GET_SIZE(self)) { PyErr_SetString(PyExc_IndexError, "string index out of range"); return NULL; - } - return PyInt_FromLong((unsigned char)self->ob_sval[i]); + } + return PyInt_FromLong((unsigned char)self->ob_sval[i]); } else if (PySlice_Check(item)) { Py_ssize_t start, stop, step, slicelength, cur, i; @@ -1149,14 +1003,15 @@ string_subscript(PyStringObject* self, PyObject* item) static int string_buffer_getbuffer(PyStringObject *self, Py_buffer *view, int flags) { - return PyBuffer_FillInfo(view, (void *)self->ob_sval, Py_Size(self), 0, flags); + return PyBuffer_FillInfo(view, (void *)self->ob_sval, Py_Size(self), + 0, flags); } static PySequenceMethods string_as_sequence = { (lenfunc)string_length, /*sq_length*/ (binaryfunc)string_concat, /*sq_concat*/ (ssizeargfunc)string_repeat, /*sq_repeat*/ - 0, /*sq_item*/ + (ssizeargfunc)string_item, /*sq_item*/ 0, /*sq_slice*/ 0, /*sq_ass_item*/ 0, /*sq_ass_slice*/ @@ -1171,7 +1026,7 @@ static PyMappingMethods string_as_mapping = { static PyBufferProcs string_as_buffer = { (getbufferproc)string_buffer_getbuffer, - NULL, + NULL, }; @@ -1297,12 +1152,12 @@ split_char(const char *s, Py_ssize_t len, char ch, Py_ssize_t maxcount) } PyDoc_STRVAR(split__doc__, -"S.split([sep [,maxsplit]]) -> list of strings\n\ +"B.split([sep[, maxsplit]]) -> list of bytes\n\ \n\ -Return a list of the words in the string S, using sep as the\n\ -delimiter string. If maxsplit is given, at most maxsplit\n\ -splits are done. If sep is not specified or is None, any\n\ -whitespace string is a separator."); +Return a list of the sections in B, using sep as the delimiter.\n\ +If sep is not given, B is split on ASCII whitespace characters\n\ +(space, tab, return, newline, formfeed, vertical tab).\n\ +If maxsplit is given, at most maxsplit splits are done."); static PyObject * string_split(PyStringObject *self, PyObject *args) @@ -1310,6 +1165,7 @@ string_split(PyStringObject *self, PyObject *args) Py_ssize_t len = PyString_GET_SIZE(self), n, i, j; Py_ssize_t maxsplit = -1, count=0; const char *s = PyString_AS_STRING(self), *sub; + Py_buffer vsub; PyObject *list, *str, *subobj = Py_None; #ifdef USE_FAST Py_ssize_t pos; @@ -1321,25 +1177,27 @@ string_split(PyStringObject *self, PyObject *args) maxsplit = PY_SSIZE_T_MAX; if (subobj == Py_None) return split_whitespace(s, len, maxsplit); - if (PyString_Check(subobj)) { - sub = PyString_AS_STRING(subobj); - n = PyString_GET_SIZE(subobj); - } - else if (PyUnicode_Check(subobj)) - return PyUnicode_Split((PyObject *)self, subobj, maxsplit); - else if (PyObject_AsCharBuffer(subobj, &sub, &n)) + if (_getbuffer(subobj, &vsub) < 0) return NULL; + sub = vsub.buf; + n = vsub.len; if (n == 0) { PyErr_SetString(PyExc_ValueError, "empty separator"); + PyObject_ReleaseBuffer(subobj, &vsub); return NULL; } - else if (n == 1) - return split_char(s, len, sub[0], maxsplit); + else if (n == 1) { + char ch = sub[0]; + PyObject_ReleaseBuffer(subobj, &vsub); + return split_char(s, len, ch, maxsplit); + } list = PyList_New(PREALLOC_SIZE(maxsplit)); - if (list == NULL) + if (list == NULL) { + PyObject_ReleaseBuffer(subobj, &vsub); return NULL; + } #ifdef USE_FAST i = j = 0; @@ -1365,19 +1223,21 @@ string_split(PyStringObject *self, PyObject *args) #endif SPLIT_ADD(s, i, len); FIX_PREALLOC_SIZE(list); + PyObject_ReleaseBuffer(subobj, &vsub); return list; onError: Py_DECREF(list); + PyObject_ReleaseBuffer(subobj, &vsub); return NULL; } PyDoc_STRVAR(partition__doc__, -"S.partition(sep) -> (head, sep, tail)\n\ +"B.partition(sep) -> (head, sep, tail)\n\ \n\ -Searches for the separator sep in S, and returns the part before it,\n\ +Searches for the separator sep in B, and returns the part before it,\n\ the separator itself, and the part after it. If the separator is not\n\ -found, returns S and two empty strings."); +found, returns B and two empty bytes objects."); static PyObject * string_partition(PyStringObject *self, PyObject *sep_obj) @@ -1402,11 +1262,12 @@ string_partition(PyStringObject *self, PyObject *sep_obj) } PyDoc_STRVAR(rpartition__doc__, -"S.rpartition(sep) -> (tail, sep, head)\n\ +"B.rpartition(sep) -> (tail, sep, head)\n\ \n\ -Searches for the separator sep in S, starting at the end of S, and returns\n\ -the part before it, the separator itself, and the part after it. If the\n\ -separator is not found, returns two empty strings and S."); +Searches for the separator sep in B, starting at the end of B,\n\ +and returns the part before it, the separator itself, and the\n\ +part after it. If the separator is not found, returns two empty\n\ +bytes objects and B."); static PyObject * string_rpartition(PyStringObject *self, PyObject *sep_obj) @@ -1450,8 +1311,8 @@ rsplit_whitespace(const char *s, Py_ssize_t len, Py_ssize_t maxsplit) SPLIT_ADD(s, i + 1, j + 1); } if (i >= 0) { - /* Only occurs when maxsplit was reached */ - /* Skip any remaining whitespace and copy to beginning of string */ + /* Only occurs when maxsplit was reached. Skip any remaining + whitespace and copy to beginning of string. */ RSKIP_SPACE(s, i); if (i >= 0) SPLIT_ADD(s, 0, i + 1); @@ -1500,13 +1361,14 @@ rsplit_char(const char *s, Py_ssize_t len, char ch, Py_ssize_t maxcount) } PyDoc_STRVAR(rsplit__doc__, -"S.rsplit([sep [,maxsplit]]) -> list of strings\n\ +"B.rsplit([sep[, maxsplit]]) -> list of strings\n\ \n\ -Return a list of the words in the string S, using sep as the\n\ -delimiter string, starting at the end of the string and working\n\ -to the front. If maxsplit is given, at most maxsplit splits are\n\ -done. If sep is not specified or is None, any whitespace string\n\ -is a separator."); +Return a list of the sections in B, using sep as the delimiter,\n\ +starting at the end of B and working to the front.\n\ +If sep is not given, B is split on ASCII whitespace characters\n\ +(space, tab, return, newline, formfeed, vertical tab).\n\ +If maxsplit is given, at most maxsplit splits are done."); + static PyObject * string_rsplit(PyStringObject *self, PyObject *args) @@ -1514,6 +1376,7 @@ string_rsplit(PyStringObject *self, PyObject *args) Py_ssize_t len = PyString_GET_SIZE(self), n, i, j; Py_ssize_t maxsplit = -1, count=0; const char *s = PyString_AS_STRING(self), *sub; + Py_buffer vsub; PyObject *list, *str, *subobj = Py_None; if (!PyArg_ParseTuple(args, "|On:rsplit", &subobj, &maxsplit)) @@ -1522,25 +1385,27 @@ string_rsplit(PyStringObject *self, PyObject *args) maxsplit = PY_SSIZE_T_MAX; if (subobj == Py_None) return rsplit_whitespace(s, len, maxsplit); - if (PyString_Check(subobj)) { - sub = PyString_AS_STRING(subobj); - n = PyString_GET_SIZE(subobj); - } - else if (PyUnicode_Check(subobj)) - return PyUnicode_RSplit((PyObject *)self, subobj, maxsplit); - else if (PyObject_AsCharBuffer(subobj, &sub, &n)) + if (_getbuffer(subobj, &vsub) < 0) return NULL; + sub = vsub.buf; + n = vsub.len; if (n == 0) { PyErr_SetString(PyExc_ValueError, "empty separator"); + PyObject_ReleaseBuffer(subobj, &vsub); return NULL; } - else if (n == 1) - return rsplit_char(s, len, sub[0], maxsplit); + else if (n == 1) { + char ch = sub[0]; + PyObject_ReleaseBuffer(subobj, &vsub); + return rsplit_char(s, len, ch, maxsplit); + } list = PyList_New(PREALLOC_SIZE(maxsplit)); - if (list == NULL) + if (list == NULL) { + PyObject_ReleaseBuffer(subobj, &vsub); return NULL; + } j = len; i = j - n; @@ -1559,10 +1424,12 @@ string_rsplit(PyStringObject *self, PyObject *args) FIX_PREALLOC_SIZE(list); if (PyList_Reverse(list) < 0) goto onError; + PyObject_ReleaseBuffer(subobj, &vsub); return list; onError: Py_DECREF(list); + PyObject_ReleaseBuffer(subobj, &vsub); return NULL; } @@ -1572,13 +1439,13 @@ onError: PyDoc_STRVAR(join__doc__, -"S.join(sequence) -> string\n\ +"B.join(iterable_of_bytes) -> bytes\n\ \n\ -Return a string which is the concatenation of the strings in the\n\ -sequence. The separator between elements is S."); +Concatenates any number of bytes objects, with B in between each pair.\n\ +Example: b'.'.join([b'ab', b'pq', b'rs']) -> b'ab.pq.rs'."); static PyObject * -string_join(PyStringObject *self, PyObject *orig) +string_join(PyObject *self, PyObject *orig) { char *sep = PyString_AS_STRING(self); const Py_ssize_t seplen = PyString_GET_SIZE(self); @@ -1601,7 +1468,7 @@ string_join(PyStringObject *self, PyObject *orig) } if (seqlen == 1) { item = PySequence_Fast_GET_ITEM(seq, 0); - if (PyString_CheckExact(item) || PyUnicode_CheckExact(item)) { + if (PyString_CheckExact(item)) { Py_INCREF(item); Py_DECREF(seq); return item; @@ -1611,37 +1478,26 @@ string_join(PyStringObject *self, PyObject *orig) /* There are at least two things to join, or else we have a subclass * of the builtin types in the sequence. * Do a pre-pass to figure out the total amount of space we'll - * need (sz), see whether any argument is absurd, and defer to - * the Unicode join if appropriate. + * need (sz), and see whether all argument are bytes. */ + /* XXX Shouldn't we use _getbuffer() on these items instead? */ for (i = 0; i < seqlen; i++) { const size_t old_sz = sz; item = PySequence_Fast_GET_ITEM(seq, i); - if (!PyString_Check(item)){ - if (PyUnicode_Check(item)) { - /* Defer to Unicode join. - * CAUTION: There's no gurantee that the - * original sequence can be iterated over - * again, so we must pass seq here. - */ - PyObject *result; - result = PyUnicode_Join((PyObject *)self, seq); - Py_DECREF(seq); - return result; - } + if (!PyString_Check(item) && !PyBytes_Check(item)) { PyErr_Format(PyExc_TypeError, - "sequence item %zd: expected string," + "sequence item %zd: expected bytes," " %.80s found", i, Py_Type(item)->tp_name); Py_DECREF(seq); return NULL; } - sz += PyString_GET_SIZE(item); + sz += Py_Size(item); if (i != 0) sz += seplen; if (sz < old_sz || sz > PY_SSIZE_T_MAX) { PyErr_SetString(PyExc_OverflowError, - "join() result is too long for a Python string"); + "join() result is too long for a Python string"); Py_DECREF(seq); return NULL; } @@ -1655,17 +1511,24 @@ string_join(PyStringObject *self, PyObject *orig) } /* Catenate everything. */ + /* I'm not worried about a PyBytes item growing because there's + nowhere in this function where we release the GIL. */ p = PyString_AS_STRING(res); for (i = 0; i < seqlen; ++i) { size_t n; - item = PySequence_Fast_GET_ITEM(seq, i); - n = PyString_GET_SIZE(item); - Py_MEMCPY(p, PyString_AS_STRING(item), n); - p += n; - if (i < seqlen - 1) { + char *q; + if (i) { Py_MEMCPY(p, sep, seplen); p += seplen; } + item = PySequence_Fast_GET_ITEM(seq, i); + n = Py_Size(item); + if (PyString_Check(item)) + q = PyString_AS_STRING(item); + else + q = PyBytes_AS_STRING(item); + Py_MEMCPY(p, q, n); + p += n; } Py_DECREF(seq); @@ -1677,7 +1540,7 @@ _PyString_Join(PyObject *sep, PyObject *x) { assert(sep != NULL && PyString_Check(sep)); assert(x != NULL); - return string_join((PyStringObject *)sep, x); + return string_join(sep, x); } Py_LOCAL_INLINE(void) @@ -1730,7 +1593,7 @@ string_find_internal(PyStringObject *self, PyObject *args, int dir) PyDoc_STRVAR(find__doc__, -"S.find(sub [,start [,end]]) -> int\n\ +"B.find(sub [,start [,end]]) -> int\n\ \n\ Return the lowest index in S where substring sub is found,\n\ such that sub is contained within s[start:end]. Optional\n\ @@ -1749,9 +1612,9 @@ string_find(PyStringObject *self, PyObject *args) PyDoc_STRVAR(index__doc__, -"S.index(sub [,start [,end]]) -> int\n\ +"B.index(sub [,start [,end]]) -> int\n\ \n\ -Like S.find() but raise ValueError when the substring is not found."); +Like B.find() but raise ValueError when the substring is not found."); static PyObject * string_index(PyStringObject *self, PyObject *args) @@ -1769,9 +1632,9 @@ string_index(PyStringObject *self, PyObject *args) PyDoc_STRVAR(rfind__doc__, -"S.rfind(sub [,start [,end]]) -> int\n\ +"B.rfind(sub [,start [,end]]) -> int\n\ \n\ -Return the highest index in S where substring sub is found,\n\ +Return the highest index in B where substring sub is found,\n\ such that sub is contained within s[start:end]. Optional\n\ arguments start and end are interpreted as in slice notation.\n\ \n\ @@ -1788,9 +1651,9 @@ string_rfind(PyStringObject *self, PyObject *args) PyDoc_STRVAR(rindex__doc__, -"S.rindex(sub [,start [,end]]) -> int\n\ +"B.rindex(sub [,start [,end]]) -> int\n\ \n\ -Like S.rfind() but raise ValueError when the substring is not found."); +Like B.rfind() but raise ValueError when the substring is not found."); static PyObject * string_rindex(PyStringObject *self, PyObject *args) @@ -1810,12 +1673,18 @@ string_rindex(PyStringObject *self, PyObject *args) Py_LOCAL_INLINE(PyObject *) do_xstrip(PyStringObject *self, int striptype, PyObject *sepobj) { + Py_buffer vsep; char *s = PyString_AS_STRING(self); Py_ssize_t len = PyString_GET_SIZE(self); - char *sep = PyString_AS_STRING(sepobj); - Py_ssize_t seplen = PyString_GET_SIZE(sepobj); + char *sep; + Py_ssize_t seplen; Py_ssize_t i, j; + if (_getbuffer(sepobj, &vsep) < 0) + return NULL; + sep = vsep.buf; + seplen = vsep.len; + i = 0; if (striptype != RIGHTSTRIP) { while (i < len && memchr(sep, Py_CHARMASK(s[i]), seplen)) { @@ -1831,6 +1700,8 @@ do_xstrip(PyStringObject *self, int striptype, PyObject *sepobj) j++; } + PyObject_ReleaseBuffer(sepobj, &vsep); + if (i == 0 && j == len && PyString_CheckExact(self)) { Py_INCREF(self); return (PyObject*)self; @@ -1879,36 +1750,17 @@ do_argstrip(PyStringObject *self, int striptype, PyObject *args) return NULL; if (sep != NULL && sep != Py_None) { - if (PyString_Check(sep)) - return do_xstrip(self, striptype, sep); - else if (PyUnicode_Check(sep)) { - PyObject *uniself = PyUnicode_FromObject((PyObject *)self); - PyObject *res; - if (uniself==NULL) - return NULL; - res = _PyUnicode_XStrip((PyUnicodeObject *)uniself, - striptype, sep); - Py_DECREF(uniself); - return res; - } - PyErr_Format(PyExc_TypeError, - "%s arg must be None or string", - STRIPNAME(striptype)); - return NULL; + return do_xstrip(self, striptype, sep); } - return do_strip(self, striptype); } PyDoc_STRVAR(strip__doc__, -"S.strip([chars]) -> string\n\ +"B.strip([bytes]) -> bytes\n\ \n\ -Return a copy of the string S with leading and trailing\n\ -whitespace removed.\n\ -If chars is given and not None, remove characters in chars instead.\n\ -If chars is unicode, S will be converted to unicode before stripping"); - +Strip leading and trailing bytes contained in the argument.\n\ +If the argument is omitted, strip trailing ASCII whitespace."); static PyObject * string_strip(PyStringObject *self, PyObject *args) { @@ -1920,12 +1772,10 @@ string_strip(PyStringObject *self, PyObject *args) PyDoc_STRVAR(lstrip__doc__, -"S.lstrip([chars]) -> string\n\ +"B.lstrip([bytes]) -> bytes\n\ \n\ -Return a copy of the string S with leading whitespace removed.\n\ -If chars is given and not None, remove characters in chars instead.\n\ -If chars is unicode, S will be converted to unicode before stripping"); - +Strip leading bytes contained in the argument.\n\ +If the argument is omitted, strip leading ASCII whitespace."); static PyObject * string_lstrip(PyStringObject *self, PyObject *args) { @@ -1937,12 +1787,10 @@ string_lstrip(PyStringObject *self, PyObject *args) PyDoc_STRVAR(rstrip__doc__, -"S.rstrip([chars]) -> string\n\ +"B.rstrip([bytes]) -> bytes\n\ \n\ -Return a copy of the string S with trailing whitespace removed.\n\ -If chars is given and not None, remove characters in chars instead.\n\ -If chars is unicode, S will be converted to unicode before stripping"); - +Strip trailing bytes contained in the argument.\n\ +If the argument is omitted, strip trailing ASCII whitespace."); static PyObject * string_rstrip(PyStringObject *self, PyObject *args) { @@ -1954,7 +1802,7 @@ string_rstrip(PyStringObject *self, PyObject *args) PyDoc_STRVAR(count__doc__, -"S.count(sub[, start[, end]]) -> int\n\ +"B.count(sub [,start [,end]]) -> int\n\ \n\ Return the number of non-overlapping occurrences of substring sub in\n\ string S[start:end]. Optional arguments start and end are interpreted\n\ @@ -1996,12 +1844,12 @@ string_count(PyStringObject *self, PyObject *args) PyDoc_STRVAR(translate__doc__, -"S.translate(table [,deletechars]) -> string\n\ +"B.translate(table[, deletechars]) -> bytes\n\ \n\ -Return a copy of the string S, where all characters occurring\n\ -in the optional argument deletechars are removed, and the\n\ -remaining characters have been mapped through the given\n\ -translation table, which must be a string of length 256."); +Return a copy of B, where all characters occurring in the\n\ +optional argument deletechars are removed, and the remaining\n\ +characters have been mapped through the given translation\n\ +table, which must be a bytes object of length 256."); static PyObject * string_translate(PyStringObject *self, PyObject *args) @@ -2187,7 +2035,7 @@ findstring(const char *target, Py_ssize_t target_len, return end; } else { for (; start <= end; start++) - if (Py_STRING_MATCH(target, start, pattern, pattern_len)) + if (Py_STRING_MATCH(target, start,pattern,pattern_len)) return start; } return -1; @@ -2225,14 +2073,15 @@ countstring(const char *target, Py_ssize_t target_len, end -= pattern_len; if (direction < 0) { for (; (end >= start); end--) - if (Py_STRING_MATCH(target, end, pattern, pattern_len)) { + if (Py_STRING_MATCH(target, end,pattern,pattern_len)) { count++; if (--maxcount <= 0) break; end -= pattern_len-1; } } else { for (; (start <= end); start++) - if (Py_STRING_MATCH(target, start, pattern, pattern_len)) { + if (Py_STRING_MATCH(target, start, + pattern, pattern_len)) { count++; if (--maxcount <= 0) break; @@ -2522,12 +2371,14 @@ replace_single_character(PyStringObject *self, /* result_len = self_len + count * (to_len-1) */ product = count * (to_len-1); if (product / (to_len-1) != count) { - PyErr_SetString(PyExc_OverflowError, "replace string is too long"); + PyErr_SetString(PyExc_OverflowError, + "replace string is too long"); return NULL; } result_len = self_len + product; if (result_len < 0) { - PyErr_SetString(PyExc_OverflowError, "replace string is too long"); + PyErr_SetString(PyExc_OverflowError, + "replace string is too long"); return NULL; } @@ -2590,12 +2441,14 @@ replace_substring(PyStringObject *self, /* result_len = self_len + count * (to_len-from_len) */ product = count * (to_len-from_len); if (product / (to_len-from_len) != count) { - PyErr_SetString(PyExc_OverflowError, "replace string is too long"); + PyErr_SetString(PyExc_OverflowError, + "replace string is too long"); return NULL; } result_len = self_len + product; if (result_len < 0) { - PyErr_SetString(PyExc_OverflowError, "replace string is too long"); + PyErr_SetString(PyExc_OverflowError, + "replace string is too long"); return NULL; } @@ -2675,7 +2528,8 @@ replace(PyStringObject *self, return replace_delete_single_character( self, from_s[0], maxcount); } else { - return replace_delete_substring(self, from_s, from_len, maxcount); + return replace_delete_substring(self, from_s, + from_len, maxcount); } } @@ -2690,7 +2544,8 @@ replace(PyStringObject *self, maxcount); } else { return replace_substring_in_place( - self, from_s, from_len, to_s, to_len, maxcount); + self, from_s, from_len, to_s, to_len, + maxcount); } } @@ -2700,14 +2555,15 @@ replace(PyStringObject *self, to_s, to_len, maxcount); } else { /* len('from')>=2, len('to')>=1 */ - return replace_substring(self, from_s, from_len, to_s, to_len, maxcount); + return replace_substring(self, from_s, from_len, to_s, to_len, + maxcount); } } PyDoc_STRVAR(replace__doc__, -"S.replace (old, new[, count]) -> string\n\ +"B.replace(old, new[, count]) -> bytes\n\ \n\ -Return a copy of string S with all occurrences of substring\n\ +Return a copy of B with all occurrences of subsection\n\ old replaced by new. If the optional argument count is\n\ given, only the first count occurrences are replaced."); @@ -2794,11 +2650,11 @@ _string_tailmatch(PyStringObject *self, PyObject *substr, Py_ssize_t start, PyDoc_STRVAR(startswith__doc__, -"S.startswith(prefix[, start[, end]]) -> bool\n\ +"B.startswith(prefix [,start [,end]]) -> bool\n\ \n\ -Return True if S starts with the specified prefix, False otherwise.\n\ -With optional start, test S beginning at that position.\n\ -With optional end, stop comparing S at that position.\n\ +Return True if B starts with the specified prefix, False otherwise.\n\ +With optional start, test B beginning at that position.\n\ +With optional end, stop comparing B at that position.\n\ prefix can also be a tuple of strings to try."); static PyObject * @@ -2835,11 +2691,11 @@ string_startswith(PyStringObject *self, PyObject *args) PyDoc_STRVAR(endswith__doc__, -"S.endswith(suffix[, start[, end]]) -> bool\n\ +"B.endswith(suffix [,start [,end]]) -> bool\n\ \n\ -Return True if S ends with the specified suffix, False otherwise.\n\ -With optional start, test S beginning at that position.\n\ -With optional end, stop comparing S at that position.\n\ +Return True if B ends with the specified suffix, False otherwise.\n\ +With optional start, test B beginning at that position.\n\ +With optional end, stop comparing B at that position.\n\ suffix can also be a tuple of strings to try."); static PyObject * @@ -2876,63 +2732,50 @@ string_endswith(PyStringObject *self, PyObject *args) PyDoc_STRVAR(decode__doc__, -"S.decode([encoding[,errors]]) -> object\n\ +"B.decode([encoding[, errors]]) -> object\n\ \n\ Decodes S using the codec registered for encoding. encoding defaults\n\ to the default encoding. errors may be given to set a different error\n\ -handling scheme. Default is 'strict' meaning that encoding errors raise\n\ -a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'\n\ +handling scheme. Default is 'strict' meaning that encoding errors raise\n\ +a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'\n\ as well as any other name registerd with codecs.register_error that is\n\ able to handle UnicodeDecodeErrors."); static PyObject * -string_decode(PyStringObject *self, PyObject *args) +string_decode(PyObject *self, PyObject *args) { - char *encoding = NULL; - char *errors = NULL; - PyObject *v; - - if (!PyArg_ParseTuple(args, "|ss:decode", &encoding, &errors)) - return NULL; - v = PyString_AsDecodedObject((PyObject *)self, encoding, errors); - if (v == NULL) - goto onError; - if (!PyString_Check(v) && !PyUnicode_Check(v)) { - PyErr_Format(PyExc_TypeError, - "decoder did not return a string/unicode object " - "(type=%.400s)", - Py_Type(v)->tp_name); - Py_DECREF(v); - return NULL; - } - return v; + const char *encoding = NULL; + const char *errors = NULL; - onError: - return NULL; + if (!PyArg_ParseTuple(args, "|ss:decode", &encoding, &errors)) + return NULL; + if (encoding == NULL) + encoding = PyUnicode_GetDefaultEncoding(); + return PyCodec_Decode(self, encoding, errors); } PyDoc_STRVAR(fromhex_doc, -"str8.fromhex(string) -> str8\n\ +"bytes.fromhex(string) -> bytes\n\ \n\ -Create a str8 object from a string of hexadecimal numbers.\n\ -Spaces between two numbers are accepted. Example:\n\ -str8.fromhex('10 1112') -> s'\\x10\\x11\\x12'."); +Create a bytes object from a string of hexadecimal numbers.\n\ +Spaces between two numbers are accepted.\n\ +Example: bytes.fromhex('B9 01EF') -> b'\\xb9\\x01\\xef'."); static int hex_digit_to_int(Py_UNICODE c) { - if (c >= 128) - return -1; - if (ISDIGIT(c)) - return c - '0'; - else { - if (ISUPPER(c)) - c = TOLOWER(c); - if (c >= 'a' && c <= 'f') - return c - 'a' + 10; - } - return -1; + if (c >= 128) + return -1; + if (ISDIGIT(c)) + return c - '0'; + else { + if (ISUPPER(c)) + c = TOLOWER(c); + if (c >= 'a' && c <= 'f') + return c - 'a' + 10; + } + return -1; } static PyObject * @@ -2975,7 +2818,7 @@ string_fromhex(PyObject *cls, PyObject *args) return newstring; error: - Py_DECREF(newstring); + Py_XDECREF(newstring); return NULL; } @@ -3058,11 +2901,11 @@ string_new(PyTypeObject *type, PyObject *args, PyObject *kwds) const char *errors = NULL; PyObject *new = NULL; Py_ssize_t i, size; - static char *kwlist[] = {"object", "encoding", "errors", 0}; + static char *kwlist[] = {"source", "encoding", "errors", 0}; if (type != &PyString_Type) return str_subtype_new(type, args, kwds); - if (!PyArg_ParseTupleAndKeywords(args, kwds, "|Oss:str8", kwlist, &x, + if (!PyArg_ParseTupleAndKeywords(args, kwds, "|Oss:bytes", kwlist, &x, &encoding, &errors)) return NULL; if (x == NULL) { @@ -3085,34 +2928,37 @@ string_new(PyTypeObject *type, PyObject *args, PyObject *kwds) new = PyCodec_Encode(x, encoding, errors); if (new == NULL) return NULL; - /* XXX(gb): must accept bytes here since codecs output bytes - at the moment */ - if (PyBytes_Check(new)) { - PyObject *str; - str = PyString_FromString(PyBytes_AsString(new)); - Py_DECREF(new); - if (!str) - return NULL; - return str; - } - if (!PyString_Check(new)) { - PyErr_Format(PyExc_TypeError, - "encoder did not return a str8 " - "object (type=%.400s)", - Py_Type(new)->tp_name); - Py_DECREF(new); - return NULL; - } + assert(PyString_Check(new)); return new; } /* If it's not unicode, there can't be encoding or errors */ if (encoding != NULL || errors != NULL) { PyErr_SetString(PyExc_TypeError, - "encoding or errors without a string argument"); + "encoding or errors without a string argument"); return NULL; } + /* Is it an int? */ + size = PyNumber_AsSsize_t(x, PyExc_ValueError); + if (size == -1 && PyErr_Occurred()) { + PyErr_Clear(); + } + else { + if (size < 0) { + PyErr_SetString(PyExc_ValueError, "negative count"); + return NULL; + } + new = PyString_FromStringAndSize(NULL, size); + if (new == NULL) { + return NULL; + } + if (size > 0) { + memset(((PyStringObject*)new)->ob_sval, 0, size); + } + return new; + } + /* Use the modern buffer interface */ if (PyObject_CheckBuffer(x)) { Py_buffer view; @@ -3133,8 +2979,10 @@ string_new(PyTypeObject *type, PyObject *args, PyObject *kwds) return NULL; } - /* For the iterator version, create a string object and resize as needed. */ - /* XXX(gb): is 64 a good value? also, optimize this if length is known */ + /* For iterator version, create a string object and resize as needed */ + /* XXX(gb): is 64 a good value? also, optimize if length is known */ + /* XXX(guido): perhaps use Pysequence_Fast() -- I can't imagine the + input being a truly long iterator. */ size = 64; new = PyString_FromStringAndSize(NULL, size); if (new == NULL) @@ -3158,9 +3006,9 @@ string_new(PyTypeObject *type, PyObject *args, PyObject *kwds) item = iternext(it); if (item == NULL) { if (PyErr_Occurred()) { - if (!PyErr_ExceptionMatches(PyExc_StopIteration)) - goto error; - PyErr_Clear(); + if (!PyErr_ExceptionMatches(PyExc_StopIteration)) + goto error; + PyErr_Clear(); } break; } @@ -3193,7 +3041,7 @@ string_new(PyTypeObject *type, PyObject *args, PyObject *kwds) return new; error: - /* Error handling when it != NULL */ + /* Error handling when new != NULL */ Py_XDECREF(it); Py_DECREF(new); return NULL; @@ -3213,43 +3061,32 @@ str_subtype_new(PyTypeObject *type, PyObject *args, PyObject *kwds) n = PyString_GET_SIZE(tmp); pnew = type->tp_alloc(type, n); if (pnew != NULL) { - Py_MEMCPY(PyString_AS_STRING(pnew), PyString_AS_STRING(tmp), n+1); + Py_MEMCPY(PyString_AS_STRING(pnew), + PyString_AS_STRING(tmp), n+1); ((PyStringObject *)pnew)->ob_shash = ((PyStringObject *)tmp)->ob_shash; - ((PyStringObject *)pnew)->ob_sstate = SSTATE_NOT_INTERNED; } Py_DECREF(tmp); return pnew; } -static PyObject * -string_mod(PyObject *v, PyObject *w) -{ - if (!PyString_Check(v)) { - Py_INCREF(Py_NotImplemented); - return Py_NotImplemented; - } - return PyString_Format(v, w); -} - -static PyNumberMethods string_as_number = { - 0, /*nb_add*/ - 0, /*nb_subtract*/ - 0, /*nb_multiply*/ - string_mod, /*nb_remainder*/ -}; - PyDoc_STRVAR(string_doc, -"str(object) -> string\n\ +"bytes(iterable_of_ints) -> bytes.\n\ +bytes(string, encoding[, errors]) -> bytes\n\ +bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer.\n\ +bytes(memory_view) -> bytes.\n\ \n\ -Return a nice string representation of the object.\n\ -If the argument is a string, the return value is the same object."); +Construct an immutable array of bytes from:\n\ + - an iterable yielding integers in range(256)\n\ + - a text string encoded using the specified encoding\n\ + - a bytes or a buffer object\n\ + - any object implementing the buffer API."); static PyObject *str_iter(PyObject *seq); PyTypeObject PyString_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) - "str8", + "bytes", sizeof(PyStringObject), sizeof(char), string_dealloc, /* tp_dealloc */ @@ -3257,8 +3094,8 @@ PyTypeObject PyString_Type = { 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_compare */ - string_repr, /* tp_repr */ - &string_as_number, /* tp_as_number */ + (reprfunc)string_repr, /* tp_repr */ + 0, /* tp_as_number */ &string_as_sequence, /* tp_as_sequence */ &string_as_mapping, /* tp_as_mapping */ (hashfunc)string_hash, /* tp_hash */ @@ -3294,14 +3131,15 @@ void PyString_Concat(register PyObject **pv, register PyObject *w) { register PyObject *v; + assert(pv != NULL); if (*pv == NULL) return; - if (w == NULL || !PyString_Check(*pv)) { + if (w == NULL) { Py_DECREF(*pv); *pv = NULL; return; } - v = string_concat((PyStringObject *) *pv, w); + v = string_concat(*pv, w); Py_DECREF(*pv); *pv = v; } @@ -3334,8 +3172,7 @@ _PyString_Resize(PyObject **pv, Py_ssize_t newsize) register PyObject *v; register PyStringObject *sv; v = *pv; - if (!PyString_Check(v) || Py_Refcnt(v) != 1 || newsize < 0 || - PyString_CHECK_INTERNED(v)) { + if (!PyString_Check(v) || Py_Refcnt(v) != 1 || newsize < 0) { *pv = 0; Py_DECREF(v); PyErr_BadInternalCall(); @@ -3359,85 +3196,6 @@ _PyString_Resize(PyObject **pv, Py_ssize_t newsize) return 0; } -/* Helpers for formatstring */ - -Py_LOCAL_INLINE(PyObject *) -getnextarg(PyObject *args, Py_ssize_t arglen, Py_ssize_t *p_argidx) -{ - Py_ssize_t argidx = *p_argidx; - if (argidx < arglen) { - (*p_argidx)++; - if (arglen < 0) - return args; - else - return PyTuple_GetItem(args, argidx); - } - PyErr_SetString(PyExc_TypeError, - "not enough arguments for format string"); - return NULL; -} - -/* Format codes - * F_LJUST '-' - * F_SIGN '+' - * F_BLANK ' ' - * F_ALT '#' - * F_ZERO '0' - */ -#define F_LJUST (1<<0) -#define F_SIGN (1<<1) -#define F_BLANK (1<<2) -#define F_ALT (1<<3) -#define F_ZERO (1<<4) - -Py_LOCAL_INLINE(int) -formatfloat(char *buf, size_t buflen, int flags, - int prec, int type, PyObject *v) -{ - /* fmt = '%#.' + `prec` + `type` - worst case length = 3 + 10 (len of INT_MAX) + 1 = 14 (use 20)*/ - char fmt[20]; - double x; - x = PyFloat_AsDouble(v); - if (x == -1.0 && PyErr_Occurred()) { - PyErr_Format(PyExc_TypeError, "float argument required, " - "not %.200s", Py_Type(v)->tp_name); - return -1; - } - if (prec < 0) - prec = 6; - if (type == 'f' && fabs(x)/1e25 >= 1e25) - type = 'g'; - /* Worst case length calc to ensure no buffer overrun: - - 'g' formats: - fmt = %#.<prec>g - buf = '-' + [0-9]*prec + '.' + 'e+' + (longest exp - for any double rep.) - len = 1 + prec + 1 + 2 + 5 = 9 + prec - - 'f' formats: - buf = '-' + [0-9]*x + '.' + [0-9]*prec (with x < 50) - len = 1 + 50 + 1 + prec = 52 + prec - - If prec=0 the effective precision is 1 (the leading digit is - always given), therefore increase the length by one. - - */ - if (((type == 'g' || type == 'G') && - buflen <= (size_t)10 + (size_t)prec) || - (type == 'f' && buflen <= (size_t)53 + (size_t)prec)) { - PyErr_SetString(PyExc_OverflowError, - "formatted float is too long (precision too large?)"); - return -1; - } - PyOS_snprintf(fmt, sizeof(fmt), "%%%s.%d%c", - (flags&F_ALT) ? "#" : "", - prec, type); - PyOS_ascii_formatd(buf, buflen, fmt, x); - return (int)strlen(buf); -} - /* _PyString_FormatLong emulates the format codes d, u, o, x and X, and * the F_ALT flag, for Python's long (unbounded) ints. It's not used for * Python's regular ints. @@ -3516,7 +3274,8 @@ _PyString_FormatLong(PyObject *val, int flags, int prec, int type, } llen = PyString_Size(result); if (llen > INT_MAX) { - PyErr_SetString(PyExc_ValueError, "string too large in _PyString_FormatLong"); + PyErr_SetString(PyExc_ValueError, + "string too large in _PyString_FormatLong"); return NULL; } len = (int)llen; @@ -3534,7 +3293,7 @@ _PyString_FormatLong(PyObject *val, int flags, int prec, int type, (type == 'o' || type == 'x' || type == 'X'))) { assert(buf[sign] == '0'); assert(buf[sign+1] == 'x' || buf[sign+1] == 'X' || - buf[sign+1] == 'o'); + buf[sign+1] == 'o'); numnondigits -= 2; buf += 2; len -= 2; @@ -3580,623 +3339,6 @@ _PyString_FormatLong(PyObject *val, int flags, int prec, int type, return result; } -Py_LOCAL_INLINE(int) -formatint(char *buf, size_t buflen, int flags, - int prec, int type, PyObject *v) -{ - /* fmt = '%#.' + `prec` + 'l' + `type` - worst case length = 3 + 19 (worst len of INT_MAX on 64-bit machine) - + 1 + 1 = 24 */ - char fmt[64]; /* plenty big enough! */ - char *sign; - long x; - - x = PyInt_AsLong(v); - if (x == -1 && PyErr_Occurred()) { - PyErr_Format(PyExc_TypeError, "int argument required, not %.200s", - Py_Type(v)->tp_name); - return -1; - } - if (x < 0 && type == 'u') { - type = 'd'; - } - if (x < 0 && (type == 'x' || type == 'X' || type == 'o')) - sign = "-"; - else - sign = ""; - if (prec < 0) - prec = 1; - - if ((flags & F_ALT) && - (type == 'x' || type == 'X' || type == 'o')) { - /* When converting under %#o, %#x or %#X, there are a number - * of issues that cause pain: - * - for %#o, we want a different base marker than C - * - when 0 is being converted, the C standard leaves off - * the '0x' or '0X', which is inconsistent with other - * %#x/%#X conversions and inconsistent with Python's - * hex() function - * - there are platforms that violate the standard and - * convert 0 with the '0x' or '0X' - * (Metrowerks, Compaq Tru64) - * - there are platforms that give '0x' when converting - * under %#X, but convert 0 in accordance with the - * standard (OS/2 EMX) - * - * We can achieve the desired consistency by inserting our - * own '0x' or '0X' prefix, and substituting %x/%X in place - * of %#x/%#X. - * - * Note that this is the same approach as used in - * formatint() in unicodeobject.c - */ - PyOS_snprintf(fmt, sizeof(fmt), "%s0%c%%.%dl%c", - sign, type, prec, type); - } - else { - PyOS_snprintf(fmt, sizeof(fmt), "%s%%%s.%dl%c", - sign, (flags&F_ALT) ? "#" : "", - prec, type); - } - - /* buf = '+'/'-'/'' + '0o'/'0x'/'' + '[0-9]'*max(prec, len(x in octal)) - * worst case buf = '-0x' + [0-9]*prec, where prec >= 11 - */ - if (buflen <= 14 || buflen <= (size_t)3 + (size_t)prec) { - PyErr_SetString(PyExc_OverflowError, - "formatted integer is too long (precision too large?)"); - return -1; - } - if (sign[0]) - PyOS_snprintf(buf, buflen, fmt, -x); - else - PyOS_snprintf(buf, buflen, fmt, x); - return (int)strlen(buf); -} - -Py_LOCAL_INLINE(int) -formatchar(char *buf, size_t buflen, PyObject *v) -{ - /* presume that the buffer is at least 2 characters long */ - if (PyString_Check(v)) { - if (!PyArg_Parse(v, "c;%c requires int or char", &buf[0])) - return -1; - } - else { - if (!PyArg_Parse(v, "b;%c requires int or char", &buf[0])) - return -1; - } - buf[1] = '\0'; - return 1; -} - -/* fmt%(v1,v2,...) is roughly equivalent to sprintf(fmt, v1, v2, ...) - - FORMATBUFLEN is the length of the buffer in which the floats, ints, & - chars are formatted. XXX This is a magic number. Each formatting - routine does bounds checking to ensure no overflow, but a better - solution may be to malloc a buffer of appropriate size for each - format. For now, the current solution is sufficient. -*/ -#define FORMATBUFLEN (size_t)120 - -PyObject * -PyString_Format(PyObject *format, PyObject *args) -{ - char *fmt, *res; - Py_ssize_t arglen, argidx; - Py_ssize_t reslen, rescnt, fmtcnt; - int args_owned = 0; - PyObject *result, *orig_args; - PyObject *v, *w; - PyObject *dict = NULL; - if (format == NULL || !PyString_Check(format) || args == NULL) { - PyErr_BadInternalCall(); - return NULL; - } - orig_args = args; - fmt = PyString_AS_STRING(format); - fmtcnt = PyString_GET_SIZE(format); - reslen = rescnt = fmtcnt + 100; - result = PyString_FromStringAndSize((char *)NULL, reslen); - if (result == NULL) - return NULL; - res = PyString_AsString(result); - if (PyTuple_Check(args)) { - arglen = PyTuple_GET_SIZE(args); - argidx = 0; - } - else { - arglen = -1; - argidx = -2; - } - if (Py_Type(args)->tp_as_mapping && !PyTuple_Check(args) && - !PyString_Check(args) && !PyUnicode_Check(args)) - dict = args; - while (--fmtcnt >= 0) { - if (*fmt != '%') { - if (--rescnt < 0) { - rescnt = fmtcnt + 100; - reslen += rescnt; - if (_PyString_Resize(&result, reslen) < 0) - return NULL; - res = PyString_AS_STRING(result) - + reslen - rescnt; - --rescnt; - } - *res++ = *fmt++; - } - else { - /* Got a format specifier */ - int flags = 0; - Py_ssize_t width = -1; - int prec = -1; - int c = '\0'; - int fill; - PyObject *v = NULL; - PyObject *temp = NULL; - char *pbuf; - int sign; - Py_ssize_t len; - char formatbuf[FORMATBUFLEN]; - /* For format{float,int,char}() */ - char *fmt_start = fmt; - Py_ssize_t argidx_start = argidx; - - fmt++; - if (*fmt == '(') { - char *keystart; - Py_ssize_t keylen; - PyObject *key; - int pcount = 1; - - if (dict == NULL) { - PyErr_SetString(PyExc_TypeError, - "format requires a mapping"); - goto error; - } - ++fmt; - --fmtcnt; - keystart = fmt; - /* Skip over balanced parentheses */ - while (pcount > 0 && --fmtcnt >= 0) { - if (*fmt == ')') - --pcount; - else if (*fmt == '(') - ++pcount; - fmt++; - } - keylen = fmt - keystart - 1; - if (fmtcnt < 0 || pcount > 0) { - PyErr_SetString(PyExc_ValueError, - "incomplete format key"); - goto error; - } - key = PyString_FromStringAndSize(keystart, - keylen); - if (key == NULL) - goto error; - if (args_owned) { - Py_DECREF(args); - args_owned = 0; - } - args = PyObject_GetItem(dict, key); - Py_DECREF(key); - if (args == NULL) { - goto error; - } - args_owned = 1; - arglen = -1; - argidx = -2; - } - while (--fmtcnt >= 0) { - switch (c = *fmt++) { - case '-': flags |= F_LJUST; continue; - case '+': flags |= F_SIGN; continue; - case ' ': flags |= F_BLANK; continue; - case '#': flags |= F_ALT; continue; - case '0': flags |= F_ZERO; continue; - } - break; - } - if (c == '*') { - v = getnextarg(args, arglen, &argidx); - if (v == NULL) - goto error; - if (!PyInt_Check(v)) { - PyErr_SetString(PyExc_TypeError, - "* wants int"); - goto error; - } - width = PyInt_AsLong(v); - if (width == -1 && PyErr_Occurred()) - goto error; - if (width < 0) { - flags |= F_LJUST; - width = -width; - } - if (--fmtcnt >= 0) - c = *fmt++; - } - else if (c >= 0 && ISDIGIT(c)) { - width = c - '0'; - while (--fmtcnt >= 0) { - c = Py_CHARMASK(*fmt++); - if (!ISDIGIT(c)) - break; - if ((width*10) / 10 != width) { - PyErr_SetString( - PyExc_ValueError, - "width too big"); - goto error; - } - width = width*10 + (c - '0'); - } - } - if (c == '.') { - prec = 0; - if (--fmtcnt >= 0) - c = *fmt++; - if (c == '*') { - v = getnextarg(args, arglen, &argidx); - if (v == NULL) - goto error; - if (!PyInt_Check(v)) { - PyErr_SetString( - PyExc_TypeError, - "* wants int"); - goto error; - } - prec = PyInt_AsLong(v); - if (prec == -1 && PyErr_Occurred()) - goto error; - if (prec < 0) - prec = 0; - if (--fmtcnt >= 0) - c = *fmt++; - } - else if (c >= 0 && ISDIGIT(c)) { - prec = c - '0'; - while (--fmtcnt >= 0) { - c = Py_CHARMASK(*fmt++); - if (!ISDIGIT(c)) - break; - if ((prec*10) / 10 != prec) { - PyErr_SetString( - PyExc_ValueError, - "prec too big"); - goto error; - } - prec = prec*10 + (c - '0'); - } - } - } /* prec */ - if (fmtcnt >= 0) { - if (c == 'h' || c == 'l' || c == 'L') { - if (--fmtcnt >= 0) - c = *fmt++; - } - } - if (fmtcnt < 0) { - PyErr_SetString(PyExc_ValueError, - "incomplete format"); - goto error; - } - if (c != '%') { - v = getnextarg(args, arglen, &argidx); - if (v == NULL) - goto error; - } - sign = 0; - fill = ' '; - switch (c) { - case '%': - pbuf = "%"; - len = 1; - break; - case 's': - if (PyUnicode_Check(v)) { - fmt = fmt_start; - argidx = argidx_start; - goto unicode; - } - temp = _PyObject_Str(v); - if (temp != NULL && PyUnicode_Check(temp)) { - Py_DECREF(temp); - fmt = fmt_start; - argidx = argidx_start; - goto unicode; - } - /* Fall through */ - case 'r': - if (c == 'r') - temp = PyObject_ReprStr8(v); - if (temp == NULL) - goto error; - if (!PyString_Check(temp)) { - PyErr_SetString(PyExc_TypeError, - "%s argument has non-string str()/repr()"); - Py_DECREF(temp); - goto error; - } - pbuf = PyString_AS_STRING(temp); - len = PyString_GET_SIZE(temp); - if (prec >= 0 && len > prec) - len = prec; - break; - case 'i': - case 'd': - case 'u': - case 'o': - case 'x': - case 'X': - if (c == 'i') - c = 'd'; - if (PyLong_Check(v)) { - int ilen; - temp = _PyString_FormatLong(v, flags, - prec, c, &pbuf, &ilen); - len = ilen; - if (!temp) - goto error; - sign = 1; - } - else { - pbuf = formatbuf; - len = formatint(pbuf, - sizeof(formatbuf), - flags, prec, c, v); - if (len < 0) - goto error; - sign = 1; - } - if (flags & F_ZERO) - fill = '0'; - break; - case 'e': - case 'E': - case 'f': - case 'F': - case 'g': - case 'G': - if (c == 'F') - c = 'f'; - pbuf = formatbuf; - len = formatfloat(pbuf, sizeof(formatbuf), - flags, prec, c, v); - if (len < 0) - goto error; - sign = 1; - if (flags & F_ZERO) - fill = '0'; - break; - case 'c': - if (PyUnicode_Check(v)) { - fmt = fmt_start; - argidx = argidx_start; - goto unicode; - } - pbuf = formatbuf; - len = formatchar(pbuf, sizeof(formatbuf), v); - if (len < 0) - goto error; - break; - default: - PyErr_Format(PyExc_ValueError, - "unsupported format character '%c' (0x%x) " - "at index %zd", - c, c, - (Py_ssize_t)(fmt - 1 - - PyString_AsString(format))); - goto error; - } - if (sign) { - if (*pbuf == '-' || *pbuf == '+') { - sign = *pbuf++; - len--; - } - else if (flags & F_SIGN) - sign = '+'; - else if (flags & F_BLANK) - sign = ' '; - else - sign = 0; - } - if (width < len) - width = len; - if (rescnt - (sign != 0) < width) { - reslen -= rescnt; - rescnt = width + fmtcnt + 100; - reslen += rescnt; - if (reslen < 0) { - Py_DECREF(result); - Py_XDECREF(temp); - return PyErr_NoMemory(); - } - if (_PyString_Resize(&result, reslen) < 0) { - Py_XDECREF(temp); - return NULL; - } - res = PyString_AS_STRING(result) - + reslen - rescnt; - } - if (sign) { - if (fill != ' ') - *res++ = sign; - rescnt--; - if (width > len) - width--; - } - if ((flags & F_ALT) && - (c == 'x' || c == 'X' || c == 'o')) { - assert(pbuf[0] == '0'); - assert(pbuf[1] == c); - if (fill != ' ') { - *res++ = *pbuf++; - *res++ = *pbuf++; - } - rescnt -= 2; - width -= 2; - if (width < 0) - width = 0; - len -= 2; - } - if (width > len && !(flags & F_LJUST)) { - do { - --rescnt; - *res++ = fill; - } while (--width > len); - } - if (fill == ' ') { - if (sign) - *res++ = sign; - if ((flags & F_ALT) && - (c == 'x' || c == 'X' || c == 'o')) { - assert(pbuf[0] == '0'); - assert(pbuf[1] == c); - *res++ = *pbuf++; - *res++ = *pbuf++; - } - } - Py_MEMCPY(res, pbuf, len); - res += len; - rescnt -= len; - while (--width >= len) { - --rescnt; - *res++ = ' '; - } - if (dict && (argidx < arglen) && c != '%') { - PyErr_SetString(PyExc_TypeError, - "not all arguments converted during string formatting"); - Py_XDECREF(temp); - goto error; - } - Py_XDECREF(temp); - } /* '%' */ - } /* until end */ - if (argidx < arglen && !dict) { - PyErr_SetString(PyExc_TypeError, - "not all arguments converted during string formatting"); - goto error; - } - if (args_owned) { - Py_DECREF(args); - } - _PyString_Resize(&result, reslen - rescnt); - return result; - - unicode: - if (args_owned) { - Py_DECREF(args); - args_owned = 0; - } - /* Fiddle args right (remove the first argidx arguments) */ - if (PyTuple_Check(orig_args) && argidx > 0) { - PyObject *v; - Py_ssize_t n = PyTuple_GET_SIZE(orig_args) - argidx; - v = PyTuple_New(n); - if (v == NULL) - goto error; - while (--n >= 0) { - PyObject *w = PyTuple_GET_ITEM(orig_args, n + argidx); - Py_INCREF(w); - PyTuple_SET_ITEM(v, n, w); - } - args = v; - } else { - Py_INCREF(orig_args); - args = orig_args; - } - args_owned = 1; - /* Take what we have of the result and let the Unicode formatting - function format the rest of the input. */ - rescnt = res - PyString_AS_STRING(result); - if (_PyString_Resize(&result, rescnt)) - goto error; - fmtcnt = PyString_GET_SIZE(format) - \ - (fmt - PyString_AS_STRING(format)); - format = PyUnicode_Decode(fmt, fmtcnt, NULL, NULL); - if (format == NULL) - goto error; - v = PyUnicode_Format(format, args); - Py_DECREF(format); - if (v == NULL) - goto error; - /* Paste what we have (result) to what the Unicode formatting - function returned (v) and return the result (or error) */ - w = PyUnicode_Concat(result, v); - Py_DECREF(result); - Py_DECREF(v); - Py_DECREF(args); - return w; - - error: - Py_DECREF(result); - if (args_owned) { - Py_DECREF(args); - } - return NULL; -} - -void -PyString_InternInPlace(PyObject **p) -{ - register PyStringObject *s = (PyStringObject *)(*p); - PyObject *t; - if (s == NULL || !PyString_Check(s)) - Py_FatalError("PyString_InternInPlace: strings only please!"); - /* If it's a string subclass, we don't really know what putting - it in the interned dict might do. */ - if (!PyString_CheckExact(s)) - return; - if (PyString_CHECK_INTERNED(s)) - return; - if (interned == NULL) { - interned = PyDict_New(); - if (interned == NULL) { - PyErr_Clear(); /* Don't leave an exception */ - return; - } - } - t = PyDict_GetItem(interned, (PyObject *)s); - if (t) { - Py_INCREF(t); - Py_DECREF(*p); - *p = t; - return; - } - - if (PyDict_SetItem(interned, (PyObject *)s, (PyObject *)s) < 0) { - PyErr_Clear(); - return; - } - /* The two references in interned are not counted by refcnt. - The string deallocator will take care of this */ - Py_Refcnt(s) -= 2; - PyString_CHECK_INTERNED(s) = SSTATE_INTERNED_MORTAL; -} - -void -PyString_InternImmortal(PyObject **p) -{ - PyString_InternInPlace(p); - if (PyString_CHECK_INTERNED(*p) != SSTATE_INTERNED_IMMORTAL) { - PyString_CHECK_INTERNED(*p) = SSTATE_INTERNED_IMMORTAL; - Py_INCREF(*p); - } -} - - -PyObject * -PyString_InternFromString(const char *cp) -{ - PyObject *s = PyString_FromString(cp); - if (s == NULL) - return NULL; - PyString_InternInPlace(&s); - return s; -} - void PyString_Fini(void) { @@ -4209,58 +3351,6 @@ PyString_Fini(void) nullstring = NULL; } -void _Py_ReleaseInternedStrings(void) -{ - PyObject *keys; - PyStringObject *s; - Py_ssize_t i, n; - Py_ssize_t immortal_size = 0, mortal_size = 0; - - if (interned == NULL || !PyDict_Check(interned)) - return; - keys = PyDict_Keys(interned); - if (keys == NULL || !PyList_Check(keys)) { - PyErr_Clear(); - return; - } - - /* Since _Py_ReleaseInternedStrings() is intended to help a leak - detector, interned strings are not forcibly deallocated; rather, we - give them their stolen references back, and then clear and DECREF - the interned dict. */ - - n = PyList_GET_SIZE(keys); - fprintf(stderr, "releasing %" PY_FORMAT_SIZE_T "d interned strings\n", - n); - for (i = 0; i < n; i++) { - s = (PyStringObject *) PyList_GET_ITEM(keys, i); - switch (s->ob_sstate) { - case SSTATE_NOT_INTERNED: - /* XXX Shouldn't happen */ - break; - case SSTATE_INTERNED_IMMORTAL: - Py_Refcnt(s) += 1; - immortal_size += Py_Size(s); - break; - case SSTATE_INTERNED_MORTAL: - Py_Refcnt(s) += 2; - mortal_size += Py_Size(s); - break; - default: - Py_FatalError("Inconsistent interned string state."); - } - s->ob_sstate = SSTATE_NOT_INTERNED; - } - fprintf(stderr, "total size of all interned strings: " - "%" PY_FORMAT_SIZE_T "d/%" PY_FORMAT_SIZE_T "d " - "mortal/immortal\n", mortal_size, immortal_size); - Py_DECREF(keys); - PyDict_Clear(interned); - Py_DECREF(interned); - interned = NULL; -} - - /*********************** Str Iterator ****************************/ typedef struct { diff --git a/Objects/typeobject.c b/Objects/typeobject.c index 44cf5f1..4266a7c 100644 --- a/Objects/typeobject.c +++ b/Objects/typeobject.c @@ -1015,7 +1015,7 @@ class_name(PyObject *cls) if (name == NULL) { PyErr_Clear(); Py_XDECREF(name); - name = PyObject_ReprStr8(cls); + name = PyObject_Repr(cls); } if (name == NULL) return NULL; @@ -1654,7 +1654,7 @@ type_new(PyTypeObject *metatype, PyObject *args, PyObject *kwds) } /* Check arguments: (name, bases, dict) */ - if (!PyArg_ParseTupleAndKeywords(args, kwds, "SO!O!:type", kwlist, + if (!PyArg_ParseTupleAndKeywords(args, kwds, "UO!O!:type", kwlist, &name, &PyTuple_Type, &bases, &PyDict_Type, &dict)) diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c index c568a8e..ae34c9e 100644 --- a/Objects/unicodeobject.c +++ b/Objects/unicodeobject.c @@ -101,7 +101,7 @@ extern "C" { function will delete the reference from this dictionary. Another way to look at this is that to say that the actual reference - count of a string is: s->ob_refcnt + (s->ob_sstate?2:0) + count of a string is: s->ob_refcnt + (s->state ? 2 : 0) */ static PyObject *interned; @@ -998,7 +998,10 @@ PyObject *PyUnicode_FromObject(register PyObject *obj) return PyUnicode_FromUnicode(PyUnicode_AS_UNICODE(obj), PyUnicode_GET_SIZE(obj)); } - return PyUnicode_FromEncodedObject(obj, NULL, "strict"); + PyErr_Format(PyExc_TypeError, + "Can't convert '%.100s' object to str implicitly", + Py_Type(obj)->tp_name); + return NULL; } PyObject *PyUnicode_FromEncodedObject(register PyObject *obj, @@ -1219,22 +1222,7 @@ PyObject *PyUnicode_AsEncodedString(PyObject *unicode, v = PyCodec_Encode(unicode, encoding, errors); if (v == NULL) goto onError; - if (!PyBytes_Check(v)) { - if (PyString_Check(v)) { - /* Old codec, turn it into bytes */ - PyObject *b = PyBytes_FromObject(v); - Py_DECREF(v); - return b; - } - PyErr_Format(PyExc_TypeError, - "encoder did not return a bytes object " - "(type=%.400s, encoding=%.20s, errors=%.20s)", - v->ob_type->tp_name, - encoding ? encoding : "NULL", - errors ? errors : "NULL"); - Py_DECREF(v); - goto onError; - } + assert(PyString_Check(v)); return v; onError: @@ -1245,19 +1233,15 @@ PyObject *_PyUnicode_AsDefaultEncodedString(PyObject *unicode, const char *errors) { PyObject *v = ((PyUnicodeObject *)unicode)->defenc; - PyObject *b; if (v) return v; if (errors != NULL) Py_FatalError("non-NULL encoding in _PyUnicode_AsDefaultEncodedString"); - b = PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(unicode), + v = PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(unicode), PyUnicode_GET_SIZE(unicode), NULL); - if (!b) + if (!v) return NULL; - v = PyString_FromStringAndSize(PyBytes_AsString(b), - PyBytes_Size(b)); - Py_DECREF(b); ((PyUnicodeObject *)unicode)->defenc = v; return v; } @@ -1420,11 +1404,11 @@ int unicode_decode_call_errorhandler(const char *errors, PyObject **errorHandler inputobj = PyUnicodeDecodeError_GetObject(*exceptionObject); if (!inputobj) goto onError; - if (!PyBytes_Check(inputobj)) { + if (!PyString_Check(inputobj)) { PyErr_Format(PyExc_TypeError, "exception attribute object must be bytes"); } - *input = PyBytes_AS_STRING(inputobj); - insize = PyBytes_GET_SIZE(inputobj); + *input = PyString_AS_STRING(inputobj); + insize = PyString_GET_SIZE(inputobj); *inend = *input + insize; /* we can DECREF safely, as the exception has another reference, so the object won't go away. */ @@ -1674,7 +1658,7 @@ PyObject *PyUnicode_EncodeUTF7(const Py_UNICODE *s, int encodeWhiteSpace, const char *errors) { - PyObject *v; + PyObject *v, *result; /* It might be possible to tighten this worst case */ Py_ssize_t cbAllocated = 5 * size; int inShift = 0; @@ -1685,7 +1669,7 @@ PyObject *PyUnicode_EncodeUTF7(const Py_UNICODE *s, char * start; if (size == 0) - return PyBytes_FromStringAndSize(NULL, 0); + return PyString_FromStringAndSize(NULL, 0); v = PyBytes_FromStringAndSize(NULL, cbAllocated); if (v == NULL) @@ -1757,11 +1741,9 @@ PyObject *PyUnicode_EncodeUTF7(const Py_UNICODE *s, *out++ = '-'; } - if (PyBytes_Resize(v, out - start)) { - Py_DECREF(v); - return NULL; - } - return v; + result = PyString_FromStringAndSize(PyBytes_AS_STRING(v), out - start); + Py_DECREF(v); + return result; } #undef SPECIAL @@ -2001,11 +1983,11 @@ PyUnicode_EncodeUTF8(const Py_UNICODE *s, { #define MAX_SHORT_UNICHARS 300 /* largest size we'll do on the stack */ - Py_ssize_t i; /* index into s of next input byte */ - PyObject *v; /* result string object */ - char *p; /* next free byte in output buffer */ - Py_ssize_t nallocated; /* number of result bytes allocated */ - Py_ssize_t nneeded; /* number of result bytes needed */ + Py_ssize_t i; /* index into s of next input byte */ + PyObject *result; /* result string object */ + char *p; /* next free byte in output buffer */ + Py_ssize_t nallocated; /* number of result bytes allocated */ + Py_ssize_t nneeded; /* number of result bytes needed */ char stackbuf[MAX_SHORT_UNICHARS * 4]; assert(s != NULL); @@ -2017,7 +1999,7 @@ PyUnicode_EncodeUTF8(const Py_UNICODE *s, * turns out we need. */ nallocated = Py_SAFE_DOWNCAST(sizeof(stackbuf), size_t, int); - v = NULL; /* will allocate after we're done */ + result = NULL; /* will allocate after we're done */ p = stackbuf; } else { @@ -2025,10 +2007,10 @@ PyUnicode_EncodeUTF8(const Py_UNICODE *s, nallocated = size * 4; if (nallocated / 4 != size) /* overflow! */ return PyErr_NoMemory(); - v = PyBytes_FromStringAndSize(NULL, nallocated); - if (v == NULL) + result = PyString_FromStringAndSize(NULL, nallocated); + if (result == NULL) return NULL; - p = PyBytes_AS_STRING(v); + p = PyString_AS_STRING(result); } for (i = 0; i < size;) { @@ -2072,19 +2054,19 @@ encodeUCS4: } } - if (v == NULL) { + if (result == NULL) { /* This was stack allocated. */ nneeded = p - stackbuf; assert(nneeded <= nallocated); - v = PyBytes_FromStringAndSize(stackbuf, nneeded); + result = PyString_FromStringAndSize(stackbuf, nneeded); } else { /* Cut back to size actually needed. */ - nneeded = p - PyBytes_AS_STRING(v); + nneeded = p - PyString_AS_STRING(result); assert(nneeded <= nallocated); - PyBytes_Resize(v, nneeded); + _PyString_Resize(&result, nneeded); } - return v; + return result; #undef MAX_SHORT_UNICHARS } @@ -2279,7 +2261,7 @@ PyUnicode_EncodeUTF32(const Py_UNICODE *s, const char *errors, int byteorder) { - PyObject *v; + PyObject *v, *result; unsigned char *p; #ifndef Py_UNICODE_WIDE int i, pairs; @@ -2319,7 +2301,7 @@ PyUnicode_EncodeUTF32(const Py_UNICODE *s, if (byteorder == 0) STORECHAR(0xFEFF); if (size == 0) - return v; + goto done; if (byteorder == -1) { /* force LE */ @@ -2350,7 +2332,11 @@ PyUnicode_EncodeUTF32(const Py_UNICODE *s, #endif STORECHAR(ch); } - return v; + + done: + result = PyString_FromStringAndSize(PyBytes_AS_STRING(v), Py_Size(v)); + Py_DECREF(v); + return result; #undef STORECHAR } @@ -2549,7 +2535,7 @@ PyUnicode_EncodeUTF16(const Py_UNICODE *s, const char *errors, int byteorder) { - PyObject *v; + PyObject *v, *result; unsigned char *p; #ifdef Py_UNICODE_WIDE int i, pairs; @@ -2584,7 +2570,7 @@ PyUnicode_EncodeUTF16(const Py_UNICODE *s, if (byteorder == 0) STORECHAR(0xFEFF); if (size == 0) - return v; + goto done; if (byteorder == -1) { /* force LE */ @@ -2610,7 +2596,11 @@ PyUnicode_EncodeUTF16(const Py_UNICODE *s, if (ch2) STORECHAR(ch2); } - return v; + + done: + result = PyString_FromStringAndSize(PyBytes_AS_STRING(v), Py_Size(v)); + Py_DECREF(v); + return result; #undef STORECHAR } @@ -2900,7 +2890,7 @@ static const char *hexdigits = "0123456789abcdef"; PyObject *PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size) { - PyObject *repr; + PyObject *repr, *result; char *p; /* XXX(nnorwitz): rather than over-allocating, it would be @@ -3023,12 +3013,10 @@ PyObject *PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, *p++ = (char) ch; } - *p = '\0'; - if (PyBytes_Resize(repr, p - PyBytes_AS_STRING(repr))) { - Py_DECREF(repr); - return NULL; - } - return repr; + result = PyString_FromStringAndSize(PyBytes_AS_STRING(repr), + p - PyBytes_AS_STRING(repr)); + Py_DECREF(repr); + return result; } PyObject *PyUnicode_AsUnicodeEscapeString(PyObject *unicode) @@ -3159,7 +3147,7 @@ PyObject *PyUnicode_DecodeRawUnicodeEscape(const char *s, PyObject *PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size) { - PyObject *repr; + PyObject *repr, *result; char *p; char *q; @@ -3171,7 +3159,7 @@ PyObject *PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, if (repr == NULL) return NULL; if (size == 0) - return repr; + goto done; p = q = PyBytes_AS_STRING(repr); while (size-- > 0) { @@ -3205,12 +3193,12 @@ PyObject *PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, else *p++ = (char) ch; } - *p = '\0'; - if (PyBytes_Resize(repr, p - q)) { - Py_DECREF(repr); - return NULL; - } - return repr; + size = p - q; + + done: + result = PyString_FromStringAndSize(PyBytes_AS_STRING(repr), size); + Py_DECREF(repr); + return result; } PyObject *PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode) @@ -3445,23 +3433,23 @@ static PyObject *unicode_encode_ucs1(const Py_UNICODE *p, /* pointer into the output */ char *str; /* current output position */ - Py_ssize_t respos = 0; Py_ssize_t ressize; const char *encoding = (limit == 256) ? "latin-1" : "ascii"; const char *reason = (limit == 256) ? "ordinal not in range(256)" : "ordinal not in range(128)"; PyObject *errorHandler = NULL; PyObject *exc = NULL; + PyObject *result = NULL; /* the following variable is used for caching string comparisons * -1=not initialized, 0=unknown, 1=strict, 2=replace, 3=ignore, 4=xmlcharrefreplace */ int known_errorHandler = -1; /* allocate enough for a simple encoding without replacements, if we need more, we'll resize */ + if (size == 0) + return PyString_FromStringAndSize(NULL, 0); res = PyBytes_FromStringAndSize(NULL, size); if (res == NULL) - goto onError; - if (size == 0) - return res; + return NULL; str = PyBytes_AS_STRING(res); ressize = size; @@ -3589,20 +3577,13 @@ static PyObject *unicode_encode_ucs1(const Py_UNICODE *p, } } } - /* Resize if we allocated to much */ - respos = str - PyBytes_AS_STRING(res); - if (respos<ressize) - /* If this falls res will be NULL */ - PyBytes_Resize(res, respos); - Py_XDECREF(errorHandler); - Py_XDECREF(exc); - return res; - - onError: - Py_XDECREF(res); + result = PyString_FromStringAndSize(PyBytes_AS_STRING(res), + str - PyBytes_AS_STRING(res)); + onError: + Py_DECREF(res); Py_XDECREF(errorHandler); Py_XDECREF(exc); - return NULL; + return result; } PyObject *PyUnicode_EncodeLatin1(const Py_UNICODE *p, @@ -3848,20 +3829,20 @@ static int encode_mbcs(PyObject **repr, if (*repr == NULL) { /* Create string object */ - *repr = PyBytes_FromStringAndSize(NULL, mbcssize); + *repr = PyString_FromStringAndSize(NULL, mbcssize); if (*repr == NULL) return -1; } else { /* Extend string object */ - n = PyBytes_Size(*repr); - if (PyBytes_Resize(*repr, n + mbcssize) < 0) + n = PyString_Size(*repr); + if (_PyString_Resize(repr, n + mbcssize) < 0) return -1; } /* Do the conversion */ if (size > 0) { - char *s = PyBytes_AS_STRING(*repr) + n; + char *s = PyString_AS_STRING(*repr) + n; if (0 == WideCharToMultiByte(CP_ACP, 0, p, size, s, mbcssize, NULL, NULL)) { PyErr_SetFromWindowsErrWithFilename(0, NULL); return -1; @@ -4341,16 +4322,14 @@ static PyObject *charmapencode_lookup(Py_UNICODE c, PyObject *mapping) } static int -charmapencode_resize(PyObject *outobj, Py_ssize_t *outpos, Py_ssize_t requiredsize) +charmapencode_resize(PyObject **outobj, Py_ssize_t *outpos, Py_ssize_t requiredsize) { - Py_ssize_t outsize = PyBytes_GET_SIZE( outobj); + Py_ssize_t outsize = PyString_GET_SIZE(*outobj); /* exponentially overallocate to minimize reallocations */ if (requiredsize < 2*outsize) requiredsize = 2*outsize; - if (PyBytes_Resize(outobj, requiredsize)) { - Py_DECREF(outobj); + if (_PyString_Resize(outobj, requiredsize)) return -1; - } return 0; } @@ -4365,21 +4344,21 @@ typedef enum charmapencode_result { reallocation error occurred. The caller must decref the result */ static charmapencode_result charmapencode_output(Py_UNICODE c, PyObject *mapping, - PyObject *outobj, Py_ssize_t *outpos) + PyObject **outobj, Py_ssize_t *outpos) { PyObject *rep; char *outstart; - Py_ssize_t outsize = PyBytes_GET_SIZE(outobj); + Py_ssize_t outsize = PyString_GET_SIZE(*outobj); if (Py_Type(mapping) == &EncodingMapType) { int res = encoding_map_lookup(c, mapping); Py_ssize_t requiredsize = *outpos+1; if (res == -1) return enc_FAILED; - if (outsize<requiredsize) + if (outsize<requiredsize) if (charmapencode_resize(outobj, outpos, requiredsize)) return enc_EXCEPTION; - outstart = PyBytes_AS_STRING(outobj); + outstart = PyString_AS_STRING(*outobj); outstart[(*outpos)++] = (char)res; return enc_SUCCESS; } @@ -4398,7 +4377,7 @@ charmapencode_result charmapencode_output(Py_UNICODE c, PyObject *mapping, Py_DECREF(rep); return enc_EXCEPTION; } - outstart = PyBytes_AS_STRING(outobj); + outstart = PyString_AS_STRING(*outobj); outstart[(*outpos)++] = (char)PyInt_AS_LONG(rep); } else { @@ -4410,7 +4389,7 @@ charmapencode_result charmapencode_output(Py_UNICODE c, PyObject *mapping, Py_DECREF(rep); return enc_EXCEPTION; } - outstart = PyBytes_AS_STRING(outobj); + outstart = PyString_AS_STRING(*outobj); memcpy(outstart + *outpos, repchars, repsize); *outpos += repsize; } @@ -4426,7 +4405,7 @@ int charmap_encoding_error( const Py_UNICODE *p, Py_ssize_t size, Py_ssize_t *inpos, PyObject *mapping, PyObject **exceptionObject, int *known_errorHandler, PyObject **errorHandler, const char *errors, - PyObject *res, Py_ssize_t *respos) + PyObject **res, Py_ssize_t *respos) { PyObject *repunicode = NULL; /* initialize to prevent gcc warning */ Py_ssize_t repsize; @@ -4561,7 +4540,7 @@ PyObject *PyUnicode_EncodeCharmap(const Py_UNICODE *p, /* allocate enough for a simple encoding without replacements, if we need more, we'll resize */ - res = PyBytes_FromStringAndSize(NULL, size); + res = PyString_FromStringAndSize(NULL, size); if (res == NULL) goto onError; if (size == 0) @@ -4569,14 +4548,14 @@ PyObject *PyUnicode_EncodeCharmap(const Py_UNICODE *p, while (inpos<size) { /* try to encode it */ - charmapencode_result x = charmapencode_output(p[inpos], mapping, res, &respos); + charmapencode_result x = charmapencode_output(p[inpos], mapping, &res, &respos); if (x==enc_EXCEPTION) /* error */ goto onError; if (x==enc_FAILED) { /* unencodable character */ if (charmap_encoding_error(p, size, &inpos, mapping, &exc, &known_errorHandler, &errorHandler, errors, - res, &respos)) { + &res, &respos)) { goto onError; } } @@ -4586,10 +4565,9 @@ PyObject *PyUnicode_EncodeCharmap(const Py_UNICODE *p, } /* Resize if we allocated to much */ - if (respos<PyBytes_GET_SIZE(res)) { - if (PyBytes_Resize(res, respos)) - goto onError; - } + if (respos<PyString_GET_SIZE(res)) + _PyString_Resize(&res, respos); + Py_XDECREF(exc); Py_XDECREF(errorHandler); return res; @@ -5483,20 +5461,14 @@ PyUnicode_Join(PyObject *separator, PyObject *seq) item = PySequence_Fast_GET_ITEM(fseq, i); /* Convert item to Unicode. */ - if (!PyString_Check(item) && !PyUnicode_Check(item)) - { - if (PyBytes_Check(item)) - { - PyErr_Format(PyExc_TypeError, - "sequence item %d: join() will not operate on " - "bytes objects", i); - goto onError; - } - item = PyObject_Unicode(item); + if (!PyUnicode_Check(item)) { + PyErr_Format(PyExc_TypeError, + "sequence item %zd: expected str instance," + " %.80s found", + i, Py_Type(item)->tp_name); + goto onError; } - else - item = PyUnicode_FromObject(item); - + item = PyUnicode_FromObject(item); if (item == NULL) goto onError; /* We own a reference to item from here on. */ @@ -6396,9 +6368,6 @@ PyObject *PyUnicode_Concat(PyObject *left, { PyUnicodeObject *u = NULL, *v = NULL, *w; - if (PyBytes_Check(left) || PyBytes_Check(right)) - return PyBytes_Concat(left, right); - /* Coerce the two arguments */ u = (PyUnicodeObject *)PyUnicode_FromObject(left); if (u == NULL) @@ -6515,7 +6484,7 @@ unicode_encode(PyUnicodeObject *self, PyObject *args) v = PyUnicode_AsEncodedObject((PyObject *)self, encoding, errors); if (v == NULL) goto onError; - if (!PyBytes_Check(v)) { + if (!PyString_Check(v)) { PyErr_Format(PyExc_TypeError, "encoder did not return a bytes object " "(type=%.400s)", @@ -8232,12 +8201,6 @@ getnextarg(PyObject *args, Py_ssize_t arglen, Py_ssize_t *p_argidx) return NULL; } -#define F_LJUST (1<<0) -#define F_SIGN (1<<1) -#define F_BLANK (1<<2) -#define F_ALT (1<<3) -#define F_ZERO (1<<4) - static Py_ssize_t strtounicode(Py_UNICODE *buffer, const char *charbuffer) { diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c index 5b3fd9e..099f6df 100644 --- a/Parser/tokenizer.c +++ b/Parser/tokenizer.c @@ -646,7 +646,7 @@ decode_str(const char *str, struct tok_state *tok) "unknown encoding: %s", tok->enc); return error_ret(tok); } - str = PyBytes_AsString(utf8); + str = PyString_AS_STRING(utf8); } assert(tok->decoding_buffer == NULL); tok->decoding_buffer = utf8; /* CAUTION */ @@ -765,8 +765,8 @@ tok_nextc(register struct tok_state *tok) tok->done = E_DECODE; return EOF; } - buflen = PyBytes_Size(u); - buf = PyBytes_AsString(u); + buflen = PyString_GET_SIZE(u); + buf = PyString_AS_STRING(u); if (!buf) { Py_DECREF(u); tok->done = E_DECODE; @@ -1550,7 +1550,7 @@ PyTokenizer_RestoreEncoding(struct tok_state* tok, int len, int* offset) #else static PyObject * dec_utf8(const char *enc, const char *text, size_t len) { - PyObject *ret = NULL; + PyObject *ret = NULL; PyObject *unicode_text = PyUnicode_DecodeUTF8(text, len, "replace"); if (unicode_text) { ret = PyUnicode_AsEncodedString(unicode_text, enc, "replace"); @@ -1560,7 +1560,7 @@ dec_utf8(const char *enc, const char *text, size_t len) { PyErr_Clear(); } else { - assert(PyBytes_Check(ret)); + assert(PyString_Check(ret)); } return ret; } @@ -1573,8 +1573,8 @@ PyTokenizer_RestoreEncoding(struct tok_state* tok, int len, int *offset) /* convert source to original encondig */ PyObject *lineobj = dec_utf8(tok->encoding, tok->buf, len); if (lineobj != NULL) { - int linelen = PyBytes_GET_SIZE(lineobj); - const char *line = PyBytes_AS_STRING(lineobj); + int linelen = PyString_GET_SIZE(lineobj); + const char *line = PyString_AS_STRING(lineobj); text = PyObject_MALLOC(linelen + 1); if (text != NULL && line != NULL) { if (linelen) @@ -1582,19 +1582,18 @@ PyTokenizer_RestoreEncoding(struct tok_state* tok, int len, int *offset) text[linelen] = '\0'; } Py_DECREF(lineobj); - + /* adjust error offset */ if (*offset > 1) { - PyObject *offsetobj = dec_utf8(tok->encoding, + PyObject *offsetobj = dec_utf8(tok->encoding, tok->buf, *offset-1); if (offsetobj) { - *offset = 1 + - PyBytes_GET_SIZE(offsetobj); + *offset = 1 + Py_Size(offsetobj); Py_DECREF(offsetobj); } } - + } } return text; diff --git a/Python/ast.c b/Python/ast.c index 485dafb..0afb408 100644 --- a/Python/ast.c +++ b/Python/ast.c @@ -3147,9 +3147,8 @@ decode_unicode(const char *s, size_t len, int rawmode, const char *encoding) Py_DECREF(u); return NULL; } - assert(PyBytes_Check(w)); - r = PyBytes_AsString(w); - rn = PyBytes_Size(w); + r = PyString_AS_STRING(w); + rn = Py_Size(w); assert(rn % 2 == 0); for (i = 0; i < rn; i += 2) { sprintf(p, "\\u%02x%02x", @@ -3174,7 +3173,7 @@ decode_unicode(const char *s, size_t len, int rawmode, const char *encoding) } /* s is a Python string literal, including the bracketing quote characters, - * and r &/or u prefixes (if any), and embedded escape sequences (if any). + * and r &/or b prefixes (if any), and embedded escape sequences (if any). * parsestr parses it, and returns the decoded Python string object. */ static PyObject * @@ -3186,7 +3185,7 @@ parsestr(const node *n, const char *encoding, int *bytesmode) int rawmode = 0; int need_encoding; - if (isalpha(quote) || quote == '_') { + if (isalpha(quote)) { if (quote == 'b' || quote == 'B') { quote = *++s; *bytesmode = 1; diff --git a/Python/bltinmodule.c b/Python/bltinmodule.c index ecc84b5..7973fcb 100644 --- a/Python/bltinmodule.c +++ b/Python/bltinmodule.c @@ -1875,7 +1875,8 @@ _PyBuiltin_Init(void) SETBUILTIN("True", Py_True); SETBUILTIN("bool", &PyBool_Type); SETBUILTIN("memoryview", &PyMemoryView_Type); - SETBUILTIN("bytes", &PyBytes_Type); + SETBUILTIN("buffer", &PyBytes_Type); + SETBUILTIN("bytes", &PyString_Type); SETBUILTIN("classmethod", &PyClassMethod_Type); #ifndef WITHOUT_COMPLEX SETBUILTIN("complex", &PyComplex_Type); @@ -1894,7 +1895,6 @@ _PyBuiltin_Init(void) SETBUILTIN("slice", &PySlice_Type); SETBUILTIN("staticmethod", &PyStaticMethod_Type); SETBUILTIN("str", &PyUnicode_Type); - SETBUILTIN("str8", &PyString_Type); SETBUILTIN("super", &PySuper_Type); SETBUILTIN("tuple", &PyTuple_Type); SETBUILTIN("type", &PyType_Type); diff --git a/Python/ceval.c b/Python/ceval.c index ae8434d..c0e0993 100644 --- a/Python/ceval.c +++ b/Python/ceval.c @@ -119,8 +119,8 @@ static int import_all_from(PyObject *, PyObject *); static void set_exc_info(PyThreadState *, PyObject *, PyObject *, PyObject *); static void reset_exc_info(PyThreadState *); static void format_exc_check_arg(PyObject *, const char *, PyObject *); -static PyObject * string_concatenate(PyObject *, PyObject *, - PyFrameObject *, unsigned char *); +static PyObject * unicode_concatenate(PyObject *, PyObject *, + PyFrameObject *, unsigned char *); #define NAME_ERROR_MSG \ "name '%.200s' is not defined" @@ -1127,10 +1127,10 @@ PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) goto slow_add; x = PyInt_FromLong(i); } - else if (PyString_CheckExact(v) && - PyString_CheckExact(w)) { - x = string_concatenate(v, w, f, next_instr); - /* string_concatenate consumed the ref to v */ + else if (PyUnicode_CheckExact(v) && + PyUnicode_CheckExact(w)) { + x = unicode_concatenate(v, w, f, next_instr); + /* unicode_concatenate consumed the ref to v */ goto skip_decref_vx; } else { @@ -1328,10 +1328,10 @@ PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) goto slow_iadd; x = PyInt_FromLong(i); } - else if (PyString_CheckExact(v) && - PyString_CheckExact(w)) { - x = string_concatenate(v, w, f, next_instr); - /* string_concatenate consumed the ref to v */ + else if (PyUnicode_CheckExact(v) && + PyUnicode_CheckExact(w)) { + x = unicode_concatenate(v, w, f, next_instr); + /* unicode_concatenate consumed the ref to v */ goto skip_decref_v; } else { @@ -1564,8 +1564,7 @@ PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) break; } PyErr_Format(PyExc_SystemError, - "no locals found when storing %s", - PyObject_REPR(w)); + "no locals found when storing %R", w); break; case DELETE_NAME: @@ -1578,8 +1577,7 @@ PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) break; } PyErr_Format(PyExc_SystemError, - "no locals when deleting %s", - PyObject_REPR(w)); + "no locals when deleting %R", w); break; PREDICTED_WITH_ARG(UNPACK_SEQUENCE); @@ -1668,8 +1666,7 @@ PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) w = GETITEM(names, oparg); if ((v = f->f_locals) == NULL) { PyErr_Format(PyExc_SystemError, - "no locals when loading %s", - PyObject_REPR(w)); + "no locals when loading %R", w); break; } if (PyDict_CheckExact(v)) { @@ -1854,19 +1851,6 @@ PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) PUSH(x); if (x != NULL) continue; break; - - case MAKE_BYTES: - w = POP(); - if (PyString_Check(w)) - x = PyBytes_FromStringAndSize( - PyString_AS_STRING(w), - PyString_GET_SIZE(w)); - else - x = NULL; - Py_DECREF(w); - PUSH(x); - if (x != NULL) continue; - break; case LOAD_ATTR: w = GETITEM(names, oparg); @@ -3961,13 +3945,13 @@ format_exc_check_arg(PyObject *exc, const char *format_str, PyObject *obj) } static PyObject * -string_concatenate(PyObject *v, PyObject *w, +unicode_concatenate(PyObject *v, PyObject *w, PyFrameObject *f, unsigned char *next_instr) { /* This function implements 'variable += expr' when both arguments - are strings. */ - Py_ssize_t v_len = PyString_GET_SIZE(v); - Py_ssize_t w_len = PyString_GET_SIZE(w); + are (Unicode) strings. */ + Py_ssize_t v_len = PyUnicode_GET_SIZE(v); + Py_ssize_t w_len = PyUnicode_GET_SIZE(w); Py_ssize_t new_len = v_len + w_len; if (new_len < 0) { PyErr_SetString(PyExc_OverflowError, @@ -4016,12 +4000,12 @@ string_concatenate(PyObject *v, PyObject *w, } } - if (v->ob_refcnt == 1 && !PyString_CHECK_INTERNED(v)) { + if (v->ob_refcnt == 1 && !PyUnicode_CHECK_INTERNED(v)) { /* Now we own the last reference to 'v', so we can resize it * in-place. */ - if (_PyString_Resize(&v, new_len) != 0) { - /* XXX if _PyString_Resize() fails, 'v' has been + if (PyUnicode_Resize(&v, new_len) != 0) { + /* XXX if PyUnicode_Resize() fails, 'v' has been * deallocated so it cannot be put back into * 'variable'. The MemoryError is raised when there * is no value in 'variable', which might (very @@ -4030,14 +4014,15 @@ string_concatenate(PyObject *v, PyObject *w, return NULL; } /* copy 'w' into the newly allocated area of 'v' */ - memcpy(PyString_AS_STRING(v) + v_len, - PyString_AS_STRING(w), w_len); + memcpy(PyUnicode_AS_UNICODE(v) + v_len, + PyUnicode_AS_UNICODE(w), w_len*sizeof(Py_UNICODE)); return v; } else { /* When in-place resizing is not an option. */ - PyString_Concat(&v, w); - return v; + w = PyUnicode_Concat(v, w); + Py_DECREF(v); + return w; } } diff --git a/Python/codecs.c b/Python/codecs.c index 4b24676..c8926fc 100644 --- a/Python/codecs.c +++ b/Python/codecs.c @@ -14,7 +14,7 @@ Copyright (c) Corporation for National Research Initiatives. /* --- Codec Registry ----------------------------------------------------- */ /* Import the standard encodings package which will register the first - codec search function. + codec search function. This is done in a lazy way so that the Unicode implementation does not downgrade startup time of scripts not needing it. @@ -87,7 +87,7 @@ PyObject *normalizestring(const char *string) characters. This makes encodings looked up through this mechanism effectively case-insensitive. - If no codec is found, a LookupError is set and NULL returned. + If no codec is found, a LookupError is set and NULL returned. As side effect, this tries to load the encodings package, if not yet done. This is part of the lazy load strategy for the encodings @@ -125,7 +125,7 @@ PyObject *_PyCodec_Lookup(const char *encoding) Py_DECREF(v); return result; } - + /* Next, scan the search functions in order of registration */ args = PyTuple_New(1); if (args == NULL) @@ -144,7 +144,7 @@ PyObject *_PyCodec_Lookup(const char *encoding) for (i = 0; i < len; i++) { PyObject *func; - + func = PyList_GetItem(interp->codec_search_path, i); if (func == NULL) goto onError; @@ -188,7 +188,7 @@ PyObject *args_tuple(PyObject *object, const char *errors) { PyObject *args; - + args = PyTuple_New(1 + (errors != NULL)); if (args == NULL) return NULL; @@ -196,7 +196,7 @@ PyObject *args_tuple(PyObject *object, PyTuple_SET_ITEM(args,0,object); if (errors) { PyObject *v; - + v = PyUnicode_FromString(errors); if (v == NULL) { Py_DECREF(args); @@ -271,10 +271,10 @@ PyObject *codec_getstreamcodec(const char *encoding, return streamcodec; } -/* Convenience APIs to query the Codec registry. - +/* Convenience APIs to query the Codec registry. + All APIs return a codec object with incremented refcount. - + */ PyObject *PyCodec_Encoder(const char *encoding) @@ -324,7 +324,7 @@ PyObject *PyCodec_Encode(PyObject *object, { PyObject *encoder = NULL; PyObject *args = NULL, *result = NULL; - PyObject *v; + PyObject *v = NULL; encoder = PyCodec_Encoder(encoding); if (encoder == NULL) @@ -333,31 +333,43 @@ PyObject *PyCodec_Encode(PyObject *object, args = args_tuple(object, errors); if (args == NULL) goto onError; - - result = PyEval_CallObject(encoder,args); + + result = PyEval_CallObject(encoder, args); if (result == NULL) goto onError; - if (!PyTuple_Check(result) || + if (!PyTuple_Check(result) || PyTuple_GET_SIZE(result) != 2) { PyErr_SetString(PyExc_TypeError, - "encoder must return a tuple (object,integer)"); + "encoder must return a tuple (object, integer)"); goto onError; } - v = PyTuple_GET_ITEM(result,0); - Py_INCREF(v); + v = PyTuple_GET_ITEM(result, 0); + if (PyBytes_Check(v)) { + char msg[100]; + PyOS_snprintf(msg, sizeof(msg), + "encoder %s returned buffer instead of bytes", + encoding); + if (PyErr_WarnEx(PyExc_RuntimeWarning, msg, 1) < 0) { + v = NULL; + goto onError; + } + v = PyString_FromStringAndSize(PyBytes_AS_STRING(v), Py_Size(v)); + } + else if (PyString_Check(v)) + Py_INCREF(v); + else { + PyErr_SetString(PyExc_TypeError, + "encoding must return a tuple(bytes, integer)"); + v = NULL; + } /* We don't check or use the second (integer) entry. */ - Py_DECREF(args); - Py_DECREF(encoder); - Py_DECREF(result); - return v; - onError: Py_XDECREF(result); Py_XDECREF(args); Py_XDECREF(encoder); - return NULL; + return v; } /* Decode an object (usually a Python string) using the given encoding @@ -380,11 +392,11 @@ PyObject *PyCodec_Decode(PyObject *object, args = args_tuple(object, errors); if (args == NULL) goto onError; - + result = PyEval_CallObject(decoder,args); if (result == NULL) goto onError; - if (!PyTuple_Check(result) || + if (!PyTuple_Check(result) || PyTuple_GET_SIZE(result) != 2) { PyErr_SetString(PyExc_TypeError, "decoder must return a tuple (object,integer)"); @@ -398,7 +410,7 @@ PyObject *PyCodec_Decode(PyObject *object, Py_DECREF(decoder); Py_DECREF(result); return v; - + onError: Py_XDECREF(args); Py_XDECREF(decoder); diff --git a/Python/compile.c b/Python/compile.c index 93087db..80c97eb 100644 --- a/Python/compile.c +++ b/Python/compile.c @@ -787,8 +787,6 @@ opcode_stack_effect(int opcode, int oparg) return 1-oparg; case BUILD_MAP: return 1; - case MAKE_BYTES: - return 0; case LOAD_ATTR: return 0; case COMPARE_OP: @@ -3222,7 +3220,6 @@ compiler_visit_expr(struct compiler *c, expr_ty e) break; case Bytes_kind: ADDOP_O(c, LOAD_CONST, e->v.Bytes.s, consts); - ADDOP(c, MAKE_BYTES); break; case Ellipsis_kind: ADDOP_O(c, LOAD_CONST, Py_Ellipsis, consts); diff --git a/Python/getargs.c b/Python/getargs.c index 0b25d4b..d268104 100644 --- a/Python/getargs.c +++ b/Python/getargs.c @@ -7,7 +7,7 @@ #ifdef __cplusplus -extern "C" { +extern "C" { #endif int PyArg_Parse(PyObject *, const char *, ...); int PyArg_ParseTuple(PyObject *, const char *, ...); @@ -37,7 +37,7 @@ PyAPI_FUNC(int) _PyArg_VaParseTupleAndKeywords_SizeT(PyObject *, PyObject *, /* Forward */ static int vgetargs1(PyObject *, const char *, va_list *, int); static void seterror(int, const char *, int *, const char *, const char *); -static char *convertitem(PyObject *, const char **, va_list *, int, int *, +static char *convertitem(PyObject *, const char **, va_list *, int, int *, char *, size_t, PyObject **); static char *converttuple(PyObject *, const char **, va_list *, int, int *, char *, size_t, int, PyObject **); @@ -54,7 +54,7 @@ PyArg_Parse(PyObject *args, const char *format, ...) { int retval; va_list va; - + va_start(va, format); retval = vgetargs1(args, format, &va, FLAG_COMPAT); va_end(va); @@ -66,7 +66,7 @@ _PyArg_Parse_SizeT(PyObject *args, char *format, ...) { int retval; va_list va; - + va_start(va, format); retval = vgetargs1(args, format, &va, FLAG_COMPAT|FLAG_SIZE_T); va_end(va); @@ -79,7 +79,7 @@ PyArg_ParseTuple(PyObject *args, const char *format, ...) { int retval; va_list va; - + va_start(va, format); retval = vgetargs1(args, format, &va, 0); va_end(va); @@ -91,7 +91,7 @@ _PyArg_ParseTuple_SizeT(PyObject *args, char *format, ...) { int retval; va_list va; - + va_start(va, format); retval = vgetargs1(args, format, &va, FLAG_SIZE_T); va_end(va); @@ -240,15 +240,15 @@ vgetargs1(PyObject *args, const char *format, va_list *p_va, int flags) break; } } - + if (level != 0) Py_FatalError(/* '(' */ "missing ')' in getargs format"); - + if (min < 0) min = max; - + format = formatsave; - + if (compat) { if (max == 0) { if (args == NULL) @@ -269,7 +269,7 @@ vgetargs1(PyObject *args, const char *format, va_list *p_va, int flags) PyErr_SetString(PyExc_TypeError, msgbuf); return 0; } - msg = convertitem(args, &format, p_va, flags, levels, + msg = convertitem(args, &format, p_va, flags, levels, msgbuf, sizeof(msgbuf), &freelist); if (msg == NULL) return cleanreturn(1, freelist); @@ -282,15 +282,15 @@ vgetargs1(PyObject *args, const char *format, va_list *p_va, int flags) return 0; } } - + if (!PyTuple_Check(args)) { PyErr_SetString(PyExc_SystemError, "new style getargs format but argument is not a tuple"); return 0; } - + len = PyTuple_GET_SIZE(args); - + if (len < min || max < len) { if (message == NULL) { PyOS_snprintf(msgbuf, sizeof(msgbuf), @@ -308,12 +308,12 @@ vgetargs1(PyObject *args, const char *format, va_list *p_va, int flags) PyErr_SetString(PyExc_TypeError, message); return 0; } - + for (i = 0; i < len; i++) { if (*format == '|') format++; msg = convertitem(PyTuple_GET_ITEM(args, i), &format, p_va, - flags, levels, msgbuf, + flags, levels, msgbuf, sizeof(msgbuf), &freelist); if (msg) { seterror(i+1, msg, levels, fname, message); @@ -328,7 +328,7 @@ vgetargs1(PyObject *args, const char *format, va_list *p_va, int flags) "bad format string: %.200s", formatsave); return cleanreturn(0, freelist); } - + return cleanreturn(1, freelist); } @@ -392,14 +392,14 @@ seterror(int iarg, const char *msg, int *levels, const char *fname, static char * converttuple(PyObject *arg, const char **p_format, va_list *p_va, int flags, - int *levels, char *msgbuf, size_t bufsize, int toplevel, + int *levels, char *msgbuf, size_t bufsize, int toplevel, PyObject **freelist) { int level = 0; int n = 0; const char *format = *p_format; int i; - + for (;;) { int c = *format++; if (c == '(') { @@ -417,17 +417,17 @@ converttuple(PyObject *arg, const char **p_format, va_list *p_va, int flags, else if (level == 0 && isalpha(Py_CHARMASK(c))) n++; } - + if (!PySequence_Check(arg) || PyString_Check(arg)) { levels[0] = 0; PyOS_snprintf(msgbuf, bufsize, toplevel ? "expected %d arguments, not %.50s" : "must be %d-item sequence, not %.50s", - n, + n, arg == Py_None ? "None" : arg->ob_type->tp_name); return msgbuf; } - + if ((i = PySequence_Size(arg)) != n) { levels[0] = 0; PyOS_snprintf(msgbuf, bufsize, @@ -449,7 +449,7 @@ converttuple(PyObject *arg, const char **p_format, va_list *p_va, int flags, strncpy(msgbuf, "is not retrievable", bufsize); return msgbuf; } - msg = convertitem(item, &format, p_va, flags, levels+1, + msg = convertitem(item, &format, p_va, flags, levels+1, msgbuf, bufsize, freelist); /* PySequence_GetItem calls tp->sq_item, which INCREFs */ Py_XDECREF(item); @@ -472,16 +472,16 @@ convertitem(PyObject *arg, const char **p_format, va_list *p_va, int flags, { char *msg; const char *format = *p_format; - + if (*format == '(' /* ')' */) { format++; - msg = converttuple(arg, &format, p_va, flags, levels, msgbuf, + msg = converttuple(arg, &format, p_va, flags, levels, msgbuf, bufsize, 0, freelist); if (msg == NULL) format++; } else { - msg = convertsimple(arg, &format, p_va, flags, + msg = convertsimple(arg, &format, p_va, flags, msgbuf, bufsize, freelist); if (msg != NULL) levels[0] = 0; @@ -502,7 +502,7 @@ static char * converterr(const char *expected, PyObject *arg, char *msgbuf, size_t bufsize) { assert(expected != NULL); - assert(arg != NULL); + assert(arg != NULL); PyOS_snprintf(msgbuf, bufsize, "must be %.50s, not %.50s", expected, arg == Py_None ? "None" : arg->ob_type->tp_name); @@ -548,9 +548,9 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, const char *format = *p_format; char c = *format++; PyObject *uarg; - + switch (c) { - + case 'b': { /* unsigned byte -- very short int */ char *p = va_arg(*p_va, char *); long ival; @@ -573,9 +573,9 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, *p = (unsigned char) ival; break; } - + case 'B': {/* byte sized bitfield - both signed and unsigned - values allowed */ + values allowed */ char *p = va_arg(*p_va, char *); long ival; if (float_argument_error(arg)) @@ -587,7 +587,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, *p = (unsigned char) ival; break; } - + case 'h': {/* signed short int */ short *p = va_arg(*p_va, short *); long ival; @@ -610,9 +610,9 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, *p = (short) ival; break; } - + case 'H': { /* short int sized bitfield, both signed and - unsigned allowed */ + unsigned allowed */ unsigned short *p = va_arg(*p_va, unsigned short *); long ival; if (float_argument_error(arg)) @@ -649,7 +649,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, } case 'I': { /* int sized bitfield, both signed and - unsigned allowed */ + unsigned allowed */ unsigned int *p = va_arg(*p_va, unsigned int *); unsigned int ival; if (float_argument_error(arg)) @@ -661,7 +661,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, *p = ival; break; } - + case 'n': /* Py_ssize_t */ #if SIZEOF_SIZE_T != SIZEOF_LONG { @@ -703,7 +703,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, *p = ival; break; } - + #ifdef HAVE_LONG_LONG case 'L': {/* PY_LONG_LONG */ PY_LONG_LONG *p = va_arg( *p_va, PY_LONG_LONG * ); @@ -727,7 +727,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, break; } #endif - + case 'f': {/* float */ float *p = va_arg(*p_va, float *); double dval = PyFloat_AsDouble(arg); @@ -737,7 +737,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, *p = (float) dval; break; } - + case 'd': {/* double */ double *p = va_arg(*p_va, double *); double dval = PyFloat_AsDouble(arg); @@ -747,7 +747,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, *p = dval; break; } - + #ifndef WITHOUT_COMPLEX case 'D': {/* complex double */ Py_complex *p = va_arg(*p_va, Py_complex *); @@ -760,7 +760,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, break; } #endif /* WITHOUT_COMPLEX */ - + case 'c': {/* char */ char *p = va_arg(*p_va, char *); if (PyString_Check(arg) && PyString_Size(arg) == 1) @@ -773,7 +773,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, return converterr("char < 256", arg, msgbuf, bufsize); break; } - + case 'C': {/* unicode char */ int *p = va_arg(*p_va, int *); if (PyString_Check(arg) && PyString_Size(arg) == 1) @@ -785,17 +785,16 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, return converterr("char", arg, msgbuf, bufsize); break; } - - case 's': {/* string */ + + /* XXX WAAAAH! 's', 'y', 'z', 'u', 'Z', 'e', 'w', 't' codes all + need to be cleaned up! */ + + case 's': {/* text string */ if (*format == '#') { void **p = (void **)va_arg(*p_va, char **); FETCH_SIZE; - - if (PyString_Check(arg)) { - *p = PyString_AS_STRING(arg); - STORE_SIZE(PyString_GET_SIZE(arg)); - } - else if (PyUnicode_Check(arg)) { + + if (PyUnicode_Check(arg)) { uarg = UNICODE_DEFAULT_ENCODING(arg); if (uarg == NULL) return converterr(CONV_UNICODE, @@ -804,6 +803,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, STORE_SIZE(PyString_GET_SIZE(uarg)); } else { /* any buffer-like object */ + /* XXX Really? */ char *buf; Py_ssize_t count = convertbuffer(arg, p, &buf); if (count < 0) @@ -813,10 +813,8 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, format++; } else { char **p = va_arg(*p_va, char **); - - if (PyString_Check(arg)) - *p = PyString_AS_STRING(arg); - else if (PyUnicode_Check(arg)) { + + if (PyUnicode_Check(arg)) { uarg = UNICODE_DEFAULT_ENCODING(arg); if (uarg == NULL) return converterr(CONV_UNICODE, @@ -832,45 +830,29 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, break; } - case 'y': {/* bytes */ + case 'y': {/* any buffer-like object, but not PyUnicode */ + void **p = (void **)va_arg(*p_va, char **); + char *buf; + Py_ssize_t count = convertbuffer(arg, p, &buf); + if (count < 0) + return converterr(buf, arg, msgbuf, bufsize); if (*format == '#') { - void **p = (void **)va_arg(*p_va, char **); FETCH_SIZE; - - if (PyBytes_Check(arg)) { - *p = PyBytes_AS_STRING(arg); - STORE_SIZE(PyBytes_GET_SIZE(arg)); - } - else - return converterr("bytes", arg, msgbuf, bufsize); + STORE_SIZE(count); format++; - } else { - char **p = va_arg(*p_va, char **); - - if (PyBytes_Check(arg)) - *p = PyBytes_AS_STRING(arg); - else - return converterr("bytes", arg, msgbuf, bufsize); - if ((Py_ssize_t)strlen(*p) != PyBytes_Size(arg)) - return converterr("bytes without null bytes", - arg, msgbuf, bufsize); } break; } - case 'z': {/* string, may be NULL (None) */ + case 'z': {/* like 's' or 's#', but None is okay, stored as NULL */ if (*format == '#') { /* any buffer-like object */ void **p = (void **)va_arg(*p_va, char **); FETCH_SIZE; - + if (arg == Py_None) { *p = 0; STORE_SIZE(0); } - else if (PyString_Check(arg)) { - *p = PyString_AS_STRING(arg); - STORE_SIZE(PyString_GET_SIZE(arg)); - } else if (PyUnicode_Check(arg)) { uarg = UNICODE_DEFAULT_ENCODING(arg); if (uarg == NULL) @@ -880,6 +862,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, STORE_SIZE(PyString_GET_SIZE(uarg)); } else { /* any buffer-like object */ + /* XXX Really? */ char *buf; Py_ssize_t count = convertbuffer(arg, p, &buf); if (count < 0) @@ -889,7 +872,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, format++; } else { char **p = va_arg(*p_va, char **); - + if (arg == Py_None) *p = 0; else if (PyString_Check(arg)) @@ -902,31 +885,33 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, *p = PyString_AS_STRING(uarg); } else - return converterr("string or None", + return converterr("string or None", arg, msgbuf, bufsize); if (*format == '#') { FETCH_SIZE; assert(0); /* XXX redundant with if-case */ - if (arg == Py_None) - *q = 0; - else - *q = PyString_Size(arg); + if (arg == Py_None) { + STORE_SIZE(0); + } + else { + STORE_SIZE(PyString_Size(arg)); + } format++; } else if (*p != NULL && (Py_ssize_t)strlen(*p) != PyString_Size(arg)) return converterr( - "string without null bytes or None", + "string without null bytes or None", arg, msgbuf, bufsize); } break; } - + case 'Z': {/* unicode, may be NULL (None) */ if (*format == '#') { /* any buffer-like object */ Py_UNICODE **p = va_arg(*p_va, Py_UNICODE **); FETCH_SIZE; - + if (arg == Py_None) { *p = 0; STORE_SIZE(0); @@ -938,18 +923,18 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, format++; } else { Py_UNICODE **p = va_arg(*p_va, Py_UNICODE **); - + if (arg == Py_None) *p = 0; else if (PyUnicode_Check(arg)) *p = PyUnicode_AS_UNICODE(arg); else - return converterr("string or None", + return converterr("string or None", arg, msgbuf, bufsize); } break; } - + case 'e': {/* encoded string */ char **buffer; const char *encoding; @@ -962,10 +947,10 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, encoding = (const char *)va_arg(*p_va, const char *); if (encoding == NULL) encoding = PyUnicode_GetDefaultEncoding(); - + /* Get output buffer parameter: 's' (recode all objects via Unicode) or - 't' (only recode non-string objects) + 't' (only recode non-string objects) */ if (*format == 's') recode_strings = 1; @@ -978,9 +963,9 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, buffer = (char **)va_arg(*p_va, char **); format++; if (buffer == NULL) - return converterr("(buffer is NULL)", + return converterr("(buffer is NULL)", arg, msgbuf, bufsize); - + /* Encode object */ if (!recode_strings && (PyString_Check(arg) || PyBytes_Check(arg))) { @@ -997,9 +982,9 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, u = PyUnicode_FromObject(arg); if (u == NULL) return converterr( - "string or unicode or text buffer", + "string or unicode or text buffer", arg, msgbuf, bufsize); - + /* Encode object; use default error handling */ s = PyUnicode_AsEncodedString(u, encoding, @@ -1008,28 +993,28 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, if (s == NULL) return converterr("(encoding failed)", arg, msgbuf, bufsize); - if (!PyBytes_Check(s)) { + if (!PyString_Check(s)) { Py_DECREF(s); return converterr( "(encoder failed to return bytes)", arg, msgbuf, bufsize); } - size = PyBytes_GET_SIZE(s); - ptr = PyBytes_AS_STRING(s); + size = PyString_GET_SIZE(s); + ptr = PyString_AS_STRING(s); if (ptr == NULL) ptr = ""; } /* Write output; output is guaranteed to be 0-terminated */ - if (*format == '#') { + if (*format == '#') { /* Using buffer length parameter '#': - + - if *buffer is NULL, a new buffer of the needed size is allocated and the data copied into it; *buffer is updated to point to the new buffer; the caller is responsible for PyMem_Free()ing it after - usage + usage - if *buffer is not NULL, the data is copied to *buffer; *buffer_len has to be @@ -1037,11 +1022,11 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, buffer overflow is signalled with an error; buffer has to provide enough room for the encoded string plus the trailing 0-byte - + - in both cases, *buffer_len is updated to the size of the buffer /excluding/ the - trailing 0-byte - + trailing 0-byte + */ FETCH_SIZE; @@ -1070,7 +1055,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, if (size + 1 > BUFFER_LEN) { Py_DECREF(s); return converterr( - "(buffer overflow)", + "(buffer overflow)", arg, msgbuf, bufsize); } } @@ -1078,10 +1063,10 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, STORE_SIZE(size); } else { /* Using a 0-terminated buffer: - + - the encoded string has to be 0-terminated for this variant to work; if it is not, an - error raised + error raised - a new buffer of the needed size is allocated and the data copied into it; @@ -1114,55 +1099,45 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, } case 'u': {/* raw unicode buffer (Py_UNICODE *) */ - if (*format == '#') { /* any buffer-like object */ - void **p = (void **)va_arg(*p_va, char **); + Py_UNICODE **p = va_arg(*p_va, Py_UNICODE **); + if (!PyUnicode_Check(arg)) + return converterr("str", arg, msgbuf, bufsize); + *p = PyUnicode_AS_UNICODE(arg); + if (*format == '#') { /* store pointer and size */ FETCH_SIZE; - if (PyUnicode_Check(arg)) { - *p = PyUnicode_AS_UNICODE(arg); - STORE_SIZE(PyUnicode_GET_SIZE(arg)); - } - else { - return converterr("cannot convert raw buffers", - arg, msgbuf, bufsize); - } + STORE_SIZE(PyUnicode_GET_SIZE(arg)); format++; - } else { - Py_UNICODE **p = va_arg(*p_va, Py_UNICODE **); - if (PyUnicode_Check(arg)) - *p = PyUnicode_AS_UNICODE(arg); - else - return converterr("unicode", arg, msgbuf, bufsize); } break; } - case 'S': { /* string object */ + case 'S': { /* PyString object */ PyObject **p = va_arg(*p_va, PyObject **); - if (PyString_Check(arg) || PyUnicode_Check(arg)) + if (PyString_Check(arg)) *p = arg; else - return converterr("string", arg, msgbuf, bufsize); + return converterr("bytes", arg, msgbuf, bufsize); break; } - case 'Y': { /* bytes object */ + case 'Y': { /* PyBytes object */ PyObject **p = va_arg(*p_va, PyObject **); if (PyBytes_Check(arg)) *p = arg; else - return converterr("bytes", arg, msgbuf, bufsize); + return converterr("buffer", arg, msgbuf, bufsize); break; } - - case 'U': { /* Unicode object */ + + case 'U': { /* PyUnicode object */ PyObject **p = va_arg(*p_va, PyObject **); if (PyUnicode_Check(arg)) *p = arg; else - return converterr("unicode", arg, msgbuf, bufsize); + return converterr("str", arg, msgbuf, bufsize); break; } - + case 'O': { /* object */ PyTypeObject *type; PyObject **p; @@ -1180,12 +1155,12 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, inquiry pred = va_arg(*p_va, inquiry); p = va_arg(*p_va, PyObject **); format++; - if ((*pred)(arg)) + if ((*pred)(arg)) *p = arg; else - return converterr("(unspecified)", + return converterr("(unspecified)", arg, msgbuf, bufsize); - + } else if (*format == '&') { typedef int (*converter)(PyObject *, void *); @@ -1193,7 +1168,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, void *addr = va_arg(*p_va, void *); format++; if (! (*convert)(arg, addr)) - return converterr("(unspecified)", + return converterr("(unspecified)", arg, msgbuf, bufsize); } else { @@ -1202,27 +1177,27 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, } break; } - - + + case 'w': { /* memory buffer, read-write access */ void **p = va_arg(*p_va, void **); PyBufferProcs *pb = arg->ob_type->tp_as_buffer; int count; int temp=-1; Py_buffer view; - - if (pb == NULL || + + if (pb == NULL || pb->bf_getbuffer == NULL || - ((temp = (*pb->bf_getbuffer)(arg, &view, + ((temp = (*pb->bf_getbuffer)(arg, &view, PyBUF_SIMPLE)) != 0) || view.readonly == 1) { if (temp==0 && pb->bf_releasebuffer != NULL) { (*pb->bf_releasebuffer)(arg, &view); } - return converterr("single-segment read-write buffer", + return converterr("single-segment read-write buffer", arg, msgbuf, bufsize); } - + if ((count = view.len) < 0) return converterr("(unspecified)", arg, msgbuf, bufsize); *p = view.buf; @@ -1243,17 +1218,17 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, PyBufferProcs *pb = arg->ob_type->tp_as_buffer; int count; Py_buffer view; - + if (*format++ != '#') return converterr( - "invalid use of 't' format character", + "invalid use of 't' format character", arg, msgbuf, bufsize); if (pb == NULL || pb->bf_getbuffer == NULL) return converterr( "bytes or read-only character buffer", arg, msgbuf, bufsize); - if ((*pb->bf_getbuffer)(arg, &view, PyBUF_SIMPLE) != 0) + if ((*pb->bf_getbuffer)(arg, &view, PyBUF_SIMPLE) != 0) return converterr("string or single-segment read-only buffer", arg, msgbuf, bufsize); @@ -1261,7 +1236,7 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, *p = view.buf; /* XXX : shouldn't really release buffer, but it should be O.K. */ - if (pb->bf_releasebuffer != NULL) + if (pb->bf_releasebuffer != NULL) (*pb->bf_releasebuffer)(arg, &view); if (count < 0) return converterr("(unspecified)", arg, msgbuf, bufsize); @@ -1274,9 +1249,9 @@ convertsimple(PyObject *arg, const char **p_format, va_list *p_va, int flags, default: return converterr("impossible<bad format char>", arg, msgbuf, bufsize); - + } - + *p_format = format; return NULL; } @@ -1314,7 +1289,7 @@ convertbuffer(PyObject *arg, void **p, char **errmsg) int PyArg_ParseTupleAndKeywords(PyObject *args, PyObject *keywords, - const char *format, + const char *format, char **kwlist, ...) { int retval; @@ -1330,7 +1305,7 @@ PyArg_ParseTupleAndKeywords(PyObject *args, } va_start(va, kwlist); - retval = vgetargskeywords(args, keywords, format, kwlist, &va, 0); + retval = vgetargskeywords(args, keywords, format, kwlist, &va, 0); va_end(va); return retval; } @@ -1338,7 +1313,7 @@ PyArg_ParseTupleAndKeywords(PyObject *args, int _PyArg_ParseTupleAndKeywords_SizeT(PyObject *args, PyObject *keywords, - const char *format, + const char *format, char **kwlist, ...) { int retval; @@ -1354,7 +1329,7 @@ _PyArg_ParseTupleAndKeywords_SizeT(PyObject *args, } va_start(va, kwlist); - retval = vgetargskeywords(args, keywords, format, + retval = vgetargskeywords(args, keywords, format, kwlist, &va, FLAG_SIZE_T); va_end(va); return retval; @@ -1364,7 +1339,7 @@ _PyArg_ParseTupleAndKeywords_SizeT(PyObject *args, int PyArg_VaParseTupleAndKeywords(PyObject *args, PyObject *keywords, - const char *format, + const char *format, char **kwlist, va_list va) { int retval; @@ -1389,14 +1364,14 @@ PyArg_VaParseTupleAndKeywords(PyObject *args, #endif #endif - retval = vgetargskeywords(args, keywords, format, kwlist, &lva, 0); + retval = vgetargskeywords(args, keywords, format, kwlist, &lva, 0); return retval; } int _PyArg_VaParseTupleAndKeywords_SizeT(PyObject *args, PyObject *keywords, - const char *format, + const char *format, char **kwlist, va_list va) { int retval; @@ -1421,7 +1396,7 @@ _PyArg_VaParseTupleAndKeywords_SizeT(PyObject *args, #endif #endif - retval = vgetargskeywords(args, keywords, format, + retval = vgetargskeywords(args, keywords, format, kwlist, &lva, FLAG_SIZE_T); return retval; } @@ -1504,7 +1479,7 @@ vgetargskeywords(PyObject *args, PyObject *keywords, const char *format, nkeywords = keywords == NULL ? 0 : PyDict_Size(keywords); /* make sure there are no duplicate values for an argument; - its not clear when to use the term "keyword argument vs. + its not clear when to use the term "keyword argument vs. keyword parameter in messages */ if (nkeywords > 0) { for (i = 0; i < nargs; i++) { @@ -1523,7 +1498,7 @@ vgetargskeywords(PyObject *args, PyObject *keywords, const char *format, } } - /* required arguments missing from args can be supplied by keyword + /* required arguments missing from args can be supplied by keyword arguments; set len to the number of positional arguments, and, if that's less than the minimum required, add in the number of required arguments that are supplied by keywords */ @@ -1540,7 +1515,7 @@ vgetargskeywords(PyObject *args, PyObject *keywords, const char *format, /* make sure we got an acceptable number of arguments; the message is a little confusing with keywords since keyword arguments which are supplied, but don't match the required arguments - are not included in the "%d given" part of the message + are not included in the "%d given" part of the message XXX and this isn't a bug!? */ if (len < min || max < len) { if (message == NULL) { @@ -1565,7 +1540,7 @@ vgetargskeywords(PyObject *args, PyObject *keywords, const char *format, if (*format == '|') format++; msg = convertitem(PyTuple_GET_ITEM(args, i), &format, p_va, - flags, levels, msgbuf, sizeof(msgbuf), + flags, levels, msgbuf, sizeof(msgbuf), &freelist); if (msg) { seterror(i+1, msg, levels, fname, message); @@ -1573,11 +1548,11 @@ vgetargskeywords(PyObject *args, PyObject *keywords, const char *format, } } - /* handle no keyword parameters in call */ + /* handle no keyword parameters in call */ if (nkeywords == 0) return cleanreturn(1, freelist); - /* convert the keyword arguments; this uses the format + /* convert the keyword arguments; this uses the format string where it was left after processing args */ for (i = nargs; i < max; i++) { PyObject *item; @@ -1586,7 +1561,7 @@ vgetargskeywords(PyObject *args, PyObject *keywords, const char *format, item = PyDict_GetItemString(keywords, kwlist[i]); if (item != NULL) { Py_INCREF(item); - msg = convertitem(item, &format, p_va, flags, levels, + msg = convertitem(item, &format, p_va, flags, levels, msgbuf, sizeof(msgbuf), &freelist); Py_DECREF(item); if (msg) { @@ -1617,7 +1592,7 @@ vgetargskeywords(PyObject *args, PyObject *keywords, const char *format, int match = 0; char *ks; if (!PyString_Check(key) && !PyUnicode_Check(key)) { - PyErr_SetString(PyExc_TypeError, + PyErr_SetString(PyExc_TypeError, "keywords must be strings"); return cleanreturn(0, freelist); } @@ -1647,7 +1622,7 @@ skipitem(const char **p_format, va_list *p_va, int flags) { const char *format = *p_format; char c = *format++; - + switch (c) { /* simple codes @@ -1681,9 +1656,9 @@ skipitem(const char **p_format, va_list *p_va, int flags) (void) va_arg(*p_va, Py_ssize_t *); break; } - + /* string codes */ - + case 'e': /* string with encoding */ { (void) va_arg(*p_va, const char *); @@ -1693,7 +1668,7 @@ skipitem(const char **p_format, va_list *p_va, int flags) format++; /* explicit fallthrough to string cases */ } - + case 's': /* string */ case 'z': /* string or None */ case 'y': /* bytes */ @@ -1721,7 +1696,7 @@ skipitem(const char **p_format, va_list *p_va, int flags) (void) va_arg(*p_va, PyObject **); break; } - + case 'O': /* object */ { if (*format == '!') { @@ -1750,16 +1725,16 @@ skipitem(const char **p_format, va_list *p_va, int flags) } break; } - + default: err: return "impossible<bad format char>"; - + } /* The "(...)" format code for tuples is not handled here because * it is not allowed with keyword args. */ - + *p_format = format; return NULL; } @@ -1784,19 +1759,19 @@ PyArg_UnpackTuple(PyObject *args, const char *name, Py_ssize_t min, Py_ssize_t m PyErr_SetString(PyExc_SystemError, "PyArg_UnpackTuple() argument list is not a tuple"); return 0; - } + } l = PyTuple_GET_SIZE(args); if (l < min) { if (name != NULL) PyErr_Format( PyExc_TypeError, - "%s expected %s%zd arguments, got %zd", + "%s expected %s%zd arguments, got %zd", name, (min == max ? "" : "at least "), min, l); else PyErr_Format( PyExc_TypeError, "unpacked tuple should have %s%zd elements," - " but has %zd", + " but has %zd", (min == max ? "" : "at least "), min, l); va_end(vargs); return 0; @@ -1805,13 +1780,13 @@ PyArg_UnpackTuple(PyObject *args, const char *name, Py_ssize_t min, Py_ssize_t m if (name != NULL) PyErr_Format( PyExc_TypeError, - "%s expected %s%zd arguments, got %zd", + "%s expected %s%zd arguments, got %zd", name, (min == max ? "" : "at most "), max, l); else PyErr_Format( PyExc_TypeError, "unpacked tuple should have %s%zd elements," - " but has %zd", + " but has %zd", (min == max ? "" : "at most "), max, l); va_end(vargs); return 0; @@ -1827,7 +1802,7 @@ PyArg_UnpackTuple(PyObject *args, const char *name, Py_ssize_t min, Py_ssize_t m /* For type constructors that don't take keyword args * - * Sets a TypeError and returns 0 if the kwds dict is + * Sets a TypeError and returns 0 if the kwds dict is * not empty, returns 1 otherwise */ int @@ -1841,8 +1816,8 @@ _PyArg_NoKeywords(const char *funcname, PyObject *kw) } if (PyDict_Size(kw) == 0) return 1; - - PyErr_Format(PyExc_TypeError, "%s does not take keyword arguments", + + PyErr_Format(PyExc_TypeError, "%s does not take keyword arguments", funcname); return 0; } diff --git a/Python/import.c b/Python/import.c index a096519..289a1fb 100644 --- a/Python/import.c +++ b/Python/import.c @@ -76,9 +76,10 @@ extern time_t PyOS_GetLastModificationTime(char *, FILE *); 3060 (PEP 3115 metaclass syntax) 3070 (PEP 3109 raise changes) 3080 (PEP 3137 make __file__ and __name__ unicode) + 3090 (kill str8 interning) . */ -#define MAGIC (3080 | ((long)'\r'<<16) | ((long)'\n'<<24)) +#define MAGIC (3090 | ((long)'\r'<<16) | ((long)'\n'<<24)) /* Magic word as global; note that _PyImport_Init() can change the value of this global to accommodate for alterations of how the @@ -2212,14 +2213,14 @@ ensure_fromlist(PyObject *mod, PyObject *fromlist, char *buf, Py_ssize_t buflen, PyUnicode_GetSize(item), NULL); } else { - item8 = PyUnicode_AsEncodedString(item, - Py_FileSystemDefaultEncoding, NULL); + item8 = PyUnicode_AsEncodedString(item, + Py_FileSystemDefaultEncoding, NULL); } if (!item8) { PyErr_SetString(PyExc_ValueError, "Cannot encode path item"); return 0; } - subname = PyBytes_AsString(item8); + subname = PyString_AS_STRING(item8); if (buflen + strlen(subname) >= MAXPATHLEN) { PyErr_SetString(PyExc_ValueError, "Module name too long"); diff --git a/Python/mactoolboxglue.c b/Python/mactoolboxglue.c index 0714cff..454553e 100644 --- a/Python/mactoolboxglue.c +++ b/Python/mactoolboxglue.c @@ -194,7 +194,7 @@ PyObject * PyMac_BuildOSType(OSType t) { uint32_t tmp = htonl((uint32_t)t); - return PyBytes_FromStringAndSize((char *)&tmp, 4); + return PyString_FromStringAndSize((char *)&tmp, 4); } /* Convert an NumVersion value to a 4-element tuple */ diff --git a/Python/marshal.c b/Python/marshal.c index 5cc0fb8..a40aecc 100644 --- a/Python/marshal.c +++ b/Python/marshal.c @@ -36,8 +36,6 @@ #define TYPE_BINARY_COMPLEX 'y' #define TYPE_LONG 'l' #define TYPE_STRING 's' -#define TYPE_INTERNED 't' -#define TYPE_STRINGREF 'R' #define TYPE_TUPLE '(' #define TYPE_LIST '[' #define TYPE_DICT '{' @@ -231,31 +229,7 @@ w_object(PyObject *v, WFILE *p) } #endif else if (PyString_Check(v)) { - if (p->strings && PyString_CHECK_INTERNED(v)) { - PyObject *o = PyDict_GetItem(p->strings, v); - if (o) { - long w = PyInt_AsLong(o); - w_byte(TYPE_STRINGREF, p); - w_long(w, p); - goto exit; - } - else { - int ok; - o = PyInt_FromSsize_t(PyDict_Size(p->strings)); - ok = o && - PyDict_SetItem(p->strings, v, o) >= 0; - Py_XDECREF(o); - if (!ok) { - p->depth--; - p->error = 1; - return; - } - w_byte(TYPE_INTERNED, p); - } - } - else { - w_byte(TYPE_STRING, p); - } + w_byte(TYPE_STRING, p); n = PyString_GET_SIZE(v); if (n > INT_MAX) { /* huge strings are not supported */ @@ -275,14 +249,14 @@ w_object(PyObject *v, WFILE *p) return; } w_byte(TYPE_UNICODE, p); - n = PyBytes_GET_SIZE(utf8); + n = PyString_GET_SIZE(utf8); if (n > INT_MAX) { p->depth--; p->error = 1; return; } w_long((long)n, p); - w_string(PyBytes_AS_STRING(utf8), (int)n, p); + w_string(PyString_AS_STRING(utf8), (int)n, p); Py_DECREF(utf8); } else if (PyTuple_Check(v)) { @@ -389,7 +363,6 @@ w_object(PyObject *v, WFILE *p) w_byte(TYPE_UNKNOWN, p); p->error = 1; } - exit: p->depth--; } @@ -703,7 +676,6 @@ r_object(RFILE *p) } #endif - case TYPE_INTERNED: case TYPE_STRING: n = r_long(p); if (n < 0 || n > INT_MAX) { @@ -723,25 +695,6 @@ r_object(RFILE *p) retval = NULL; break; } - if (type == TYPE_INTERNED) { - PyString_InternInPlace(&v); - if (PyList_Append(p->strings, v) < 0) { - retval = NULL; - break; - } - } - retval = v; - break; - - case TYPE_STRINGREF: - n = r_long(p); - if (n < 0 || n >= PyList_GET_SIZE(p->strings)) { - PyErr_SetString(PyExc_ValueError, "bad marshal data"); - retval = NULL; - break; - } - v = PyList_GET_ITEM(p->strings, n); - Py_INCREF(v); retval = v; break; diff --git a/Python/modsupport.c b/Python/modsupport.c index 8f25ed2..144fb4f 100644 --- a/Python/modsupport.c +++ b/Python/modsupport.c @@ -504,7 +504,7 @@ do_mkvalue(const char **p_format, va_list *p_va, int flags) } n = (Py_ssize_t)m; } - v = PyBytes_FromStringAndSize(str, n); + v = PyString_FromStringAndSize(str, n); } return v; } diff --git a/Python/pythonrun.c b/Python/pythonrun.c index ec9ed02..0b3935a 100644 --- a/Python/pythonrun.c +++ b/Python/pythonrun.c @@ -75,6 +75,7 @@ int Py_VerboseFlag; /* Needed by import.c */ int Py_InteractiveFlag; /* Needed by Py_FdIsInteractive() below */ int Py_InspectFlag; /* Needed to determine whether to exit at SystemError */ int Py_NoSiteFlag; /* Suppress 'import site' */ +int Py_BytesWarningFlag; /* Warn on str(bytes) and str(buffer) */ int Py_UseClassExceptionsFlag = 1; /* Needed by bltinmodule.c: deprecated */ int Py_FrozenFlag; /* Needed by getpath.c */ int Py_IgnoreEnvironmentFlag; /* e.g. PYTHONPATH, PYTHONHOME */ @@ -234,6 +235,7 @@ Py_InitializeEx(int install_sigs) if (pstderr == NULL) Py_FatalError("Py_Initialize: can't set preliminary stderr"); PySys_SetObject("stderr", pstderr); + PySys_SetObject("__stderr__", pstderr); _PyImport_Init(); @@ -261,8 +263,28 @@ Py_InitializeEx(int install_sigs) #endif /* WITH_THREAD */ warnings_module = PyImport_ImportModule("warnings"); - if (!warnings_module) + if (!warnings_module) { PyErr_Clear(); + } + else { + PyObject *o; + char *action[8]; + + if (Py_BytesWarningFlag > 1) + *action = "error"; + else if (Py_BytesWarningFlag) + *action = "default"; + else + *action = "ignore"; + + o = PyObject_CallMethod(warnings_module, + "simplefilter", "sO", + *action, PyExc_BytesWarning); + if (o == NULL) + Py_FatalError("Py_Initialize: can't initialize" + "warning filter for BytesWarning."); + Py_DECREF(o); + } #if defined(HAVE_LANGINFO_H) && defined(CODESET) /* On Unix, set the file system encoding according to the @@ -743,6 +765,7 @@ initstdio(void) PySys_SetObject("stdout", std); Py_DECREF(std); +#if 1 /* Disable this if you have trouble debugging bootstrap stuff */ /* Set sys.stderr, replaces the preliminary stderr */ if (!(std = PyFile_FromFd(fileno(stderr), "<stderr>", "w", -1, NULL, "\n", 0))) { @@ -751,6 +774,7 @@ initstdio(void) PySys_SetObject("__stderr__", std); PySys_SetObject("stderr", std); Py_DECREF(std); +#endif if (0) { error: @@ -1339,7 +1363,7 @@ PyRun_StringFlags(const char *str, int start, PyObject *globals, PyArena *arena = PyArena_New(); if (arena == NULL) return NULL; - + mod = PyParser_ASTFromString(str, "<string>", start, flags, arena); if (mod != NULL) ret = run_mod(mod, "<string>", globals, locals, flags, arena); @@ -1356,7 +1380,7 @@ PyRun_FileExFlags(FILE *fp, const char *filename, int start, PyObject *globals, PyArena *arena = PyArena_New(); if (arena == NULL) return NULL; - + mod = PyParser_ASTFromFile(fp, filename, NULL, start, 0, 0, flags, NULL, arena); if (closeit) @@ -1705,7 +1729,7 @@ void _Py_PyAtExit(void (*func)(void)) static void call_py_exitfuncs(void) { - if (pyexitfunc == NULL) + if (pyexitfunc == NULL) return; (*pyexitfunc)(); diff --git a/Python/sysmodule.c b/Python/sysmodule.c index d140fe3..ffaa596 100644 --- a/Python/sysmodule.c +++ b/Python/sysmodule.c @@ -225,14 +225,9 @@ static PyObject * sys_intern(PyObject *self, PyObject *args) { PyObject *s; - if (!PyArg_ParseTuple(args, "S:intern", &s)) + if (!PyArg_ParseTuple(args, "U:intern", &s)) return NULL; - if (PyString_CheckExact(s)) { - Py_INCREF(s); - PyString_InternInPlace(&s); - return s; - } - else if (PyUnicode_CheckExact(s)) { + if (PyUnicode_CheckExact(s)) { Py_INCREF(s); PyUnicode_InternInPlace(&s); return s; |