| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
* Use the 'p' format unit instead of manually called PyObject_IsTrue().
* Pass boolean value instead 0/1 integers to functions that needs boolean.
* Convert some arguments to boolean only once.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The documented definition was much broader than the real one:
there are tons of characters with general category "Other",
and we don't (and shouldn't) treat most of them as whitespace.
Rewrite the definition to agree with the comment on
_PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py,
which is what generates that function and so ultimately governs.
Add suitable breadcrumbs so that a reader who wants to pin down
exactly what this definition means (what's a "bidirectional class"
of "B"?) can do so. The `unicodedata` module documentation is an
appropriate central place for our references to Unicode's own copious
documentation, so point there.
Also add to the isspace() test a thorough check that the
implementation agrees with the intended definition.
|
| |
|
|
|
|
|
|
|
|
|
| |
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().
By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.
|
|
|
|
| |
uppercasing it (GH-12804)
|
| |
|
|
|
|
| |
Add also tests for PyUnicode_FromFormat() and PyBytes_FromFormat()
with empty result.
|
|
|
| |
This reverts commit 886483e2b9bbabf60ab769683269b873381dd5ee.
|
|
|
|
|
|
|
|
|
| |
* Add %T format to PyUnicode_FromFormatV(), and so to
PyUnicode_FromFormat() and PyErr_Format(), to format an object type
name: equivalent to "%s" with Py_TYPE(obj)->tp_name.
* Replace Py_TYPE(obj)->tp_name with %T format in unicodeobject.c.
* Add unit test on %T format.
* Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8(), to make the intent more explicit.
|
|
|
|
|
|
|
| |
starting with "+". (GH-8741)
The UTF-7 decoder now raises UnicodeDecodeError for ill-formed
sequences starting with "+" (as specified in RFC 2152).
|
| |
|
|
|
|
|
| |
in int(), float() and complex() parsers.
This also speeds up parsing non-ASCII numbers by around 20%.
|
|
|
| |
Previously any exception was replaced with a KeyError exception.
|
|
|
| |
Make also minor PEP8 coding style fixes on modified imports.
|
|
|
|
|
|
|
|
| |
operations (#51)
When you use `'%s' % SubClassOfStr()`, where `SubClassOfStr.__rmod__` exists, the reverse operation is ignored as normally such string formatting operations use the `PyUnicode_Format()` fast path. This patch tests for subclasses of `str` first and picks the slow path in that case.
Patch by Martijn Pieters.
|
|\ |
|
| |\ |
|
| | | |
|
| | | |
|
|/ / |
|
|\ \
| |/ |
|
| | |
|
| |
| |
| |
| | |
escapes. Backport to 3.6.
|
|\ \
| |/ |
|
| |
| |
| |
| | |
Patch by Xiang Zhang.
|
| |
| |
| |
| | |
Reported by Xiang Zhang.
|
|\ \
| |/
| |
| | |
Original patch by Xiang Zhang.
|
| |
| |
| |
| | |
Original patch by Xiang Zhang.
|
|\ \
| |/ |
|
| | |
|
| |
| |
| |
| | |
Patch by Emanuel Barry, reviewed by Serhiy Storchaka and Martin Panter.
|
| |
| |
| |
| |
| |
| |
| | |
And most of the tools.
Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
|
|/
|
|
|
|
|
|
|
|
| |
- Issue #25958: Support "anti-registration" of special methods from
various ABCs, like __hash__, __iter__ or __len__. All these (and
several more) can be set to None in an implementation class and the
behavior will be as if the method is not defined at all.
(Previously, this mechanism existed only for __hash__, to make
mutable classes unhashable.) Code contributed by Andrew Barnert and
Ivan Levkivskyi.
|
|
|
|
| |
This affects documentation, code comments, and a debugging messages.
|
|
|
|
| |
This eliminates a few redundant test cases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ByteArrayAsStringTest.fixtype() was converting test data to bytes, not byte-
array, therefore many of the test cases inherited in this class were not
actually being run on the bytearray type.
The tests in buffer_tests.py were redundant with methods in string_tests
.MixinStrUnicodeUserStringTest and string_tests.CommonTest. These methods are
now moved into string_tests.BaseTest, where they will also get run for bytes
and bytearray.
This change also moves test_additional_split(), test_additional_rsplit(), and
test_strip() from CommonTest to BaseTest, meaning these tests are now run for
bytes and bytearray. I plan to eliminate redundancies with existing tests in
test_bytes.py soon.
|
|
|
|
|
|
| |
Affected classes are generic sequence iterators, iterators of str, bytes,
bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding
views and os.scandir() iterator.
|
|
|
|
| |
Initialize i variable if the string is non-ASCII.
|
|
|
|
|
|
| |
Issue #26464: Fix str.translate() when string is ASCII and first replacements
removes character, but next replacement uses a non-ASCII character or a string
longer than 1 character. Regression introduced in Python 3.5.0.
|
|\ |
|
| | |
|
|\ \
| |/
| |
| |
| | |
__bytes__, __trunc__, and __float__ returning instances of subclasses of
bytes, int, and float to subclasses of bytes, int, and float correspondingly.
|
| |
| |
| |
| |
| | |
__bytes__, __trunc__, and __float__ returning instances of subclasses of
bytes, int, and float to subclasses of bytes, int, and float correspondingly.
|
|\ \
| |/ |
|
| | |
|
|\ \
| |/
| |
| |
| |
| |
| | |
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
3. In some circumstances the '\xfd' character was produced instead of the
replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).
|
| |
| |
| |
| |
| | |
1. Non-ASCII bytes were accepted after shift sequence.
2. A low surrogate could be emitted in case of error in high surrogate.
|
| | |
|
| | |
|
|\ \
| |/ |
|