| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in
RFC 3629, but leave it commented out since it's not backward compatible;
4) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
5) Add an extensive set of tests in test_unicode;
6) Fix test_codeccallbacks because it was failing after this change.
|
|
|
|
| |
tests, it was missing there.)
|
|
|
|
|
|
| |
__unicode__ method (and not the __str__ method).
Simplify also the testcase.
|
| |
|
|
|
|
| |
effectively raised. A new utility ``check_py3k_warnings`` deals with py3k warnings.
|
|
|
|
| |
=> raise an UnicodeDecodeError. Patch written by Ezio Melotti.
|
| |
|
|
|
|
| |
`rindex`, `rsplit` and `rpartition` methods. Patch by Florent Xicluna.
|
|
|
|
|
|
|
| |
any errors that might occur during coercion of the left operand and
turning them into a TypeError with a message text that was confusing in
the given context. This patch lets any errors through, as was already
done during coercion of the right hand side.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
sequences.
Patch by Nick Barnes and Victor Stinner.
|
|
|
|
| |
to the original backport from py3k.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For simple uses for str.format(), this makes the typing easier. Hopfully this
will help in the adoption of str.format().
For example:
'The {} is {}'.format('sky', 'blue')
You can mix and matcth auto-numbering and named replacement fields:
'The {} is {color}'.format('sky', color='blue')
But you can't mix and match auto-numbering and specified numbering:
'The {0} is {}'.format('sky', 'blue')
ValueError: cannot switch from manual field specification to automatic field numbering
Will port to 3.1.
|
|
|
|
| |
Reviewed by Benjamin Peterson on IRC.
|
| |
|
|
|
|
|
|
|
|
|
| |
the freelist contained half-initialized objects with freed pointers.
The comment
/* XXX UNREF/NEWREF interface should be more symmetrical */
was copied from tupleobject.c, and appears in some other places.
I sign the petition.
|
| |
|
|
|
|
|
| |
Corrected the raw-unicode-escape codec to use UTF-16 surrogates in
this case, just like the unicode-escape codec.
|
| |
|
|
|
|
| |
case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Highlights:
- Adding PyObject_Format.
- Adding string.Format class.
- Adding __format__ for str, unicode, int, long, float, datetime.
- Adding builtin format.
- Adding ''.format and u''.format.
- str/unicode fixups for formatters.
The files in Objects/stringlib that implement PEP 3101 (stringdefs.h,
unicodedefs.h, formatter.h, string_format.h) are identical in trunk
and py3k. Any changes from here on should be made to trunk, and
changes will propogate to py3k).
|
| |
|
| |
|
|
|
|
|
|
|
| |
a large width is passed on 32-bit platforms. Found by Google.
It would be good for people to review this especially carefully and verify
I don't have an off by one error and there is no other way to cause overflow.
|
|
|
|
| |
run_unittest() and run_suite()). Also, add functionality to run_unittest() that admits usage of unittest.TestLoader.loadTestsFromModule().
|
|
|
|
|
|
| |
a unicode string in a build with wide unicode (UCS-4) support.
This code could be improved, so add an XXX comment.
|
| |
|
|
|
|
|
| |
On the way, add a decorator to test_support to facilitate running single
test functions in different locales with automatic cleanup.
|
| |
|
| |
|
|
|
|
| |
just like string codecs.
|
|
|
|
|
|
| |
so it is only executed once. Otherwise the same search function is
repeated added to the codec search path when regrtest is run with -R
and leaks are reported.
|
|
|
|
|
| |
unicode instance if the argument is not an instance of basestring and
calling __str__ on the argument returns a unicode instance.
|
|
|
|
|
|
|
| |
conversion using the proper magic slot (e.g., __int__()). Also move conversion
code out of PyNumber_*() functions in the C API into the nb_* function.
Applied patch #1109424. Thanks Walter Doewald.
|
|
|
|
|
|
| |
it can be used for str and unicode. Drop the test for
"".join([s]) is s
because this is an implementation detail (and doesn't work for unicode)
|
|
|
|
|
|
|
|
|
|
|
|
| |
unicodedata.east_asian_width(). You can still implement your own
simple width() function using it like this:
def width(u):
w = 0
for c in unicodedata.normalize('NFC', u):
cwidth = unicodedata.east_asian_width(c)
if cwidth in ('W', 'F'): w += 2
else: w += 1
return w
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
|
|
|
|
| |
characters instead of character pointers to determine space requirements.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
and test_support.run_classtests() into run_unittest()
and use it wherever possible.
Also don't use "from test.test_support import ...", but
"from test import test_support" in a few spots.
From SF patch #662807.
|