summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
...
| * | Issue #16856: Fix a segmentation fault from calling repr() on a dict withSerhiy Storchaka2013-01-041-1/+1
| | | | | | | | | | | | a key whose repr raise an exception.
| * | (Merge 3.2) Issue #16455: On FreeBSD and Solaris, if the locale is C, theVictor Stinner2013-01-031-4/+4
| |\ \ | | |/ | | | | | | | | | | | | | | | ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
| | * Issue #16455: On FreeBSD and Solaris, if the locale is C, theVictor Stinner2013-01-031-4/+4
| | | | | | | | | | | | | | | | | | | | | ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
* | | Close #16281: handle tailmatch() failure and remove useless commentVictor Stinner2013-01-031-2/+9
| | | | | | | | | | | | | | | | | | | | | "honor direction and do a forward or backwards search": the runtime speed may be different, but I consider that it doesn't really matter in practice. The direction was never honored before: Python 2.7 uses memcmp() for the str type for example.
* | | Issue #16719: Get rid of WindowsError. Use OSError insteadAndrew Svetlov2012-12-191-5/+5
| | | | | | | | | | | | Patch by Serhiy Storchaka.
* | | Fix the internals of our hash functions to used unsigned values during hashGregory P. Smith2012-12-111-1/+1
|\ \ \ | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | computation as the overflow behavior of signed integers is undefined. NOTE: This change is smaller compared to 3.2 as much of this cleanup had already been done. I added the comment that my change in 3.2 added so that the code would match up. Otherwise this just adds or synchronizes appropriate UL designations on some constants to be pedantic. In practice we require compiling everything with -fwrapv which forces overflow to be defined as twos compliment but this keeps the code cleaner for checkers or in the case where someone has compiled it without -fwrapv or their compiler's equivalent. We could work to get rid of the -fwrapv requirement in 3.4 but that requires more planning. Found by Clang trunk's Undefined Behavior Sanitizer (UBSan). Cleanup only - no functionality or hash values change.
| * | Fix the internals of our hash functions to used unsigned values during hashGregory P. Smith2012-12-111-1/+1
| |\ \ | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | computation as the overflow behavior of signed integers is undefined. NOTE: This change is smaller compared to 3.2 as much of this cleanup had already been done. I added the comment that my change in 3.2 added so that the code would match up. Otherwise this just adds or synchronizes appropriate UL designations on some constants to be pedantic. In practice we require compiling everything with -fwrapv which forces overflow to be defined as twos compliment but this keeps the code cleaner for checkers or in the case where someone has compiled it without -fwrapv or their compiler's equivalent. Found by Clang trunk's Undefined Behavior Sanitizer (UBSan). Cleanup only - no functionality or hash values change.
| | * Fix the internals of our hash functions to used unsigned values during hashGregory P. Smith2012-12-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | computation as the overflow behavior of signed integers is undefined. In practice we require compiling everything with -fwrapv which forces overflow to be defined as twos compliment but this keeps the code cleaner for checkers or in the case where someone has compiled it without -fwrapv or their compiler's equivalent. Found by Clang trunk's Undefined Behavior Sanitizer (UBSan). Cleanup only - no functionality or hash values change.
| * | (Merge 3.2) Issue #16416: On Mac OS X, operating system data are now alwaysVictor Stinner2012-12-031-4/+5
| |\ \ | | |/ | | | | | | | | | | | | | | | encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding (which may be ASCII if no locale environment variable is set), to avoid inconsistencies with os.fsencode() and os.fsdecode() functions which are already using UTF-8/surrogateescape.
| | * Issue #16416: On Mac OS X, operating system data are now alwaysVictor Stinner2012-12-031-4/+5
| | | | | | | | | | | | | | | | | | | | | encoded/decoded to/from UTF-8/surrogateescape, instead of the locale encoding (which may be ASCII if no locale environment variable is set), to avoid inconsistencies with os.fsencode() and os.fsdecode() functions which are already using UTF-8/surrogateescape.
* | | Cleanup unicodeobject.cVictor Stinner2012-12-041-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | * Remove micro-optization: (errors == "surrogateescape" || strcmp(errors, "surrogateescape") == 0). Only use strcmp() * Initialize 'arg' members in unicode_format_arg() to help the compiler to diagnose real bugs and also make the code simpler to read
* | | Issue #16455: On FreeBSD and Solaris, if the locale is C, theVictor Stinner2012-12-041-11/+13
| | | | | | | | | | | | | | | | | | | | | ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
* | | Issue #16215: Fix potential double memory free in str.replace().Antoine Pitrou2012-11-171-0/+2
|\ \ \ | |/ / | | | | | | Patch by Serhiy Storchaka.
| * | Issue #16215: Fix potential double memory free in str.replace().Antoine Pitrou2012-11-171-0/+2
| | | | | | | | | | | | Patch by Serhiy Storchaka.
* | | Issue #16416: Fix error handling in _Py_wchar2char() _Py_char2wchar() functionsVictor Stinner2012-11-121-4/+5
| | |
* | | Close #16311: Use the _PyUnicodeWriter API in text decodersVictor Stinner2012-11-061-324/+328
| | | | | | | | | | | | | | | | | | | | | | | | * Remove unicode_widen(): replaced with _PyUnicodeWriter_Prepare() * Remove unicode_putchar(): replaced with PyUnicodeWriter_Prepare() + PyUnicode_WRITER() * When handling an decoding error, only overallocate the buffer by +25% instead of +100%
* | | #8271: merge with 3.3.Ezio Melotti2012-11-041-6/+4
|\ \ \ | |/ /
| * | #8271: the utf-8 decoder now outputs the correct number of U+FFFD ↵Ezio Melotti2012-11-041-6/+4
| | | | | | | | | | | | characters when used with the "replace" error handler on invalid utf-8 sequences. Patch by Serhiy Storchaka, tests by Ezio Melotti.
* | | merge 3.3 (#16369)Benjamin Peterson2012-10-311-0/+6
|\ \ \ | |/ /
| * | merge 3.2 (#16369)Benjamin Peterson2012-10-311-0/+6
| |\ \ | | |/
| | * initialize more global type objects (closes #16369)Benjamin Peterson2012-10-311-0/+6
| | |
| | * Issue #14700: Fix buggy overflow checks for large precision and width in ↵Mark Dickinson2012-10-281-2/+2
| | | | | | | | | | | | new-style and old-style formatting.
* | | Close #14625: Rewrite the UTF-32 decoder. It is now 3x to 4x fasterVictor Stinner2012-10-301-73/+69
| | | | | | | | | | | | Patch written by Serhiy Storchaka.
* | | Issue #16330: Use surrogate-related macrosVictor Stinner2012-10-301-4/+3
| | | | | | | | | | | | Patch written by Serhiy Storchaka.
* | | Replace usage of the deprecated Py_UNICODE_COPY() with Py_MEMCPY() in ↵Victor Stinner2012-10-231-2/+2
| | | | | | | | | | | | resize_copy()
* | | Optimize _PyUnicode_HasNULChars(): use findchar() instead of ↵Victor Stinner2012-10-231-7/+9
| | | | | | | | | | | | PyUnicode_Contains()
* | | Inline raise_translate_exception(): it is only used onceVictor Stinner2012-10-231-15/+4
| | |
* | | Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp()Victor Stinner2012-10-231-26/+44
| | |
* | | Issue #16166: Add PY_LITTLE_ENDIAN and PY_BIG_ENDIAN macros and unifiedChristian Heimes2012-10-171-13/+5
| | | | | | | | | | | | endianess detection and handling.
* | | Issue #14783: Merge changes from 3.3.Chris Jerdonek2012-10-071-1/+2
|\ \ \ | |/ /
| * | Issue #14783: Merge changes from 3.2.Chris Jerdonek2012-10-071-1/+2
| |\ \ | | |/
| | * Issue #14783: Improve int() docstring and also str(), range(), and slice().Chris Jerdonek2012-10-071-1/+2
| | | | | | | | | | | | | | | | | | This commit rewrites the docstring for int() to incorporate the documentation changes made in issue #16036. It also switches the docstrings for int(), str(), range(), and slice() to use multi-line signatures.
* | | Cleanup PyUnicode_FromFormatV() for zero paddingVictor Stinner2012-10-061-1/+5
| | | | | | | | | | | | | | | Skip the "0" instead of parsing it twice: detect zero padding and then parsed as a digit of the width.
* | | Issue #16147: PyUnicode_FromFormatV() doesn't need anymore to allocate a bufferVictor Stinner2012-10-061-46/+14
| | | | | | | | | | | | on the heap to format numbers.
* | | Issue #16147: PyUnicode_FromFormatV() now raises an error if the argument ofVictor Stinner2012-10-061-0/+5
| | | | | | | | | | | | '%c' is not in the range(0x110000).
* | | Issue #16147: PyUnicode_FromFormatV() now detects integer overflow when parsingVictor Stinner2012-10-061-1/+11
| | | | | | | | | | | | width and precision
* | | Issue #16147: Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter APIVictor Stinner2012-10-061-483/+331
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Simplify the code: replace 4 steps with one unique step using the _PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to store intermediate results which require to allocate an array of pointers on the heap. * Use the _PyUnicodeWriter API for speed (and its convinient API): overallocate the buffer to reduce the number of "realloc()" * Implement "width" and "precision" in Python, don't rely on sprintf(). It avoids to need of a temporary buffer allocated on the heap: only use a small buffer allocated in the stack. * Add _PyUnicodeWriter_WriteCstr() function * Split PyUnicode_FromFormatV() into two functions: add unicode_fromformat_arg(). * Inline parse_format_flags(): the format of an argument is now only parsed once, it's no more needed to have a subfunction. * Optimize PyUnicode_FromFormatV() for characters between two "%" arguments: search the next "%" and copy the substring in one chunk, instead of copying character per character.
* | | Issue #16096: Merge fixes from 3.3.Mark Dickinson2012-10-061-14/+9
|\ \ \ | |/ /
| * | Issue #16096: Fix several occurrences of potential signed integer overflow. ↵Mark Dickinson2012-10-061-14/+9
| | | | | | | | | | | | Thanks Serhiy Storchaka.
* | | In debug mode, unicode_write_cstr() now checks that non-ASCII characters areVictor Stinner2012-10-051-0/+8
| | | | | | | | | | | | not written into an ASCII string
* | | #16127: merge with 3.3.Ezio Melotti2012-10-051-10/+4
|\ \ \ | |/ /
| * | #16127: remove outdated references to narrow builds. Patch by Serhiy Storchaka.Ezio Melotti2012-10-051-10/+4
| | |
| * | Fix PyUnicode_Format(): return NULL if PyUnicode_READY(uformat) failedVictor Stinner2012-10-041-1/+3
| | | | | | | | | | | | | | | This error cannot occur in practice: PyUnicode_FromObject() always return a "ready" string.
* | | Optimize unicode_compare(): use memcmp() when comparing two UCS1 stringsVictor Stinner2012-10-041-8/+25
| | |
* | | Enable also ptr==ptr optimization in PyUnicode_Compare()Victor Stinner2012-10-041-4/+5
| | | | | | | | | | | | It was already implemented in PyUnicode_RichCompare()
* | | unicode_result_wchar(): move the assert() to the "#ifdef Py_DEBUG" blockVictor Stinner2012-10-041-3/+3
| | |
* | | Split the huge PyUnicode_Format() function (+540 lines) into subfunctionsVictor Stinner2012-10-041-472/+605
| | |
* | | PyUnicode_Format(): disable overallocation when we are writing the last partVictor Stinner2012-10-031-1/+3
| | | | | | | | | | | | of the output string
* | | Unicode: resize_compact() and resize_inplace() fills also the Unicode stringsVictor Stinner2012-10-031-5/+33
| | | | | | | | | | | | with invalid bytes in debug mode, as done by PyUnicode_New()
* | | Issue #15609: Fix refleak introduced by my last optimizationVictor Stinner2012-10-021-1/+4
| | |