summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* Issue #17237: Fix crash in the ASCII decoder on m68k.Antoine Pitrou2013-05-111-0/+9
|\
| * Issue #17237: Fix crash in the ASCII decoder on m68k.Antoine Pitrou2013-05-111-0/+9
| |
* | Fix uninitialized value in charmap_decode_mapping()Victor Stinner2013-05-061-1/+1
| |
* | Issue #7330: Implement width and precision (ex: "%5.3s") for the format stringVictor Stinner2013-05-061-46/+109
| | | | | | | | of PyUnicode_FromFormat() function, original patch written by Ysj Ray.
* | Partial revert of changeset 9744b2df134cVictor Stinner2013-04-181-5/+4
| | | | | | | | | | PyUnicode_Append() cannot call directly resize_compact(): I forgot that a string can be ready *and* not compact (a legacy string can also be ready).
* | Split PyUnicode_DecodeCharmap() into subfunction for readabilityVictor Stinner2013-04-171-178/+213
| |
* | Fix bug in Unicode decoders related to _PyUnicodeWriterVictor Stinner2013-04-171-6/+14
| | | | | | | | Bug introduced by changesets 7ed9993d53b4 and edf029fc9591.
* | Fix typo in unicode_decode_call_errorhandler_writer()Victor Stinner2013-04-171-1/+1
| | | | | | | | Bug introduced by changeset 7ed9993d53b4.
* | Close #17694: Add minimum length to _PyUnicodeWriterVictor Stinner2013-04-171-54/+57
| | | | | | | | | | | | | | | | | | | | | | | | * Add also min_char attribute to _PyUnicodeWriter structure (currently unused) * _PyUnicodeWriter_Init() has no more argument (except the writer itself): min_length and overallocate must be set explicitly * In error handlers, only enable overallocation if the replacement string is longer than 1 character * CJK decoders don't use overallocation anymore * Set min_length, instead of preallocating memory using _PyUnicodeWriter_Prepare(), in many decoders * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
* | Cleanup PyUnicode_Contains()Victor Stinner2013-04-141-11/+6
| | | | | | | | | | | | * No need to double-check that strings are ready: test already done by PyUnicode_FromObject() * Remove useless kind variable (use kind1 instead)
* | Minor change: fix character in do_strip() for the ASCII caseVictor Stinner2013-04-141-2/+2
| |
* | Cleanup PyUnicode_Append()Victor Stinner2013-04-141-18/+14
| | | | | | | | | | | | | | * Check also that right is a Unicode object * call directly resize_compact() instead of unicode_resize() for a more explicit error handling, and to avoid testing some properties twice (ex: unicode_modifiable())
* | PyUnicode_Join(): move use_memcpy test out of the loop to cleanup and ↵Victor Stinner2013-04-141-20/+28
| | | | | | | | optimize the code
* | Optimize repr(str): use _PyUnicode_FastCopyCharacters() when no character is ↵Victor Stinner2013-04-141-69/+78
| | | | | | | | escaped
* | Optimize ascii(str): don't encode/decode repr if repr is already ASCIIVictor Stinner2013-04-141-1/+1
| |
* | Add _PyUnicodeWriter_WriteCharInline()Victor Stinner2013-04-141-71/+35
| |
* | Issue #16061: Speed up str.replace() for replacing 1-character strings.Serhiy Storchaka2013-04-131-26/+38
| |
* | Close #17693: Rewrite CJK decoders to use the _PyUnicodeWriter API instead ofVictor Stinner2013-04-111-0/+10
| | | | | | | | | | | | the legacy Py_UNICODE API. Add also a new _PyUnicodeWriter_WriteChar() function.
* | Issue #17615: On Windows (VS2010), Performances of wmemcmp() to compare UnicodeVictor Stinner2013-04-091-9/+0
| | | | | | | | | | | | | | | | strings are not convincing. For UCS2 (16-bit wchar_t type), use a dummy loop instead of wmemcmp(). The dummy loop is as fast, or a little bit faster. wchar_t is only 16-bit long on Windows. wmemcmp() is still used for 32-bit wchar_t.
* | replace(): only call PyUnicode_DATA(u) onceVictor Stinner2013-04-091-3/+4
| |
* | Write super-fast version of str.strip(), str.lstrip() and str.rstrip() for ↵Victor Stinner2013-04-091-19/+45
| | | | | | | | pure ASCII
* | Don't calls macros in PyUnicode_WRITE() parametersVictor Stinner2013-04-091-2/+10
| | | | | | | | PyUnicode_WRITE() expands some parameters twice or more.
* | Fix do_strip(): don't call PyUnicode_READ() in Py_UNICODE_ISSPACE() to not callVictor Stinner2013-04-091-3/+10
| | | | | | | | it twice
* | Fix _PyUnicode_XStrip()Victor Stinner2013-04-091-10/+18
| | | | | | | | | | | | Inline the BLOOM_MEMBER() to only call PyUnicode_READ() only once (per loop iteration). Store also the length of the seperator in a variable to avoid calls to PyUnicode_GET_LENGTH().
* | Optimize PyUnicode_DecodeCharmap()Victor Stinner2013-04-091-7/+9
| | | | | | | | | | Avoid expensive PyUnicode_READ() and PyUnicode_WRITE(), manipulate pointers instead.
* | Optimize make_bloom_mask(), used by str.strip(), str.lstrip() and str.rstrip()Victor Stinner2013-04-091-5/+27
| | | | | | | | | | Write specialized functions per Unicode kind to avoid the expensive PyUnicode_READ() macro.
* | Use PyUnicode_READ() instead of PyUnicode_READ_CHAR()Victor Stinner2013-04-091-6/+22
| | | | | | | | | | "PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it calls PyUnicode_KIND() and might call it twice." according to its documentation.
* | Add fast-path in PyUnicode_DecodeCharmap() for pure 8 bit encodings:Victor Stinner2013-04-091-1/+26
| | | | | | | | cp037, cp500 and iso8859_1 codecs
* | Issue #17615: Comparing two Unicode strings now uses wmemcmp() when possibleVictor Stinner2013-04-081-0/+22
| | | | | | | | | | wmemcmp() is twice faster than a dummy loop (342 usec vs 744 usec) on Fedora 18/x86_64, GCC 4.7.2.
* | Issue #17615: Expand expensive PyUnicode_READ() macro in unicode_compare():Victor Stinner2013-04-081-17/+77
| | | | | | | | write specialized functions for each combination of Unicode kinds.
* | fix unused variableVictor Stinner2013-04-031-1/+0
| |
* | Close #16757: Avoid calling the expensive _PyUnicode_FindMaxChar() functionVictor Stinner2013-04-031-7/+10
| | | | | | | | when possible
* | Add _PyUnicodeWriter_WriteSubstring() functionVictor Stinner2013-04-021-9/+39
| | | | | | | | | | | | | | | | | | Write a function to enable more optimizations: * If the substring is the whole string and overallocation is disabled, just keep a reference to the string, don't copy characters * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when possible
* | mergeRaymond Hettinger2013-03-231-1/+4
|\ \ | |/
| * Issue 17447: Clarify that str.isidentifier doesn't check for reserved keywords.Raymond Hettinger2013-03-231-1/+4
| |
* | (Merge 3.3) _PyUnicode_Writer() now also reuses Unicode singletons:Victor Stinner2013-03-061-1/+1
|\ \ | |/ | | | | empty string and latin1 single character
| * _PyUnicode_Writer() now also reuses Unicode singletons:Victor Stinner2013-03-061-1/+1
| | | | | | | | empty string and latin1 single character
* | Backed out changeset b9f7b1bf36aaVictor Stinner2013-03-061-12/+7
| |
* | Issue #17223: Fix PyUnicode_FromUnicode() on Windows (16-bit wchar_t type)Victor Stinner2013-03-051-7/+12
| | | | | | | | to reject invalid UTF-16 surrogate.
* | (Merge 3.3) Issue #17223: Fix PyUnicode_FromUnicode() for string of 1 characterVictor Stinner2013-02-251-7/+7
|\ \ | |/ | | | | outside the range U+0000-U+10ffff.
| * Issue #17223: Fix PyUnicode_FromUnicode() for string of 1 character outsideVictor Stinner2013-02-251-7/+7
| | | | | | | | the range U+0000-U+10ffff.
* | (Merge 3.3) Issue #17137: When an Unicode string is resized, the internal wideVictor Stinner2013-02-071-0/+4
|\ \ | |/ | | | | character string (wstr) format is now cleared.
| * Issue #17137: When an Unicode string is resized, the internal wide characterVictor Stinner2013-02-071-0/+4
| | | | | | | | string (wstr) format is now cleared.
* | Issue #17043: The unicode-internal decoder no longer read past the end ofSerhiy Storchaka2013-02-071-26/+22
|\ \ | |/ | | | | input buffer.
| * Issue #17043: The unicode-internal decoder no longer read past the end ofSerhiy Storchaka2013-02-071-26/+22
| |\ | | | | | | | | | input buffer.
| | * Issue #17043: The unicode-internal decoder no longer read past the end ofSerhiy Storchaka2013-02-071-27/+24
| | | | | | | | | | | | input buffer.
* | | Issue #16971: Fix a refleak in the charmap decoder.Serhiy Storchaka2013-01-291-4/+12
|\ \ \ | |/ /
| * | Issue #16971: Fix a refleak in the charmap decoder.Serhiy Storchaka2013-01-291-4/+13
| | |
* | | Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder.Serhiy Storchaka2013-01-291-51/+29
|\ \ \ | |/ /
| * | Issue #16979: Fix error handling bugs in the unicode-escape-decode decoder.Serhiy Storchaka2013-01-291-52/+30
| |\ \ | | |/