summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* (Merge 3.2) Issue #13913: normalize utf-8 codec name in UTF-8 decoderVictor Stinner2012-02-141-1/+1
|\
| * Issue #13913: normalize utf-8 codec name in UTF-8 decoderVictor Stinner2012-02-141-1/+1
| |
* | Backout d2c1521ad0a1: _Py_IDENTIFIER() uses UTF-8 againVictor Stinner2012-02-071-2/+3
| |
* | _Py_Identifier are always ASCII stringsVictor Stinner2012-02-051-3/+2
| |
* | Issue #13848: open() and the FileIO constructor now check for NUL characters ↵Antoine Pitrou2012-01-291-0/+13
|\ \ | |/ | | | | | | | | in the file name. Patch by Hynek Schlawack.
| * Issue #13848: open() and the FileIO constructor now check for NUL characters ↵Antoine Pitrou2012-01-291-0/+13
| | | | | | | | | | | | in the file name. Patch by Hynek Schlawack.
* | don't ready in case_operation, since most callers do it themselvesBenjamin Peterson2012-01-161-2/+5
| |
* | Consolidate the occurrances of the prime used as the multiplier when hashing.Gregory P. Smith2012-01-141-1/+1
|\ \ | |/
| * Consolidate the occurrances of the prime used as the multiplier when hashingGregory P. Smith2012-01-141-1/+1
| | | | | | | | | | | | | | to a single #define instead of having several copies in several files. This excludes the Modules/ tree (datetime and expat both have a copy for their own purposes with no need for it to be the same).
* | fix possible refleaks if PyUnicode_READY failsBenjamin Peterson2012-01-141-3/+15
| |
* | always explicitly check for -1 from PyUnicode_READYBenjamin Peterson2012-01-141-35/+35
| |
* | add str.casefold() (closes #13752)Benjamin Peterson2012-01-141-0/+35
| |
* | move do_title to a better placeBenjamin Peterson2012-01-131-28/+28
| |
* | make fix_decimal_and_space_to_ascii check if it modifies the stringBenjamin Peterson2012-01-121-1/+3
| |
* | kill capwords implementation which has been disabled since the beginingBenjamin Peterson2012-01-121-42/+0
| |
* | use full unicode mappings for upper/lower/title case (#12736)Benjamin Peterson2012-01-111-149/+196
| | | | | | | | Also broaden the category of characters that count as lowercase/uppercase.
* | Add a new PyUnicode_Fill() functionVictor Stinner2012-01-031-0/+35
| | | | | | | | | | It is faster than the unicode_fill() function which was implemented in formatter_unicode.c.
* | also decref the right thingBenjamin Peterson2012-01-021-1/+1
| |
* | ready the correct stringBenjamin Peterson2012-01-021-1/+1
| |
* | fix some possible refleaks from PyUnicode_READY error conditionsBenjamin Peterson2012-01-021-21/+53
| |
* | == -1 is conventionBenjamin Peterson2012-01-011-1/+1
| |
* | make switch more robustBenjamin Peterson2012-01-011-1/+2
| |
* | 4 space indentationBenjamin Peterson2011-12-201-13/+13
| |
* | fix spacing around switch statementsBenjamin Peterson2011-12-201-23/+22
| |
* | merge 3.2Benjamin Peterson2011-12-201-1/+5
|\ \ | |/
| * fix possible if unlikely leakBenjamin Peterson2011-12-201-1/+5
| |
* | Issue #13624: Write a specialized UTF-8 encoder to allow more optimizationVictor Stinner2011-12-181-149/+12
| | | | | | | | The main bottleneck was the PyUnicode_READ() macro.
* | Optimize str * n for len(str)==1 and UCS-2 or UCS-4Victor Stinner2011-12-181-4/+11
| |
* | Issue #13621: Optimize str.replace(char1, char2)Victor Stinner2011-12-181-9/+21
| | | | | | | | | | Use findchar() which is more optimized than a dummy loop using PyUnicode_READ(). PyUnicode_READ() is a complex and slow macro.
* | Issue #10951: Fix compiler warnings in timemodule.c and unicodeobject.cVictor Stinner2011-12-171-1/+1
|\ \ | |/ | | | | Thanks Jérémy Anger for the fix.
| * Issue #13093: Fix error handling on PyUnicode_EncodeDecimal()Victor Stinner2011-11-221-6/+4
| | | | | | | | | | * Add tests for PyUnicode_EncodeDecimal() and PyUnicode_TransformDecimalToASCII() * Remove the unused "e" variable in replace()
* | The locale decoder raises a UnicodeDecodeError instead of an OSErrorVictor Stinner2011-12-171-17/+86
| | | | | | | | Search the invalid character using mbrtowc().
* | Issue #13560: Locale codec functions use the classic "errors" parameter,Victor Stinner2011-12-171-7/+38
| | | | | | | | | | | | instead of surrogateescape So it would be possible to support more error handlers later.
* | What's New in Python 3.3: complete the deprecation listVictor Stinner2011-12-171-0/+2
| | | | | | | | Add also FIXMEs in unicodeobject.c
* | Issue #13560: os.strerror() now uses the current locale encoding instead of ↵Victor Stinner2011-12-171-8/+20
| | | | | | | | UTF-8
* | Issue #13560: Add PyUnicode_EncodeLocale()Victor Stinner2011-12-171-32/+135
| | | | | | | | | | | | * Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not available * Document my last changes in Misc/NEWS
* | Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()Victor Stinner2011-12-161-17/+78
| | | | | | | | | | | | | | | | | | | | | | * PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string from the current locale encoding * _Py_char2wchar() writes an "error code" in the size argument to indicate if the function failed because of memory allocation failure or because of a decoding error. The function doesn't write the error message directly to stderr. * Fix time.strftime() (if wcsftime() is missing): decode strftime() result from the current locale encoding, not from the filesystem encoding.
* | PyUnicode_Resize(): warn about canonical representationVictor Stinner2011-12-121-12/+13
| | | | | | | | Call also directly unicode_resize() in unicodeobject.c
* | Fix PyUnicode_Resize() for compact string: leave the string unchanged on errorVictor Stinner2011-12-121-20/+9
| | | | | | | | Fix also PyUnicode_Resize() doc
* | Make PyUnicode_Copy() private => _PyUnicode_Copy()Victor Stinner2011-12-121-6/+6
| | | | | | | | | | | | Undocument the function. Make also decode_utf8_errors() as private (static).
* | resize_copy() now supports legacy ready stringsVictor Stinner2011-12-111-13/+15
| |
* | Rewrite PyUnicode_Append(); unicode_modifiable() is more strictVictor Stinner2011-12-111-79/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | * Rename unicode_resizable() to unicode_modifiable() * Rename _PyUnicode_Dirty() to unicode_check_modifiable() to make it clear that the function is private * Inline PyUnicode_Concat() and unicode_append_inplace() in PyUnicode_Append() to simplify the code * unicode_modifiable() return 0 if the hash has been computed or if the string is not an exact unicode string * Remove _PyUnicode_DIRTY(): no need to reset the hash anymore, because if the hash has already been computed, you cannot modify a string inplace anymore * PyUnicode_Concat() checks for integer overflow
* | Create unicode_result_unchanged() subfunctionVictor Stinner2011-12-111-69/+48
| |
* | Fix fixup() for unchanged unicode subtypeVictor Stinner2011-12-111-33/+33
| | | | | | | | If maxchar_new == 0 and self is a unicode subtype, return u instead of duplicating u.
* | unicode_fromascii() doesn't check string content twice in debug modeVictor Stinner2011-12-111-6/+3
| | | | | | | | _PyUnicode_CheckConsistency() also checks string content.
* | Call directly PyUnicode_DecodeUTF8Stateful() instead of PyUnicode_DecodeUTF8()Victor Stinner2011-12-111-33/+14
| | | | | | | | | | | | | | | | * Remove micro-optimization from PyUnicode_FromStringAndSize(): PyUnicode_DecodeUTF8Stateful() has already these optimizations (for size=0 and one ascii char). * Rename utf8_max_char_size_and_char_count() to utf8_scanner(), and remove an useless variable
* | Use directly unicode_empty instead of PyUnicode_New(0, 0)Victor Stinner2011-12-111-6/+12
| |
* | Move the slowest UTF-8 decoder to its own subfunctionVictor Stinner2011-12-111-128/+98
| | | | | | | | | | | | | | * Create decode_utf8_errors() * Reuse unicode_fromascii() * decode_utf8_errors() doesn't refit at the beginning * Remove refit_partial_string(), use unicode_adjust_maxchar() instead
* | Fix error handling in resize_compact()Victor Stinner2011-12-111-5/+9
| |
* | PyUnicode_FromWideChar() and PyUnicode_FromUnicode() raise a ValueError if aVictor Stinner2011-12-081-33/+34
| | | | | | | | character in not in range [U+0000; U+10ffff].