summaryrefslogtreecommitdiffstats
path: root/Include/unicodeobject.h
Commit message (Collapse)AuthorAgeFilesLines
* Issue #19512: add _PyUnicode_CompareWithId() functionVictor Stinner2013-11-061-0/+5
| | | | | | | _PyUnicode_CompareWithId() is faster than PyUnicode_CompareWithASCIIString() when both strings are equal and interned. Add also _PyId_builtins identifier for "builtins" common string.
* Issue #18722: Remove uses of the "register" keyword in C code.Antoine Pitrou2013-08-131-4/+4
|
* fix typo in a commentVictor Stinner2013-04-181-1/+1
|
* Close #17694: Add minimum length to _PyUnicodeWriterVictor Stinner2013-04-171-7/+13
| | | | | | | | | | | | * Add also min_char attribute to _PyUnicodeWriter structure (currently unused) * _PyUnicodeWriter_Init() has no more argument (except the writer itself): min_length and overallocate must be set explicitly * In error handlers, only enable overallocation if the replacement string is longer than 1 character * CJK decoders don't use overallocation anymore * Set min_length, instead of preallocating memory using _PyUnicodeWriter_Prepare(), in many decoders * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
* Close #17693: Rewrite CJK decoders to use the _PyUnicodeWriter API instead ofVictor Stinner2013-04-111-0/+7
| | | | | | the legacy Py_UNICODE API. Add also a new _PyUnicodeWriter_WriteChar() function.
* Add _PyUnicodeWriter_WriteSubstring() functionVictor Stinner2013-04-021-0/+9
| | | | | | | | | Write a function to enable more optimizations: * If the substring is the whole string and overallocation is disabled, just keep a reference to the string, don't copy characters * Avoid a call to the expensive _PyUnicode_FindMaxChar() function when possible
* Issue #16455: On FreeBSD and Solaris, if the locale is C, theVictor Stinner2012-12-041-1/+1
| | | | | | | ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.
* Issue #16330: Use surrogate-related macrosVictor Stinner2012-10-301-3/+3
| | | | Patch written by Serhiy Storchaka.
* Issue #16147: Rewrite PyUnicode_FromFormatV() to use _PyUnicodeWriter APIVictor Stinner2012-10-061-1/+17
| | | | | | | | | | | | | | | | | | | | * Simplify the code: replace 4 steps with one unique step using the _PyUnicodeWriter API. PyUnicode_Format() has the same design. It avoids to store intermediate results which require to allocate an array of pointers on the heap. * Use the _PyUnicodeWriter API for speed (and its convinient API): overallocate the buffer to reduce the number of "realloc()" * Implement "width" and "precision" in Python, don't rely on sprintf(). It avoids to need of a temporary buffer allocated on the heap: only use a small buffer allocated in the stack. * Add _PyUnicodeWriter_WriteCstr() function * Split PyUnicode_FromFormatV() into two functions: add unicode_fromformat_arg(). * Inline parse_format_flags(): the format of an argument is now only parsed once, it's no more needed to have a subfunction. * Optimize PyUnicode_FromFormatV() for characters between two "%" arguments: search the next "%" and copy the substring in one chunk, instead of copying character per character.
* #16127: merge with 3.3.Ezio Melotti2012-10-051-2/+1
|\
| * #16127: remove outdated references to narrow builds. Patch by Serhiy Storchaka.Ezio Melotti2012-10-051-2/+1
| |
* | Enable also ptr==ptr optimization in PyUnicode_Compare()Victor Stinner2012-10-041-1/+2
|/ | | | It was already implemented in PyUnicode_RichCompare()
* Issue #15026: utf-16 encoding is now significantly faster (up to 10x).Antoine Pitrou2012-06-151-2/+2
| | | | Patch by Serhiy Storchaka.
* Issue #14993: Use standard "unsigned char" instead of a unsigned char bitfieldVictor Stinner2012-06-041-6/+4
|
* Issue #14744: Use the new _PyUnicodeWriter internal API to speed up str%args ↵Victor Stinner2012-05-291-6/+89
| | | | | | | | | | | | | | | | | and str.format(args) * Formatting string, int, float and complex use the _PyUnicodeWriter API. It avoids a temporary buffer in most cases. * Add _PyUnicodeWriter_WriteStr() to restore the PyAccu optimization: just keep a reference to the string if the output is only composed of one string * Disable overallocation when formatting the last argument of str%args and str.format(args) * Overallocation allocates at least 100 characters: add min_length attribute to the _PyUnicodeWriter structure * Add new private functions: _PyUnicode_FastCopyCharacters(), _PyUnicode_FastFill() and _PyUnicode_FromASCII() The speed up is around 20% in average.
* Close #14648: Compute correctly maxchar in str.format() for substrinVictor Stinner2012-04-231-0/+9
|
* Close #14085: remove assertions from PyUnicode_WRITE macroVictor Stinner2012-03-041-3/+0
| | | | | Add checks in PyUnicode_WriteChar() and convert PyUnicode_New() assertion to a test raising a Python exception.
* Issue #13706: Fix format(int, "n") for locale with non-ASCII thousands separatorVictor Stinner2012-02-231-15/+3
| | | | | | | | | | | * Decode thousands separator and decimal point using PyUnicode_DecodeLocale() (from the locale encoding), instead of decoding them implicitly from latin1 * Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used * Change _PyUnicode_InsertThousandsGrouping() API to return the maximum character if unicode is NULL * Replace MIN/MAX macros by Py_MIN/Py_MAX * stringlib/undef.h undefines STRINGLIB_IS_UNICODE * stringlib/localeutil.h only supports Unicode
* Issue #13706: Add assertions to detect bugs earlierVictor Stinner2012-01-311-0/+3
|
* Issue #13848: open() and the FileIO constructor now check for NUL characters ↵Antoine Pitrou2012-01-291-0/+6
|\ | | | | | | | | | | in the file name. Patch by Hynek Schlawack.
| * Issue #13848: open() and the FileIO constructor now check for NUL characters ↵Antoine Pitrou2012-01-291-0/+6
| | | | | | | | | | | | in the file name. Patch by Hynek Schlawack.
* | use the static identifier api for looking up special methodsBenjamin Peterson2012-01-221-29/+0
| | | | | | | | | | I had to move the static identifier code from unicodeobject.h to object.h in order for this to work.
* | add str.casefold() (closes #13752)Benjamin Peterson2012-01-141-0/+5
| |
* | Silence compilation warnings on WindowsAmaury Forgeot d'Arc2012-01-131-2/+2
| |
* | use full unicode mappings for upper/lower/title case (#12736)Benjamin Peterson2012-01-111-0/+23
| | | | | | | | Also broaden the category of characters that count as lowercase/uppercase.
* | Add a new PyUnicode_Fill() functionVictor Stinner2012-01-031-3/+20
| | | | | | | | | | It is faster than the unicode_fill() function which was implemented in formatter_unicode.c.
* | fix PyCompactUnicodeObject doc (test)Victor Stinner2011-12-221-1/+1
| |
* | backout 7876cd49300d: Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enumVictor Stinner2011-12-191-4/+3
| |
* | Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enumVictor Stinner2011-12-171-3/+4
| |
* | Issue #13560: Locale codec functions use the classic "errors" parameter,Victor Stinner2011-12-171-3/+3
| | | | | | | | | | | | instead of surrogateescape So it would be possible to support more error handlers later.
* | Issue #13560: Add PyUnicode_EncodeLocale()Victor Stinner2011-12-171-1/+11
| | | | | | | | | | | | * Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not available * Document my last changes in Misc/NEWS
* | Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()Victor Stinner2011-12-161-0/+22
| | | | | | | | | | | | | | | | | | | | | | * PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string from the current locale encoding * _Py_char2wchar() writes an "error code" in the size argument to indicate if the function failed because of memory allocation failure or because of a decoding error. The function doesn't write the error message directly to stderr. * Fix time.strftime() (if wcsftime() is missing): decode strftime() result from the current locale encoding, not from the filesystem encoding.
* | PyUnicode_Resize(): warn about canonical representationVictor Stinner2011-12-121-1/+4
| | | | | | | | Call also directly unicode_resize() in unicodeobject.c
* | Fix PyUnicode_Resize() for compact string: leave the string unchanged on errorVictor Stinner2011-12-121-8/+5
| | | | | | | | Fix also PyUnicode_Resize() doc
* | Make PyUnicode_Copy() private => _PyUnicode_Copy()Victor Stinner2011-12-121-1/+3
| | | | | | | | | | | | Undocument the function. Make also decode_utf8_errors() as private (static).
* | resize_copy() now supports legacy ready stringsVictor Stinner2011-12-111-0/+4
| |
* | PyUnicode_IS_ASCII() macro ensures that the string is readyVictor Stinner2011-12-121-5/+7
| | | | | | | | It has no sense to check if a not ready string is ASCII or not.
* | Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macrosVictor Stinner2011-11-291-0/+4
| | | | | | | | And use surrogates macros everywhere in unicodeobject.c
* | PyUnicode_GET_SIZE() checks that PyUnicode_AsUnicode() succeedVictor Stinner2011-11-211-6/+7
| | | | | | | | using an assertion
* | _PyUnicode_CheckConsistency() also checks maxchar maximum value,Victor Stinner2011-11-201-5/+8
| | | | | | | | not only its minimum value
* | Fix PyUnicode_CopyCharacters() docVictor Stinner2011-11-201-2/+1
| |
* | Ensure that Py_UCS4 is 32 bits and Py_UCS2 is 16 bitsVictor Stinner2011-11-201-2/+7
| |
* | Fix misused of "PyUnicodeObject" structure name in unicodeobject.hVictor Stinner2011-11-161-2/+2
| |
* | Port encoders from Py_UNICODE API to unicode object API.Martin v. Löwis2011-11-101-0/+16
| |
* | Make _PyUnicode_FromId return borrowed references.Martin v. Löwis2011-11-071-1/+1
| | | | | | | | http://mail.python.org/pipermail/python-dev/2011-November/114347.html
* | Fix gdb/libpython.py for not ready Unicode stringsVictor Stinner2011-11-041-3/+5
| | | | | | | | | | _PyUnicode_CheckConsistency() checks also hash and length value for not ready Unicode strings.
* | Replace PyUnicodeObject type by PyObjectVictor Stinner2011-11-031-2/+1
| | | | | | | | | | * _PyUnicode_CheckConsistency() now takes a PyObject* instead of void* * Remove now useless casts to PyObject*
* | Port UCS1 and charmap codecs to new API.Martin v. Löwis2011-11-021-0/+6
| |
* | Drop Py_UCS4_ functions. Closes #13246.Martin v. Löwis2011-10-311-37/+0
| |
* | Replace PyUnicodeObject* by PyObject* where it was irrevelantVictor Stinner2011-10-231-1/+1
| | | | | | | | | | | | A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to PyUnicodeObject* is wrong