summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* remove (un)transform methodsBenjamin Peterson2010-12-121-44/+1
|
* Issue #10557: Fixed error messages from float() and other numericAlexander Belopolsky2010-12-041-2/+33
| | | | | | types. Added a new API function, PyUnicode_TransformDecimalToASCII(), which transforms non-ASCII decimal digits in a Unicode string to their ASCII equivalents.
* Merge branches/pep-0384.Martin v. Löwis2010-12-031-3/+3
|
* Remove redundant check for PyBytes in unicode_encode.Georg Brandl2010-12-031-17/+1
|
* #7475: add (un)transform method to bytes/bytearray and str, add back codecs ↵Georg Brandl2010-12-021-1/+45
| | | | that can be used with them from Python 2.
* Remove redundant includes of headers that are already included by Python.h.Georg Brandl2010-11-301-1/+0
|
* PyUnicode_DecodeFSDefaultAndSize() raises MemoryError if _Py_char2wchar() failsVictor Stinner2010-11-081-1/+1
|
* PyUnicode_EncodeFS() raises an exception if _Py_wchar2char() failsVictor Stinner2010-11-081-3/+20
| | | | | | * Add error_pos optional argument to _Py_wchar2char() * PyUnicode_EncodeFS() raises a UnicodeEncodeError or MemoryError if _Py_wchar2char() fails
* str, bytes, bytearray docstring: remove unnecessary [...]Victor Stinner2010-11-071-1/+1
|
* Fix encode/decode method doc of str, bytes, bytearray typesVictor Stinner2010-11-071-3/+3
| | | | | | * Specify the default encoding: write 'utf-8' instead of sys.getdefaultencoding(), because the default encoding is now constant * Specify the default errors value
* Added more to docstrings for str.format, format_map, and __format__.Eric Smith2010-11-061-3/+5
|
* Issue #10288: The deprecated family of "char"-handling macrosDavid Malcolm2010-11-051-10/+10
| | | | | (ISLOWER()/ISUPPER()/etc) have now been removed: use Py_ISLOWER() etc instead.
* Issue #6081: Add str.format_map. str.format_map(mapping) is similar to ↵Eric Smith2010-11-041-0/+6
| | | | str.format(**mapping), except mapping does not get converted to a dict.
* Simplify PyUnicode_Encode/DecodeFSDefault on Windows/Mac OS XVictor Stinner2010-10-271-16/+18
| | | | | * Windows always uses mbcs * Mac OS X always uses utf-8
* Issue #4388: On Mac OS X, decode command line arguments from UTF-8, instead ofVictor Stinner2010-10-201-0/+114
| | | | | | | the locale encoding. If the LANG (and LC_ALL and LC_CTYPE) environment variable is not set, the locale encoding is ISO-8859-1, whereas most programs (including Python) expect UTF-8. Python already uses UTF-8 for the filesystem encoding and to encode command line arguments on this OS.
* PyUnicode_FromFormatV(): Fix %A formatVictor Stinner2010-10-181-0/+1
| | | | It was not completly implemented. Add a test.
* make hashes always the size of pointers; introduce Py_hash_t #9778Benjamin Peterson2010-10-171-2/+2
|
* Add an optional size argument to _Py_char2wchar()Victor Stinner2010-10-161-2/+3
| | | | | | _Py_char2wchar() callers usually need the result size in characters. Since it's trivial to compute it in _Py_char2wchar() (O(1) whereas wcslen() is O(n)), add an option to get it.
* Use locale encoding if Py_FileSystemDefaultEncoding is not setVictor Stinner2010-10-151-8/+32
| | | | | | | | * PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if Py_FileSystemDefaultEncoding is NULL * redecode_filenames() functions and _Py_code_object_list (issue #9630) are no more needed: remove them
* #9418: first step of moving private string methods to _string module.Georg Brandl2010-10-141-2/+30
|
* PyUnicode_AsWideCharString() takes a PyObject*, not a PyUnicodeObject*Victor Stinner2010-10-071-3/+3
| | | | | All unicode functions uses PyObject* except PyUnicode_AsWideChar(). Fix the prototype for the new function PyUnicode_AsWideCharString().
* Issue #8670: PyUnicode_AsWideChar() and PyUnicode_AsWideCharString() replaceVictor Stinner2010-10-021-22/+105
| | | | | UTF-16 surrogate pairs by single non-BMP characters for 16 bits Py_UNICODE and 32 bits wchar_t (eg. Linux in narrow build).
* Issue #8870: PyUnicode_AsWideCharString() doesn't count the trailing nul ↵Victor Stinner2010-10-021-1/+1
| | | | | | character And write unit tests for PyUnicode_AsWideChar() and PyUnicode_AsWideCharString().
* Fix PyUnicode_AsWideCharString(): set *size if size is not NULLVictor Stinner2010-09-291-0/+2
|
* Issue #9630: Redecode filenames when setting the filesystem encodingVictor Stinner2010-09-291-1/+7
| | | | | | | | | | | | | | Redecode the filenames of: - all modules: __file__ and __path__ attributes - all code objects: co_filename attribute - sys.path - sys.meta_path - sys.executable - sys.path_importer_cache (keys) Keep weak references to all code objects until initfsencoding() is called, to be able to redecode co_filename attribute of all code objects.
* Issue #9979: Create function PyUnicode_AsWideCharString().Victor Stinner2010-09-291-14/+48
|
* use return NULL; it's just as correctBenjamin Peterson2010-09-121-1/+1
|
* Issue #9738, #9836: Fix refleak introduced by r84704Victor Stinner2010-09-121-2/+2
|
* detect non-ascii characters much earlier (plugs ref leak)Benjamin Peterson2010-09-121-7/+7
|
* Issue #9738: PyUnicode_FromFormat() and PyErr_Format() raise an error onVictor Stinner2010-09-111-1/+9
| | | | | | a non-ASCII byte in the format string. Document also the encoding.
* Rename PyUnicode_strdup() to PyUnicode_AsUnicodeCopy()Victor Stinner2010-09-031-1/+1
|
* Create PyUnicode_strdup() functionVictor Stinner2010-09-011-0/+22
|
* Create Py_UNICODE_strcat() functionVictor Stinner2010-09-011-0/+9
|
* Remove unicode_default_encoding constantVictor Stinner2010-09-011-10/+1
| | | | | Inline its value in PyUnicode_GetDefaultEncoding(). The comment is now outdated (we will not change its value anymore).
* Issue #9549: sys.setdefaultencoding() and PyUnicode_SetDefaultEncoding()Antoine Pitrou2010-09-011-11/+0
| | | | | are now removed, since their effect was inexistent in 3.x (the default encoding is hardcoded to utf-8 and cannot be changed).
* Issue #7415: PyUnicode_FromEncodedObject() now uses the new buffer APIAntoine Pitrou2010-09-011-27/+26
| | | | properly. Patch by Stefan Behnel.
* Issue 8781: On systems a signed 4-byte wchar_t and a 4-byte Py_UNICODE, use ↵Daniel Stutzbach2010-08-241-2/+2
| | | | memcpy to convert between the two (as already done when wchar_t is unsigned)
* Fix PyUnicode_EncodeFSDefault() indentationVictor Stinner2010-08-181-2/+2
|
* Issue #9425: Create Py_UNICODE_strncmp() functionVictor Stinner2010-08-161-0/+17
| | | | | The code is based on strncmp() of the libiberty library, function in the public domain.
* Issue #9542: Create PyUnicode_FSDecoder() functionVictor Stinner2010-08-131-3/+41
| | | | | | | | | | | | It's a ParseTuple converter: decode bytes objects to unicode using PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is. * Don't specify surrogateescape error handler in the comments nor the documentation, but PyUnicode_DecodeFSDefaultAndSize() and PyUnicode_EncodeFSDefault() because these functions use strict error handler for the mbcs encoding (on Windows). * Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid inconsistency with unicodeobject.h.
* Issue #9425: Create PyErr_WarnFormat() functionVictor Stinner2010-08-131-7/+8
| | | | | | | Similar to PyErr_WarnEx() but use PyUnicode_FromFormatV() to format the warning message. Strip also some trailing spaces.
* Issue #2443: Added a new macro, Py_VA_COPY, which is equivalent to C99Alexander Belopolsky2010-08-111-9/+1
| | | | | va_copy, but available on all python platforms. Untabified a few unrelated files.
* Issue #9425: create Py_UNICODE_strrchr() functionVictor Stinner2010-08-101-0/+13
|
* Revert r83395, it introduces test failures and is not necessary anyway since ↵Georg Brandl2010-08-011-2/+2
| | | | we now have to nul-terminate the string anyway.
* #8821: do not rely on Unicode strings being terminated with a \u0000, rather ↵Georg Brandl2010-08-011-2/+2
| | | | explicitly check range before looking for a second surrogate character.
* Use Py_CLEAR().Georg Brandl2010-07-291-4/+2
|
* Sub-issue of #9036: Fix incorrect use of Py_CHARMASK.Stefan Krah2010-07-191-1/+1
|
* Fix the docstrings of the capitalize method.Senthil Kumaran2010-07-051-1/+1
|
* Update comment about surrogates.Ezio Melotti2010-07-031-5/+5
|
* Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.Ezio Melotti2010-07-011-56/+56
| | | | | | | | | | | | | 1) #8271: when a byte sequence is invalid, only the start byte and all the valid continuation bytes are now replaced by U+FFFD, instead of replacing the number of bytes specified by the start byte. See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95); 2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes in behavior); 3) Change the error messages "unexpected code byte" to "invalid start byte" and "invalid data" to "invalid continuation byte"; 4) Add an extensive set of tests in test_unicode; 5) Fix test_codeccallbacks because it was failing after this change.