summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* Bug #1379994: Fix *unicode_escape codecs to encode r'\' as r'\\'Hye-Shik Chang2005-12-171-3/+3
| | | | just like string codecs.
* Add const to several API functions that take char *.Jeremy Hylton2005-12-101-1/+1
| | | | | | | | | | | | | | | | | | | In C++, it's an error to pass a string literal to a char* function without a const_cast(). Rather than require every C++ extension module to put a cast around string literals, fix the API to state the const-ness. I focused on parts of the API where people usually pass literals: PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type slots, etc. Predictably, there were a large set of functions that needed to be fixed as a result of these changes. The most pervasive change was to make the keyword args list passed to PyArg_ParseTupleAndKewords() to be a const char *kwlist[]. One cast was required as a result of the changes: A type object mallocs the memory for its tp_doc slot and later frees it. PyTypeObject says that tp_doc is const char *; but if the type was created by type_new(), we know it is safe to cast to char *.
* Fix leaked reference to None.Walter Dörwald2005-11-281-0/+1
|
* Another comment typo fixAndrew M. Kuchling2005-11-021-1/+1
|
* Fix typo in comment.Walter Dörwald2005-11-021-1/+1
|
* fix typos, mostly in commentsFred Drake2005-10-281-2/+2
|
* Fix bug:Michael W. Hudson2005-10-211-4/+0
| | | | | | | | [ 1327110 ] wrong TypeError traceback in generator expressions by removing the code that can stomp on the users' TypeError raised by the iterable argument to ''.join() -- PySequence_Fast (now?) gives a perfectly reasonable message itself. Also, a couple of tests.
* Whitespace corrections.Marc-André Lemburg2005-10-191-19/+19
|
* Bug fix for [ 1331062 ] utf 7 codec broken.Marc-André Lemburg2005-10-191-8/+16
| | | | Backport candidate.
* Part of SF patch #1313939: Speedup charmap decoding by extendingWalter Dörwald2005-10-061-75/+107
| | | | | | | PyUnicode_DecodeCharmap() the accept a unicode string as the mapping argument which is used as a mapping table. This code isn't used by any of the codecs yet.
* SF bug #1251300: On UCS-4 builds the "unicode-internal" codec will now complainWalter Dörwald2005-08-301-0/+75
| | | | | about illegal code points. The codec now supports PEP 293 style error handlers. (This is a variant of the Nik Haldimann's patch that detects truncated data)
* Correct the handling of 0-termination of PyUnicode_AsWideChar()Marc-André Lemburg2004-11-221-1/+7
| | | | | | | | and its usage in PyLocale_strcoll(). Clarify the documentation on this. Thanks to Andreas Degert for pointing this out.
* Applied patch for [ 1047269 ] Buffer overwrite in PyUnicode_AsWideChar.Marc-André Lemburg2004-10-151-2/+2
| | | | Python 2.3.x candidate.
* Initialize sep and seplen to suppress warning from gcc.Skip Montanaro2004-09-161-3/+3
|
* Add a missing line continuation character.Thomas Heller2004-09-151-1/+1
|
* Make the hint about the None default less ambiguous.Walter Dörwald2004-09-141-1/+1
|
* Enhance the docstrings for unicode.split() and string.split()Walter Dörwald2004-09-141-2/+2
| | | | | to make it clear that it is possible to pass None as the separator argument to get the default "any whitespace" separator.
* SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now supportWalter Dörwald2004-09-071-23/+57
| | | | | | | | | | | decoding incomplete input (when the input stream is temporarily exhausted). codecs.StreamReader now implements buffering, which enables proper readline support for the UTF-16 decoders. codecs.StreamReader.read() has a new argument chars which specifies the number of characters to return. codecs.StreamReader.readline() and codecs.StreamReader.readlines() have a new argument keepends. Trailing "\n"s will be stripped from the lines if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and PyUnicode_DecodeUTF16Stateful.
* PyUnicode_Join(): Bozo Alert. While this is chugging along, it mayTim Peters2004-08-271-0/+12
| | | | | | | | | need to convert str objects from the iterable to unicode. So, if someone set the system default encoding to something nasty enough, the conversion process could mutate the input iterable as a side effect, and PySequence_Fast doesn't hide that from us if the input was a list. IOW, can't assume the size of PySequence_Fast's result is invariant across PyUnicode_FromObject() calls.
* PyUnicode_Join(): Rewrote to use PySequence_Fast(). This doesn't doTim Peters2004-08-271-126/+96
| | | | | | | | much to reduce the size of the code, but greatly improves its clarity. It's also quicker in what's probably the most common case (the argument iterable is a list). Against it, if the iterable isn't a list or a tuple, a temp tuple is materialized containing the entire input sequence, and that's a bigger temp memory burden. Yawn.
* PyUnicode_Join(): Missed a spot where I intended a cast from size_t toTim Peters2004-08-271-1/+1
| | | | | int. I sure wish MS would gripe about that! Whatever, note that the statement above it guarantees that the cast loses no info.
* PyUnicode_Join(): Two primary aims:Tim Peters2004-08-271-40/+120
| | | | | | | | 1. u1.join([u2]) is u2 2. Be more careful about C-level int overflow. Since PySequence_Fast() isn't needed to achieve #1, it's not used -- but the code could sure be simpler if it were.
* SF #989185: Drop unicode.iswide() and unicode.width() and addHye-Shik Chang2004-08-041-67/+0
| | | | | | | | | | | | unicodedata.east_asian_width(). You can still implement your own simple width() function using it like this: def width(u): w = 0 for c in unicodedata.normalize('NFC', u): cwidth = unicodedata.east_asian_width(c) if cwidth in ('W', 'F'): w += 2 else: w += 1 return w
* Let u'%s' % obj try obj.__unicode__() first and fallback to obj.__str__().Marc-André Lemburg2004-07-231-10/+12
|
* Moved SunPro warning suppression into pyport.h and out of individualNicholas Bastin2004-07-151-4/+0
| | | | modules and objects.
* Fix a copy&paste typo.Marc-André Lemburg2004-07-101-1/+1
|
* .encode()/.decode() patch part 2.Marc-André Lemburg2004-07-081-0/+10
|
* Allow string and unicode return types from .encode()/.decode()Marc-André Lemburg2004-07-081-5/+95
| | | | | methods on string and unicode objects. Added unicode.decode() which was missing for no apparent reason.
* Fixed end-of-loop code not reached warning when using SunPro CNicholas Bastin2004-06-171-0/+4
|
* - SF #962502: Add two more methods for unicode type; width() andHye-Shik Chang2004-06-021-1/+68
| | | | | | | iswide() for east asian width manipulation. (Inspired by David Goodger, Reviewed by Martin v. Loewis) - Move _PyUnicode_TypeRecord.flags to the end of the struct so that no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
* SF Patch #926375: Remove a useless UTF-16 support code that is neverHye-Shik Chang2004-04-061-18/+3
| | | | been used. (Suggested by Martin v. Loewis)
* Fix reallocation bug in unicode.translate(): The code was comparingWalter Dörwald2004-02-051-1/+1
| | | | characters instead of character pointers to determine space requirements.
* Cosmetic fix for wrongly indented tabs with ts=4.Hye-Shik Chang2004-01-031-8/+8
|
* Fix unicode.rsplit()'s bug that ignores separater on the end of string whenHye-Shik Chang2003-12-231-1/+1
| | | | using specialized splitter for 1 char sep.
* Fix broken xmlcharrefreplace by rev 2.204.Hye-Shik Chang2003-12-221-3/+3
| | | | (Pointy hat goes to perky)
* SF #859573: Reduce compiler warnings on gcc 3.2 and above.Hye-Shik Chang2003-12-191-1/+14
|
* Add rsplit method for str and unicode builtin types.Hye-Shik Chang2003-12-151-1/+191
| | | | | SF feature request #801847. Original patch is written by Sean Reifschneider.
* - Removed FutureWarnings related to hex/oct literals and conversionsGuido van Rossum2003-11-291-17/+19
| | | | | | | | | | and left shifts. (Thanks to Kalle Svensson for SF patch 849227.) This addresses most of the remaining semantic changes promised by PEP 237, except for repr() of a long, which still shows the trailing 'L'. The PEP appears to promise warnings for operations that changed semantics compared to Python 2.3, but this is not implemented; we've suffered through enough warnings related to hex/oct literals and I think it's best to be silent now.
* Add optional fillchar argument to ljust(), rjust(), and center() string methods.Raymond Hettinger2003-11-261-13/+45
|
* Fix a bug in the memory reallocation code of PyUnicode_TranslateCharmap().Walter Dörwald2003-10-241-19/+20
| | | | | | | charmaptranslate_makespace() allocated more memory than required for the next replacement but didn't remember that fact, so memory size was growing exponentially every time a replacement string is longer that one character. This fixes SF bug #828737.
* Patch #825679: Clarify semantics of .isfoo on empty strings.Martin v. Löwis2003-10-181-10/+11
| | | | Backported to 2.3.
* Fix for SF bug [ 817156 ] invalid \U escape gives 0=length unistr.Jeremy Hylton2003-10-061-1/+1
|
* On c.l.py, Martin v. Löwis said that Py_UNICODE could be of a signed type,Tim Peters2003-09-161-137/+145
| | | | | | | so fiddle Jeremy's fix to live with that. Also added more comments. Bugfix candidate (this bug is in all versions of Python, at least since 2.1).
* Double-fix of crash in Unicode freelist handling.Jeremy Hylton2003-09-161-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | If a length-1 Unicode string was in the freelist and it was uninitialized or pointed to a very large (magnitude) negative number, the check unicode_latin1[unicode->str[0]] == unicode could cause a segmentation violation, e.g. unicode->str[0] is 0xcbcbcbcb. Fix this in two ways: 1. Change guard befor unicode_latin1[] to test against 256U. If I understand correctly, the unsigned long used to store UCS4 on my box was getting converted to a signed long to compare with the signed constant 256. 2. Change _PyUnicode_New() to make sure the first element of str is always initialized to zero. There are several places in the code where the caller can exit with an error before initializing any of str, which would leave junk in str[0]. Also, silence a compiler warning on pointer vs. int arithmetic. Bug fix candidate.
* Change checks of PyUnicode_Resize() return value for clarity.Jeremy Hylton2003-09-161-18/+17
| | | | | | | The unicode_resize() family only returns -1 or 0 so simply checking for != 0 is sufficient, but somewhat unclear. Many Python API functions return < 0 on error, reserving the right to return 0 or 1 on success. Change the call sites for consistency with these calls.
* SF bug #795506: Wrong handling of string format code for float values.Raymond Hettinger2003-08-271-0/+3
| | | | | | Adding missing support for '%F'. Will backport to 2.3.1.
* Fix refcounting leak in charmaptranslate_lookup()Walter Dörwald2003-08-151-0/+1
|
* Fix another refcounting leak in PyUnicode_EncodeCharmap().Walter Dörwald2003-08-151-1/+3
|
* Fix another refcounting leak (in PyUnicode_DecodeUnicodeEscape()).Walter Dörwald2003-08-151-0/+2
|
* Fix refcount leak in PyUnicode_EncodeCharmap(). The bug surfacesWalter Dörwald2003-08-141-3/+3
| | | | | | | | | | when an encoding error occurs and the callback name is unknown, i.e. when the callback has to be called. The problem was that the fact that the callback has already been looked up was only recorded in a local variable in charmap_encoding_error(), because charmap_encoding_error() got it's own copy of the errorHandler pointer instead of a pointer to the pointer in PyUnicode_EncodeCharmap().