summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge p3yk branch with the trunk up to revision 45595. This breaks a fairThomas Wouters2006-04-211-59/+64
| | | | | | | | | | | | | | | | | | | | number of tests, all because of the codecs/_multibytecodecs issue described here (it's not a Py3K issue, just something Py3K discovers): http://mail.python.org/pipermail/python-dev/2006-April/064051.html Hye-Shik Chang promised to look for a fix, so no need to fix it here. The tests that are expected to break are: test_codecencodings_cn test_codecencodings_hk test_codecencodings_jp test_codecencodings_kr test_codecencodings_tw test_codecs test_multibytecodec This merge fixes an actual test failure (test_weakref) in this branch, though, so I believe merging is the right thing to do anyway.
* Get rid of remnants of integer divisionNeal Norwitz2006-03-241-1/+0
|
* - Reindent a confusingly indented piece of code (no intended code changesThomas Wouters2006-03-121-13/+16
| | | | | | | | | there) - Add missing DECREFs of inner-scope 'temp' variable - Add various missing DECREFs by changing 'return NULL' into 'goto onError' - Avoid double DECREF when last _PyUnicode_Resize() fails Coverity found one of the missing DECREFs, but oddly enough not the others.
* Update Unicode database to Unicode 4.1.Martin v. Löwis2006-03-091-1/+1
|
* Checking in the code for PEP 357.Guido van Rossum2006-03-071-2/+5
| | | | | | This was mostly written by Travis Oliphant. I've inspected it all; Neal Norwitz and MvL have also looked at it (in an earlier incarnation).
* SF #1444030: Fix several potential defects found by Coverity.Hye-Shik Chang2006-03-071-8/+14
| | | | (reviewed by Neal Norwitz)
* Revert backwards-incompatible const changes.Martin v. Löwis2006-02-271-1/+1
|
* Use correct PyArg_Parse format char for Py_ssize_t in unicode.center().Thomas Wouters2006-02-161-1/+1
| | | | | | | | | | | Fixes: >>> u"".center(10) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError on 64-bit systems.
* Use Py_ssize_t for counts and sizes.Martin v. Löwis2006-02-161-1/+1
| | | | Convert Py_ssize_t using PyInt_FromSsize_t
* Support %zd in PyErr_Format and PyString_FromFormat.Martin v. Löwis2006-02-161-6/+3
|
* doubletounicode(), longtounicode():Tim Peters2006-02-161-4/+8
| | | | | | | | Py_SAFE_DOWNCAST can evaluate its first argument multiple times in a debug build. This caused two distinct assert- failures in test_unicode run under a debug build. Rewrote the code in trivial ways so that multiple evaluation of the first argument doesn't hurt.
* Remove two unused Py_ssize_t variables (merge glitches, looks like.)Thomas Wouters2006-02-151-2/+0
|
* Merge ssize_t branch.Martin v. Löwis2006-02-151-287/+299
|
* - Patch #1400181, fix unicode string formatting to not use the locale.Neal Norwitz2006-01-101-16/+21
| | | | | | | | | | | | | | | | This is how string objects work. u'%f' could use , instead of . for the decimal point. Now both strings and unicode always use periods. This is the code that would break: import locale locale.setlocale(locale.LC_NUMERIC, 'de_DE') u'%.1f' % 1.0 assert '1.0' == u'%.1f' % 1.0 I couldn't create a test case which fails, but this fixes the problem. Will backport.
* Fix icc warnings: remove (sometimes) unused variable conditionallyNeal Norwitz2006-01-081-2/+4
|
* Stop maintaining the buildno file.Martin v. Löwis2006-01-051-12/+15
| | | | Also, stop determining Unicode sizes with PyString_GET_SIZE.
* Bug #1379994: Fix *unicode_escape codecs to encode r'\' as r'\\'Hye-Shik Chang2005-12-171-3/+3
| | | | just like string codecs.
* Add const to several API functions that take char *.Jeremy Hylton2005-12-101-1/+1
| | | | | | | | | | | | | | | | | | | In C++, it's an error to pass a string literal to a char* function without a const_cast(). Rather than require every C++ extension module to put a cast around string literals, fix the API to state the const-ness. I focused on parts of the API where people usually pass literals: PyArg_ParseTuple() and friends, Py_BuildValue(), PyMethodDef, the type slots, etc. Predictably, there were a large set of functions that needed to be fixed as a result of these changes. The most pervasive change was to make the keyword args list passed to PyArg_ParseTupleAndKewords() to be a const char *kwlist[]. One cast was required as a result of the changes: A type object mallocs the memory for its tp_doc slot and later frees it. PyTypeObject says that tp_doc is const char *; but if the type was created by type_new(), we know it is safe to cast to char *.
* Fix leaked reference to None.Walter Dörwald2005-11-281-0/+1
|
* Another comment typo fixAndrew M. Kuchling2005-11-021-1/+1
|
* Fix typo in comment.Walter Dörwald2005-11-021-1/+1
|
* fix typos, mostly in commentsFred Drake2005-10-281-2/+2
|
* Fix bug:Michael W. Hudson2005-10-211-4/+0
| | | | | | | | [ 1327110 ] wrong TypeError traceback in generator expressions by removing the code that can stomp on the users' TypeError raised by the iterable argument to ''.join() -- PySequence_Fast (now?) gives a perfectly reasonable message itself. Also, a couple of tests.
* Whitespace corrections.Marc-André Lemburg2005-10-191-19/+19
|
* Bug fix for [ 1331062 ] utf 7 codec broken.Marc-André Lemburg2005-10-191-8/+16
| | | | Backport candidate.
* Part of SF patch #1313939: Speedup charmap decoding by extendingWalter Dörwald2005-10-061-75/+107
| | | | | | | PyUnicode_DecodeCharmap() the accept a unicode string as the mapping argument which is used as a mapping table. This code isn't used by any of the codecs yet.
* SF bug #1251300: On UCS-4 builds the "unicode-internal" codec will now complainWalter Dörwald2005-08-301-0/+75
| | | | | about illegal code points. The codec now supports PEP 293 style error handlers. (This is a variant of the Nik Haldimann's patch that detects truncated data)
* Correct the handling of 0-termination of PyUnicode_AsWideChar()Marc-André Lemburg2004-11-221-1/+7
| | | | | | | | and its usage in PyLocale_strcoll(). Clarify the documentation on this. Thanks to Andreas Degert for pointing this out.
* Applied patch for [ 1047269 ] Buffer overwrite in PyUnicode_AsWideChar.Marc-André Lemburg2004-10-151-2/+2
| | | | Python 2.3.x candidate.
* Initialize sep and seplen to suppress warning from gcc.Skip Montanaro2004-09-161-3/+3
|
* Add a missing line continuation character.Thomas Heller2004-09-151-1/+1
|
* Make the hint about the None default less ambiguous.Walter Dörwald2004-09-141-1/+1
|
* Enhance the docstrings for unicode.split() and string.split()Walter Dörwald2004-09-141-2/+2
| | | | | to make it clear that it is possible to pass None as the separator argument to get the default "any whitespace" separator.
* SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now supportWalter Dörwald2004-09-071-23/+57
| | | | | | | | | | | decoding incomplete input (when the input stream is temporarily exhausted). codecs.StreamReader now implements buffering, which enables proper readline support for the UTF-16 decoders. codecs.StreamReader.read() has a new argument chars which specifies the number of characters to return. codecs.StreamReader.readline() and codecs.StreamReader.readlines() have a new argument keepends. Trailing "\n"s will be stripped from the lines if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and PyUnicode_DecodeUTF16Stateful.
* PyUnicode_Join(): Bozo Alert. While this is chugging along, it mayTim Peters2004-08-271-0/+12
| | | | | | | | | need to convert str objects from the iterable to unicode. So, if someone set the system default encoding to something nasty enough, the conversion process could mutate the input iterable as a side effect, and PySequence_Fast doesn't hide that from us if the input was a list. IOW, can't assume the size of PySequence_Fast's result is invariant across PyUnicode_FromObject() calls.
* PyUnicode_Join(): Rewrote to use PySequence_Fast(). This doesn't doTim Peters2004-08-271-126/+96
| | | | | | | | much to reduce the size of the code, but greatly improves its clarity. It's also quicker in what's probably the most common case (the argument iterable is a list). Against it, if the iterable isn't a list or a tuple, a temp tuple is materialized containing the entire input sequence, and that's a bigger temp memory burden. Yawn.
* PyUnicode_Join(): Missed a spot where I intended a cast from size_t toTim Peters2004-08-271-1/+1
| | | | | int. I sure wish MS would gripe about that! Whatever, note that the statement above it guarantees that the cast loses no info.
* PyUnicode_Join(): Two primary aims:Tim Peters2004-08-271-40/+120
| | | | | | | | 1. u1.join([u2]) is u2 2. Be more careful about C-level int overflow. Since PySequence_Fast() isn't needed to achieve #1, it's not used -- but the code could sure be simpler if it were.
* SF #989185: Drop unicode.iswide() and unicode.width() and addHye-Shik Chang2004-08-041-67/+0
| | | | | | | | | | | | unicodedata.east_asian_width(). You can still implement your own simple width() function using it like this: def width(u): w = 0 for c in unicodedata.normalize('NFC', u): cwidth = unicodedata.east_asian_width(c) if cwidth in ('W', 'F'): w += 2 else: w += 1 return w
* Let u'%s' % obj try obj.__unicode__() first and fallback to obj.__str__().Marc-André Lemburg2004-07-231-10/+12
|
* Moved SunPro warning suppression into pyport.h and out of individualNicholas Bastin2004-07-151-4/+0
| | | | modules and objects.
* Fix a copy&paste typo.Marc-André Lemburg2004-07-101-1/+1
|
* .encode()/.decode() patch part 2.Marc-André Lemburg2004-07-081-0/+10
|
* Allow string and unicode return types from .encode()/.decode()Marc-André Lemburg2004-07-081-5/+95
| | | | | methods on string and unicode objects. Added unicode.decode() which was missing for no apparent reason.
* Fixed end-of-loop code not reached warning when using SunPro CNicholas Bastin2004-06-171-0/+4
|
* - SF #962502: Add two more methods for unicode type; width() andHye-Shik Chang2004-06-021-1/+68
| | | | | | | iswide() for east asian width manipulation. (Inspired by David Goodger, Reviewed by Martin v. Loewis) - Move _PyUnicode_TypeRecord.flags to the end of the struct so that no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
* SF Patch #926375: Remove a useless UTF-16 support code that is neverHye-Shik Chang2004-04-061-18/+3
| | | | been used. (Suggested by Martin v. Loewis)
* Fix reallocation bug in unicode.translate(): The code was comparingWalter Dörwald2004-02-051-1/+1
| | | | characters instead of character pointers to determine space requirements.
* Cosmetic fix for wrongly indented tabs with ts=4.Hye-Shik Chang2004-01-031-8/+8
|
* Fix unicode.rsplit()'s bug that ignores separater on the end of string whenHye-Shik Chang2003-12-231-1/+1
| | | | using specialized splitter for 1 char sep.