summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* Update comment about surrogates.Ezio Melotti2010-07-031-5/+5
|
* Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.Ezio Melotti2010-07-011-56/+56
| | | | | | | | | | | | | 1) #8271: when a byte sequence is invalid, only the start byte and all the valid continuation bytes are now replaced by U+FFFD, instead of replacing the number of bytes specified by the start byte. See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95); 2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes in behavior); 3) Change the error messages "unexpected code byte" to "invalid start byte" and "invalid data" to "invalid continuation byte"; 4) Add an extensive set of tests in test_unicode; 5) Fix test_codeccallbacks because it was failing after this change.
* #9078: fix some Unicode C API descriptions, in comments and docs.Georg Brandl2010-06-271-1/+1
|
* Merged revisions 82248 via svnmerge fromEzio Melotti2010-06-261-1/+1
| | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r82248 | ezio.melotti | 2010-06-26 21:44:42 +0300 (Sat, 26 Jun 2010) | 1 line Fix extra space. ........
* Issue #850997: mbcs encoding (Windows only) handles errors argument: strictVictor Stinner2010-06-161-38/+125
| | | | | mode raises unicode errors. The encoder only supports "strict" and "replace" error handlers, the decoder only supports "strict" and "ignore" error handlers.
* Silence 'unused variable' gcc warning. Patch by Éric Araujo.Mark Dickinson2010-06-121-1/+2
|
* Issue #8969: On Windows, use mbcs codec in strict mode to encode and decodeVictor Stinner2010-06-111-4/+10
| | | | filenames and enable os.fsencode().
* Merged revisions 81907 via svnmerge fromAntoine Pitrou2010-06-111-19/+21
| | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash the interpreter with characters outside the Basic Multilingual Plane (higher than 0x10000). ........
* Fix r81869: ISO-8859-15 was seen as an alias to ISO-8859-1Victor Stinner2010-06-101-39/+45
| | | | Don't use normalize_encoding() result if it is truncated.
* Issue #8922: Normalize the encoding name in PyUnicode_AsEncodedString() toVictor Stinner2010-06-101-18/+31
| | | | | enable shortcuts for upper case encoding name. Add also a shortcut for "iso-8859-1" in PyUnicode_AsEncodedString() and PyUnicode_Decode().
* Issue #8715: Create PyUnicode_EncodeFSDefault() function: Encode a UnicodeVictor Stinner2010-05-151-3/+13
| | | | | | object to Py_FileSystemDefaultEncoding with the "surrogateescape" error handler, return a bytes object. If Py_FileSystemDefaultEncoding is not set, fall back to UTF-8.
* Enable shortcuts for common encodings in PyUnicode_AsEncodedString() for anyVictor Stinner2010-05-151-23/+31
| | | | error handler, not only the default error handler (strict)
* PyUnicode_DecodeFSDefaultAndSize() uses surrogateescape error handlerVictor Stinner2010-04-301-4/+4
| | | | | | This function is only used to decode Python module filenames, but Python doesn't support surrogates in modules filenames yet. So nobody noticed this minor bug.
* Simplify PyUnicode_FSConverter(): remove reference to PyByteArrayVictor Stinner2010-04-301-9/+3
| | | | PyByteArray is no more supported
* condense conditionBenjamin Peterson2010-04-251-4/+1
|
* Fix my previous commit (r80382) for wide build (unicodeobject.c)Victor Stinner2010-04-221-2/+3
|
* Issue #8092: Fix PyUnicode_EncodeUTF8() to support error handler producingVictor Stinner2010-04-221-47/+80
| | | | unicode string (eg. backslashreplace)
* Issue #8485: PyUnicode_FSConverter() doesn't accept bytearray object anymore,Victor Stinner2010-04-221-1/+1
| | | | you have to convert your bytearray filenames to bytes
* Merged revisions 79494,79496 via svnmerge fromFlorent Xicluna2010-03-301-3/+5
| | | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r79494 | florent.xicluna | 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) | 2 lines #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14. ........ r79496 | florent.xicluna | 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) | 2 lines Highlight the change of behavior related to r79494. Now VT and FF are linebreaks. ........
* Merged revisions 79278,79280 via svnmerge fromVictor Stinner2010-03-221-1/+1
| | | | | | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r79278 | victor.stinner | 2010-03-22 13:24:37 +0100 (lun., 22 mars 2010) | 2 lines Issue #1583863: An unicode subclass can now override the __str__ method ........ r79280 | victor.stinner | 2010-03-22 13:36:28 +0100 (lun., 22 mars 2010) | 5 lines Fix the NEWS about my last commit: an unicode subclass can now override the __unicode__ method (and not the __str__ method). Simplify also the testcase. ........
* Update a comment with more details.Gregory P. Smith2010-02-271-1/+2
|
* Merged revisions 77743 via svnmerge fromEzio Melotti2010-01-251-1/+1
| | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r77743 | ezio.melotti | 2010-01-25 13:24:37 +0200 (Mon, 25 Jan 2010) | 1 line #7775: fixed docstring for rpartition ........
* Merged revisions 77469-77470 via svnmerge fromAntoine Pitrou2010-01-131-3/+13
| | | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r77469 | antoine.pitrou | 2010-01-13 14:43:37 +0100 (mer., 13 janv. 2010) | 3 lines Test commit to try to diagnose failures of the IA-64 buildbot ........ r77470 | antoine.pitrou | 2010-01-13 15:01:26 +0100 (mer., 13 janv. 2010) | 3 lines Sanitize bloom filter macros ........
* Merged revisions 77463 via svnmerge fromAntoine Pitrou2010-01-131-1/+1
| | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r77463 | antoine.pitrou | 2010-01-13 09:55:20 +0100 (mer., 13 janv. 2010) | 3 lines Fix Windows build (re r77461) ........
* Merged revisions 77461 via svnmerge fromAntoine Pitrou2010-01-131-347/+73
| | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r77461 | antoine.pitrou | 2010-01-13 08:55:48 +0100 (mer., 13 janv. 2010) | 5 lines Issue #7622: Improve the split(), rsplit(), splitlines() and replace() methods of bytes, bytearray and unicode objects by using a common implementation based on stringlib's fast search. Patch by Florent Xicluna. ........
* Python strings ending with '\0' should not be equivalent to their C ↵Benjamin Peterson2010-01-091-0/+5
| | | | counterparts in PyUnicode_CompareWithASCIIString
* Merged revisions 76308 via svnmerge fromMark Dickinson2009-11-161-23/+90
| | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r76308 | mark.dickinson | 2009-11-15 16:18:58 +0000 (Sun, 15 Nov 2009) | 3 lines Issue #7228: Add '%lld' and '%llu' support to PyFormat_FromString, PyFormat_FromStringV and PyErr_Format. ........
* death to compiler warningBenjamin Peterson2009-11-101-0/+2
|
* Merged revisions ↵Georg Brandl2009-10-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 75365,75394,75402-75403,75418,75459,75484,75592-75596,75600,75602-75607,75610-75613,75616-75617,75623,75627,75640,75647,75696,75795 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r75365 | georg.brandl | 2009-10-11 22:16:16 +0200 (So, 11 Okt 2009) | 1 line Fix broken links found by "make linkcheck". scipy.org seems to be done right now, so I could not verify links going there. ........ r75394 | georg.brandl | 2009-10-13 20:10:59 +0200 (Di, 13 Okt 2009) | 1 line Fix markup. ........ r75402 | georg.brandl | 2009-10-14 17:51:48 +0200 (Mi, 14 Okt 2009) | 1 line #7125: fix typo. ........ r75403 | georg.brandl | 2009-10-14 17:57:46 +0200 (Mi, 14 Okt 2009) | 1 line #7126: os.environ changes *do* take effect in subprocesses started with os.system(). ........ r75418 | georg.brandl | 2009-10-14 20:48:32 +0200 (Mi, 14 Okt 2009) | 1 line #7116: str.join() takes an iterable. ........ r75459 | georg.brandl | 2009-10-17 10:57:43 +0200 (Sa, 17 Okt 2009) | 1 line Fix refleaks in _ctypes PyCSimpleType_New, which fixes the refleak seen in test___all__. ........ r75484 | georg.brandl | 2009-10-18 09:58:12 +0200 (So, 18 Okt 2009) | 1 line Fix missing word. ........ r75592 | georg.brandl | 2009-10-22 09:05:48 +0200 (Do, 22 Okt 2009) | 1 line Fix punctuation. ........ r75593 | georg.brandl | 2009-10-22 09:06:49 +0200 (Do, 22 Okt 2009) | 1 line Revert unintended change. ........ r75594 | georg.brandl | 2009-10-22 09:56:02 +0200 (Do, 22 Okt 2009) | 1 line Fix markup. ........ r75595 | georg.brandl | 2009-10-22 09:56:56 +0200 (Do, 22 Okt 2009) | 1 line Fix duplicate target. ........ r75596 | georg.brandl | 2009-10-22 10:05:04 +0200 (Do, 22 Okt 2009) | 1 line Add a new directive marking up implementation details and start using it. ........ r75600 | georg.brandl | 2009-10-22 13:01:46 +0200 (Do, 22 Okt 2009) | 1 line Make it more robust. ........ r75602 | georg.brandl | 2009-10-22 13:28:06 +0200 (Do, 22 Okt 2009) | 1 line Document new directive. ........ r75603 | georg.brandl | 2009-10-22 13:28:23 +0200 (Do, 22 Okt 2009) | 1 line Allow short form with text as argument. ........ r75604 | georg.brandl | 2009-10-22 13:36:50 +0200 (Do, 22 Okt 2009) | 1 line Fix stylesheet for multi-paragraph impl-details. ........ r75605 | georg.brandl | 2009-10-22 13:48:10 +0200 (Do, 22 Okt 2009) | 1 line Use "impl-detail" directive where applicable. ........ r75606 | georg.brandl | 2009-10-22 17:00:06 +0200 (Do, 22 Okt 2009) | 1 line #6324: membership test tries iteration via __iter__. ........ r75607 | georg.brandl | 2009-10-22 17:04:09 +0200 (Do, 22 Okt 2009) | 1 line #7088: document new functions in signal as Unix-only. ........ r75610 | georg.brandl | 2009-10-22 17:27:24 +0200 (Do, 22 Okt 2009) | 1 line Reorder __slots__ fine print and add a clarification. ........ r75611 | georg.brandl | 2009-10-22 17:42:32 +0200 (Do, 22 Okt 2009) | 1 line #7035: improve docs of the various <method>_errors() functions, and give them docstrings. ........ r75612 | georg.brandl | 2009-10-22 17:52:15 +0200 (Do, 22 Okt 2009) | 1 line #7156: document curses as Unix-only. ........ r75613 | georg.brandl | 2009-10-22 17:54:35 +0200 (Do, 22 Okt 2009) | 1 line #6977: getopt does not support optional option arguments. ........ r75616 | georg.brandl | 2009-10-22 18:17:05 +0200 (Do, 22 Okt 2009) | 1 line Add proper references. ........ r75617 | georg.brandl | 2009-10-22 18:20:55 +0200 (Do, 22 Okt 2009) | 1 line Make printout margin important. ........ r75623 | georg.brandl | 2009-10-23 10:14:44 +0200 (Fr, 23 Okt 2009) | 1 line #7188: fix optionxform() docs. ........ r75627 | fred.drake | 2009-10-23 15:04:51 +0200 (Fr, 23 Okt 2009) | 2 lines add further note about what's passed to optionxform ........ r75640 | neil.schemenauer | 2009-10-23 21:58:17 +0200 (Fr, 23 Okt 2009) | 2 lines Improve some docstrings in the 'warnings' module. ........ r75647 | georg.brandl | 2009-10-24 12:04:19 +0200 (Sa, 24 Okt 2009) | 1 line Fix markup. ........ r75696 | georg.brandl | 2009-10-25 21:25:43 +0100 (So, 25 Okt 2009) | 1 line Fix a demo. ........ r75795 | georg.brandl | 2009-10-27 16:10:22 +0100 (Di, 27 Okt 2009) | 1 line Fix a strange mis-edit. ........
* kill merged lineBenjamin Peterson2009-09-181-1/+0
|
* Merged revisions 74929 via svnmerge fromBenjamin Peterson2009-09-181-3/+6
| | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r74929 | benjamin.peterson | 2009-09-18 16:14:55 -0500 (Fri, 18 Sep 2009) | 1 line add keyword arguments support to str/unicode encode and decode #6300 ........
* Merged revisions 73871 via svnmerge fromAlexandre Vassalotti2009-07-211-2/+2
| | | | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r73871 | alexandre.vassalotti | 2009-07-06 22:17:30 -0400 (Mon, 06 Jul 2009) | 7 lines Grow the allocated buffer in PyUnicode_EncodeUTF7 to avoid buffer overrun. Without this change, test_unicode.UnicodeTest.test_codecs_utf7 crashes in debug mode. What happens is the unicode string u'\U000abcde' with a length of 1 encodes to the string '+2m/c3g-' of length 8. Since only 5 bytes is reserved in the buffer, a buffer overrun occurs. ........
* #6373: SystemError in str.encode('latin1', 'surrogateescape')Amaury Forgeot d'Arc2009-06-291-0/+2
| | | | | | | | if the string contains unpaired surrogates. (In debug build, crash in assert()) This can happen with normal processing, if python starts with utf-8, then calls sys.setfilesystemencoding('latin-1')
* Merged revisions 73190,73213,73257-73258,73260,73275,73294 via svnmerge fromGeorg Brandl2009-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r73190 | georg.brandl | 2009-06-04 01:23:45 +0200 (Do, 04 Jun 2009) | 2 lines Avoid PendingDeprecationWarnings emitted by deprecated unittest methods. ........ r73213 | georg.brandl | 2009-06-04 12:15:57 +0200 (Do, 04 Jun 2009) | 1 line #5967: note that the C slicing APIs do not support negative indices. ........ r73257 | georg.brandl | 2009-06-06 19:50:05 +0200 (Sa, 06 Jun 2009) | 1 line #6211: elaborate a bit on ways to call the function. ........ r73258 | georg.brandl | 2009-06-06 19:51:31 +0200 (Sa, 06 Jun 2009) | 1 line #6204: use a real reference instead of "see later". ........ r73260 | georg.brandl | 2009-06-06 20:21:58 +0200 (Sa, 06 Jun 2009) | 1 line #6224: s/JPython/Jython/, and remove one link to a module nine years old. ........ r73275 | georg.brandl | 2009-06-07 22:37:52 +0200 (So, 07 Jun 2009) | 1 line Add Ezio. ........ r73294 | georg.brandl | 2009-06-08 15:34:52 +0200 (Mo, 08 Jun 2009) | 1 line #6194: O_SHLOCK/O_EXLOCK are not really more platform independent than lockf(). ........
* Strengthen the guard. The code doesn't work well with subclasses.Raymond Hettinger2009-05-291-1/+1
|
* Issue #6012: Add cleanup support to O& argument parsing.Martin v. Löwis2009-05-291-1/+5
|
* Rename utf8b error handler to surrogateescape.Martin v. Löwis2009-05-101-1/+1
|
* add a replacement API for PyCObject, PyCapsule #5630Benjamin Peterson2009-05-051-10/+1
| | | | | | All stdlib modules with C-APIs now use this. Patch by Larry Hastings
* Merged revisions 72326 via svnmerge fromGeorg Brandl2009-05-051-1/+1
| | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r72326 | georg.brandl | 2009-05-05 11:19:43 +0200 (Di, 05 Mai 2009) | 1 line #5929: fix signedness warning. ........
* Issue #5915: Implement PEP 383, Non-decodable Bytes inMartin v. Löwis2009-05-051-9/+80
| | | | System Character Interfaces.
* Merged revisions 72283-72284 via svnmerge fromAntoine Pitrou2009-05-041-181/+244
| | | | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r72283 | antoine.pitrou | 2009-05-04 20:32:32 +0200 (lun., 04 mai 2009) | 4 lines Issue #4426: The UTF-7 decoder was too strict and didn't accept some legal sequences. Patch by Nick Barnes and Victor Stinner. ........ r72284 | antoine.pitrou | 2009-05-04 20:32:50 +0200 (lun., 04 mai 2009) | 3 lines Add Nick Barnes to ACKS. ........
* Merged revisions 72260 via svnmerge fromWalter Dörwald2009-05-031-49/+33
| | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r72260 | walter.doerwald | 2009-05-04 00:36:33 +0200 (Mo, 04 Mai 2009) | 5 lines Issue #5108: Handle %s like %S and %R in PyUnicode_FromFormatV(): Call PyUnicode_DecodeUTF8() once, remember the result and output it in a second step. This avoids problems with counting UTF-8 bytes that ignores the effect of using the replace error handler in PyUnicode_DecodeUTF8(). ........
* Issue #3672: Reject surrogates in utf-8 codec; add surrogates errorMartin v. Löwis2009-05-021-11/+72
| | | | handler.
* Issue #5859: Remove '%f' to '%g' formatting switch for large floats.Mark Dickinson2009-05-011-3/+0
|
* Issue #5859: Remove use of fixed-length buffers for float formattingMark Dickinson2009-05-011-66/+18
| | | | | | in unicodeobject.c and the fallback version of PyOS_double_to_string. As a result, operations like '%.120e' % 12.34 no longer raise an exception.
* The other half of Issue #1580: use short float repr where possible.Eric Smith2009-04-161-137/+33
| | | | | | | | | | | | | | Addresses the float -> string conversion, using David Gay's code which was added in Mark Dickinson's checkin r71663. Also addresses these, which are intertwined with the short repr changes: - Issue #5772: format(1e100, '<') produces '1e+100', not '1.0e+100' - Issue #5515: 'n' formatting with commas no longer works poorly with leading zeros. - PEP 378 Format Specifier for Thousands Separator: implemented for floats.
* #5708: a bit of streamlining in unicode_repeat().Georg Brandl2009-04-121-9/+8
|
* Added ',' thousands grouping to int.__format__. See PEP 378.Eric Smith2009-04-031-0/+1
| | | | | | | | | This is incomplete, but I want to get some version into the next alpha. I am still working on: Documentation. More tests. Implement for floats. In addition, there's an existing bug with 'n' formatting that carries forward to thousands grouping (issue 5515).
* Merged revisions 70682,70684 via svnmerge fromMark Dickinson2009-03-291-0/+9
| | | | | | | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r70682 | mark.dickinson | 2009-03-29 17:17:16 +0100 (Sun, 29 Mar 2009) | 3 lines Issue #532631: Add paranoid check to avoid potential buffer overflow on systems with sizeof(int) > 4. ........ r70684 | mark.dickinson | 2009-03-29 17:24:29 +0100 (Sun, 29 Mar 2009) | 3 lines Issue #532631: Apply floatformat changes to unicodeobject.c as well as stringobject.c. ........
* Merged revisions 70678 via svnmerge fromMark Dickinson2009-03-291-1/+1
| | | | | | | | | | | svn+ssh://pythondev@svn.python.org/python/trunk ........ r70678 | mark.dickinson | 2009-03-29 15:37:51 +0100 (Sun, 29 Mar 2009) | 3 lines Issue #532631: Replace confusing fabs(x)/1e25 >= 1e25 test with fabs(x) >= 1e50, and fix documentation. ........