| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
_Py_char2wchar() callers usually need the result size in characters. Since it's
trivial to compute it in _Py_char2wchar() (O(1) whereas wcslen() is O(n)), add
an option to get it.
|
|
|
|
|
|
|
|
| |
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
Py_FileSystemDefaultEncoding is NULL
* redecode_filenames() functions and _Py_code_object_list (issue #9630)
are no more needed: remove them
|
| |
|
|
|
|
|
| |
All unicode functions uses PyObject* except PyUnicode_AsWideChar(). Fix the
prototype for the new function PyUnicode_AsWideCharString().
|
|
|
|
|
| |
UTF-16 surrogate pairs by single non-BMP characters for 16 bits Py_UNICODE
and 32 bits wchar_t (eg. Linux in narrow build).
|
|
|
|
|
|
| |
character
And write unit tests for PyUnicode_AsWideChar() and PyUnicode_AsWideCharString().
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Redecode the filenames of:
- all modules: __file__ and __path__ attributes
- all code objects: co_filename attribute
- sys.path
- sys.meta_path
- sys.executable
- sys.path_importer_cache (keys)
Keep weak references to all code objects until initfsencoding() is called, to
be able to redecode co_filename attribute of all code objects.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
a non-ASCII byte in the format string.
Document also the encoding.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Inline its value in PyUnicode_GetDefaultEncoding(). The comment is now outdated
(we will not change its value anymore).
|
|
|
|
|
| |
are now removed, since their effect was inexistent in 3.x (the default
encoding is hardcoded to utf-8 and cannot be changed).
|
|
|
|
| |
properly. Patch by Stefan Behnel.
|
|
|
|
| |
memcpy to convert between the two (as already done when wchar_t is unsigned)
|
| |
|
|
|
|
|
| |
The code is based on strncmp() of the libiberty library,
function in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.
* Don't specify surrogateescape error handler in the comments nor the
documentation, but PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_EncodeFSDefault() because these functions use strict error handler
for the mbcs encoding (on Windows).
* Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
inconsistency with unicodeobject.h.
|
|
|
|
|
|
|
| |
Similar to PyErr_WarnEx() but use PyUnicode_FromFormatV() to format the warning
message.
Strip also some trailing spaces.
|
|
|
|
|
| |
va_copy, but available on all python platforms. Untabified a few
unrelated files.
|
| |
|
|
|
|
| |
we now have to nul-terminate the string anyway.
|
|
|
|
| |
explicitly check range before looking for a second surrogate character.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r82248 | ezio.melotti | 2010-06-26 21:44:42 +0300 (Sat, 26 Jun 2010) | 1 line
Fix extra space.
........
|
|
|
|
|
| |
mode raises unicode errors. The encoder only supports "strict" and "replace"
error handlers, the decoder only supports "strict" and "ignore" error handlers.
|
| |
|
|
|
|
| |
filenames and enable os.fsencode().
|
|
|
|
|
|
|
|
|
|
|
|
| |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines
Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
the interpreter with characters outside the Basic Multilingual Plane
(higher than 0x10000).
........
|
|
|
|
| |
Don't use normalize_encoding() result if it is truncated.
|
|
|
|
|
| |
enable shortcuts for upper case encoding name. Add also a shortcut for
"iso-8859-1" in PyUnicode_AsEncodedString() and PyUnicode_Decode().
|
|
|
|
|
|
| |
object to Py_FileSystemDefaultEncoding with the "surrogateescape" error
handler, return a bytes object. If Py_FileSystemDefaultEncoding is not set,
fall back to UTF-8.
|
|
|
|
| |
error handler, not only the default error handler (strict)
|
|
|
|
|
|
| |
This function is only used to decode Python module filenames, but Python
doesn't support surrogates in modules filenames yet. So nobody noticed this
minor bug.
|
|
|
|
| |
PyByteArray is no more supported
|
| |
|
| |
|
|
|
|
| |
unicode string (eg. backslashreplace)
|
|
|
|
| |
you have to convert your bytearray filenames to bytes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79494 | florent.xicluna | 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) | 2 lines
#7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14.
........
r79496 | florent.xicluna | 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) | 2 lines
Highlight the change of behavior related to r79494. Now VT and FF are linebreaks.
........
|