| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
trying to return a complete line even if a size parameter was given (see
http://www.python.org/sf/1076985). This leads to buffer overflows with long
source lines under Windows if e.g. cp1252 is used as the source encoding.
This patch reverts the behaviour of readline() to something that behaves more
like Python 2.3: If a size parameter is given, read() is called only once.
As a side effect of this, readline() now supports all types of linebreaks
supported by unicode.splitlines().
Note that the tokenizer is still broken and it's possible to provoke segfaults
(see http://www.python.org/sf/1089395).
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
Python 2.3 will support source code encodings which rely on the
builtin codecs being available to the parser.
Remove struct dependency from codecs.py
|
|
|
|
| |
the codecs docstrings.
|
|
|
|
|
|
| |
and StreamRecoder.
This closes SF bug #634246.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
BOM_UTF32, BOM_UTF32_LE and BOM_UTF32_BE that represent the Byte
Order Mark in UTF-8, UTF-16 and UTF-32 encodings for little and
big endian systems.
The old names BOM32_* and BOM64_* were off by a factor of 2.
This closes SF bug http://www.python.org/sf/555360
|
| |
|
|
|
|
| |
as well.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
codec files to codecs.py and added logic so that multi mappings
in the decoding maps now result in mappings to None (undefined mapping)
in the encoding maps.
|
|
|
|
| |
after commas that didn't have any).
|
|
|
|
|
|
|
|
| |
added test script and expected output file as well
this closes patch 103297.
__all__ attributes will be added to other modules without first submitting
a patch, just adding the necessary line to the test script to verify
more-or-less correct implementation.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
codec to not apply Latin-1 mappings for keys which are not found
in the mapping dictionaries, but instead treat them as undefined
mappings.
The patch was originally written by Martin v. Loewis with some
additional (cosmetic) changes and an updated test script
by Marc-Andre Lemburg.
The standard codecs were recreated from the most current files
available at the Unicode.org site using the Tools/scripts/gencodec.py
tool.
This patch closes the bugs #116285 and #119960.
|
|
|
|
| |
StreamReader ignores the 'errors' parameter passed to its constructor
|
| |
|
|
|
|
|
|
|
|
|
|
| |
comments, docstrings or error messages. I fixed two minor things in
test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't").
There is a minor style issue involved: Guido seems to have preferred English
grammar (behaviour, honour) in a couple places. This patch changes that to
American, which is the more prominent style in the source. I prefer English
myself, so if English is preferred, I'd be happy to supply a patch myself ;)
|
|
|
|
| |
Made codecs.open() default to 'rb' as file mode.
|
|
|
|
|
| |
The two methods .readline() and .readlines() in StreamReaderWriter
didn't define the self argument. Found by Tom Emerson.
|
|
|
|
| |
Added more documentation. Clarified some existing comments.
|
|
|
|
|
|
|
|
|
| |
a missing part of the previous checkin message:
Marc-Andre Lemburg:
Added encoding name attributes to wrapper classes which
allow applications to check the used encoding names.
|
|
|
|
|
| |
Added .writelines(), .readlines() and .readline() to all
codec classes.
|
|
|
|
| |
mechanism is enhanced to be more informative.
|
|
|
|
|
|
|
|
|
|
| |
Attached you find the latest update of the Unicode implementation.
The patch is against the current CVS version.
It includes the fix I posted yesterday for the core dump problem
in codecs.c (was introduced by my previous patch set -- sorry),
adds more tests for the codecs and two new parser markers
"es" and "es#".
|
|
|
|
|
|
|
| |
checking in; sorry!
"the the" --> "the" (in docstring); noted by Detlef Lannert
<lannert@lannert.rz.uni-duesseldorf.de>.
|
|
|
|
| |
<lannert@lannert.rz.uni-duesseldorf.de>.
|
|
Marc-Andre Lemburg.
|