From 196035595f6002868e0a760d78b7bd9fd630fcbd Mon Sep 17 00:00:00 2001 From: Benjamin Peterson Date: Sun, 2 Dec 2012 11:26:10 -0500 Subject: document that encoding error handlers may return bytes (#16585) --- Doc/library/codecs.rst | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst index 071fc23..28ea89d 100644 --- a/Doc/library/codecs.rst +++ b/Doc/library/codecs.rst @@ -155,13 +155,16 @@ functions which use :func:`lookup` for the codec lookup: when *name* is specified as the errors parameter. For encoding *error_handler* will be called with a :exc:`UnicodeEncodeError` - instance, which contains information about the location of the error. The error - handler must either raise this or a different exception or return a tuple with a - replacement for the unencodable part of the input and a position where encoding - should continue. The encoder will encode the replacement and continue encoding - the original input at the specified position. Negative position values will be - treated as being relative to the end of the input string. If the resulting - position is out of bound an :exc:`IndexError` will be raised. + instance, which contains information about the location of the error. The + error handler must either raise this or a different exception or return a + tuple with a replacement for the unencodable part of the input and a position + where encoding should continue. The replacement may be either :class:`str` or + :class:`bytes`. If the replacement is bytes, the encoder will simply copy + them into the output buffer. If the replacement is a string, the encoder will + encode the replacement. Encoding continues on original input at the + specified position. Negative position values will be treated as being + relative to the end of the input string. If the resulting position is out of + bound an :exc:`IndexError` will be raised. Decoding and translating works similar, except :exc:`UnicodeDecodeError` or :exc:`UnicodeTranslateError` will be passed to the handler and that the -- cgit v0.12 From 78f7e3a8dc999114c6863754b0c72ad5a9ec93eb Mon Sep 17 00:00:00 2001 From: Benjamin Peterson Date: Sun, 2 Dec 2012 11:33:06 -0500 Subject: document UnicodeError attributes --- Doc/library/exceptions.rst | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/Doc/library/exceptions.rst b/Doc/library/exceptions.rst index 053ba56..89f933c 100644 --- a/Doc/library/exceptions.rst +++ b/Doc/library/exceptions.rst @@ -371,6 +371,30 @@ The following exceptions are the exceptions that are usually raised. Raised when a Unicode-related encoding or decoding error occurs. It is a subclass of :exc:`ValueError`. + :exc:`UnicodeError` has attributes that describe the encoding or decoding + error. For example, ``err.object[err.start:err.end]`` gives the particular + invalid input that the codec failed on. + + .. attribute:: encoding + + The name of the encoding that raised the error. + + .. attribute:: reason + + A string describing the specific codec error. + + .. attribute:: object + + The object the codec was attempting to encode or decode. + + .. attribute:: start + + The first index of invalid data in :attr:`object`. + + .. attribute:: end + + The index after the last invalid data in :attr:`object`. + .. exception:: UnicodeEncodeError -- cgit v0.12