Merge part of the trunk changes into the p3yk branch. This merges from 43030

(branch-creation time) up to 43067. 43068 and 43069 contain a little swapping action between re.py and sre.py, and this mightily confuses svn merge, so later changes are going in separately. This merge should break no additional tests. The last-merged revision is going in a 'last_merge' property on '.' (the branch directory.) Arbitrarily chosen, really; if there's a BCP for this, I couldn't find it, but we can easily change it afterwards ;)
author: Thomas Wouters <thomas@python.org> 2006-04-21 09:43:23 (GMT)
committer: Thomas Wouters <thomas@python.org> 2006-04-21 09:43:23 (GMT)
commit: a977329b6fb0e4c95cabb9043794de69b27a1099 (patch)
tree: b91552a0578639bd10181ab612039c1bed9bec27 /Doc/lib/libcodecs.tex
parent: d858f70617a9df8456e89a898ad8f97bd57c09f9 (diff)
download: cpython-a977329b6fb0e4c95cabb9043794de69b27a1099.zip
cpython-a977329b6fb0e4c95cabb9043794de69b27a1099.tar.gz
cpython-a977329b6fb0e4c95cabb9043794de69b27a1099.tar.bz2
1 files changed, 160 insertions, 8 deletions
diff --git a/Doc/lib/libcodecs.tex b/Doc/lib/libcodecs.tex
index 9e92217..1806ef0 100644
--- a/Doc/lib/libcodecs.tex
+++ b/Doc/lib/libcodecs.tex
@@ -24,8 +24,19 @@ It defines the following functions:
 \begin{funcdesc}{register}{search_function}
 Register a codec search function. Search functions are expected to
 take one argument, the encoding name in all lower case letters, and
-return a tuple of functions \code{(\var{encoder}, \var{decoder}, \var{stream_reader},
-\var{stream_writer})} taking the following arguments:
+return a \class{CodecInfo} object having the following attributes:
+
+\begin{itemize}
+  \item \code{name} The name of the encoding;
+  \item \code{encoder} The stateless encoding function;
+  \item \code{decoder} The stateless decoding function;
+  \item \code{incrementalencoder} An incremental encoder class or factory function;
+  \item \code{incrementaldecoder} An incremental decoder class or factory function;
+  \item \code{streamwriter} A stream writer class or factory function;
+  \item \code{streamreader} A stream reader class or factory function.
+\end{itemize}
+
+The various functions or classes take the following arguments:
 
   \var{encoder} and \var{decoder}: These must be functions or methods
   which have the same interface as the
@@ -33,7 +44,17 @@ return a tuple of functions \code{(\var{encoder}, \var{decoder}, \var{stream_rea
   Codec Interface). The functions/methods are expected to work in a
   stateless mode.
 
-  \var{stream_reader} and \var{stream_writer}: These have to be
+  \var{incrementalencoder} and \var{incrementalencoder}: These have to be
+  factory functions providing the following interface:
+
+        \code{factory(\var{errors}='strict')}
+
+  The factory functions must return objects providing the interfaces
+  defined by the base classes \class{IncrementalEncoder} and
+  \class{IncrementalEncoder}, respectively. Incremental codecs can maintain
+  state.
+
+  \var{streamreader} and \var{streamwriter}: These have to be
   factory functions providing the following interface:
 
         \code{factory(\var{stream}, \var{errors}='strict')}
@@ -58,13 +79,13 @@ return \code{None}.
 \end{funcdesc}
 
 \begin{funcdesc}{lookup}{encoding}
-Looks up a codec tuple in the Python codec registry and returns the
-function tuple as defined above.
+Looks up the codec info in the Python codec registry and returns a
+\class{CodecInfo} object as defined above.
 
 Encodings are first looked up in the registry's cache. If not found,
-the list of registered search functions is scanned. If no codecs tuple
-is found, a \exception{LookupError} is raised. Otherwise, the codecs
-tuple is stored in the cache and returned to the caller.
+the list of registered search functions is scanned. If no \class{CodecInfo}
+object is found, a \exception{LookupError} is raised. Otherwise, the
+\class{CodecInfo} object is stored in the cache and returned to the caller.
 \end{funcdesc}
 
 To simplify access to the various codecs, the module provides these
@@ -85,6 +106,22 @@ function.
 Raises a \exception{LookupError} in case the encoding cannot be found.
 \end{funcdesc}
 
+\begin{funcdesc}{getincrementalencoder}{encoding}
+Lookup up the codec for the given encoding and return its incremental encoder
+class or factory function.
+
+Raises a \exception{LookupError} in case the encoding cannot be found or the
+codec doesn't support an incremental encoder.
+\end{funcdesc}
+
+\begin{funcdesc}{getincrementaldecoder}{encoding}
+Lookup up the codec for the given encoding and return its incremental decoder
+class or factory function.
+
+Raises a \exception{LookupError} in case the encoding cannot be found or the
+codec doesn't support an incremental decoder.
+\end{funcdesc}
+
 \begin{funcdesc}{getreader}{encoding}
 Lookup up the codec for the given encoding and return its StreamReader
 class or factory function.
@@ -188,6 +225,18 @@ If \var{output} is not given, it defaults to \var{input}.
 an encoding error occurs.
 \end{funcdesc}
 
+\begin{funcdesc}{iterencode}{iterable, encoding\optional{, errors}}
+Uses an incremental encoder to iteratively encode the input provided by
+\var{iterable}. This function is a generator. \var{errors} (as well as
+any other keyword argument) is passed through to the incremental encoder.
+\end{funcdesc}
+
+\begin{funcdesc}{iterdecode}{iterable, encoding\optional{, errors}}
+Uses an incremental decoder to iteratively decode the input provided by
+\var{iterable}. This function is a generator. \var{errors} (as well as
+any other keyword argument) is passed through to the incremental encoder.
+\end{funcdesc}
+
 The module also provides the following constants which are useful
 for reading and writing to platform dependent files:
 
@@ -292,6 +341,109 @@ function interfaces of the stateless encoder and decoder:
   empty object of the output object type in this situation.
 \end{methoddesc}
 
+The \class{IncrementalEncoder} and \class{IncrementalDecoder} classes provide
+the basic interface for incremental encoding and decoding. Encoding/decoding the
+input isn't done with one call to the stateless encoder/decoder function,
+but with multiple calls to the \method{encode}/\method{decode} method of the
+incremental encoder/decoder. The incremental encoder/decoder keeps track of
+the encoding/decoding process during method calls.
+
+The joined output of calls to the \method{encode}/\method{decode} method is the
+same as if the all single inputs where joined into one, and this input was
+encoded/decoded with the stateless encoder/decoder.
+
+
+\subsubsection{IncrementalEncoder Objects \label{incremental-encoder-objects}}
+
+The \class{IncrementalEncoder} class is used for encoding an input in multiple
+steps. It defines the following methods which every incremental encoder must
+define in order to be compatible to the Python codec registry.
+
+\begin{classdesc}{IncrementalEncoder}{\optional{errors}}
+  Constructor for a \class{IncrementalEncoder} instance.
+
+  All incremental encoders must provide this constructor interface. They are
+  free to add additional keyword arguments, but only the ones defined
+  here are used by the Python codec registry.
+
+  The \class{IncrementalEncoder} may implement different error handling
+  schemes by providing the \var{errors} keyword argument. These
+  parameters are predefined:
+
+  \begin{itemize}
+    \item \code{'strict'} Raise \exception{ValueError} (or a subclass);
+                          this is the default.
+    \item \code{'ignore'} Ignore the character and continue with the next.
+    \item \code{'replace'} Replace with a suitable replacement character
+    \item \code{'xmlcharrefreplace'} Replace with the appropriate XML
+                     character reference
+    \item \code{'backslashreplace'} Replace with backslashed escape sequences.
+  \end{itemize}
+
+  The \var{errors} argument will be assigned to an attribute of the
+  same name. Assigning to this attribute makes it possible to switch
+  between different error handling strategies during the lifetime
+  of the \class{IncrementalEncoder} object.
+
+  The set of allowed values for the \var{errors} argument can
+  be extended with \function{register_error()}.
+\end{classdesc}
+
+\begin{methoddesc}{encode}{object\optional{, final}}
+  Encodes \var{object} (taking the current state of the encoder into account)
+  and returns the resulting encoded object. If this is the last call to
+  \method{encode} \var{final} must be true (the default is false).
+\end{methoddesc}
+
+\begin{methoddesc}{reset}{}
+  Reset the encoder to the initial state.
+\end{methoddesc}
+
+
+\subsubsection{IncrementalDecoder Objects \label{incremental-decoder-objects}}
+
+The \class{IncrementalDecoder} class is used for decoding an input in multiple
+steps. It defines the following methods which every incremental decoder must
+define in order to be compatible to the Python codec registry.
+
+\begin{classdesc}{IncrementalDecoder}{\optional{errors}}
+  Constructor for a \class{IncrementalDecoder} instance.
+
+  All incremental decoders must provide this constructor interface. They are
+  free to add additional keyword arguments, but only the ones defined
+  here are used by the Python codec registry.
+
+  The \class{IncrementalDecoder} may implement different error handling
+  schemes by providing the \var{errors} keyword argument. These
+  parameters are predefined:
+
+  \begin{itemize}
+    \item \code{'strict'} Raise \exception{ValueError} (or a subclass);
+                          this is the default.
+    \item \code{'ignore'} Ignore the character and continue with the next.
+    \item \code{'replace'} Replace with a suitable replacement character.
+  \end{itemize}
+
+  The \var{errors} argument will be assigned to an attribute of the
+  same name. Assigning to this attribute makes it possible to switch
+  between different error handling strategies during the lifetime
+  of the \class{IncrementalEncoder} object.
+
+  The set of allowed values for the \var{errors} argument can
+  be extended with \function{register_error()}.
+\end{classdesc}
+
+\begin{methoddesc}{decode}{object\optional{, final}}
+  Decodes \var{object} (taking the current state of the decoder into account)
+  and returns the resulting decoded object. If this is the last call to
+  \method{decode} \var{final} must be true (the default is false).
+\end{methoddesc}
+
+\begin{methoddesc}{reset}{}
+  Reset the decoder to the initial state.
+\end{methoddesc}
+
+
 The \class{StreamWriter} and \class{StreamReader} classes provide
 generic working interfaces which can be used to implement new
 encodings submodules very easily. See \module{encodings.utf_8} for an
author	Thomas Wouters <thomas@python.org>	2006-04-21 09:43:23 (GMT)
committer	Thomas Wouters <thomas@python.org>	2006-04-21 09:43:23 (GMT)
commit	a977329b6fb0e4c95cabb9043794de69b27a1099 (patch)
tree	b91552a0578639bd10181ab612039c1bed9bec27 /Doc/lib/libcodecs.tex
parent	d858f70617a9df8456e89a898ad8f97bd57c09f9 (diff)
download	cpython-a977329b6fb0e4c95cabb9043794de69b27a1099.zip cpython-a977329b6fb0e4c95cabb9043794de69b27a1099.tar.gz cpython-a977329b6fb0e4c95cabb9043794de69b27a1099.tar.bz2