summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorSkip Montanaro <skip@pobox.com>2003-03-20 23:29:12 (GMT)
committerSkip Montanaro <skip@pobox.com>2003-03-20 23:29:12 (GMT)
commitb4a0417e9112126070316d21cb1f54a7c365a24c (patch)
tree39cc7fcb6fe0bec3760dc195f55de001d77d1724 /Doc
parent4cee220ff3bc1d858aeb4d8035f1427e5f14dbd1 (diff)
downloadcpython-b4a0417e9112126070316d21cb1f54a7c365a24c.zip
cpython-b4a0417e9112126070316d21cb1f54a7c365a24c.tar.gz
cpython-b4a0417e9112126070316d21cb1f54a7c365a24c.tar.bz2
new CSV file processing module - see PEP 305
Diffstat (limited to 'Doc')
-rw-r--r--Doc/lib/libcsv.tex281
1 files changed, 281 insertions, 0 deletions
diff --git a/Doc/lib/libcsv.tex b/Doc/lib/libcsv.tex
new file mode 100644
index 0000000..283e401
--- /dev/null
+++ b/Doc/lib/libcsv.tex
@@ -0,0 +1,281 @@
+\section{\module{csv} --- CSV File Reading and Writing}
+
+\declaremodule{standard}{csv}
+\modulesynopsis{Write and read tabular data to and from delimited files.}
+
+\versionadded{2.3}
+\index{csv}
+\indexii{data}{tabular}
+
+The so-called CSV (Comma Separated Values) format is the most common import
+and export format for spreadsheets and databases. There is no ``CSV
+standard'', so the format is operationally defined by the many applications
+which read and write it. The lack of a standard means that subtle
+differences often exist in the data produced and consumed by different
+applications. These differences can make it annoying to process CSV files
+from multiple sources. Still, while the delimiters and quoting characters
+vary, the overall format is similar enough that it is possible to write a
+single module which can efficiently manipulate such data, hiding the details
+of reading and writing the data from the programmer.
+
+The \module{csv} module implements classes to read and write tabular data in
+CSV format. It allows programmers to say, ``write this data in the format
+preferred by Excel,'' or ``read data from this file which was generated by
+Excel,'' without knowing the precise details of the CSV format used by
+Excel. Programmers can also describe the CSV formats understood by other
+applications or define their own special-purpose CSV formats.
+
+The \module{csv} module's \class{reader} and \class{writer} objects read and
+write sequences. Programmers can also read and write data in dictionary
+form using the \class{DictReader} and \class{DictWriter} classes.
+
+\note{The first version of the \module{csv} module doesn't support Unicode
+input. Also, there are currently some issues regarding \ASCII{} NUL
+characters. Accordingly, all input should generally be plain \ASCII{} to be
+safe. These restrictions will be removed in the future.}
+
+\begin{seealso}
+% \seemodule{array}{Arrays of uniformly types numeric values.}
+ \seepep{305}{CSV File API}
+ {The Python Enhancement Proposal which proposed this addition
+ to Python.}
+\end{seealso}
+
+
+\subsection{Module Contents}
+
+
+The \module{csv} module defines the following functions:
+
+\begin{funcdesc}{reader}{csvfile\optional{,
+ dialect=\code{'excel'}\optional{, fmtparam}}}
+Return a reader object which will iterate over lines in the given
+{}\var{csvfile}. \var{csvfile} can be any object which supports the
+iterator protocol and returns a string each time its \method{next}
+method is called. An optional \var{dialect} parameter can be given
+which is used to define a set of parameters specific to a particular CSV
+dialect. It may be an instance of a subclass of the \class{Dialect}
+class or one of the strings returned by the \function{list_dialects}
+function. The other optional {}\var{fmtparam} keyword arguments can be
+given to override individual formatting parameters in the current
+dialect. For more information about the dialect and formatting
+parameters, see section~\ref{fmt-params}, ``Dialects and Formatting
+Parameters'' for details of these parameters.
+
+All data read are returned as strings. No automatic data type
+conversion is performed.
+\end{funcdesc}
+
+\begin{funcdesc}{writer}{csvfile\optional{,
+ dialect=\code{'excel'}\optional{, fmtparam}}}
+Return a writer object responsible for converting the user's data into
+delimited strings on the given file-like object. An optional
+{}\var{dialect} parameter can be given which is used to define a set of
+parameters specific to a particular CSV dialect. It may be an instance
+of a subclass of the \class{Dialect} class or one of the strings
+returned by the \function{list_dialects} function. The other optional
+{}\var{fmtparam} keyword arguments can be given to override individual
+formatting parameters in the current dialect. For more information
+about the dialect and formatting parameters, see
+section~\ref{fmt-params}, ``Dialects and Formatting Parameters'' for
+details of these parameters. To make it as easy as possible to
+interface with modules which implement the DB API, the value
+\constant{None} is written as the empty string. While this isn't a
+reversible transformation, it makes it easier to dump SQL NULL data values
+to CSV files without preprocessing the data returned from a
+\code{cursor.fetch*()} call. All other non-string data are stringified
+with \function{str()} before being written.
+\end{funcdesc}
+
+\begin{funcdesc}{register_dialect}{name, dialect}
+Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
+of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
+\end{funcdesc}
+
+\begin{funcdesc}{unregister_dialect}{name}
+Delete the dialect associated with \var{name} from the dialect registry. An
+\exception{Error} is raised if \var{name} is not a registered dialect
+name.
+\end{funcdesc}
+
+\begin{funcdesc}{get_dialect}{name}
+Return the dialect associated with \var{name}. An \exception{Error} is
+raised if \var{name} is not a registered dialect name.
+\end{funcdesc}
+
+\begin{funcdesc}{list_dialects}{}
+Return the names of all registered dialects.
+\end{funcdesc}
+
+
+The \module{csv} module defines the following classes:
+
+\begin{classdesc}{DictReader}{csvfile, fieldnames\optional{,
+ restkey=\code{None}\optional{,
+ restval=\code{None}\optional{,
+ dialect=\code{'excel'}\optional{,
+ fmtparam}}}}}
+Create an object which operates like a regular reader but maps the
+information read into a dict whose keys are given by the \var{fieldnames}
+parameter. If the row read has fewer fields than the fieldnames sequence,
+the value of \var{restval} will be used as the default value. If the row
+read has more fields than the fieldnames sequence, the remaining data is
+added as a sequence keyed by the value of \var{restkey}. If the row read
+has fewer fields than the fieldnames sequence, the remaining keys take the
+value of the optiona \var{restval} parameter. All other parameters are
+interpreted as for regular readers.
+\end{classdesc}
+
+
+\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
+ restval=""\optional{,
+ extrasaction=\code{'raise'}\optional{,
+ dialect=\code{'excel'}\optional{, fmtparam}}}}}
+Create an object which operates like a regular writer but maps dictionaries
+onto output rows. The \var{fieldnames} parameter identifies the order in
+which values in the dictionary passed to the \method{writerow()} method are
+written to the \var{csvfile}. The optional \var{restval} parameter
+specifies the value to be written if the dictionary is missing a key in
+\var{fieldnames}. If the dictionary passed to the \method{writerow()}
+method contains a key not found in \var{fieldnames}, the optional
+\var{extrasaction} parameter indicates what action to take. If it is set
+to \code{'raise'} a \exception{ValueError} is raised. If it is set to
+\code{'ignore'}, extra values in the dictionary are ignored. All other
+parameters are interpreted as for regular writers.
+\end{classdesc}
+
+
+\begin{classdesc*}{Dialect}{}
+The \class{Dialect} class is a container class relied on primarily for its
+attributes, which are used to define the parameters for a specific
+\class{reader} or \class{writer} instance. Dialect objects support the
+following data attributes:
+
+\begin{memberdesc}[string]{delimiter}
+A one-character string used to separate fields. It defaults to \code{","}.
+\end{memberdesc}
+
+\begin{memberdesc}[boolean]{doublequote}
+Controls how instances of \var{quotechar} appearing inside a field should be
+themselves be quoted. When \constant{True}, the character is doubledd.
+When \constant{False}, the \var{escapechar} must be a one-character string
+which is used as a prefix to the \var{quotechar}. It defaults to
+\constant{True}.
+\end{memberdesc}
+
+\begin{memberdesc}{escapechar}
+A one-character string used to escape the \var{delimiter} if \var{quoting}
+is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
+\end{memberdesc}
+
+\begin{memberdesc}[string]{lineterminator}
+The string used to terminate lines in the CSV file. It defaults to
+\code{"\e r\e n"}.
+\end{memberdesc}
+
+\begin{memberdesc}[string]{quotechar}
+A one-character string used to quote elements containing the \var{delimiter}
+or which start with the \var{quotechar}. It defaults to \code{'"'}.
+\end{memberdesc}
+
+\begin{memberdesc}[integer]{quoting}
+Controls when quotes should be generated by the writer. It can take on any
+of the \code{QUOTE_*} constants defined below and defaults to
+\constant{QUOTE_MINIMAL}.
+\end{memberdesc}
+
+\begin{memberdesc}[boolean]{skipinitialspace}
+When \constant{True}, whitespace immediately following the \var{delimiter}
+is ignored. The default is \constant{False}.
+\end{memberdesc}
+
+\end{classdesc*}
+
+The \module{csv} module defines the following constants:
+
+\begin{datadesc}{QUOTE_ALWAYS}
+Instructs \class{writer} objects to quote all fields.
+\end{datadesc}
+
+\begin{datadesc}{QUOTE_MINIMAL}
+Instructs \class{writer} objects to only quote those fields which contain
+the current \var{delimiter} or begin with the current \var{quotechar}.
+\end{datadesc}
+
+\begin{datadesc}{QUOTE_NONNUMERIC}
+Instructs \class{writer} objects to quote all non-numeric fields.
+\end{datadesc}
+
+\begin{datadesc}{QUOTE_NONE}
+Instructs \class{writer} objects to never quote fields. When the current
+\var{delimiter} occurs in output data it is preceded by the current
+\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
+is an error not to have a single-character \var{escapechar} defined, even if
+no data to be written contains the \var{delimiter} character.
+\end{datadesc}
+
+
+The \module{csv} module defines the following exception:
+
+\begin{excdesc}{Error}
+Raised by any of the functions when an error is detected.
+\end{excdesc}
+
+
+\subsection{Dialects and Formatting Parameters\label{fmt-params}}
+
+To make it easier to specify the format of input and output records,
+specific formatting parameters are grouped together into dialects. A
+dialect is a subclass of the \class{Dialect} class having a set of specific
+methods and a single \method{validate()} method. When creating \class{reader}
+or \class{writer} objects, the programmer can specify a string or a subclass
+of the \class{Dialect} class as the dialect parameter. In addition to, or
+instead of, the \var{dialect} parameter, the programmer can also specify
+individual formatting parameters, which have the same names as the
+attributes defined above for the \class{Dialect} class.
+
+
+\subsection{Reader Objects}
+
+\class{DictReader} and \var{reader} objects have the following public
+methods:
+
+\begin{methoddesc}{next}{}
+Return the next row of the reader's iterable object as a list, parsed
+according to the current dialect.
+\end{methoddesc}
+
+
+\subsection{Writer Objects}
+
+\class{DictWriter} and \var{writer} objects have the following public
+methods:
+
+\begin{methoddesc}{writerow}{row}
+Write the \var{row} parameter to the writer's file object, formatted
+according to the current dialect.
+\end{methoddesc}
+
+\begin{methoddesc}{writerows}{rows}
+Write all the \var{rows} parameters to the writer's file object, formatted
+according to the current dialect.
+\end{methoddesc}
+
+
+\subsection{Examples}
+
+The ``Hello, world'' of csv reading is
+
+\begin{verbatim}
+ reader = csv.reader(file("some.csv"))
+ for row in reader:
+ print row
+\end{verbatim}
+
+The corresponding simplest possible writing example is
+
+\begin{verbatim}
+ writer = csv.writer(file("some.csv", "w"))
+ for row in someiterable:
+ writer.writerow(row)
+\end{verbatim}