diff options
author | Skip Montanaro <skip@pobox.com> | 2003-03-20 23:29:12 (GMT) |
---|---|---|
committer | Skip Montanaro <skip@pobox.com> | 2003-03-20 23:29:12 (GMT) |
commit | b4a0417e9112126070316d21cb1f54a7c365a24c (patch) | |
tree | 39cc7fcb6fe0bec3760dc195f55de001d77d1724 /Doc/lib | |
parent | 4cee220ff3bc1d858aeb4d8035f1427e5f14dbd1 (diff) | |
download | cpython-b4a0417e9112126070316d21cb1f54a7c365a24c.zip cpython-b4a0417e9112126070316d21cb1f54a7c365a24c.tar.gz cpython-b4a0417e9112126070316d21cb1f54a7c365a24c.tar.bz2 |
new CSV file processing module - see PEP 305
Diffstat (limited to 'Doc/lib')
-rw-r--r-- | Doc/lib/libcsv.tex | 281 |
1 files changed, 281 insertions, 0 deletions
diff --git a/Doc/lib/libcsv.tex b/Doc/lib/libcsv.tex new file mode 100644 index 0000000..283e401 --- /dev/null +++ b/Doc/lib/libcsv.tex @@ -0,0 +1,281 @@ +\section{\module{csv} --- CSV File Reading and Writing} + +\declaremodule{standard}{csv} +\modulesynopsis{Write and read tabular data to and from delimited files.} + +\versionadded{2.3} +\index{csv} +\indexii{data}{tabular} + +The so-called CSV (Comma Separated Values) format is the most common import +and export format for spreadsheets and databases. There is no ``CSV +standard'', so the format is operationally defined by the many applications +which read and write it. The lack of a standard means that subtle +differences often exist in the data produced and consumed by different +applications. These differences can make it annoying to process CSV files +from multiple sources. Still, while the delimiters and quoting characters +vary, the overall format is similar enough that it is possible to write a +single module which can efficiently manipulate such data, hiding the details +of reading and writing the data from the programmer. + +The \module{csv} module implements classes to read and write tabular data in +CSV format. It allows programmers to say, ``write this data in the format +preferred by Excel,'' or ``read data from this file which was generated by +Excel,'' without knowing the precise details of the CSV format used by +Excel. Programmers can also describe the CSV formats understood by other +applications or define their own special-purpose CSV formats. + +The \module{csv} module's \class{reader} and \class{writer} objects read and +write sequences. Programmers can also read and write data in dictionary +form using the \class{DictReader} and \class{DictWriter} classes. + +\note{The first version of the \module{csv} module doesn't support Unicode +input. Also, there are currently some issues regarding \ASCII{} NUL +characters. Accordingly, all input should generally be plain \ASCII{} to be +safe. These restrictions will be removed in the future.} + +\begin{seealso} +% \seemodule{array}{Arrays of uniformly types numeric values.} + \seepep{305}{CSV File API} + {The Python Enhancement Proposal which proposed this addition + to Python.} +\end{seealso} + + +\subsection{Module Contents} + + +The \module{csv} module defines the following functions: + +\begin{funcdesc}{reader}{csvfile\optional{, + dialect=\code{'excel'}\optional{, fmtparam}}} +Return a reader object which will iterate over lines in the given +{}\var{csvfile}. \var{csvfile} can be any object which supports the +iterator protocol and returns a string each time its \method{next} +method is called. An optional \var{dialect} parameter can be given +which is used to define a set of parameters specific to a particular CSV +dialect. It may be an instance of a subclass of the \class{Dialect} +class or one of the strings returned by the \function{list_dialects} +function. The other optional {}\var{fmtparam} keyword arguments can be +given to override individual formatting parameters in the current +dialect. For more information about the dialect and formatting +parameters, see section~\ref{fmt-params}, ``Dialects and Formatting +Parameters'' for details of these parameters. + +All data read are returned as strings. No automatic data type +conversion is performed. +\end{funcdesc} + +\begin{funcdesc}{writer}{csvfile\optional{, + dialect=\code{'excel'}\optional{, fmtparam}}} +Return a writer object responsible for converting the user's data into +delimited strings on the given file-like object. An optional +{}\var{dialect} parameter can be given which is used to define a set of +parameters specific to a particular CSV dialect. It may be an instance +of a subclass of the \class{Dialect} class or one of the strings +returned by the \function{list_dialects} function. The other optional +{}\var{fmtparam} keyword arguments can be given to override individual +formatting parameters in the current dialect. For more information +about the dialect and formatting parameters, see +section~\ref{fmt-params}, ``Dialects and Formatting Parameters'' for +details of these parameters. To make it as easy as possible to +interface with modules which implement the DB API, the value +\constant{None} is written as the empty string. While this isn't a +reversible transformation, it makes it easier to dump SQL NULL data values +to CSV files without preprocessing the data returned from a +\code{cursor.fetch*()} call. All other non-string data are stringified +with \function{str()} before being written. +\end{funcdesc} + +\begin{funcdesc}{register_dialect}{name, dialect} +Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass +of \class{csv.Dialect}. \var{name} must be a string or Unicode object. +\end{funcdesc} + +\begin{funcdesc}{unregister_dialect}{name} +Delete the dialect associated with \var{name} from the dialect registry. An +\exception{Error} is raised if \var{name} is not a registered dialect +name. +\end{funcdesc} + +\begin{funcdesc}{get_dialect}{name} +Return the dialect associated with \var{name}. An \exception{Error} is +raised if \var{name} is not a registered dialect name. +\end{funcdesc} + +\begin{funcdesc}{list_dialects}{} +Return the names of all registered dialects. +\end{funcdesc} + + +The \module{csv} module defines the following classes: + +\begin{classdesc}{DictReader}{csvfile, fieldnames\optional{, + restkey=\code{None}\optional{, + restval=\code{None}\optional{, + dialect=\code{'excel'}\optional{, + fmtparam}}}}} +Create an object which operates like a regular reader but maps the +information read into a dict whose keys are given by the \var{fieldnames} +parameter. If the row read has fewer fields than the fieldnames sequence, +the value of \var{restval} will be used as the default value. If the row +read has more fields than the fieldnames sequence, the remaining data is +added as a sequence keyed by the value of \var{restkey}. If the row read +has fewer fields than the fieldnames sequence, the remaining keys take the +value of the optiona \var{restval} parameter. All other parameters are +interpreted as for regular readers. +\end{classdesc} + + +\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{, + restval=""\optional{, + extrasaction=\code{'raise'}\optional{, + dialect=\code{'excel'}\optional{, fmtparam}}}}} +Create an object which operates like a regular writer but maps dictionaries +onto output rows. The \var{fieldnames} parameter identifies the order in +which values in the dictionary passed to the \method{writerow()} method are +written to the \var{csvfile}. The optional \var{restval} parameter +specifies the value to be written if the dictionary is missing a key in +\var{fieldnames}. If the dictionary passed to the \method{writerow()} +method contains a key not found in \var{fieldnames}, the optional +\var{extrasaction} parameter indicates what action to take. If it is set +to \code{'raise'} a \exception{ValueError} is raised. If it is set to +\code{'ignore'}, extra values in the dictionary are ignored. All other +parameters are interpreted as for regular writers. +\end{classdesc} + + +\begin{classdesc*}{Dialect}{} +The \class{Dialect} class is a container class relied on primarily for its +attributes, which are used to define the parameters for a specific +\class{reader} or \class{writer} instance. Dialect objects support the +following data attributes: + +\begin{memberdesc}[string]{delimiter} +A one-character string used to separate fields. It defaults to \code{","}. +\end{memberdesc} + +\begin{memberdesc}[boolean]{doublequote} +Controls how instances of \var{quotechar} appearing inside a field should be +themselves be quoted. When \constant{True}, the character is doubledd. +When \constant{False}, the \var{escapechar} must be a one-character string +which is used as a prefix to the \var{quotechar}. It defaults to +\constant{True}. +\end{memberdesc} + +\begin{memberdesc}{escapechar} +A one-character string used to escape the \var{delimiter} if \var{quoting} +is set to \constant{QUOTE_NONE}. It defaults to \constant{None}. +\end{memberdesc} + +\begin{memberdesc}[string]{lineterminator} +The string used to terminate lines in the CSV file. It defaults to +\code{"\e r\e n"}. +\end{memberdesc} + +\begin{memberdesc}[string]{quotechar} +A one-character string used to quote elements containing the \var{delimiter} +or which start with the \var{quotechar}. It defaults to \code{'"'}. +\end{memberdesc} + +\begin{memberdesc}[integer]{quoting} +Controls when quotes should be generated by the writer. It can take on any +of the \code{QUOTE_*} constants defined below and defaults to +\constant{QUOTE_MINIMAL}. +\end{memberdesc} + +\begin{memberdesc}[boolean]{skipinitialspace} +When \constant{True}, whitespace immediately following the \var{delimiter} +is ignored. The default is \constant{False}. +\end{memberdesc} + +\end{classdesc*} + +The \module{csv} module defines the following constants: + +\begin{datadesc}{QUOTE_ALWAYS} +Instructs \class{writer} objects to quote all fields. +\end{datadesc} + +\begin{datadesc}{QUOTE_MINIMAL} +Instructs \class{writer} objects to only quote those fields which contain +the current \var{delimiter} or begin with the current \var{quotechar}. +\end{datadesc} + +\begin{datadesc}{QUOTE_NONNUMERIC} +Instructs \class{writer} objects to quote all non-numeric fields. +\end{datadesc} + +\begin{datadesc}{QUOTE_NONE} +Instructs \class{writer} objects to never quote fields. When the current +\var{delimiter} occurs in output data it is preceded by the current +\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it +is an error not to have a single-character \var{escapechar} defined, even if +no data to be written contains the \var{delimiter} character. +\end{datadesc} + + +The \module{csv} module defines the following exception: + +\begin{excdesc}{Error} +Raised by any of the functions when an error is detected. +\end{excdesc} + + +\subsection{Dialects and Formatting Parameters\label{fmt-params}} + +To make it easier to specify the format of input and output records, +specific formatting parameters are grouped together into dialects. A +dialect is a subclass of the \class{Dialect} class having a set of specific +methods and a single \method{validate()} method. When creating \class{reader} +or \class{writer} objects, the programmer can specify a string or a subclass +of the \class{Dialect} class as the dialect parameter. In addition to, or +instead of, the \var{dialect} parameter, the programmer can also specify +individual formatting parameters, which have the same names as the +attributes defined above for the \class{Dialect} class. + + +\subsection{Reader Objects} + +\class{DictReader} and \var{reader} objects have the following public +methods: + +\begin{methoddesc}{next}{} +Return the next row of the reader's iterable object as a list, parsed +according to the current dialect. +\end{methoddesc} + + +\subsection{Writer Objects} + +\class{DictWriter} and \var{writer} objects have the following public +methods: + +\begin{methoddesc}{writerow}{row} +Write the \var{row} parameter to the writer's file object, formatted +according to the current dialect. +\end{methoddesc} + +\begin{methoddesc}{writerows}{rows} +Write all the \var{rows} parameters to the writer's file object, formatted +according to the current dialect. +\end{methoddesc} + + +\subsection{Examples} + +The ``Hello, world'' of csv reading is + +\begin{verbatim} + reader = csv.reader(file("some.csv")) + for row in reader: + print row +\end{verbatim} + +The corresponding simplest possible writing example is + +\begin{verbatim} + writer = csv.writer(file("some.csv", "w")) + for row in someiterable: + writer.writerow(row) +\end{verbatim} |