diff options
Diffstat (limited to 'Doc/library/csv.rst')
-rw-r--r-- | Doc/library/csv.rst | 150 |
1 files changed, 30 insertions, 120 deletions
diff --git a/Doc/library/csv.rst b/Doc/library/csv.rst index 46ef246..fd56e46 100644 --- a/Doc/library/csv.rst +++ b/Doc/library/csv.rst @@ -33,14 +33,6 @@ The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and write sequences. Programmers can also read and write data in dictionary form using the :class:`DictReader` and :class:`DictWriter` classes. -.. note:: - - This version of the :mod:`csv` module doesn't support Unicode input. Also, - there are currently some issues regarding ASCII NUL characters. Accordingly, - all input should be UTF-8 or printable ASCII to be safe; see the examples in - section :ref:`csv-examples`. These restrictions will be removed in the future. - - .. seealso:: :pep:`305` - CSV File API @@ -60,8 +52,8 @@ The :mod:`csv` module defines the following functions: Return a reader object which will iterate over lines in the given *csvfile*. *csvfile* can be any object which supports the :term:`iterator` protocol and returns a string each time its :meth:`next` method is called --- file objects and list - objects are both suitable. If *csvfile* is a file object, it must be opened - with the 'b' flag on platforms where that makes a difference. An optional + objects are both suitable. If *csvfile* is a file object, it should be opened + with ``newline=''``. [#]_ An optional *dialect* parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the :class:`Dialect` class or one of the strings returned by the @@ -71,20 +63,13 @@ The :mod:`csv` module defines the following functions: section :ref:`csv-fmt-params`. Each row read from the csv file is returned as a list of strings. No - automatic data type conversion is performed. - - The parser is quite strict with respect to multi-line quoted fields. Previously, - if a line ended within a quoted field without a terminating newline character, a - newline would be inserted into the returned field. This behavior caused problems - when reading files which contained carriage return characters within fields. - The behavior was changed to return the field without inserting newlines. As a - consequence, if newlines embedded within fields are important, the input should - be split into lines in a manner which preserves the newline characters. + automatic data type conversion is performed unless the ``QUOTE_NONNUMERIC`` format + option is specified (in which case unquoted fields are transformed into floats). A short usage example:: >>> import csv - >>> spamReader = csv.reader(open('eggs.csv'), delimiter=' ', quotechar='|') + >>> spamReader = csv.reader(open('eggs.csv', newline=''), delimiter=' ', quotechar='|') >>> for row in spamReader: ... print(', '.join(row)) Spam, Spam, Spam, Spam, Spam, Baked Beans @@ -95,8 +80,7 @@ The :mod:`csv` module defines the following functions: Return a writer object responsible for converting the user's data into delimited strings on the given file-like object. *csvfile* can be any object with a - :func:`write` method. If *csvfile* is a file object, it must be opened with the - 'b' flag on platforms where that makes a difference. An optional *dialect* + :func:`write` method. An optional *dialect* parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the :class:`Dialect` class or one of the strings returned by the @@ -270,7 +254,6 @@ The :mod:`csv` module defines the following exception: Raised by any of the functions when an error is detected. - .. _csv-fmt-params: Dialects and Formatting Parameters @@ -419,41 +402,52 @@ Examples The simplest example of reading a CSV file:: import csv - reader = csv.reader(open("some.csv", "rb")) + reader = csv.reader(open("some.csv", newline='')) for row in reader: print(row) Reading a file with an alternate format:: import csv - reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE) + reader = csv.reader(open("passwd"), delimiter=':', quoting=csv.QUOTE_NONE) for row in reader: print(row) The corresponding simplest possible writing example is:: import csv - writer = csv.writer(open("some.csv", "wb")) + writer = csv.writer(open("some.csv", "w")) writer.writerows(someiterable) +Since :func:`open` is used to open a CSV file for reading, the file +will by default be decoded into unicode using the system default +encoding (see :func:`locale.getpreferredencoding`). To decode a file +using a different encoding, use the ``encoding`` argument of open:: + + import csv + reader = csv.reader(open("some.csv", newline='', encoding='utf-8')) + for row in reader: + print(row) + +The same applies to writing in something other than the system default +encoding: specify the encoding argument when opening the output file. + Registering a new dialect:: import csv - csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) - - reader = csv.reader(open("passwd", "rb"), 'unixpwd') + reader = csv.reader(open("passwd"), 'unixpwd') A slightly more advanced use of the reader --- catching and reporting errors:: import csv, sys filename = "some.csv" - reader = csv.reader(open(filename, "rb")) + reader = csv.reader(open(filename, newline='')) try: for row in reader: print(row) except csv.Error as e: - sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e)) + sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e)) And while the module doesn't directly support parsing strings, it can easily be done:: @@ -462,94 +456,10 @@ done:: for row in csv.reader(['one,two,three']): print(row) -The :mod:`csv` module doesn't directly support reading and writing Unicode, but -it is 8-bit-clean save for some problems with ASCII NUL characters. So you can -write functions or classes that handle the encoding and decoding for you as long -as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended. - -:func:`unicode_csv_reader` below is a :term:`generator` that wraps :class:`csv.reader` -to handle Unicode CSV data (a list of Unicode strings). :func:`utf_8_encoder` -is a :term:`generator` that encodes the Unicode strings as UTF-8, one string (or row) at -a time. The encoded strings are parsed by the CSV reader, and -:func:`unicode_csv_reader` decodes the UTF-8-encoded cells back into Unicode:: - - import csv - def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs): - # csv.py doesn't do Unicode; encode temporarily as UTF-8: - csv_reader = csv.reader(utf_8_encoder(unicode_csv_data), - dialect=dialect, **kwargs) - for row in csv_reader: - # decode UTF-8 back to Unicode, cell by cell: - yield [unicode(cell, 'utf-8') for cell in row] - - def utf_8_encoder(unicode_csv_data): - for line in unicode_csv_data: - yield line.encode('utf-8') - -For all other encodings the following :class:`UnicodeReader` and -:class:`UnicodeWriter` classes can be used. They take an additional *encoding* -parameter in their constructor and make sure that the data passes the real -reader or writer encoded as UTF-8:: - - import csv, codecs, io - - class UTF8Recoder: - """ - Iterator that reads an encoded stream and reencodes the input to UTF-8 - """ - def __init__(self, f, encoding): - self.reader = codecs.getreader(encoding)(f) - - def __iter__(self): - return self - - def __next__(self): - return next(self.reader).encode("utf-8") - - class UnicodeReader: - """ - A CSV reader which will iterate over lines in the CSV file "f", - which is encoded in the given encoding. - """ - - def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): - f = UTF8Recoder(f, encoding) - self.reader = csv.reader(f, dialect=dialect, **kwds) - - def __next__(self): - row = next(self.reader) - return [unicode(s, "utf-8") for s in row] - - def __iter__(self): - return self - - class UnicodeWriter: - """ - A CSV writer which will write rows to CSV file "f", - which is encoded in the given encoding. - """ - - def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): - # Redirect output to a queue - self.queue = io.StringIO() - self.writer = csv.writer(self.queue, dialect=dialect, **kwds) - self.stream = f - self.encoder = codecs.getincrementalencoder(encoding)() - - def writerow(self, row): - self.writer.writerow([s.encode("utf-8") for s in row]) - # Fetch UTF-8 output from the queue ... - data = self.queue.getvalue() - data = data.decode("utf-8") - # ... and reencode it into the target encoding - data = self.encoder.encode(data) - # write to the target stream - self.stream.write(data) - # empty queue - self.queue.truncate(0) - - def writerows(self, rows): - for row in rows: - self.writerow(row) +.. rubric:: Footnotes +.. [#] If ``newline=''`` is not specified, newlines embedded inside quoted fields + will not be interpreted correctly. It should always be safe to specify + ``newline=''``, since the csv module does its own universal newline handling + on input. |