diff options
Diffstat (limited to 'Doc/library/io.rst')
-rw-r--r-- | Doc/library/io.rst | 214 |
1 files changed, 162 insertions, 52 deletions
diff --git a/Doc/library/io.rst b/Doc/library/io.rst index 96a4970..4a13d54 100644 --- a/Doc/library/io.rst +++ b/Doc/library/io.rst @@ -11,44 +11,90 @@ .. moduleauthor:: Benjamin Peterson <benjamin@python.org> .. sectionauthor:: Benjamin Peterson <benjamin@python.org> -The :mod:`io` module provides the Python interfaces to stream handling. The -built-in :func:`open` function is defined in this module. +Overview +-------- -At the top of the I/O hierarchy is the abstract base class :class:`IOBase`. It -defines the basic interface to a stream. Note, however, that there is no -separation between reading and writing to streams; implementations are allowed -to raise an :exc:`IOError` if they do not support a given operation. +The :mod:`io` module provides Python 3's main facilities for dealing for +various types of I/O. Three main types of I/O are defined: *text I/O*, +*binary I/O*, *raw I/O*. It should be noted that these are generic categories, +and various backing stores can be used for each of them. Concrete objects +belonging to any of these categories will often be called *streams*; another +common term is *file-like objects*. -Extending :class:`IOBase` is :class:`RawIOBase` which deals simply with the -reading and writing of raw bytes to a stream. :class:`FileIO` subclasses -:class:`RawIOBase` to provide an interface to files in the machine's -file system. +Independently of its category, each concrete stream object will also have +various capabilities: it can be read-only, write-only, or read-write; it +can also allow arbitrary random access (seeking forwards or backwards to +any location), or only sequential access (for example in the case of a +socket or pipe). -:class:`BufferedIOBase` deals with buffering on a raw byte stream -(:class:`RawIOBase`). Its subclasses, :class:`BufferedWriter`, -:class:`BufferedReader`, and :class:`BufferedRWPair` buffer streams that are -readable, writable, and both readable and writable. -:class:`BufferedRandom` provides a buffered interface to random access -streams. :class:`BytesIO` is a simple stream of in-memory bytes. +All streams are careful about the type of data you give to them. For example +giving a :class:`str` object to the ``write()`` method of a binary stream +will raise a ``TypeError``. So will giving a :class:`bytes` object to the +``write()`` method of a text stream. -Another :class:`IOBase` subclass, :class:`TextIOBase`, deals with -streams whose bytes represent text, and handles encoding and decoding -from and to strings. :class:`TextIOWrapper`, which extends it, is a -buffered text interface to a buffered raw stream -(:class:`BufferedIOBase`). Finally, :class:`StringIO` is an in-memory -stream for text. +Text I/O +^^^^^^^^ -Argument names are not part of the specification, and only the arguments of -:func:`.open` are intended to be used as keyword arguments. +Text I/O expects and produces :class:`str` objects. This means that, +whenever the backing store is natively made of bytes (such as in the case +of a file), encoding and decoding of data is made transparently, as well as, +optionally, translation of platform-specific newline characters. -.. seealso:: - :mod:`sys` - contains the standard IO streams: :data:`sys.stdin`, :data:`sys.stdout`, - and :data:`sys.stderr`. +A way to create a text stream is to :meth:`open()` a file in text mode, +optionally specifying an encoding:: + + f = open("myfile.txt", "r", encoding="utf-8") + +In-memory text streams are also available as :class:`StringIO` objects:: + + f = io.StringIO("some initial text data") + +The detailed API of text streams is described by the :class:`TextIOBase` +class. + +.. note:: + Text I/O over a binary storage (such as a file) is significantly + slower than binary I/O over the same storage. This can become noticeable + if you handle huge amounts of text data (for example very large log files). + +Binary I/O +^^^^^^^^^^ + +Binary I/O (also called *buffered I/O*) expects and produces +:class:`bytes` objects. No encoding, decoding or character translation +is performed. This is the category of streams used for all kinds of non-text +data, and also when manual control over the handling of text data is desired. + +A way to create a binary stream is to :meth:`open()` a file in binary mode:: + + f = open("myfile.jpg", "rb") + +In-memory binary streams are also available as :class:`BytesIO` objects:: + + f = io.BytesIO(b"some initial binary data: \x00\x01") + +The detailed API of binary streams is described by the :class:`BufferedIOBase` +class. + +Other library modules may provide additional ways to create text or binary +streams. See for example :meth:`socket.socket.makefile`. + +Raw I/O +^^^^^^^ + +Raw I/O (also called *unbuffered I/O*) is generally used as a low-level +building-block for binary and text streams; it is rarely useful to directly +manipulate a raw stream from user code. Nevertheless, you can for example +create a raw stream by opening a file in binary mode with buffering disabled:: + + f = open("myfile.jpg", "rb", buffering=0) + +The detailed API of raw streams is described by the :class:`RawIOBase` +class. -Module Interface ----------------- +High-level Module Interface +--------------------------- .. data:: DEFAULT_BUFFER_SIZE @@ -89,17 +135,22 @@ Module Interface not be used in new code) ========= =============================================================== - The default mode is ``'rt'`` (open for reading text). For binary random - access, the mode ``'w+b'`` opens and truncates the file to 0 bytes, while - ``'r+b'`` opens the file without truncation. + The default mode is ``'r'`` (open for reading text, synonym of ``'rt'``). + For binary read-write access, the mode ``'w+b'`` opens and truncates the + file to 0 bytes, while ``'r+b'`` opens the file without truncation. - Python distinguishes between files opened in binary and text modes, even when - the underlying operating system doesn't. Files opened in binary mode - (including ``'b'`` in the *mode* argument) return contents as ``bytes`` - objects without any decoding. In text mode (the default, or when ``'t'`` is - included in the *mode* argument), the contents of the file are returned as - strings, the bytes having been first decoded using a platform-dependent - encoding or using the specified *encoding* if given. + As mentioned in the `overview`_, Python distinguishes between binary + and text I/O. Files opened in binary mode (including ``'b'`` in the + *mode* argument) return contents as :class:`bytes` objects without + any decoding. In text mode (the default, or when ``'t'`` + is included in the *mode* argument), the contents of the file are + returned as strings, the bytes having been first decoded using a + platform-dependent encoding or using the specified *encoding* if given. + + .. note:: + Python doesn't depend on the underlying operating system's notion + of text files; all the the processing is done by Python itself, and + is therefore platform-independent. *buffering* is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select @@ -168,11 +219,6 @@ Module Interface :class:`BufferedRandom`. When buffering is disabled, the raw stream, a subclass of :class:`RawIOBase`, :class:`FileIO`, is returned. - It is also possible to use a string or bytearray as a file for both reading - and writing. For strings :class:`StringIO` can be used like a file opened in - a text mode, and for bytearrays a :class:`BytesIO` can be used like a - file opened in a binary mode. - .. exception:: BlockingIOError @@ -194,8 +240,67 @@ Module Interface when an unsupported operation is called on a stream. +In-memory streams +^^^^^^^^^^^^^^^^^ + +It is also possible to use a :class:`str` or :class:`bytes`-like object as a +file for both reading and writing. For strings :class:`StringIO` can be +used like a file opened in text mode, and :class:`BytesIO` can be used like +a file opened in binary mode. Both provide full read-write capabilities +with random access. + + +.. seealso:: + :mod:`sys` + contains the standard IO streams: :data:`sys.stdin`, :data:`sys.stdout`, + and :data:`sys.stderr`. + + +Class hierarchy +--------------- + +The implementation of I/O streams is organized as a hierarchy of classes. +First :term:`abstract base classes <abstract base class>` (ABCs), which are used to specify the +various categories of streams, then concrete classes providing the standard +stream implementations. + + .. note:: + The abstract base classes also provide default implementations of + some methods in order to help implementation of concrete stream + classes. For example, :class:`BufferedIOBase` provides + unoptimized implementations of ``readinto()`` and ``readline()``. + +At the top of the I/O hierarchy is the abstract base class :class:`IOBase`. It +defines the basic interface to a stream. Note, however, that there is no +separation between reading and writing to streams; implementations are allowed +to raise an :exc:`UnsupportedOperation` if they do not support a given +operation. + +Extending :class:`IOBase` is the :class:`RawIOBase` ABC which deals simply +with the reading and writing of raw bytes to a stream. :class:`FileIO` +subclasses :class:`RawIOBase` to provide an interface to files in the +machine's file system. + +The :class:`BufferedIOBase` ABC deals with buffering on a raw byte stream +(:class:`RawIOBase`). Its subclasses, :class:`BufferedWriter`, +:class:`BufferedReader`, and :class:`BufferedRWPair` buffer streams that are +readable, writable, and both readable and writable. +:class:`BufferedRandom` provides a buffered interface to random access +streams. :class:`BytesIO` is a simple stream of in-memory bytes. + +Another :class:`IOBase` subclass, the :class:`TextIOBase` ABC, deals with +streams whose bytes represent text, and handles encoding and decoding +from and to strings. :class:`TextIOWrapper`, which extends it, is a +buffered text interface to a buffered raw stream +(:class:`BufferedIOBase`). Finally, :class:`StringIO` is an in-memory +stream for text. + +Argument names are not part of the specification, and only the arguments of +:func:`.open` are intended to be used as keyword arguments. + + I/O Base Classes ----------------- +^^^^^^^^^^^^^^^^ .. class:: IOBase @@ -467,7 +572,7 @@ I/O Base Classes Raw File I/O ------------- +^^^^^^^^^^^^ .. class:: FileIO(name, mode='r', closefd=True) @@ -505,7 +610,7 @@ Raw File I/O Buffered Streams ----------------- +^^^^^^^^^^^^^^^^ In many situations, buffered I/O streams will provide higher performance (bandwidth and latency) than raw I/O streams. Their API is also more usable. @@ -515,7 +620,7 @@ In many situations, buffered I/O streams will provide higher performance A stream implementation using an in-memory bytes buffer. It inherits :class:`BufferedIOBase`. - The argument *initial_bytes* is an optional initial bytearray. + The argument *initial_bytes* contains optional initial :class:`bytes` data. :class:`BytesIO` provides or overrides these methods in addition to those from :class:`BufferedIOBase` and :class:`IOBase`: @@ -632,7 +737,7 @@ In many situations, buffered I/O streams will provide higher performance Text I/O --------- +^^^^^^^^ .. class:: TextIOBase @@ -736,14 +841,14 @@ Text I/O .. class:: StringIO(initial_value='', newline=None) - An in-memory stream for text. It inherits :class:`TextIOWrapper`. + An in-memory stream for text I/O. The initial value of the buffer (an empty string by default) can be set by providing *initial_value*. The *newline* argument works like that of :class:`TextIOWrapper`. The default is to do no newline translation. :class:`StringIO` provides this method in addition to those from - :class:`TextIOWrapper` and its parents: + :class:`TextIOBase` and its parents: .. method:: getvalue() @@ -767,6 +872,11 @@ Text I/O # .getvalue() will now raise an exception. output.close() + .. note:: + :class:`StringIO` uses a native text storage and doesn't suffer from + the performance issues of other text streams, such as those based on + :class:`TextIOWrapper`. + .. class:: IncrementalNewlineDecoder A helper codec that decodes newlines for universal newlines mode. It |