diff options
author | Benjamin Peterson <benjamin@python.org> | 2009-03-04 00:14:51 (GMT) |
---|---|---|
committer | Benjamin Peterson <benjamin@python.org> | 2009-03-04 00:14:51 (GMT) |
commit | 4fa88fa0ba35e25ad9be66ebbdaba9aca553dc8b (patch) | |
tree | 8486abb218ed5bb1f7713416fb64104f7e2537b0 /Lib/_pyio.py | |
parent | 03ad812435f51261ba78bc5920d254ea0e123b37 (diff) | |
download | cpython-4fa88fa0ba35e25ad9be66ebbdaba9aca553dc8b.zip cpython-4fa88fa0ba35e25ad9be66ebbdaba9aca553dc8b.tar.gz cpython-4fa88fa0ba35e25ad9be66ebbdaba9aca553dc8b.tar.bz2 |
merge the io-c branch: C implementation of the io module
The main io module now uses the C implementation. The Python one still exists
in Lib/_pyio.py for ease of testing new features and usefulness to other
implementers.
The rewrite was done by Antoine Pitrou and Amaury Forgeot d'Arc. I was slightly
helpful at the end. :)
Following are the log messages from the io-c branch:
Merged revisions 68683-68685,68687-68689,68693,68704,68741-68743,68745,68747,68752-68754,68756,68758,68812,68816-68817,68820-68822,68824-68825,68828,68876-68877,69037,69044,69104,69115,69194,69626-69629,69636,69638,69641-69642,69644-69654,69656-69661,69671,69677,69812-69815,69817,69827-69830,69839,69841-69845,69848,69850,69852,69854,69860,69865-69866,69868,69872-69873,69885,69888,69891-69893,69911,69913-69916,69963,70033,70035,70038,70041-70048,70067-70070,70075,70112,70133,70135,70140 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/io-c
........
r68683 | antoine.pitrou | 2009-01-17 17:13:48 -0600 (Sat, 17 Jan 2009) | 3 lines
Merge in changes from the io-c sandbox. Tests will follow in separate commits.
........
r68684 | antoine.pitrou | 2009-01-17 17:17:26 -0600 (Sat, 17 Jan 2009) | 3 lines
Fixes and additions to test_io.py
........
r68685 | antoine.pitrou | 2009-01-17 17:22:04 -0600 (Sat, 17 Jan 2009) | 1 line
Fix test_fileio
........
r68687 | antoine.pitrou | 2009-01-17 17:35:11 -0600 (Sat, 17 Jan 2009) | 3 lines
Add dependency to _iomodule.h for the various C sources
........
r68688 | antoine.pitrou | 2009-01-17 17:38:18 -0600 (Sat, 17 Jan 2009) | 3 lines
These precautions are not needed anymore!
........
r68689 | antoine.pitrou | 2009-01-17 17:41:48 -0600 (Sat, 17 Jan 2009) | 3 lines
Fix another test
........
r68693 | antoine.pitrou | 2009-01-17 17:49:58 -0600 (Sat, 17 Jan 2009) | 3 lines
Fix test_uu (which was using private attributes of TextIOWrapper)
........
r68704 | antoine.pitrou | 2009-01-17 18:45:29 -0600 (Sat, 17 Jan 2009) | 3 lines
Most io sources are Py_ssize_t-clean (I don't know about bytesio and stringio)
........
r68741 | antoine.pitrou | 2009-01-18 15:20:30 -0600 (Sun, 18 Jan 2009) | 3 lines
Check return type in TextIOWrapper.__next__
........
r68742 | antoine.pitrou | 2009-01-18 15:28:48 -0600 (Sun, 18 Jan 2009) | 4 lines
Make binary buffered readline and iteration much faster
(8x as fast as the IOBase generic implementation)
........
r68743 | antoine.pitrou | 2009-01-18 15:47:47 -0600 (Sun, 18 Jan 2009) | 3 lines
Reinsert test_io_after_close (was removed by mistake)
........
r68745 | antoine.pitrou | 2009-01-18 16:16:06 -0600 (Sun, 18 Jan 2009) | 3 lines
Add read, read1 and write methods to BufferedIOBase
........
r68747 | antoine.pitrou | 2009-01-18 16:35:58 -0600 (Sun, 18 Jan 2009) | 3 lines
Kill test failure
........
r68752 | amaury.forgeotdarc | 2009-01-18 17:05:43 -0600 (Sun, 18 Jan 2009) | 3 lines
Fix a segfault when e.g a BufferedReader is created with a FileIO in
read mode.
........
r68753 | antoine.pitrou | 2009-01-18 17:13:09 -0600 (Sun, 18 Jan 2009) | 3 lines
Add truncate() to text IO objects
........
r68754 | antoine.pitrou | 2009-01-18 17:51:08 -0600 (Sun, 18 Jan 2009) | 3 lines
Remove IOBase.__del__ and replace it with custom code with tp_dealloc
........
r68756 | antoine.pitrou | 2009-01-18 18:10:16 -0600 (Sun, 18 Jan 2009) | 3 lines
Remove irrelevant comment.
........
r68758 | antoine.pitrou | 2009-01-18 18:36:16 -0600 (Sun, 18 Jan 2009) | 3 lines
in importlib:_fileio._FileIO -> _io.FileIO
........
r68812 | antoine.pitrou | 2009-01-20 14:15:51 -0600 (Tue, 20 Jan 2009) | 3 lines
Add garbage collection support to FileIO objects
........
r68816 | antoine.pitrou | 2009-01-20 14:56:28 -0600 (Tue, 20 Jan 2009) | 3 lines
Add GC support to Buffered and Text IO objects
........
r68817 | antoine.pitrou | 2009-01-20 15:19:45 -0600 (Tue, 20 Jan 2009) | 3 lines
Add some file headers
........
r68820 | antoine.pitrou | 2009-01-20 15:29:59 -0600 (Tue, 20 Jan 2009) | 3 lines
Add class TextIOBase
........
r68821 | antoine.pitrou | 2009-01-20 15:36:16 -0600 (Tue, 20 Jan 2009) | 3 lines
Add properties to TextIOBase
........
r68822 | antoine.pitrou | 2009-01-20 15:41:19 -0600 (Tue, 20 Jan 2009) | 3 lines
Disable the pure Python TextIOBase class, and inject C the implementation instead
........
r68824 | antoine.pitrou | 2009-01-20 16:36:28 -0600 (Tue, 20 Jan 2009) | 3 lines
Fix two leaks
........
r68825 | antoine.pitrou | 2009-01-20 16:38:29 -0600 (Tue, 20 Jan 2009) | 3 lines
FileIO.name is just a plain attribute, we can set it directly
........
r68828 | antoine.pitrou | 2009-01-20 17:06:33 -0600 (Tue, 20 Jan 2009) | 3 lines
Speed up closed checks on text IO objects. Good for a 25% speedup on small ops.
........
r68876 | antoine.pitrou | 2009-01-23 17:01:25 -0600 (Fri, 23 Jan 2009) | 3 lines
Two typos
........
r68877 | antoine.pitrou | 2009-01-23 18:13:20 -0600 (Fri, 23 Jan 2009) | 3 lines
Remove two unused functions
........
r69037 | amaury.forgeotdarc | 2009-01-27 17:10:25 -0600 (Tue, 27 Jan 2009) | 2 lines
Update the win32 project files
........
r69044 | antoine.pitrou | 2009-01-27 18:51:07 -0600 (Tue, 27 Jan 2009) | 3 lines
Improve heuristic in IncrementalNewlineDecoder + some micro-optimizations
........
r69104 | antoine.pitrou | 2009-01-29 15:23:42 -0600 (Thu, 29 Jan 2009) | 3 lines
Fix some crashers found by Victor
........
r69115 | hirokazu.yamamoto | 2009-01-29 20:36:28 -0600 (Thu, 29 Jan 2009) | 1 line
Updated VC6 project file.
........
r69194 | antoine.pitrou | 2009-02-01 16:57:18 -0600 (Sun, 01 Feb 2009) | 3 lines
Fix downcasting warnings in 32-bit mode with 64-bit offsets (Windows)
........
r69626 | benjamin.peterson | 2009-02-14 17:33:34 -0600 (Sat, 14 Feb 2009) | 1 line
only catch AttributeError and UnsupportedOperation
........
r69627 | benjamin.peterson | 2009-02-14 21:35:28 -0600 (Sat, 14 Feb 2009) | 1 line
give the IO module its own state and store the os and locale modules in it
........
r69628 | benjamin.peterson | 2009-02-14 22:08:32 -0600 (Sat, 14 Feb 2009) | 1 line
put interned strings in the module state structure
........
r69629 | benjamin.peterson | 2009-02-14 22:15:29 -0600 (Sat, 14 Feb 2009) | 1 line
put UnsupportedOperation in the module state
........
r69636 | benjamin.peterson | 2009-02-15 08:31:42 -0600 (Sun, 15 Feb 2009) | 1 line
dealloc unsupported_operation
........
r69638 | benjamin.peterson | 2009-02-15 09:24:45 -0600 (Sun, 15 Feb 2009) | 1 line
actually test the C implementation
........
r69641 | benjamin.peterson | 2009-02-15 10:12:37 -0600 (Sun, 15 Feb 2009) | 5 lines
make interned strings globals again ;(
putting them in the module state was asking for trouble when the module
was dealloced before the classes in it were
........
r69642 | benjamin.peterson | 2009-02-15 10:19:45 -0600 (Sun, 15 Feb 2009) | 1 line
actually test the python implementations
........
r69644 | antoine.pitrou | 2009-02-15 11:59:30 -0600 (Sun, 15 Feb 2009) | 3 lines
Fix memory leak in destructor when a Python class inherits from IOBase (or an IOBase-derived type)
........
r69645 | antoine.pitrou | 2009-02-15 12:23:26 -0600 (Sun, 15 Feb 2009) | 3 lines
Add a warning about the embarassing state of IOBase finalization
........
r69646 | antoine.pitrou | 2009-02-15 13:14:42 -0600 (Sun, 15 Feb 2009) | 3 lines
Fix opening of 8-bit filenames with FileIO
........
r69647 | antoine.pitrou | 2009-02-15 13:20:22 -0600 (Sun, 15 Feb 2009) | 3 lines
Fix leak in FileIO constructor
........
r69648 | antoine.pitrou | 2009-02-15 13:58:16 -0600 (Sun, 15 Feb 2009) | 3 lines
Fix some refleaks
........
r69649 | antoine.pitrou | 2009-02-15 14:05:13 -0600 (Sun, 15 Feb 2009) | 3 lines
Fix a leak in IOBase.writelines
........
r69650 | antoine.pitrou | 2009-02-15 14:11:56 -0600 (Sun, 15 Feb 2009) | 3 lines
Fix memory leak in BufferedWriter.truncate
........
r69651 | antoine.pitrou | 2009-02-15 14:25:34 -0600 (Sun, 15 Feb 2009) | 3 lines
Fix a leak in TextIOWrapper.seek
........
r69652 | antoine.pitrou | 2009-02-15 14:26:28 -0600 (Sun, 15 Feb 2009) | 3 lines
Unify implementations of truncate for buffered objects
........
r69653 | antoine.pitrou | 2009-02-15 15:15:15 -0600 (Sun, 15 Feb 2009) | 3 lines
Fix more leaks in TextIOWrapper
........
r69654 | antoine.pitrou | 2009-02-15 15:21:57 -0600 (Sun, 15 Feb 2009) | 3 lines
Smaller chunk size for a faster test
........
r69656 | benjamin.peterson | 2009-02-15 17:29:48 -0600 (Sun, 15 Feb 2009) | 1 line
braces make this much clearer
........
r69657 | benjamin.peterson | 2009-02-15 17:46:07 -0600 (Sun, 15 Feb 2009) | 1 line
use the correct macro
........
r69658 | antoine.pitrou | 2009-02-15 19:38:59 -0600 (Sun, 15 Feb 2009) | 5 lines
Fix crash in test_urllib2_localnet in debug mode. It was due to an HTTPResponse
object being revived when calling its close() method in IOBase's tp_dealloc.
_PyIOBase_finalize() starts looking scary...
........
r69659 | benjamin.peterson | 2009-02-15 20:55:48 -0600 (Sun, 15 Feb 2009) | 1 line
fix segfault on initialization failing
........
r69660 | benjamin.peterson | 2009-02-15 21:09:31 -0600 (Sun, 15 Feb 2009) | 1 line
apparently locale.getprefferedencoding() can raise a ImportError, too
........
r69661 | benjamin.peterson | 2009-02-15 21:54:15 -0600 (Sun, 15 Feb 2009) | 1 line
it's amazing this worked at all; I was using the wrong structs!
........
r69671 | benjamin.peterson | 2009-02-16 08:38:27 -0600 (Mon, 16 Feb 2009) | 1 line
add garbage collection support to bytesio
........
r69677 | benjamin.peterson | 2009-02-16 10:31:03 -0600 (Mon, 16 Feb 2009) | 5 lines
reduce ImportError catching code duplication
I'm not sure this makes the code clearer with its new gotos, but
at least I added a big fat comment
........
r69812 | antoine.pitrou | 2009-02-20 13:50:16 -0600 (Fri, 20 Feb 2009) | 3 lines
_StringIO now belongs to the _io modules, rather to its own _stringio module
........
r69813 | antoine.pitrou | 2009-02-20 13:58:22 -0600 (Fri, 20 Feb 2009) | 3 lines
Add a test for StringIO properties
........
r69814 | antoine.pitrou | 2009-02-20 14:06:03 -0600 (Fri, 20 Feb 2009) | 3 lines
Reimplement a few trivial StringIO functions and properties in C
........
r69815 | antoine.pitrou | 2009-02-20 14:13:11 -0600 (Fri, 20 Feb 2009) | 3 lines
Add the line_buffering property to TextIOWrapper, and test for it
........
r69817 | antoine.pitrou | 2009-02-20 14:45:50 -0600 (Fri, 20 Feb 2009) | 4 lines
Allow IncrementalNewlineDecoder to take unicode objects as decoding input if the decoder parameter is None
This will help rewriting StringIO to C
........
r69827 | antoine.pitrou | 2009-02-20 19:00:30 -0600 (Fri, 20 Feb 2009) | 3 lines
Rewrite most of StringIO in C. Some almost empty stubs remain to be converted.
........
r69828 | antoine.pitrou | 2009-02-20 19:09:25 -0600 (Fri, 20 Feb 2009) | 3 lines
Plug a leak, and remove an unused string
........
r69829 | benjamin.peterson | 2009-02-20 20:02:28 -0600 (Fri, 20 Feb 2009) | 1 line
this assertions makes more sense here
........
r69830 | benjamin.peterson | 2009-02-20 20:03:04 -0600 (Fri, 20 Feb 2009) | 1 line
PyModule_AddObject can fail; simplify this code with a macro
........
r69839 | antoine.pitrou | 2009-02-21 12:54:01 -0600 (Sat, 21 Feb 2009) | 3 lines
StringIO is now written entirely in C (and blazingly fast)
........
r69841 | benjamin.peterson | 2009-02-21 14:05:40 -0600 (Sat, 21 Feb 2009) | 1 line
split the Python implementation of io into another module and rewrite the tests to test both implementations
........
r69842 | benjamin.peterson | 2009-02-21 14:10:00 -0600 (Sat, 21 Feb 2009) | 1 line
closed is not a function
........
r69843 | benjamin.peterson | 2009-02-21 14:13:04 -0600 (Sat, 21 Feb 2009) | 1 line
fix __all__ test
........
r69844 | benjamin.peterson | 2009-02-21 14:21:24 -0600 (Sat, 21 Feb 2009) | 1 line
fix the rest of the Misc tests
........
r69845 | benjamin.peterson | 2009-02-21 14:26:59 -0600 (Sat, 21 Feb 2009) | 1 line
RawIOBase is better for FileIO
........
r69848 | benjamin.peterson | 2009-02-21 15:33:53 -0600 (Sat, 21 Feb 2009) | 1 line
fix some more tests broken by bag argument validation
........
r69850 | benjamin.peterson | 2009-02-21 16:16:42 -0600 (Sat, 21 Feb 2009) | 1 line
make the python IncrementalNewineDecoder support a None decoder
........
r69852 | benjamin.peterson | 2009-02-21 16:36:09 -0600 (Sat, 21 Feb 2009) | 1 line
fix a BlockingIOError.characters_written bug
........
r69854 | benjamin.peterson | 2009-02-21 16:49:02 -0600 (Sat, 21 Feb 2009) | 1 line
check whence
........
r69860 | benjamin.peterson | 2009-02-21 17:42:50 -0600 (Sat, 21 Feb 2009) | 1 line
fix some of these Misbehaving io tests
........
r69865 | benjamin.peterson | 2009-02-21 18:59:52 -0600 (Sat, 21 Feb 2009) | 1 line
don't use super here()
........
r69866 | benjamin.peterson | 2009-02-21 19:05:28 -0600 (Sat, 21 Feb 2009) | 1 line
use implementation specific classes
........
r69868 | benjamin.peterson | 2009-02-21 22:12:05 -0600 (Sat, 21 Feb 2009) | 1 line
use a more DRY friendly approach to injecting module contents into test classes
........
r69872 | antoine.pitrou | 2009-02-22 13:39:45 -0600 (Sun, 22 Feb 2009) | 3 lines
Sanitize destructor behaviour of IOBase. Now Python-defined attributes can be accessed from close().
........
r69873 | antoine.pitrou | 2009-02-22 13:50:14 -0600 (Sun, 22 Feb 2009) | 4 lines
Only set the internal fd after it has been checked to be valid
(otherwise, the destructor will attempt to close it)
........
r69885 | benjamin.peterson | 2009-02-22 15:30:14 -0600 (Sun, 22 Feb 2009) | 1 line
convert some other tests to use both io implementations
........
r69888 | antoine.pitrou | 2009-02-22 17:03:16 -0600 (Sun, 22 Feb 2009) | 3 lines
Silence all exceptions when finalizing
........
r69891 | benjamin.peterson | 2009-02-22 17:27:24 -0600 (Sun, 22 Feb 2009) | 1 line
convert another test to test both io implementations
........
r69892 | benjamin.peterson | 2009-02-22 17:32:15 -0600 (Sun, 22 Feb 2009) | 1 line
help poor people like me to find their io tests (did I miss any?)
........
r69893 | benjamin.peterson | 2009-02-22 17:37:56 -0600 (Sun, 22 Feb 2009) | 1 line
put a big note in the test telling people to write tests for both implementations now
........
r69911 | antoine.pitrou | 2009-02-23 13:57:18 -0600 (Mon, 23 Feb 2009) | 3 lines
expose DEFAULT_BUFFER_SIZE again (fixes a bunch of test failures)
........
r69913 | antoine.pitrou | 2009-02-23 14:10:30 -0600 (Mon, 23 Feb 2009) | 4 lines
Do the cyclic garbage collection tests only on the C version.
The Python version is helpless as it uses __del__.
........
r69914 | antoine.pitrou | 2009-02-23 14:21:41 -0600 (Mon, 23 Feb 2009) | 3 lines
Adapt test_largefile to test both implementations
........
r69915 | antoine.pitrou | 2009-02-23 14:25:14 -0600 (Mon, 23 Feb 2009) | 3 lines
One small failure
........
r69916 | antoine.pitrou | 2009-02-23 14:28:33 -0600 (Mon, 23 Feb 2009) | 3 lines
Add a comment, at BP's request
........
r69963 | antoine.pitrou | 2009-02-25 09:42:59 -0600 (Wed, 25 Feb 2009) | 3 lines
Add a test of ABC inheritance
........
r70033 | antoine.pitrou | 2009-02-27 15:49:50 -0600 (Fri, 27 Feb 2009) | 3 lines
The base classes now are ABCs.
........
r70035 | benjamin.peterson | 2009-02-27 15:57:41 -0600 (Fri, 27 Feb 2009) | 1 line
good house keeping
........
r70038 | antoine.pitrou | 2009-02-27 17:05:23 -0600 (Fri, 27 Feb 2009) | 4 lines
Make the buffer allocation overflow tests specific to the C implementation, since the Python implementation resizes its buffers when needed rather than allocating them up front.
........
r70041 | benjamin.peterson | 2009-02-27 18:26:12 -0600 (Fri, 27 Feb 2009) | 1 line
kill java naming for sanity
........
r70042 | benjamin.peterson | 2009-02-27 18:28:53 -0600 (Fri, 27 Feb 2009) | 2 lines
timingTest is superseded by iobench
........
r70043 | antoine.pitrou | 2009-02-27 19:13:50 -0600 (Fri, 27 Feb 2009) | 3 lines
Remove the last traces of java naming in test_io
........
r70044 | antoine.pitrou | 2009-02-27 19:18:34 -0600 (Fri, 27 Feb 2009) | 3 lines
Better resource cleanup
........
r70045 | antoine.pitrou | 2009-02-27 19:29:00 -0600 (Fri, 27 Feb 2009) | 3 lines
Remove dubious uses of super(), and fix one test
........
r70046 | antoine.pitrou | 2009-02-27 19:31:00 -0600 (Fri, 27 Feb 2009) | 3 lines
Bump up CHUNK_SIZE (no need to make the Python version look slower than it is)
........
r70047 | benjamin.peterson | 2009-02-27 20:03:26 -0600 (Fri, 27 Feb 2009) | 1 line
fix typo
........
r70048 | benjamin.peterson | 2009-02-27 21:35:11 -0600 (Fri, 27 Feb 2009) | 1 line
move code to a better place
........
r70067 | benjamin.peterson | 2009-02-28 10:43:20 -0600 (Sat, 28 Feb 2009) | 4 lines
1. make sure to undo buffered read aheads in BufferedRandom.seek()
2. refill the buffer if have <= 0
3. fix the last failing test_io test!
........
r70068 | benjamin.peterson | 2009-02-28 10:57:50 -0600 (Sat, 28 Feb 2009) | 1 line
define read1() on the python implementation's BufferedIOBase
........
r70069 | benjamin.peterson | 2009-02-28 11:01:17 -0600 (Sat, 28 Feb 2009) | 1 line
document read1() in BufferedIOBase
........
r70070 | benjamin.peterson | 2009-02-28 11:06:42 -0600 (Sat, 28 Feb 2009) | 1 line
give credit where credit is due
........
r70075 | antoine.pitrou | 2009-02-28 13:34:59 -0600 (Sat, 28 Feb 2009) | 3 lines
Amaury's name
........
r70112 | antoine.pitrou | 2009-03-02 17:11:55 -0600 (Mon, 02 Mar 2009) | 4 lines
Looks like this is necessary in order to build cleanly under Windows
(someone correct this if it's wrong, I'm no Windows user)
........
r70133 | benjamin.peterson | 2009-03-03 15:23:32 -0600 (Tue, 03 Mar 2009) | 1 line
fix test_newline_property on _pyio.StringIO
........
r70135 | benjamin.peterson | 2009-03-03 15:47:30 -0600 (Tue, 03 Mar 2009) | 1 line
fix typos and inconsistencies. thanks to Daniel Diniz
........
r70140 | benjamin.peterson | 2009-03-03 16:21:10 -0600 (Tue, 03 Mar 2009) | 1 line
add the test from #5266
........
Diffstat (limited to 'Lib/_pyio.py')
-rw-r--r-- | Lib/_pyio.py | 1831 |
1 files changed, 1831 insertions, 0 deletions
diff --git a/Lib/_pyio.py b/Lib/_pyio.py new file mode 100644 index 0000000..5f2e400 --- /dev/null +++ b/Lib/_pyio.py @@ -0,0 +1,1831 @@ +""" +Python implementation of the io module. +""" + +import os +import abc +import codecs +# Import _thread instead of threading to reduce startup cost +try: + from _thread import allocate_lock as Lock +except ImportError: + from _dummy_thread import allocate_lock as Lock + +import io +from io import __all__ + +# open() uses st_blksize whenever we can +DEFAULT_BUFFER_SIZE = 8 * 1024 # bytes + +# NOTE: Base classes defined here are registered with the "official" ABCs +# defined in io.py. We don't use real inheritance though, because we don't +# want to inherit the C implementations. + + +class BlockingIOError(IOError): + + """Exception raised when I/O would block on a non-blocking I/O stream.""" + + def __init__(self, errno, strerror, characters_written=0): + super().__init__(errno, strerror) + if not isinstance(characters_written, int): + raise TypeError("characters_written must be a integer") + self.characters_written = characters_written + + +def open(file, mode="r", buffering=None, encoding=None, errors=None, + newline=None, closefd=True): + + r"""Open file and return a stream. Raise IOError upon failure. + + file is either a text or byte string giving the name (and the path + if the file isn't in the current working directory) of the file to + be opened or an integer file descriptor of the file to be + wrapped. (If a file descriptor is given, it is closed when the + returned I/O object is closed, unless closefd is set to False.) + + mode is an optional string that specifies the mode in which the file + is opened. It defaults to 'r' which means open for reading in text + mode. Other common values are 'w' for writing (truncating the file if + it already exists), and 'a' for appending (which on some Unix systems, + means that all writes append to the end of the file regardless of the + current seek position). In text mode, if encoding is not specified the + encoding used is platform dependent. (For reading and writing raw + bytes use binary mode and leave encoding unspecified.) The available + modes are: + + ========= =============================================================== + Character Meaning + --------- --------------------------------------------------------------- + 'r' open for reading (default) + 'w' open for writing, truncating the file first + 'a' open for writing, appending to the end of the file if it exists + 'b' binary mode + 't' text mode (default) + '+' open a disk file for updating (reading and writing) + 'U' universal newline mode (for backwards compatibility; unneeded + for new code) + ========= =============================================================== + + The default mode is 'rt' (open for reading text). For binary random + access, the mode 'w+b' opens and truncates the file to 0 bytes, while + 'r+b' opens the file without truncation. + + Python distinguishes between files opened in binary and text modes, + even when the underlying operating system doesn't. Files opened in + binary mode (appending 'b' to the mode argument) return contents as + bytes objects without any decoding. In text mode (the default, or when + 't' is appended to the mode argument), the contents of the file are + returned as strings, the bytes having been first decoded using a + platform-dependent encoding or using the specified encoding if given. + + buffering is an optional integer used to set the buffering policy. By + default full buffering is on. Pass 0 to switch buffering off (only + allowed in binary mode), 1 to set line buffering, and an integer > 1 + for full buffering. + + encoding is the name of the encoding used to decode or encode the + file. This should only be used in text mode. The default encoding is + platform dependent, but any encoding supported by Python can be + passed. See the codecs module for the list of supported encodings. + + errors is an optional string that specifies how encoding errors are to + be handled---this argument should not be used in binary mode. Pass + 'strict' to raise a ValueError exception if there is an encoding error + (the default of None has the same effect), or pass 'ignore' to ignore + errors. (Note that ignoring encoding errors can lead to data loss.) + See the documentation for codecs.register for a list of the permitted + encoding error strings. + + newline controls how universal newlines works (it only applies to text + mode). It can be None, '', '\n', '\r', and '\r\n'. It works as + follows: + + * On input, if newline is None, universal newlines mode is + enabled. Lines in the input can end in '\n', '\r', or '\r\n', and + these are translated into '\n' before being returned to the + caller. If it is '', universal newline mode is enabled, but line + endings are returned to the caller untranslated. If it has any of + the other legal values, input lines are only terminated by the given + string, and the line ending is returned to the caller untranslated. + + * On output, if newline is None, any '\n' characters written are + translated to the system default line separator, os.linesep. If + newline is '', no translation takes place. If newline is any of the + other legal values, any '\n' characters written are translated to + the given string. + + If closefd is False, the underlying file descriptor will be kept open + when the file is closed. This does not work when a file name is given + and must be True in that case. + + open() returns a file object whose type depends on the mode, and + through which the standard file operations such as reading and writing + are performed. When open() is used to open a file in a text mode ('w', + 'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open + a file in a binary mode, the returned class varies: in read binary + mode, it returns a BufferedReader; in write binary and append binary + modes, it returns a BufferedWriter, and in read/write mode, it returns + a BufferedRandom. + + It is also possible to use a string or bytearray as a file for both + reading and writing. For strings StringIO can be used like a file + opened in a text mode, and for bytes a BytesIO can be used like a file + opened in a binary mode. + """ + if not isinstance(file, (str, bytes, int)): + raise TypeError("invalid file: %r" % file) + if not isinstance(mode, str): + raise TypeError("invalid mode: %r" % mode) + if buffering is not None and not isinstance(buffering, int): + raise TypeError("invalid buffering: %r" % buffering) + if encoding is not None and not isinstance(encoding, str): + raise TypeError("invalid encoding: %r" % encoding) + if errors is not None and not isinstance(errors, str): + raise TypeError("invalid errors: %r" % errors) + modes = set(mode) + if modes - set("arwb+tU") or len(mode) > len(modes): + raise ValueError("invalid mode: %r" % mode) + reading = "r" in modes + writing = "w" in modes + appending = "a" in modes + updating = "+" in modes + text = "t" in modes + binary = "b" in modes + if "U" in modes: + if writing or appending: + raise ValueError("can't use U and writing mode at once") + reading = True + if text and binary: + raise ValueError("can't have text and binary mode at once") + if reading + writing + appending > 1: + raise ValueError("can't have read/write/append mode at once") + if not (reading or writing or appending): + raise ValueError("must have exactly one of read/write/append mode") + if binary and encoding is not None: + raise ValueError("binary mode doesn't take an encoding argument") + if binary and errors is not None: + raise ValueError("binary mode doesn't take an errors argument") + if binary and newline is not None: + raise ValueError("binary mode doesn't take a newline argument") + raw = FileIO(file, + (reading and "r" or "") + + (writing and "w" or "") + + (appending and "a" or "") + + (updating and "+" or ""), + closefd) + if buffering is None: + buffering = -1 + line_buffering = False + if buffering == 1 or buffering < 0 and raw.isatty(): + buffering = -1 + line_buffering = True + if buffering < 0: + buffering = DEFAULT_BUFFER_SIZE + try: + bs = os.fstat(raw.fileno()).st_blksize + except (os.error, AttributeError): + pass + else: + if bs > 1: + buffering = bs + if buffering < 0: + raise ValueError("invalid buffering size") + if buffering == 0: + if binary: + return raw + raise ValueError("can't have unbuffered text I/O") + if updating: + buffer = BufferedRandom(raw, buffering) + elif writing or appending: + buffer = BufferedWriter(raw, buffering) + elif reading: + buffer = BufferedReader(raw, buffering) + else: + raise ValueError("unknown mode: %r" % mode) + if binary: + return buffer + text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering) + text.mode = mode + return text + + +class DocDescriptor: + """Helper for builtins.open.__doc__ + """ + def __get__(self, obj, typ): + return ( + "open(file, mode='r', buffering=None, encoding=None, " + "errors=None, newline=None, closefd=True)\n\n" + + open.__doc__) + +class OpenWrapper: + """Wrapper for builtins.open + + Trick so that open won't become a bound method when stored + as a class variable (as dbm.dumb does). + + See initstdio() in Python/pythonrun.c. + """ + __doc__ = DocDescriptor() + + def __new__(cls, *args, **kwargs): + return open(*args, **kwargs) + + +class UnsupportedOperation(ValueError, IOError): + pass + + +class IOBase(metaclass=abc.ABCMeta): + + """The abstract base class for all I/O classes, acting on streams of + bytes. There is no public constructor. + + This class provides dummy implementations for many methods that + derived classes can override selectively; the default implementations + represent a file that cannot be read, written or seeked. + + Even though IOBase does not declare read, readinto, or write because + their signatures will vary, implementations and clients should + consider those methods part of the interface. Also, implementations + may raise a IOError when operations they do not support are called. + + The basic type used for binary data read from or written to a file is + bytes. bytearrays are accepted too, and in some cases (such as + readinto) needed. Text I/O classes work with str data. + + Note that calling any method (even inquiries) on a closed stream is + undefined. Implementations may raise IOError in this case. + + IOBase (and its subclasses) support the iterator protocol, meaning + that an IOBase object can be iterated over yielding the lines in a + stream. + + IOBase also supports the :keyword:`with` statement. In this example, + fp is closed after the suite of the with statement is complete: + + with open('spam.txt', 'r') as fp: + fp.write('Spam and eggs!') + """ + + ### Internal ### + + def _unsupported(self, name: str) -> IOError: + """Internal: raise an exception for unsupported operations.""" + raise UnsupportedOperation("%s.%s() not supported" % + (self.__class__.__name__, name)) + + ### Positioning ### + + def seek(self, pos: int, whence: int = 0) -> int: + """Change stream position. + + Change the stream position to byte offset offset. offset is + interpreted relative to the position indicated by whence. Values + for whence are: + + * 0 -- start of stream (the default); offset should be zero or positive + * 1 -- current stream position; offset may be negative + * 2 -- end of stream; offset is usually negative + + Return the new absolute position. + """ + self._unsupported("seek") + + def tell(self) -> int: + """Return current stream position.""" + return self.seek(0, 1) + + def truncate(self, pos: int = None) -> int: + """Truncate file to size bytes. + + Size defaults to the current IO position as reported by tell(). Return + the new size. + """ + self._unsupported("truncate") + + ### Flush and close ### + + def flush(self) -> None: + """Flush write buffers, if applicable. + + This is not implemented for read-only and non-blocking streams. + """ + # XXX Should this return the number of bytes written??? + + __closed = False + + def close(self) -> None: + """Flush and close the IO object. + + This method has no effect if the file is already closed. + """ + if not self.__closed: + try: + self.flush() + except IOError: + pass # If flush() fails, just give up + self.__closed = True + + def __del__(self) -> None: + """Destructor. Calls close().""" + # The try/except block is in case this is called at program + # exit time, when it's possible that globals have already been + # deleted, and then the close() call might fail. Since + # there's nothing we can do about such failures and they annoy + # the end users, we suppress the traceback. + try: + self.close() + except: + pass + + ### Inquiries ### + + def seekable(self) -> bool: + """Return whether object supports random access. + + If False, seek(), tell() and truncate() will raise IOError. + This method may need to do a test seek(). + """ + return False + + def _checkSeekable(self, msg=None): + """Internal: raise an IOError if file is not seekable + """ + if not self.seekable(): + raise IOError("File or stream is not seekable." + if msg is None else msg) + + + def readable(self) -> bool: + """Return whether object was opened for reading. + + If False, read() will raise IOError. + """ + return False + + def _checkReadable(self, msg=None): + """Internal: raise an IOError if file is not readable + """ + if not self.readable(): + raise IOError("File or stream is not readable." + if msg is None else msg) + + def writable(self) -> bool: + """Return whether object was opened for writing. + + If False, write() and truncate() will raise IOError. + """ + return False + + def _checkWritable(self, msg=None): + """Internal: raise an IOError if file is not writable + """ + if not self.writable(): + raise IOError("File or stream is not writable." + if msg is None else msg) + + @property + def closed(self): + """closed: bool. True iff the file has been closed. + + For backwards compatibility, this is a property, not a predicate. + """ + return self.__closed + + def _checkClosed(self, msg=None): + """Internal: raise an ValueError if file is closed + """ + if self.closed: + raise ValueError("I/O operation on closed file." + if msg is None else msg) + + ### Context manager ### + + def __enter__(self) -> "IOBase": # That's a forward reference + """Context management protocol. Returns self.""" + self._checkClosed() + return self + + def __exit__(self, *args) -> None: + """Context management protocol. Calls close()""" + self.close() + + ### Lower-level APIs ### + + # XXX Should these be present even if unimplemented? + + def fileno(self) -> int: + """Returns underlying file descriptor if one exists. + + An IOError is raised if the IO object does not use a file descriptor. + """ + self._unsupported("fileno") + + def isatty(self) -> bool: + """Return whether this is an 'interactive' stream. + + Return False if it can't be determined. + """ + self._checkClosed() + return False + + ### Readline[s] and writelines ### + + def readline(self, limit: int = -1) -> bytes: + r"""Read and return a line from the stream. + + If limit is specified, at most limit bytes will be read. + + The line terminator is always b'\n' for binary files; for text + files, the newlines argument to open can be used to select the line + terminator(s) recognized. + """ + # For backwards compatibility, a (slowish) readline(). + if hasattr(self, "peek"): + def nreadahead(): + readahead = self.peek(1) + if not readahead: + return 1 + n = (readahead.find(b"\n") + 1) or len(readahead) + if limit >= 0: + n = min(n, limit) + return n + else: + def nreadahead(): + return 1 + if limit is None: + limit = -1 + res = bytearray() + while limit < 0 or len(res) < limit: + b = self.read(nreadahead()) + if not b: + break + res += b + if res.endswith(b"\n"): + break + return bytes(res) + + def __iter__(self): + self._checkClosed() + return self + + def __next__(self): + line = self.readline() + if not line: + raise StopIteration + return line + + def readlines(self, hint=None): + """Return a list of lines from the stream. + + hint can be specified to control the number of lines read: no more + lines will be read if the total size (in bytes/characters) of all + lines so far exceeds hint. + """ + if hint is None or hint <= 0: + return list(self) + n = 0 + lines = [] + for line in self: + lines.append(line) + n += len(line) + if n >= hint: + break + return lines + + def writelines(self, lines): + self._checkClosed() + for line in lines: + self.write(line) + +io.IOBase.register(IOBase) + + +class RawIOBase(IOBase): + + """Base class for raw binary I/O.""" + + # The read() method is implemented by calling readinto(); derived + # classes that want to support read() only need to implement + # readinto() as a primitive operation. In general, readinto() can be + # more efficient than read(). + + # (It would be tempting to also provide an implementation of + # readinto() in terms of read(), in case the latter is a more suitable + # primitive operation, but that would lead to nasty recursion in case + # a subclass doesn't implement either.) + + def read(self, n: int = -1) -> bytes: + """Read and return up to n bytes. + + Returns an empty bytes object on EOF, or None if the object is + set not to block and has no data to read. + """ + if n is None: + n = -1 + if n < 0: + return self.readall() + b = bytearray(n.__index__()) + n = self.readinto(b) + del b[n:] + return bytes(b) + + def readall(self): + """Read until EOF, using multiple read() call.""" + res = bytearray() + while True: + data = self.read(DEFAULT_BUFFER_SIZE) + if not data: + break + res += data + return bytes(res) + + def readinto(self, b: bytearray) -> int: + """Read up to len(b) bytes into b. + + Returns number of bytes read (0 for EOF), or None if the object + is set not to block as has no data to read. + """ + self._unsupported("readinto") + + def write(self, b: bytes) -> int: + """Write the given buffer to the IO stream. + + Returns the number of bytes written, which may be less than len(b). + """ + self._unsupported("write") + +io.RawIOBase.register(RawIOBase) +from _io import FileIO +RawIOBase.register(FileIO) + + +class BufferedIOBase(IOBase): + + """Base class for buffered IO objects. + + The main difference with RawIOBase is that the read() method + supports omitting the size argument, and does not have a default + implementation that defers to readinto(). + + In addition, read(), readinto() and write() may raise + BlockingIOError if the underlying raw stream is in non-blocking + mode and not ready; unlike their raw counterparts, they will never + return None. + + A typical implementation should not inherit from a RawIOBase + implementation, but wrap one. + """ + + def read(self, n: int = None) -> bytes: + """Read and return up to n bytes. + + If the argument is omitted, None, or negative, reads and + returns all data until EOF. + + If the argument is positive, and the underlying raw stream is + not 'interactive', multiple raw reads may be issued to satisfy + the byte count (unless EOF is reached first). But for + interactive raw streams (XXX and for pipes?), at most one raw + read will be issued, and a short result does not imply that + EOF is imminent. + + Returns an empty bytes array on EOF. + + Raises BlockingIOError if the underlying raw stream has no + data at the moment. + """ + self._unsupported("read") + + def read1(self, n: int=None) -> bytes: + """Read up to n bytes with at most one read() system call.""" + self._unsupported("read1") + + def readinto(self, b: bytearray) -> int: + """Read up to len(b) bytes into b. + + Like read(), this may issue multiple reads to the underlying raw + stream, unless the latter is 'interactive'. + + Returns the number of bytes read (0 for EOF). + + Raises BlockingIOError if the underlying raw stream has no + data at the moment. + """ + # XXX This ought to work with anything that supports the buffer API + data = self.read(len(b)) + n = len(data) + try: + b[:n] = data + except TypeError as err: + import array + if not isinstance(b, array.array): + raise err + b[:n] = array.array('b', data) + return n + + def write(self, b: bytes) -> int: + """Write the given buffer to the IO stream. + + Return the number of bytes written, which is never less than + len(b). + + Raises BlockingIOError if the buffer is full and the + underlying raw stream cannot accept more data at the moment. + """ + self._unsupported("write") + +io.BufferedIOBase.register(BufferedIOBase) + + +class _BufferedIOMixin(BufferedIOBase): + + """A mixin implementation of BufferedIOBase with an underlying raw stream. + + This passes most requests on to the underlying raw stream. It + does *not* provide implementations of read(), readinto() or + write(). + """ + + def __init__(self, raw): + self.raw = raw + + ### Positioning ### + + def seek(self, pos, whence=0): + new_position = self.raw.seek(pos, whence) + if new_position < 0: + raise IOError("seek() returned an invalid position") + return new_position + + def tell(self): + pos = self.raw.tell() + if pos < 0: + raise IOError("tell() returned an invalid position") + return pos + + def truncate(self, pos=None): + # Flush the stream. We're mixing buffered I/O with lower-level I/O, + # and a flush may be necessary to synch both views of the current + # file state. + self.flush() + + if pos is None: + pos = self.tell() + # XXX: Should seek() be used, instead of passing the position + # XXX directly to truncate? + return self.raw.truncate(pos) + + ### Flush and close ### + + def flush(self): + self.raw.flush() + + def close(self): + if not self.closed: + try: + self.flush() + except IOError: + pass # If flush() fails, just give up + self.raw.close() + + ### Inquiries ### + + def seekable(self): + return self.raw.seekable() + + def readable(self): + return self.raw.readable() + + def writable(self): + return self.raw.writable() + + @property + def closed(self): + return self.raw.closed + + @property + def name(self): + return self.raw.name + + @property + def mode(self): + return self.raw.mode + + ### Lower-level APIs ### + + def fileno(self): + return self.raw.fileno() + + def isatty(self): + return self.raw.isatty() + + +class BytesIO(BufferedIOBase): + + """Buffered I/O implementation using an in-memory bytes buffer.""" + + def __init__(self, initial_bytes=None): + buf = bytearray() + if initial_bytes is not None: + buf += initial_bytes + self._buffer = buf + self._pos = 0 + + def getvalue(self): + """Return the bytes value (contents) of the buffer + """ + if self.closed: + raise ValueError("getvalue on closed file") + return bytes(self._buffer) + + def read(self, n=None): + if self.closed: + raise ValueError("read from closed file") + if n is None: + n = -1 + if n < 0: + n = len(self._buffer) + if len(self._buffer) <= self._pos: + return b"" + newpos = min(len(self._buffer), self._pos + n) + b = self._buffer[self._pos : newpos] + self._pos = newpos + return bytes(b) + + def read1(self, n): + """This is the same as read. + """ + return self.read(n) + + def write(self, b): + if self.closed: + raise ValueError("write to closed file") + if isinstance(b, str): + raise TypeError("can't write str to binary stream") + n = len(b) + if n == 0: + return 0 + pos = self._pos + if pos > len(self._buffer): + # Inserts null bytes between the current end of the file + # and the new write position. + padding = b'\x00' * (pos - len(self._buffer)) + self._buffer += padding + self._buffer[pos:pos + n] = b + self._pos += n + return n + + def seek(self, pos, whence=0): + if self.closed: + raise ValueError("seek on closed file") + try: + pos = pos.__index__() + except AttributeError as err: + raise TypeError("an integer is required") from err + if whence == 0: + if pos < 0: + raise ValueError("negative seek position %r" % (pos,)) + self._pos = pos + elif whence == 1: + self._pos = max(0, self._pos + pos) + elif whence == 2: + self._pos = max(0, len(self._buffer) + pos) + else: + raise ValueError("invalid whence value") + return self._pos + + def tell(self): + if self.closed: + raise ValueError("tell on closed file") + return self._pos + + def truncate(self, pos=None): + if self.closed: + raise ValueError("truncate on closed file") + if pos is None: + pos = self._pos + elif pos < 0: + raise ValueError("negative truncate position %r" % (pos,)) + del self._buffer[pos:] + return self.seek(pos) + + def readable(self): + return True + + def writable(self): + return True + + def seekable(self): + return True + + +class BufferedReader(_BufferedIOMixin): + + """BufferedReader(raw[, buffer_size]) + + A buffer for a readable, sequential BaseRawIO object. + + The constructor creates a BufferedReader for the given readable raw + stream and buffer_size. If buffer_size is omitted, DEFAULT_BUFFER_SIZE + is used. + """ + + def __init__(self, raw, buffer_size=DEFAULT_BUFFER_SIZE): + """Create a new buffered reader using the given readable raw IO object. + """ + raw._checkReadable() + _BufferedIOMixin.__init__(self, raw) + if buffer_size <= 0: + raise ValueError("invalid buffer size") + self.buffer_size = buffer_size + self._reset_read_buf() + self._read_lock = Lock() + + def _reset_read_buf(self): + self._read_buf = b"" + self._read_pos = 0 + + def read(self, n=None): + """Read n bytes. + + Returns exactly n bytes of data unless the underlying raw IO + stream reaches EOF or if the call would block in non-blocking + mode. If n is negative, read until EOF or until read() would + block. + """ + if n is not None and n < -1: + raise ValueError("invalid number of bytes to read") + with self._read_lock: + return self._read_unlocked(n) + + def _read_unlocked(self, n=None): + nodata_val = b"" + empty_values = (b"", None) + buf = self._read_buf + pos = self._read_pos + + # Special case for when the number of bytes to read is unspecified. + if n is None or n == -1: + self._reset_read_buf() + chunks = [buf[pos:]] # Strip the consumed bytes. + current_size = 0 + while True: + # Read until EOF or until read() would block. + chunk = self.raw.read() + if chunk in empty_values: + nodata_val = chunk + break + current_size += len(chunk) + chunks.append(chunk) + return b"".join(chunks) or nodata_val + + # The number of bytes to read is specified, return at most n bytes. + avail = len(buf) - pos # Length of the available buffered data. + if n <= avail: + # Fast path: the data to read is fully buffered. + self._read_pos += n + return buf[pos:pos+n] + # Slow path: read from the stream until enough bytes are read, + # or until an EOF occurs or until read() would block. + chunks = [buf[pos:]] + wanted = max(self.buffer_size, n) + while avail < n: + chunk = self.raw.read(wanted) + if chunk in empty_values: + nodata_val = chunk + break + avail += len(chunk) + chunks.append(chunk) + # n is more then avail only when an EOF occurred or when + # read() would have blocked. + n = min(n, avail) + out = b"".join(chunks) + self._read_buf = out[n:] # Save the extra data in the buffer. + self._read_pos = 0 + return out[:n] if out else nodata_val + + def peek(self, n=0): + """Returns buffered bytes without advancing the position. + + The argument indicates a desired minimal number of bytes; we + do at most one raw read to satisfy it. We never return more + than self.buffer_size. + """ + with self._read_lock: + return self._peek_unlocked(n) + + def _peek_unlocked(self, n=0): + want = min(n, self.buffer_size) + have = len(self._read_buf) - self._read_pos + if have < want or have <= 0: + to_read = self.buffer_size - have + current = self.raw.read(to_read) + if current: + self._read_buf = self._read_buf[self._read_pos:] + current + self._read_pos = 0 + return self._read_buf[self._read_pos:] + + def read1(self, n): + """Reads up to n bytes, with at most one read() system call.""" + # Returns up to n bytes. If at least one byte is buffered, we + # only return buffered bytes. Otherwise, we do one raw read. + if n < 0: + raise ValueError("number of bytes to read must be positive") + if n == 0: + return b"" + with self._read_lock: + self._peek_unlocked(1) + return self._read_unlocked( + min(n, len(self._read_buf) - self._read_pos)) + + def tell(self): + return _BufferedIOMixin.tell(self) - len(self._read_buf) + self._read_pos + + def seek(self, pos, whence=0): + if not (0 <= whence <= 2): + raise ValueError("invalid whence value") + with self._read_lock: + if whence == 1: + pos -= len(self._read_buf) - self._read_pos + pos = _BufferedIOMixin.seek(self, pos, whence) + self._reset_read_buf() + return pos + +class BufferedWriter(_BufferedIOMixin): + + """A buffer for a writeable sequential RawIO object. + + The constructor creates a BufferedWriter for the given writeable raw + stream. If the buffer_size is not given, it defaults to + DEFAULT_BUFFER_SIZE. If max_buffer_size is omitted, it defaults to + twice the buffer size. + """ + + def __init__(self, raw, + buffer_size=DEFAULT_BUFFER_SIZE, max_buffer_size=None): + raw._checkWritable() + _BufferedIOMixin.__init__(self, raw) + if buffer_size <= 0: + raise ValueError("invalid buffer size") + self.buffer_size = buffer_size + self.max_buffer_size = (2*buffer_size + if max_buffer_size is None + else max_buffer_size) + self._write_buf = bytearray() + self._write_lock = Lock() + + def write(self, b): + if self.closed: + raise ValueError("write to closed file") + if isinstance(b, str): + raise TypeError("can't write str to binary stream") + with self._write_lock: + # XXX we can implement some more tricks to try and avoid + # partial writes + if len(self._write_buf) > self.buffer_size: + # We're full, so let's pre-flush the buffer + try: + self._flush_unlocked() + except BlockingIOError as e: + # We can't accept anything else. + # XXX Why not just let the exception pass through? + raise BlockingIOError(e.errno, e.strerror, 0) + before = len(self._write_buf) + self._write_buf.extend(b) + written = len(self._write_buf) - before + if len(self._write_buf) > self.buffer_size: + try: + self._flush_unlocked() + except BlockingIOError as e: + if len(self._write_buf) > self.max_buffer_size: + # We've hit max_buffer_size. We have to accept a + # partial write and cut back our buffer. + overage = len(self._write_buf) - self.max_buffer_size + written -= overage + self._write_buf = self._write_buf[:self.max_buffer_size] + raise BlockingIOError(e.errno, e.strerror, written) + return written + + def truncate(self, pos=None): + with self._write_lock: + self._flush_unlocked() + if pos is None: + pos = self.raw.tell() + return self.raw.truncate(pos) + + def flush(self): + with self._write_lock: + self._flush_unlocked() + + def _flush_unlocked(self): + if self.closed: + raise ValueError("flush of closed file") + written = 0 + try: + while self._write_buf: + n = self.raw.write(self._write_buf) + if n > len(self._write_buf) or n < 0: + raise IOError("write() returned incorrect number of bytes") + del self._write_buf[:n] + written += n + except BlockingIOError as e: + n = e.characters_written + del self._write_buf[:n] + written += n + raise BlockingIOError(e.errno, e.strerror, written) + + def tell(self): + return _BufferedIOMixin.tell(self) + len(self._write_buf) + + def seek(self, pos, whence=0): + if not (0 <= whence <= 2): + raise ValueError("invalid whence") + with self._write_lock: + self._flush_unlocked() + return _BufferedIOMixin.seek(self, pos, whence) + + +class BufferedRWPair(BufferedIOBase): + + """A buffered reader and writer object together. + + A buffered reader object and buffered writer object put together to + form a sequential IO object that can read and write. This is typically + used with a socket or two-way pipe. + + reader and writer are RawIOBase objects that are readable and + writeable respectively. If the buffer_size is omitted it defaults to + DEFAULT_BUFFER_SIZE. The max_buffer_size (for the buffered writer) + defaults to twice the buffer size. + """ + + # XXX The usefulness of this (compared to having two separate IO + # objects) is questionable. + + def __init__(self, reader, writer, + buffer_size=DEFAULT_BUFFER_SIZE, max_buffer_size=None): + """Constructor. + + The arguments are two RawIO instances. + """ + reader._checkReadable() + writer._checkWritable() + self.reader = BufferedReader(reader, buffer_size) + self.writer = BufferedWriter(writer, buffer_size, max_buffer_size) + + def read(self, n=None): + if n is None: + n = -1 + return self.reader.read(n) + + def readinto(self, b): + return self.reader.readinto(b) + + def write(self, b): + return self.writer.write(b) + + def peek(self, n=0): + return self.reader.peek(n) + + def read1(self, n): + return self.reader.read1(n) + + def readable(self): + return self.reader.readable() + + def writable(self): + return self.writer.writable() + + def flush(self): + return self.writer.flush() + + def close(self): + self.writer.close() + self.reader.close() + + def isatty(self): + return self.reader.isatty() or self.writer.isatty() + + @property + def closed(self): + return self.writer.closed + + +class BufferedRandom(BufferedWriter, BufferedReader): + + """A buffered interface to random access streams. + + The constructor creates a reader and writer for a seekable stream, + raw, given in the first argument. If the buffer_size is omitted it + defaults to DEFAULT_BUFFER_SIZE. The max_buffer_size (for the buffered + writer) defaults to twice the buffer size. + """ + + def __init__(self, raw, + buffer_size=DEFAULT_BUFFER_SIZE, max_buffer_size=None): + raw._checkSeekable() + BufferedReader.__init__(self, raw, buffer_size) + BufferedWriter.__init__(self, raw, buffer_size, max_buffer_size) + + def seek(self, pos, whence=0): + if not (0 <= whence <= 2): + raise ValueError("invalid whence") + self.flush() + if self._read_buf: + # Undo read ahead. + with self._read_lock: + self.raw.seek(self._read_pos - len(self._read_buf), 1) + # First do the raw seek, then empty the read buffer, so that + # if the raw seek fails, we don't lose buffered data forever. + pos = self.raw.seek(pos, whence) + with self._read_lock: + self._reset_read_buf() + if pos < 0: + raise IOError("seek() returned invalid position") + return pos + + def tell(self): + if self._write_buf: + return BufferedWriter.tell(self) + else: + return BufferedReader.tell(self) + + def truncate(self, pos=None): + if pos is None: + pos = self.tell() + # Use seek to flush the read buffer. + self.seek(pos) + return BufferedWriter.truncate(self) + + def read(self, n=None): + if n is None: + n = -1 + self.flush() + return BufferedReader.read(self, n) + + def readinto(self, b): + self.flush() + return BufferedReader.readinto(self, b) + + def peek(self, n=0): + self.flush() + return BufferedReader.peek(self, n) + + def read1(self, n): + self.flush() + return BufferedReader.read1(self, n) + + def write(self, b): + if self._read_buf: + # Undo readahead + with self._read_lock: + self.raw.seek(self._read_pos - len(self._read_buf), 1) + self._reset_read_buf() + return BufferedWriter.write(self, b) + + +class TextIOBase(IOBase): + + """Base class for text I/O. + + This class provides a character and line based interface to stream + I/O. There is no readinto method because Python's character strings + are immutable. There is no public constructor. + """ + + def read(self, n: int = -1) -> str: + """Read at most n characters from stream. + + Read from underlying buffer until we have n characters or we hit EOF. + If n is negative or omitted, read until EOF. + """ + self._unsupported("read") + + def write(self, s: str) -> int: + """Write string s to stream.""" + self._unsupported("write") + + def truncate(self, pos: int = None) -> int: + """Truncate size to pos.""" + self._unsupported("truncate") + + def readline(self) -> str: + """Read until newline or EOF. + + Returns an empty string if EOF is hit immediately. + """ + self._unsupported("readline") + + @property + def encoding(self): + """Subclasses should override.""" + return None + + @property + def newlines(self): + """Line endings translated so far. + + Only line endings translated during reading are considered. + + Subclasses should override. + """ + return None + +io.TextIOBase.register(TextIOBase) + + +class IncrementalNewlineDecoder(codecs.IncrementalDecoder): + r"""Codec used when reading a file in universal newlines mode. It wraps + another incremental decoder, translating \r\n and \r into \n. It also + records the types of newlines encountered. When used with + translate=False, it ensures that the newline sequence is returned in + one piece. + """ + def __init__(self, decoder, translate, errors='strict'): + codecs.IncrementalDecoder.__init__(self, errors=errors) + self.translate = translate + self.decoder = decoder + self.seennl = 0 + self.pendingcr = False + + def decode(self, input, final=False): + # decode input (with the eventual \r from a previous pass) + if self.decoder is None: + output = input + else: + output = self.decoder.decode(input, final=final) + if self.pendingcr and (output or final): + output = "\r" + output + self.pendingcr = False + + # retain last \r even when not translating data: + # then readline() is sure to get \r\n in one pass + if output.endswith("\r") and not final: + output = output[:-1] + self.pendingcr = True + + # Record which newlines are read + crlf = output.count('\r\n') + cr = output.count('\r') - crlf + lf = output.count('\n') - crlf + self.seennl |= (lf and self._LF) | (cr and self._CR) \ + | (crlf and self._CRLF) + + if self.translate: + if crlf: + output = output.replace("\r\n", "\n") + if cr: + output = output.replace("\r", "\n") + + return output + + def getstate(self): + if self.decoder is None: + buf = b"" + flag = 0 + else: + buf, flag = self.decoder.getstate() + flag <<= 1 + if self.pendingcr: + flag |= 1 + return buf, flag + + def setstate(self, state): + buf, flag = state + self.pendingcr = bool(flag & 1) + if self.decoder is not None: + self.decoder.setstate((buf, flag >> 1)) + + def reset(self): + self.seennl = 0 + self.pendingcr = False + if self.decoder is not None: + self.decoder.reset() + + _LF = 1 + _CR = 2 + _CRLF = 4 + + @property + def newlines(self): + return (None, + "\n", + "\r", + ("\r", "\n"), + "\r\n", + ("\n", "\r\n"), + ("\r", "\r\n"), + ("\r", "\n", "\r\n") + )[self.seennl] + + +class TextIOWrapper(TextIOBase): + + r"""Character and line based layer over a BufferedIOBase object, buffer. + + encoding gives the name of the encoding that the stream will be + decoded or encoded with. It defaults to locale.getpreferredencoding. + + errors determines the strictness of encoding and decoding (see the + codecs.register) and defaults to "strict". + + newline can be None, '', '\n', '\r', or '\r\n'. It controls the + handling of line endings. If it is None, universal newlines is + enabled. With this enabled, on input, the lines endings '\n', '\r', + or '\r\n' are translated to '\n' before being returned to the + caller. Conversely, on output, '\n' is translated to the system + default line seperator, os.linesep. If newline is any other of its + legal values, that newline becomes the newline when the file is read + and it is returned untranslated. On output, '\n' is converted to the + newline. + + If line_buffering is True, a call to flush is implied when a call to + write contains a newline character. + """ + + _CHUNK_SIZE = 2048 + + def __init__(self, buffer, encoding=None, errors=None, newline=None, + line_buffering=False): + if newline is not None and not isinstance(newline, str): + raise TypeError("illegal newline type: %r" % (type(newline),)) + if newline not in (None, "", "\n", "\r", "\r\n"): + raise ValueError("illegal newline value: %r" % (newline,)) + if encoding is None: + try: + encoding = os.device_encoding(buffer.fileno()) + except (AttributeError, UnsupportedOperation): + pass + if encoding is None: + try: + import locale + except ImportError: + # Importing locale may fail if Python is being built + encoding = "ascii" + else: + encoding = locale.getpreferredencoding() + + if not isinstance(encoding, str): + raise ValueError("invalid encoding: %r" % encoding) + + if errors is None: + errors = "strict" + else: + if not isinstance(errors, str): + raise ValueError("invalid errors: %r" % errors) + + self.buffer = buffer + self._line_buffering = line_buffering + self._encoding = encoding + self._errors = errors + self._readuniversal = not newline + self._readtranslate = newline is None + self._readnl = newline + self._writetranslate = newline != '' + self._writenl = newline or os.linesep + self._encoder = None + self._decoder = None + self._decoded_chars = '' # buffer for text returned from decoder + self._decoded_chars_used = 0 # offset into _decoded_chars for read() + self._snapshot = None # info for reconstructing decoder state + self._seekable = self._telling = self.buffer.seekable() + + # self._snapshot is either None, or a tuple (dec_flags, next_input) + # where dec_flags is the second (integer) item of the decoder state + # and next_input is the chunk of input bytes that comes next after the + # snapshot point. We use this to reconstruct decoder states in tell(). + + # Naming convention: + # - "bytes_..." for integer variables that count input bytes + # - "chars_..." for integer variables that count decoded characters + + @property + def encoding(self): + return self._encoding + + @property + def errors(self): + return self._errors + + @property + def line_buffering(self): + return self._line_buffering + + def seekable(self): + return self._seekable + + def readable(self): + return self.buffer.readable() + + def writable(self): + return self.buffer.writable() + + def flush(self): + self.buffer.flush() + self._telling = self._seekable + + def close(self): + try: + self.flush() + except: + pass # If flush() fails, just give up + self.buffer.close() + + @property + def closed(self): + return self.buffer.closed + + @property + def name(self): + return self.buffer.name + + def fileno(self): + return self.buffer.fileno() + + def isatty(self): + return self.buffer.isatty() + + def write(self, s: str): + if self.closed: + raise ValueError("write to closed file") + if not isinstance(s, str): + raise TypeError("can't write %s to text stream" % + s.__class__.__name__) + length = len(s) + haslf = (self._writetranslate or self._line_buffering) and "\n" in s + if haslf and self._writetranslate and self._writenl != "\n": + s = s.replace("\n", self._writenl) + encoder = self._encoder or self._get_encoder() + # XXX What if we were just reading? + b = encoder.encode(s) + self.buffer.write(b) + if self._line_buffering and (haslf or "\r" in s): + self.flush() + self._snapshot = None + if self._decoder: + self._decoder.reset() + return length + + def _get_encoder(self): + make_encoder = codecs.getincrementalencoder(self._encoding) + self._encoder = make_encoder(self._errors) + return self._encoder + + def _get_decoder(self): + make_decoder = codecs.getincrementaldecoder(self._encoding) + decoder = make_decoder(self._errors) + if self._readuniversal: + decoder = IncrementalNewlineDecoder(decoder, self._readtranslate) + self._decoder = decoder + return decoder + + # The following three methods implement an ADT for _decoded_chars. + # Text returned from the decoder is buffered here until the client + # requests it by calling our read() or readline() method. + def _set_decoded_chars(self, chars): + """Set the _decoded_chars buffer.""" + self._decoded_chars = chars + self._decoded_chars_used = 0 + + def _get_decoded_chars(self, n=None): + """Advance into the _decoded_chars buffer.""" + offset = self._decoded_chars_used + if n is None: + chars = self._decoded_chars[offset:] + else: + chars = self._decoded_chars[offset:offset + n] + self._decoded_chars_used += len(chars) + return chars + + def _rewind_decoded_chars(self, n): + """Rewind the _decoded_chars buffer.""" + if self._decoded_chars_used < n: + raise AssertionError("rewind decoded_chars out of bounds") + self._decoded_chars_used -= n + + def _read_chunk(self): + """ + Read and decode the next chunk of data from the BufferedReader. + """ + + # The return value is True unless EOF was reached. The decoded + # string is placed in self._decoded_chars (replacing its previous + # value). The entire input chunk is sent to the decoder, though + # some of it may remain buffered in the decoder, yet to be + # converted. + + if self._decoder is None: + raise ValueError("no decoder") + + if self._telling: + # To prepare for tell(), we need to snapshot a point in the + # file where the decoder's input buffer is empty. + + dec_buffer, dec_flags = self._decoder.getstate() + # Given this, we know there was a valid snapshot point + # len(dec_buffer) bytes ago with decoder state (b'', dec_flags). + + # Read a chunk, decode it, and put the result in self._decoded_chars. + input_chunk = self.buffer.read1(self._CHUNK_SIZE) + eof = not input_chunk + self._set_decoded_chars(self._decoder.decode(input_chunk, eof)) + + if self._telling: + # At the snapshot point, len(dec_buffer) bytes before the read, + # the next input to be decoded is dec_buffer + input_chunk. + self._snapshot = (dec_flags, dec_buffer + input_chunk) + + return not eof + + def _pack_cookie(self, position, dec_flags=0, + bytes_to_feed=0, need_eof=0, chars_to_skip=0): + # The meaning of a tell() cookie is: seek to position, set the + # decoder flags to dec_flags, read bytes_to_feed bytes, feed them + # into the decoder with need_eof as the EOF flag, then skip + # chars_to_skip characters of the decoded result. For most simple + # decoders, tell() will often just give a byte offset in the file. + return (position | (dec_flags<<64) | (bytes_to_feed<<128) | + (chars_to_skip<<192) | bool(need_eof)<<256) + + def _unpack_cookie(self, bigint): + rest, position = divmod(bigint, 1<<64) + rest, dec_flags = divmod(rest, 1<<64) + rest, bytes_to_feed = divmod(rest, 1<<64) + need_eof, chars_to_skip = divmod(rest, 1<<64) + return position, dec_flags, bytes_to_feed, need_eof, chars_to_skip + + def tell(self): + if not self._seekable: + raise IOError("underlying stream is not seekable") + if not self._telling: + raise IOError("telling position disabled by next() call") + self.flush() + position = self.buffer.tell() + decoder = self._decoder + if decoder is None or self._snapshot is None: + if self._decoded_chars: + # This should never happen. + raise AssertionError("pending decoded text") + return position + + # Skip backward to the snapshot point (see _read_chunk). + dec_flags, next_input = self._snapshot + position -= len(next_input) + + # How many decoded characters have been used up since the snapshot? + chars_to_skip = self._decoded_chars_used + if chars_to_skip == 0: + # We haven't moved from the snapshot point. + return self._pack_cookie(position, dec_flags) + + # Starting from the snapshot position, we will walk the decoder + # forward until it gives us enough decoded characters. + saved_state = decoder.getstate() + try: + # Note our initial start point. + decoder.setstate((b'', dec_flags)) + start_pos = position + start_flags, bytes_fed, chars_decoded = dec_flags, 0, 0 + need_eof = 0 + + # Feed the decoder one byte at a time. As we go, note the + # nearest "safe start point" before the current location + # (a point where the decoder has nothing buffered, so seek() + # can safely start from there and advance to this location). + next_byte = bytearray(1) + for next_byte[0] in next_input: + bytes_fed += 1 + chars_decoded += len(decoder.decode(next_byte)) + dec_buffer, dec_flags = decoder.getstate() + if not dec_buffer and chars_decoded <= chars_to_skip: + # Decoder buffer is empty, so this is a safe start point. + start_pos += bytes_fed + chars_to_skip -= chars_decoded + start_flags, bytes_fed, chars_decoded = dec_flags, 0, 0 + if chars_decoded >= chars_to_skip: + break + else: + # We didn't get enough decoded data; signal EOF to get more. + chars_decoded += len(decoder.decode(b'', final=True)) + need_eof = 1 + if chars_decoded < chars_to_skip: + raise IOError("can't reconstruct logical file position") + + # The returned cookie corresponds to the last safe start point. + return self._pack_cookie( + start_pos, start_flags, bytes_fed, need_eof, chars_to_skip) + finally: + decoder.setstate(saved_state) + + def truncate(self, pos=None): + self.flush() + if pos is None: + pos = self.tell() + self.seek(pos) + return self.buffer.truncate() + + def seek(self, cookie, whence=0): + if self.closed: + raise ValueError("tell on closed file") + if not self._seekable: + raise IOError("underlying stream is not seekable") + if whence == 1: # seek relative to current position + if cookie != 0: + raise IOError("can't do nonzero cur-relative seeks") + # Seeking to the current position should attempt to + # sync the underlying buffer with the current position. + whence = 0 + cookie = self.tell() + if whence == 2: # seek relative to end of file + if cookie != 0: + raise IOError("can't do nonzero end-relative seeks") + self.flush() + position = self.buffer.seek(0, 2) + self._set_decoded_chars('') + self._snapshot = None + if self._decoder: + self._decoder.reset() + return position + if whence != 0: + raise ValueError("invalid whence (%r, should be 0, 1 or 2)" % + (whence,)) + if cookie < 0: + raise ValueError("negative seek position %r" % (cookie,)) + self.flush() + + # The strategy of seek() is to go back to the safe start point + # and replay the effect of read(chars_to_skip) from there. + start_pos, dec_flags, bytes_to_feed, need_eof, chars_to_skip = \ + self._unpack_cookie(cookie) + + # Seek back to the safe start point. + self.buffer.seek(start_pos) + self._set_decoded_chars('') + self._snapshot = None + + # Restore the decoder to its state from the safe start point. + if self._decoder or dec_flags or chars_to_skip: + self._decoder = self._decoder or self._get_decoder() + self._decoder.setstate((b'', dec_flags)) + self._snapshot = (dec_flags, b'') + + if chars_to_skip: + # Just like _read_chunk, feed the decoder and save a snapshot. + input_chunk = self.buffer.read(bytes_to_feed) + self._set_decoded_chars( + self._decoder.decode(input_chunk, need_eof)) + self._snapshot = (dec_flags, input_chunk) + + # Skip chars_to_skip of the decoded characters. + if len(self._decoded_chars) < chars_to_skip: + raise IOError("can't restore logical file position") + self._decoded_chars_used = chars_to_skip + + return cookie + + def read(self, n=None): + if n is None: + n = -1 + decoder = self._decoder or self._get_decoder() + if n < 0: + # Read everything. + result = (self._get_decoded_chars() + + decoder.decode(self.buffer.read(), final=True)) + self._set_decoded_chars('') + self._snapshot = None + return result + else: + # Keep reading chunks until we have n characters to return. + eof = False + result = self._get_decoded_chars(n) + while len(result) < n and not eof: + eof = not self._read_chunk() + result += self._get_decoded_chars(n - len(result)) + return result + + def __next__(self): + self._telling = False + line = self.readline() + if not line: + self._snapshot = None + self._telling = self._seekable + raise StopIteration + return line + + def readline(self, limit=None): + if self.closed: + raise ValueError("read from closed file") + if limit is None: + limit = -1 + + # Grab all the decoded text (we will rewind any extra bits later). + line = self._get_decoded_chars() + + start = 0 + # Make the decoder if it doesn't already exist. + if not self._decoder: + self._get_decoder() + + pos = endpos = None + while True: + if self._readtranslate: + # Newlines are already translated, only search for \n + pos = line.find('\n', start) + if pos >= 0: + endpos = pos + 1 + break + else: + start = len(line) + + elif self._readuniversal: + # Universal newline search. Find any of \r, \r\n, \n + # The decoder ensures that \r\n are not split in two pieces + + # In C we'd look for these in parallel of course. + nlpos = line.find("\n", start) + crpos = line.find("\r", start) + if crpos == -1: + if nlpos == -1: + # Nothing found + start = len(line) + else: + # Found \n + endpos = nlpos + 1 + break + elif nlpos == -1: + # Found lone \r + endpos = crpos + 1 + break + elif nlpos < crpos: + # Found \n + endpos = nlpos + 1 + break + elif nlpos == crpos + 1: + # Found \r\n + endpos = crpos + 2 + break + else: + # Found \r + endpos = crpos + 1 + break + else: + # non-universal + pos = line.find(self._readnl) + if pos >= 0: + endpos = pos + len(self._readnl) + break + + if limit >= 0 and len(line) >= limit: + endpos = limit # reached length limit + break + + # No line ending seen yet - get more data' + while self._read_chunk(): + if self._decoded_chars: + break + if self._decoded_chars: + line += self._get_decoded_chars() + else: + # end of file + self._set_decoded_chars('') + self._snapshot = None + return line + + if limit >= 0 and endpos > limit: + endpos = limit # don't exceed limit + + # Rewind _decoded_chars to just after the line ending we found. + self._rewind_decoded_chars(len(line) - endpos) + return line[:endpos] + + @property + def newlines(self): + return self._decoder.newlines if self._decoder else None + + +class StringIO(TextIOWrapper): + """Text I/O implementation using an in-memory buffer. + + The initial_value argument sets the value of object. The newline + argument is like the one of TextIOWrapper's constructor. + """ + + # XXX This is really slow, but fully functional + + def __init__(self, initial_value="", newline="\n"): + super(StringIO, self).__init__(BytesIO(), + encoding="utf-8", + errors="strict", + newline=newline) + if initial_value: + if not isinstance(initial_value, str): + initial_value = str(initial_value) + self.write(initial_value) + self.seek(0) + + def getvalue(self): + self.flush() + return self.buffer.getvalue().decode(self._encoding, self._errors) |