| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
stream cannot immediately return bytes. (GH-122933)
|
|
|
|
|
|
|
|
| |
Performed an audit of `fileio.c` and `_pyio` and made sure anytime the
fd changes the stat result, if set, is also cleared/changed.
There's one case where it's not cleared, if code would clear it in
__init__, keep the memory allocated and just do another fstat with the
existing memory.
|
|
|
|
|
|
|
| |
Spotted by @ngnpope.
`isatty` returns False to indicate the file is not a TTY. The C
implementation of _io does that (`Py_RETURN_FALSE`) but I got the
bool backwards in the _pyio implementaiton.
|
|
|
| |
Co-authored-by: Victor Stinner <vstinner@python.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
individual members (#123412)
Multiple places in the I/O stack optimize common cases by using the
information from stat. Currently individual members are extracted from
the stat and stored into the fileio struct. Refactor the code to store
the whole stat struct instead.
Parallels the changes to _io. The `stat` Python object doesn't allow
changing members, so rather than modifying estimated_size, just clear
the value.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces the system call count of a simple program[0] that reads all
the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my
linux system, 5813 -> 4875 on my macOS)
This reduces the number of `fstat()` calls always and seek calls most
the time. Stat was always called twice, once at open (to error early on
directories), and a second time to get the size of the file to be able
to read the whole file in one read. Now the size is cached with the
first call.
The code keeps an optimization that if the user had previously read a
lot of data, the current position is subtracted from the number of bytes
to read. That is somewhat expensive so only do it on larger files,
otherwise just try and read the extra bytes and resize the PyBytes as
needeed.
I built a little test program to validate the behavior + assumptions
around relative costs and then ran it under `strace` to get a log of the
system calls. Full samples below[1].
After the changes, this is everything in one `filename.read_text()`:
```python3
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3`
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0`
ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1) = 0
close(3) = 0
```
This does make some tradeoffs
1. If the file size changes between open() and readall(), this will
still get all the data but might have more read calls.
2. I experimented with avoiding the stat + cached result for small files
in general, but on my dev workstation at least that tended to reduce
performance compared to using the fstat().
[0]
```python3
from pathlib import Path
nlines = []
for filename in Path("cpython/Doc").glob("**/*.rst"):
nlines.append(len(filename.read_text()))
```
[1]
Before small file:
```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
lseek(3, 0, SEEK_CUR) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1) = 0
close(3) = 0
```
After small file:
```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1) = 0
close(3) = 0
```
Before large file:
```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
lseek(3, 0, SEEK_CUR) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1) = 0
close(3) = 0
```
After large file:
```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
lseek(3, 0, SEEK_CUR) = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1) = 0
close(3) = 0
```
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
|
|
|
|
|
| |
Tools such as ruff can ignore "imported but unused" warnings if a
line ends with "# noqa: F401". It avoids the temptation to remove
an import which is used effectively.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(GH-99709)
lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.
When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.
Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
|
| |
|
|
|
|
|
|
|
| |
(GH-22535)
io.TextIOWrapper was dropping the internal decoding buffer
during read() and write() calls.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Remove io.OpenWrapper and _pyio.OpenWrapper, deprecated in Python
3.10: just use :func:`open` instead. The open() (io.open()) function
is a built-in function. Since Python 3.10, _pyio.open() is also a
static method.
|
| |
|
| |
|
|
|
|
| |
`TextIOWrapper.__init__()` called `os.device_encoding(file.fileno())` if fileno is 0-2 and encoding=None.
But it is very rarely works, and never documented behavior.
|
| |
|
|
|
| |
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
|
| |
|
|
|
|
| |
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com>
|
|
|
|
|
| |
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
|
|
|
|
|
|
|
|
| |
Deprecate io.OpenWrapper and _pyio.OpenWrapper: use io.open and
_pyio.open instead. Until Python 3.9, _pyio.open was not a static
method and builtins.open was set to OpenWrapper to not become a bound
method when set to a class variable. _io.open is a built-in function
whereas _pyio.open is a Python function. In Python 3.10, _pyio.open()
is now a static method, and builtins.open() is now io.open().
|
|
|
|
|
|
|
|
|
|
|
| |
The Python _pyio.open() function becomes a static method to behave as
io.open() built-in function: don't become a bound method when stored
as a class variable. It becomes possible since static methods are now
callable in Python 3.10. Moreover, _pyio.OpenWrapper becomes a simple
alias to _pyio.open.
init_set_builtins_open() now sets builtins.open to io.open, rather
than setting it to io.OpenWrapper, since OpenWrapper is now an alias
to open in the io and _pyio modules.
|
|
|
|
|
| |
(GH-25103)" (#25108)
This reverts commit ff3c9739bd69aa8b58007e63c9e40e6708b4761e.
|
|
|
|
| |
It make `encoding="locale"` usable everywhere `encoding=None` is
allowed.
|
|
|
|
|
|
|
|
|
|
|
| |
See [PEP 597](https://www.python.org/dev/peps/pep-0597/).
* Add `-X warn_default_encoding` and `PYTHONWARNDEFAULTENCODING`.
* Add EncodingWarning
* Add io.text_encoding()
* open(), TextIOWrapper() emits EncodingWarning when encoding is omitted and warn_default_encoding is enabled.
* _pyio.TextIOWrapper() uses UTF-8 as fallback default encoding used when failed to import locale module. (used during building Python)
* bz2, configparser, gzip, lzma, pathlib, tempfile modules use io.text_encoding().
* What's new entry
|
|
|
|
|
|
|
| |
(GH-16959)" (GH-18767)
This reverts commit e471e72977c83664f13d041c78549140c86c92de.
The mode will be removed from Python 3.10.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The truncate() method of io.BufferedReader() should raise
UnsupportedOperation when it is called on a read-only
io.BufferedReader() instance.
https://bugs.python.org/issue35950
Automerge-Triggered-By: @methane
|
|
|
|
|
| |
(GH-17112)
This change, which follows the behavior of C stdio's fdopen and Python 2's file object, allows pipes to be opened in append mode.
|
|
|
|
|
| |
open(), io.open(), codecs.open() and fileinput.FileInput no longer
accept "U" ("universal newline") in the file mode. This flag was
deprecated since Python 3.3.
|
|
|
|
|
|
| |
* Use the 'p' format unit instead of manually called PyObject_IsTrue().
* Pass boolean value instead 0/1 integers to functions that needs boolean.
* Convert some arguments to boolean only once.
|
| |
|
|
|
|
| |
streams. (GH-15543)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix typos in comments, docs and test names
* Update test_pyparse.py
account for change in string length
* Apply suggestion: splitable -> splittable
Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>
* Apply suggestion: splitable -> splittable
Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>
* Apply suggestion: Dealloccte -> Deallocate
Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>
* Update posixmodule checksum.
* Reverse idlelib changes.
|
|
|
|
|
|
|
|
|
| |
In development mode and in debug build, encoding and errors arguments
are now checked on string encoding and decoding operations. Examples:
open(), str.encode() and bytes.decode().
By default, for best performances, the errors argument is only
checked at the first encoding/decoding error, and the encoding
argument is sometimes ignored for empty strings.
|
|
|
|
| |
_pyio.IOBase destructor now does nothing if getting the closed
attribute fails to better mimick _io.IOBase finalizer.
|
|
|
|
|
| |
Fix destructor _pyio.BytesIO and _pyio.TextIOWrapper: initialize
their _buffer attribute as soon as possible (in the class body),
because it's used by __del__() which calls close().
|
|
|
| |
Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
|
|
|
|
|
|
|
|
| |
In development (-X dev) mode and in a debug build, IOBase finalizer
of the _pyio module now logs the exception if the close() method
fails. The exception is ignored silently by default in release build.
test_io: test_error_through_destructor() now uses
support.catch_unraisable_exception() rather than capturing stderr.
|
| |
|
|
|
|
|
|
|
| |
IOBase. (GH-11893)
Move all documentation regarding the readinto method into either io.RawIOBase or io.BufferedIOBase.
Corresponding changes to documentation in the _pyio.py module.
|
|
|
|
|
|
|
| |
The previous code hardcoded `SEEK_SET`, etc. While it's very unlikely
that these values will change, it's best to use the definitions to avoid
there being mismatches in behavior with the code in the future.
Signed-off-by: Enji Cooper <yaneurabeya@gmail.com>
|
|
|
|
| |
types. (GH-6239)
|
|
|
|
|
|
|
|
|
| |
If buffering=1 is specified for open() in binary mode, it is silently
treated as buffering=-1 (i.e., the default buffer size).
Coupled with the fact that line buffering is always supported in Python 2,
such behavior caused several issues (e.g., bpo-10344, bpo-21332).
Warn that line buffering is not supported if open() is called with
binary mode and buffering=1.
|
| |
|
| |
|
|
|
|
| |
newline (GH-2343)
|
|
|
|
| |
semantics (#4826)
|
| |
|