| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
stream cannot immediately return bytes. (GH-122933)
|
|
|
|
|
| |
docstrings (#127310)
Co-authored-by: Erlend E. Aasland <erlend@python.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
console (GH-124059)
Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if
the data contains invalid UTF-8 sequences, use binary search to
calculate the number of written bytes from the number of written
characters.
Also fix writing incomplete UTF-8 sequences.
Also fix handling of memory allocation failures.
|
|
|
|
|
| |
value (#127219)
Co-authored-by: Victor Stinner <vstinner@python.org>
|
|
|
|
| |
Remove _PyArg_UnpackKeywordsWithVararg.
Add comments for integer arguments of _PyArg_UnpackKeywords.
|
|
|
|
|
|
|
|
| |
Performed an audit of `fileio.c` and `_pyio` and made sure anytime the
fd changes the stat result, if set, is also cleared/changed.
There's one case where it's not cleared, if code would clear it in
__init__, keep the memory allocated and just do another fstat with the
existing memory.
|
|
|
|
| |
Replace PyUnicode_FromStringAndSize(NULL, 0)
with Py_GetConstant(Py_CONSTANT_EMPTY_STR).
|
|
|
|
|
| |
Replace PyUnicode_New(0, 0), PyUnicode_FromString("")
and PyUnicode_FromStringAndSize("", 0)
with Py_GetConstant(Py_CONSTANT_EMPTY_STR).
|
|
|
|
| |
Replace PyBytes_FromString("") and PyBytes_FromStringAndSize("", 0)
with Py_GetConstant(Py_CONSTANT_EMPTY_BYTES).
|
|
|
| |
Co-authored-by: Victor Stinner <vstinner@python.org>
|
|
|
|
| |
* Add "fileio_" prefix to getter functions.
* Small refactoring.
|
|
|
|
|
| |
Free 'self->stat_atopen' before assigning it, since
io.FileIO.__init__() can be called multiple times manually
(especially by test_io).
|
|
|
|
|
|
|
|
|
|
|
|
| |
individual members (#123412)
Multiple places in the I/O stack optimize common cases by using the
information from stat. Currently individual members are extracted from
the stat and stored into the fileio struct. Refactor the code to store
the whole stat struct instead.
Parallels the changes to _io. The `stat` Python object doesn't allow
changing members, so rather than modifying estimated_size, just clear
the value.
|
|
|
|
| |
* Replace _PyBytes_Join() with PyBytes_Join().
* Keep _PyBytes_Join() as an alias to PyBytes_Join().
|
|
|
|
| |
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Sometimes a large file is truncated (test_largefile). While
estimated_size is used as a estimate (the read will stil get the number
of bytes in the file), that it is much larger than the actual size of
data can result in a significant over allocation and sometimes lead to
a MemoryError / running out of memory.
This brings the C implementation to match the Python _pyio
implementation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces the system call count of a simple program[0] that reads all
the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my
linux system, 5813 -> 4875 on my macOS)
This reduces the number of `fstat()` calls always and seek calls most
the time. Stat was always called twice, once at open (to error early on
directories), and a second time to get the size of the file to be able
to read the whole file in one read. Now the size is cached with the
first call.
The code keeps an optimization that if the user had previously read a
lot of data, the current position is subtracted from the number of bytes
to read. That is somewhat expensive so only do it on larger files,
otherwise just try and read the extra bytes and resize the PyBytes as
needeed.
I built a little test program to validate the behavior + assumptions
around relative costs and then ran it under `strace` to get a log of the
system calls. Full samples below[1].
After the changes, this is everything in one `filename.read_text()`:
```python3
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3`
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0`
ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1) = 0
close(3) = 0
```
This does make some tradeoffs
1. If the file size changes between open() and readall(), this will
still get all the data but might have more read calls.
2. I experimented with avoiding the stat + cached result for small files
in general, but on my dev workstation at least that tended to reduce
performance compared to using the fstat().
[0]
```python3
from pathlib import Path
nlines = []
for filename in Path("cpython/Doc").glob("**/*.rst"):
nlines.append(len(filename.read_text()))
```
[1]
Before small file:
```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
lseek(3, 0, SEEK_CUR) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1) = 0
close(3) = 0
```
After small file:
```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1) = 0
close(3) = 0
```
Before large file:
```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
lseek(3, 0, SEEK_CUR) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1) = 0
close(3) = 0
```
After large file:
```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
lseek(3, 0, SEEK_CUR) = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1) = 0
close(3) = 0
```
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
|
| |
|
|
|
| |
It is faster and more obvious.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(GH-120520)
* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
|
|
|
| |
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
|
|
|
|
|
| |
When the _Py_SINGLETON() is used, Argument Clinic now adds an
explicit "pycore_runtime.h" include to get the macro. Previously, the
macro may or may not be included indirectly by another include.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.
PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.
A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
|
|
|
|
|
|
|
|
|
| |
buffer size (GH-118037)
BufferedWriter() was buffering calls that are the exact same size as the buffer. it's a very common case to read/write in blocks of the exact buffer size.
it's pointless to copy a full buffer, it's costing extra memory copy and the full buffer will have to be written in the next call anyway.
Co-authored-by: rmorotti <romain.morotti@man.com>
|
|
|
|
|
|
| |
(GH-117773)
__reduce__() does not have parameters, __reduce_ex__() has a single
parameter.
|
|
|
| |
Co-authored-by: i.khabibulin <i.khabibulin@ngrsoftlab.ru>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(GH-99709)
lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.
When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.
Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
|
|
|
|
| |
functions (GH-114886)
|
|
|
|
| |
(GH-115163)
|
| |
|
|
|
|
| |
without params (#115016)
|
|
|
|
| |
(GH-114287)
|
|
|
|
|
|
|
|
| |
This comment appears to have been mistakenly copied from what is now
called iobase_check_closed() in commit 4d9aec022063.
Also unite the iobase_check_closed() code with the relevant comment.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
|
|
|
|
|
|
|
| |
On Windows, set _O_NOINHERIT flag on file descriptors
created by os.pipe() and io.WindowsConsoleIO.
Add test_pipe_spawnl() to test_os.
Co-authored-by: Zackery Spytz <zspytz@gmail.com>
|
|
|
|
|
|
| |
Use the object's actual class name in the following _io type's __repr__:
- FileIO
- TextIOWrapper
- _WindowsConsoleIO
|
|
|
|
|
|
|
| |
(GH-22535)
io.TextIOWrapper was dropping the internal decoding buffer
during read() and write() calls.
|
|
|
|
|
| |
---------
Co-authored-by: Erlend E. Aasland <erlend@python.org>
|
| |
|
|
|
|
| |
(GH-111221)
|
|
|
|
|
|
| |
---------
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
|
| |
|
| |
|
|
|
|
| |
classes thread safe (gh-112298)
|
|
|
|
| |
(gh-112116)
|
| |
|
|
|
|
| |
(gh-112193)
|
|
|
|
|
|
| |
In non-debug more the check for the "errors" argument is skipped,
and then PyUnicode_AsUTF8() can fail, but its result was not checked.
Co-authored-by: Victor Stinner <vstinner@python.org>
|