summaryrefslogtreecommitdiffstats
path: root/Modules/_io
Commit message (Collapse)AuthorAgeFilesLines
* gh-128083: Fix macro redefinition warning in clinic. (GH-127950)Peter Bierma2024-12-191-1/+10
|
* gh-109523: Raise a BlockingIOError if reading text from a non-blocking ↵Giovanni Siragusa2024-12-021-0/+6
| | | | stream cannot immediately return bytes. (GH-122933)
* gh-127341: Argument Clinic: fix compiler warnings for getters with ↵Peter Bierma2024-11-293-51/+21
| | | | | docstrings (#127310) Co-authored-by: Erlend E. Aasland <erlend@python.org>
* gh-124008: Fix calculation of the number of written bytes for the Windows ↵Serhiy Storchaka2024-11-271-28/+90
| | | | | | | | | | | | console (GH-124059) Since MultiByteToWideChar()/WideCharToMultiByte() is not reversible if the data contains invalid UTF-8 sequences, use binary search to calculate the number of written bytes from the number of written characters. Also fix writing incomplete UTF-8 sequences. Also fix handling of memory allocation failures.
* gh-127182: Fix `io.StringIO.__setstate__` crash when `None` is the first ↵sobolevn2024-11-251-14/+16
| | | | | value (#127219) Co-authored-by: Victor Stinner <vstinner@python.org>
* gh-122943: Add the varpos parameter in _PyArg_UnpackKeywords (GH-126564)Serhiy Storchaka2024-11-088-37/+66
| | | | Remove _PyArg_UnpackKeywordsWithVararg. Add comments for integer arguments of _PyArg_UnpackKeywords.
* gh-120754: _io Ensure stat cache is cleared on fd change (#125166)Cody Maloney2024-11-011-5/+6
| | | | | | | | Performed an audit of `fileio.c` and `_pyio` and made sure anytime the fd changes the stat result, if set, is also cleared/changed. There's one case where it's not cleared, if code would clear it in __init__, keep the memory allocated and just do another fstat with the existing memory.
* gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_STR) (#125583)Victor Stinner2024-10-251-1/+1
| | | | Replace PyUnicode_FromStringAndSize(NULL, 0) with Py_GetConstant(Py_CONSTANT_EMPTY_STR).
* gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_STR) (#125194)Victor Stinner2024-10-091-1/+1
| | | | | Replace PyUnicode_New(0, 0), PyUnicode_FromString("") and PyUnicode_FromStringAndSize("", 0) with Py_GetConstant(Py_CONSTANT_EMPTY_STR).
* gh-115754: Use Py_GetConstant(Py_CONSTANT_EMPTY_BYTES) (#125195)Victor Stinner2024-10-091-1/+1
| | | | Replace PyBytes_FromString("") and PyBytes_FromStringAndSize("", 0) with Py_GetConstant(Py_CONSTANT_EMPTY_BYTES).
* gh-90102: Remove isatty call during regular open (#124922)Cody Maloney2024-10-083-4/+21
| | | Co-authored-by: Victor Stinner <vstinner@python.org>
* gh-111178: Fix function signatures in fileio.c (#125043)Victor Stinner2024-10-071-37/+51
| | | | * Add "fileio_" prefix to getter functions. * Small refactoring.
* gh-120754: Fix memory leak in FileIO.__init__() (#124225)Victor Stinner2024-09-181-0/+1
| | | | | Free 'self->stat_atopen' before assigning it, since io.FileIO.__init__() can be called multiple times manually (especially by test_io).
* gh-120754: Refactor I/O modules to stash whole stat result rather than ↵Cody Maloney2024-09-181-26/+57
| | | | | | | | | | | | individual members (#123412) Multiple places in the I/O stack optimize common cases by using the information from stat. Currently individual members are extracted from the stat and stored into the fileio struct. Refactor the code to store the whole stat struct instead. Parallels the changes to _io. The `stat` Python object doesn't allow changing members, so rather than modifying estimated_size, just clear the value.
* gh-121645: Add PyBytes_Join() function (#121646)Victor Stinner2024-08-302-3/+3
| | | | * Replace _PyBytes_Join() with PyBytes_Join(). * Keep _PyBytes_Join() as an alias to PyBytes_Join().
* Fix typos in docs, error messages and comments (#123336)Wulian2024-08-281-1/+1
| | | | Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* gh-121489: Export private _PyBytes_Join() again (#122267)Marc Mueller2024-07-251-1/+0
|
* gh-120754: Update estimated_size in C truncate (#121357)Cody Maloney2024-07-041-0/+6
| | | | | | | | | | Sometimes a large file is truncated (test_largefile). While estimated_size is used as a estimate (the read will stil get the number of bytes in the file), that it is much larger than the actual size of data can result in a significant over allocation and sometimes lead to a MemoryError / running out of memory. This brings the C implementation to match the Python _pyio implementation.
* gh-120754: Reduce system calls in full-file FileIO.readall() case (#120755)Cody Maloney2024-07-041-25/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
* Fixes loop variables to be the same types as their limit (GH-120958)Steve Dower2024-06-241-1/+1
|
* Use _PyLong_IsNegative instead of _PyLong_Sign if appropriate. (GH-120493)Serhiy Storchaka2024-06-241-5/+2
| | | It is faster and more obvious.
* gh-113993: Allow interned strings to be mortal, and fix related issues ↵Petr Viktorin2024-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (GH-120520) * Add an InternalDocs file describing how interning should work and how to use it. * Add internal functions to *explicitly* request what kind of interning is done: - `_PyUnicode_InternMortal` - `_PyUnicode_InternImmortal` - `_PyUnicode_InternStatic` * Switch uses of `PyUnicode_InternInPlace` to those. * Disallow using `_Py_SetImmortal` on strings directly. You should use `_PyUnicode_InternImmortal` instead: - Strings should be interned before immortalization, otherwise you're possibly interning a immortalizing copy. - `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to `SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in backports, as they are now part of public API and version-specific ABI. * Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery. * Make sure the statically allocated string singletons are unique. This means these sets are now disjoint: - `_Py_ID` - `_Py_STR` (including the empty string) - one-character latin-1 singletons Now, when you intern a singleton, that exact singleton will be interned. * Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic). * Intern `_Py_STR` singletons at startup. * For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup. * Beef up the tests. Cover internal details (marked with `@cpython_only`). * Add lots of assertions Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
* gh-119506: fix `_io.TextIOWrapper.write()` write during flush (#119507)Radislav Chugunov2024-06-031-9/+22
| | | Co-authored-by: Inada Naoki <songofacandy@gmail.com>
* gh-119661: Add _Py_SINGLETON() include in Argumenet Clinic (#119712)Victor Stinner2024-05-293-5/+8
| | | | | When the _Py_SINGLETON() is used, Argument Clinic now adds an explicit "pycore_runtime.h" include to get the macro. Previously, the macro may or may not be included indirectly by another include.
* gh-116322: Add Py_mod_gil module slot (#116882)Brett Simmers2024-05-031-0/+1
| | | | | | | | | | | | | | This PR adds the ability to enable the GIL if it was disabled at interpreter startup, and modifies the multi-phase module initialization path to enable the GIL when loading a module, unless that module's spec includes a slot indicating it can run safely without the GIL. PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148. A warning will be issued up to once per interpreter for the first GIL-using module that is loaded. If `-v` is given, a shorter message will be printed to stderr every time a GIL-using module is loaded (including the first one that issues a warning).
* gh-117151: optimize BufferedWriter(), do not buffer writes that are the ↵morotti2024-04-231-2/+2
| | | | | | | | | buffer size (GH-118037) BufferedWriter() was buffering calls that are the exact same size as the buffer. it's a very common case to read/write in blocks of the exact buffer size. it's pointless to copy a full buffer, it's costing extra memory copy and the full buffer will have to be written in the next call anyway. Co-authored-by: rmorotti <romain.morotti@man.com>
* gh-117764: Add signatures for __reduce__ and __reduce_ex__ in the _io module ↵Serhiy Storchaka2024-04-123-10/+10
| | | | | | (GH-117773) __reduce__() does not have parameters, __reduce_ex__() has a single parameter.
* gh-117068: Remove useless code in bytesio.c:resize_buffer() (GH-117069)NGRsoftlab2024-03-221-3/+0
| | | Co-authored-by: i.khabibulin <i.khabibulin@ngrsoftlab.ru>
* gh-115538: Emit warning when use bool as fd in _io.WindowsConsoleIO (GH-116925)AN Long2024-03-181-0/+7
|
* gh-95782: Fix io.BufferedReader.tell() etc. being able to return offsets < 0 ↵6t8k2024-02-171-1/+10
| | | | | | | | | | | | | | | | | | | | (GH-99709) lseek() always returns 0 for character pseudo-devices like `/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it always returns -1, to which CPython reacts by raising appropriate exceptions). They are thus technically seekable despite not having seek semantics. When calling read() on e.g. an instance of `io.BufferedReader` that wraps such a file, `BufferedReader` reads ahead, filling its buffer, creating a discrepancy between the number of bytes read and the internal `tell()` always returning 0, which previously resulted in e.g. `BufferedReader.tell()` or `BufferedReader.seek()` being able to return positions < 0 even though these are supposed to be always >= 0. Invariably keep the return value non-negative by returning max(former_return_value, 0) instead, and add some corresponding tests.
* gh-111140: Adds PyLong_AsNativeBytes and PyLong_FromNative[Unsigned]Bytes ↵Steve Dower2024-02-121-1/+1
| | | | functions (GH-114886)
* gh-115059: Flush the underlying write buffer in io.BufferedRandom.read1() ↵Serhiy Storchaka2024-02-091-0/+10
| | | | (GH-115163)
* gh-82626: Emit a warning when bool is used as a file descriptor (GH-111275)Serhiy Storchaka2024-02-051-0/+7
|
* gh-115015: Argument Clinic: fix generated code for METH_METHOD methods ↵Erlend E. Aasland2024-02-056-12/+12
| | | | without params (#115016)
* gh-114286: Fix `maybe-uninitialized` warning in `Modules/_io/fileio.c` ↵Nikita Sobolev2024-01-191-1/+1
| | | | (GH-114287)
* Fix an incorrect comment in iobase_is_closed (GH-102952)Jonathon Reinhart2024-01-161-10/+9
| | | | | | | | This comment appears to have been mistakenly copied from what is now called iobase_check_closed() in commit 4d9aec022063. Also unite the iobase_check_closed() code with the relevant comment. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* gh-77046: os.pipe() sets _O_NOINHERIT flag on fds (#113817)Victor Stinner2024-01-101-2/+2
| | | | | | | | On Windows, set _O_NOINHERIT flag on file descriptors created by os.pipe() and io.WindowsConsoleIO. Add test_pipe_spawnl() to test_os. Co-authored-by: Zackery Spytz <zspytz@gmail.com>
* gh-66060: Use actual class name in _io type's __repr__ (#30824)AN Long2024-01-093-20/+29
| | | | | | Use the object's actual class name in the following _io type's __repr__: - FileIO - TextIOWrapper - _WindowsConsoleIO
* gh-80109: Fix io.TextIOWrapper dropping the internal buffer during write() ↵Zackery Spytz2024-01-081-4/+8
| | | | | | | (GH-22535) io.TextIOWrapper was dropping the internal decoding buffer during read() and write() calls.
* gh-112205: Support docstring for `@getter` (#113160)Donghee Na2023-12-204-49/+204
| | | | | --------- Co-authored-by: Erlend E. Aasland <erlend@python.org>
* gh-112205: Update textio module to use `@getter` as possible. (gh-113095)Donghee Na2023-12-142-49/+125
|
* gh-111049: Fix crash during garbage collection of the BytesIO buffer object ↵Serhiy Storchaka2023-12-141-10/+4
| | | | (GH-111221)
* gh-112205: Support `@setter` annotation from AC (gh-112922)Donghee Na2023-12-136-52/+113
| | | | | | --------- Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
* gh-112205: Update stringio module to use AC for the thread-safe (gh-112549)Donghee Na2023-11-302-35/+79
|
* gh-112205: Support @getter annotation from AC (gh-112396)Donghee Na2023-11-302-49/+88
|
* gh-111965: Use critical sections to make io.BufferedIOBase and its related ↵Mayuresh Kedari2023-11-222-40/+180
| | | | classes thread safe (gh-112298)
* gh-111965: Using critical sections to make ``io.StringIO`` thread safe. ↵AN Long2023-11-192-30/+194
| | | | (gh-112116)
* gh-111903: Update AC to support "pycore_critical_section.h" header (gh-112251)Donghee Na2023-11-194-4/+4
|
* gh-111965: Use critical sections to make io.TextIOWrapper thread safe ↵AN Long2023-11-182-39/+181
| | | | (gh-112193)
* gh-111942: Fix SystemError in the TextIOWrapper constructor (#112061)Serhiy Storchaka2023-11-141-2/+6
| | | | | | In non-debug more the check for the "errors" argument is skipped, and then PyUnicode_AsUTF8() can fail, but its result was not checked. Co-authored-by: Victor Stinner <vstinner@python.org>