summaryrefslogtreecommitdiffstats
path: root/Modules/_io/fileio.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-120754: _io Ensure stat cache is cleared on fd change (#125166)Cody Maloney2024-11-011-5/+6
| | | | | | | | Performed an audit of `fileio.c` and `_pyio` and made sure anytime the fd changes the stat result, if set, is also cleared/changed. There's one case where it's not cleared, if code would clear it in __init__, keep the memory allocated and just do another fstat with the existing memory.
* gh-90102: Remove isatty call during regular open (#124922)Cody Maloney2024-10-081-3/+19
| | | Co-authored-by: Victor Stinner <vstinner@python.org>
* gh-111178: Fix function signatures in fileio.c (#125043)Victor Stinner2024-10-071-37/+51
| | | | * Add "fileio_" prefix to getter functions. * Small refactoring.
* gh-120754: Fix memory leak in FileIO.__init__() (#124225)Victor Stinner2024-09-181-0/+1
| | | | | Free 'self->stat_atopen' before assigning it, since io.FileIO.__init__() can be called multiple times manually (especially by test_io).
* gh-120754: Refactor I/O modules to stash whole stat result rather than ↵Cody Maloney2024-09-181-26/+57
| | | | | | | | | | | | individual members (#123412) Multiple places in the I/O stack optimize common cases by using the information from stat. Currently individual members are extracted from the stat and stored into the fileio struct. Refactor the code to store the whole stat struct instead. Parallels the changes to _io. The `stat` Python object doesn't allow changing members, so rather than modifying estimated_size, just clear the value.
* gh-120754: Update estimated_size in C truncate (#121357)Cody Maloney2024-07-041-0/+6
| | | | | | | | | | Sometimes a large file is truncated (test_largefile). While estimated_size is used as a estimate (the read will stil get the number of bytes in the file), that it is much larger than the actual size of data can result in a significant over allocation and sometimes lead to a MemoryError / running out of memory. This brings the C implementation to match the Python _pyio implementation.
* gh-120754: Reduce system calls in full-file FileIO.readall() case (#120755)Cody Maloney2024-07-041-25/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
* gh-117764: Add signatures for __reduce__ and __reduce_ex__ in the _io module ↵Serhiy Storchaka2024-04-121-2/+2
| | | | | | (GH-117773) __reduce__() does not have parameters, __reduce_ex__() has a single parameter.
* gh-82626: Emit a warning when bool is used as a file descriptor (GH-111275)Serhiy Storchaka2024-02-051-0/+7
|
* gh-114286: Fix `maybe-uninitialized` warning in `Modules/_io/fileio.c` ↵Nikita Sobolev2024-01-191-1/+1
| | | | (GH-114287)
* gh-66060: Use actual class name in _io type's __repr__ (#30824)AN Long2024-01-091-8/+9
| | | | | | Use the object's actual class name in the following _io type's __repr__: - FileIO - TextIOWrapper - _WindowsConsoleIO
* gh-110014: Include explicitly <unistd.h> header (#110155)Victor Stinner2023-09-301-15/+19
| | | | | | | | | | | * Remove unused <locale.h> includes. * Remove unused <fcntl.h> include in traceback.h. * Remove redundant <assert.h> and <stddef.h> includes. They are already included by "Python.h". * Remove <object.h> include in faulthandler.c. Python.h already includes it. * Add missing <stdbool.h> in pycore_pythread.h if HAVE_PTHREAD_STUBS is defined. * Fix also warnings in pthread_stubs.h: don't redefine macros if they are already defined, like the __NEED_pthread_t macro.
* gh-106320: Remove private _PyErr_ChainExceptions() (#108713)Victor Stinner2023-08-311-0/+1
| | | | | | | | | | | | | Remove _PyErr_ChainExceptions(), _PyErr_ChainExceptions1() and _PyErr_SetFromPyStatus() functions from the public C API. * Move the private _PyErr_ChainExceptions() and _PyErr_ChainExceptions1() function to the internal C API (pycore_pyerrors.h). * Move the private _PyErr_SetFromPyStatus() to the internal C API (pycore_initconfig.h). * No longer export the _PyErr_ChainExceptions() function. * Move run_in_subinterp_with_config() from _testcapi to _testinternalcapi.
* gh-107913: Fix possible losses of OSError error codes (GH-107930)Serhiy Storchaka2023-08-261-6/+6
| | | | | | Functions like PyErr_SetFromErrno() and SetFromWindowsErr() should be called immediately after using the C API which sets errno or the Windows error code.
* gh-108444: Replace _PyLong_AsInt() with PyLong_AsInt() (#108459)Victor Stinner2023-08-241-2/+2
| | | | | | Change generated by the command: sed -i -e 's!_PyLong_AsInt!PyLong_AsInt!g' \ $(find -name "*.c" -o -name "*.h")
* gh-106869: Use new PyMemberDef constant names (#106871)Victor Stinner2023-07-251-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Remove '#include "structmember.h"'. * If needed, add <stddef.h> to get offsetof() function. * Update Parser/asdl_c.py to regenerate Python/Python-ast.c. * Replace: * T_SHORT => Py_T_SHORT * T_INT => Py_T_INT * T_LONG => Py_T_LONG * T_FLOAT => Py_T_FLOAT * T_DOUBLE => Py_T_DOUBLE * T_STRING => Py_T_STRING * T_OBJECT => _Py_T_OBJECT * T_CHAR => Py_T_CHAR * T_BYTE => Py_T_BYTE * T_UBYTE => Py_T_UBYTE * T_USHORT => Py_T_USHORT * T_UINT => Py_T_UINT * T_ULONG => Py_T_ULONG * T_STRING_INPLACE => Py_T_STRING_INPLACE * T_BOOL => Py_T_BOOL * T_OBJECT_EX => Py_T_OBJECT_EX * T_LONGLONG => Py_T_LONGLONG * T_ULONGLONG => Py_T_ULONGLONG * T_PYSSIZET => Py_T_PYSSIZET * T_NONE => _Py_T_NONE * READONLY => Py_READONLY * PY_AUDIT_READ => Py_AUDIT_READ * READ_RESTRICTED => Py_AUDIT_READ * PY_WRITE_RESTRICTED => _Py_WRITE_RESTRICTED * RESTRICTED => (READ_RESTRICTED | _Py_WRITE_RESTRICTED)
* gh-106521: Remove _PyObject_LookupAttr() function (GH-106642)Serhiy Storchaka2023-07-121-1/+1
|
* gh-104922: remove PY_SSIZE_T_CLEAN (#106315)Inada Naoki2023-07-021-1/+0
|
* gh-105156: Deprecate the old Py_UNICODE type in C API (#105157)Victor Stinner2023-06-011-1/+1
| | | | | | | | Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead. Replace Py_UNICODE with wchar_t in multiple C files. Co-authored-by: Inada Naoki <songofacandy@gmail.com>
* gh-101819: Isolate `_io` (#101948)Erlend E. Aasland2023-05-151-1/+3
| | | | Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com> Co-authored-by: Victor Stinner <vstinner@python.org>
* gh-101819: Refactor `_io` futher in preparation for module isolation (#104369)Erlend E. Aasland2023-05-111-22/+31
|
* gh-101819: Prepare to modernize the _io extension (#104178)Victor Stinner2023-05-051-4/+9
| | | | | | | | | | | | | | | | | | | | | * Add references to static types to _PyIO_State: * PyBufferedIOBase_Type * PyBytesIOBuffer_Type * PyIncrementalNewlineDecoder_Type * PyRawIOBase_Type * PyTextIOBase_Type * Add the defining class to methods: * _io.BytesIO.getbuffer() * _io.FileIO.close() * Add get_io_state_by_cls() function. * Add state parameter to _textiowrapper_decode() * _io_TextIOWrapper___init__() now sets self->state before calling _textiowrapper_set_decoder(). Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
* gh-102255: Improve build support for Windows API partitions (GH-102256)Max Bachmann2023-03-091-0/+2
| | | | | Add `MS_WINDOWS_DESKTOP`, `MS_WINDOWS_APPS`, `MS_WINDOWS_SYSTEM` and `MS_WINDOWS_GAMES` preprocessor definitions to allow switching off functionality missing from particular API partitions ("partitions" are used in Windows to identify overlapping subsets of APIs). CPython only officially supports `MS_WINDOWS_DESKTOP` and `MS_WINDOWS_SYSTEM` (APPS is included by normal desktop builds, but APPS without DESKTOP is not covered). Other configurations are a convenience for people building their own runtimes. `MS_WINDOWS_GAMES` is for the Xbox subset of the Windows API, which is also available on client OS, but is restricted compared to `MS_WINDOWS_DESKTOP`. These restrictions may change over time, as they relate to the build headers rather than the OS support, and so we assume that Xbox builds will use the latest available version of the GDK.
* gh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives ↵Irit Katriel2023-02-241-14/+17
| | | | (in Modules/) (#102196)
* gh-101819: Adapt _io types to heap types, batch 1 (GH-101949)Erlend E. Aasland2023-02-201-57/+33
| | | | | Adapt StringIO, TextIOWrapper, FileIO, Buffered*, and BytesIO types. Automerge-Triggered-By: GH:erlend-aasland
* bpo-15999: Accept arbitrary values for boolean parameters. (#15609)Serhiy Storchaka2022-12-031-2/+2
| | | builtins and extension module functions and methods that expect boolean values for parameters now accept any Python object rather than just a bool or int type. This is more consistent with how native Python code itself behaves.
* bpo-38031: Fix a possible assertion failure in _io.FileIO() (#GH-5688)Zackery Spytz2022-11-251-1/+5
|
* gh-92536: PEP 623: Remove wstr and legacy APIs from Unicode (GH-92537)Inada Naoki2022-05-121-9/+0
|
* bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized ↵Eric Snow2022-02-081-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | global objects. (gh-30928) We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code. It is still used in a number of non-builtin stdlib modules. The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime. A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings). https://bugs.python.org/issue46541#msg411799 explains the rationale for this change. The core of the change is in: * (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros * Include/internal/pycore_runtime_init.h - added the static initializers for the global strings * Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState * Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings. That check is added to the PR CI config. The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()). This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *. The following are not changed (yet): * stop using _Py_IDENTIFIER() in the stdlib modules * (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API * (maybe) intern the strings during runtime init https://bugs.python.org/issue46541
* bpo-45434: Move _Py_BEGIN_SUPPRESS_IPH to pycore_fileutils.h (GH-28922)Victor Stinner2021-10-131-1/+2
|
* bpo-36346: Make using the legacy Unicode C API optional (GH-21437)Serhiy Storchaka2020-07-101-0/+12
| | | | Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0 makes the interpreter not using the wchar_t cache and the legacy Unicode C API.
* bpo-37999: No longer use __int__ in implicit integer conversions. (GH-15636)Serhiy Storchaka2020-05-261-10/+0
| | | | Only __index__ should be used to make integer conversions lossless.
* bpo-40268: Remove unused structmember.h includes (GH-19530)Victor Stinner2020-04-151-1/+1
| | | | | | If only offsetof() is needed: include stddef.h instead. When structmember.h is used, add a comment explaining that PyMemberDef is used.
* closes bpo-27805: Ignore ESPIPE in initializing seek of append-mode files. ↵Benjamin Peterson2019-11-121-9/+15
| | | | | (GH-17112) This change, which follows the behavior of C stdio's fdopen and Python 2's file object, allows pipes to be opened in append mode.
* bpo-37206: Unrepresentable default values no longer represented as None. ↵Serhiy Storchaka2019-09-141-3/+3
| | | | | | | (GH-13933) In ArgumentClinic, value "NULL" should now be used only for unrepresentable default values (like in the optional third parameter of getattr). "None" should be used if None is accepted as argument and passing None has the same effect as not passing the argument at all.
* bpo-37547: add _PyObject_CallMethodOneArg (GH-14685)Jeroen Demeyer2019-07-111-2/+2
|
* bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async ↵Jeroen Demeyer2019-05-311-2/+2
| | | | | | | | | (GH-13464) Automatically replace tp_print -> tp_vectorcall_offset tp_compare -> tp_as_async tp_reserved -> tp_as_async
* bpo-32388: Remove cross-version binary compatibility requirement in tp_flags ↵Antoine Pitrou2019-05-291-3/+3
| | | | | | | | (GH-4944) It is now allowed to add new fields at the end of the PyTypeObject struct without having to allocate a dedicated compatibility flag in tp_flags. This will reduce the risk of running out of bits in the 32-bit tp_flags value.
* bpo-36842: Implement PEP 578 (GH-12613)Steve Dower2019-05-231-0/+4
| | | Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
* bpo-35081: Add Include/internal/pycore_object.h (GH-10640)Victor Stinner2018-11-211-0/+1
| | | | Move _PyObject_GC_TRACK() and _PyObject_GC_UNTRACK() from Include/objimpl.h to Include/internal/pycore_object.h.
* bpo-33138: Change standard error message for non-pickleable and non-copyable ↵Serhiy Storchaka2018-10-311-9/+0
| | | | types. (GH-6239)
* bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)Stéphane Wirtel2018-10-171-5/+3
| | | On macOS, fix reading from and writing into a file with a size larger than 2 GiB.
* bpo-33012: Fix invalid function cast warnings with gcc 8 for METH_NOARGS. ↵Siddhesh Poyarekar2018-04-291-1/+1
| | | | | | | | | (GH-6030) METH_NOARGS functions need only a single argument but they are cast into a PyCFunction, which takes two arguments. This triggers an invalid function cast warning in gcc8 due to the argument mismatch. Fix this by adding a dummy unused argument.
* bpo-32571: Avoid raising unneeded AttributeError and silencing it in C code ↵Serhiy Storchaka2018-01-251-5/+3
| | | | | (GH-5222) Add two new private APIs: _PyObject_LookupAttr() and _PyObject_LookupAttrId()
* bpo-32186: Release the GIL during lseek and fstat (#4652)Nir Soffer2017-12-011-1/+5
| | | | | | In _io_FileIO_readall_impl(), lseek() and _Py_fstat_noraise() were called without releasing the GIL. This can cause all threads to hang for unlimited time when calling FileIO.read() and the NFS server is not accessible.
* Replace KB unit with KiB (#4293)Victor Stinner2017-11-081-1/+1
| | | | | | | | | | | kB (*kilo* byte) unit means 1000 bytes, whereas KiB ("kibibyte") means 1024 bytes. KB was misused: replace kB or KB with KiB when appropriate. Same change for MB and GB which become MiB and GiB. Change the output of Tools/iobench/iobench.py. Round also the size of the documentation from 5.5 MB to 5 MiB.
* [security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302)Serhiy Storchaka2017-06-281-2/+1
| | | | | | | Based on patch by Victor Stinner. Add private C API function _PyUnicode_AsUnicode() which is similar to PyUnicode_AsUnicode(), but checks for null characters.
* bpo-30228: FileIO seek() and tell() set seekable (#1384)Victor Stinner2017-05-021-15/+21
| | | | | | | FileIO.seek() and FileIO.tell() method now set the internal seekable attribute to avoid one syscall on open() (in buffered or text mode). The seekable property is now also more reliable since its value is set correctly on memory allocation failure.
* bpo-30022: Get rid of using EnvironmentError and IOError (except test… (#1051)Serhiy Storchaka2017-04-161-4/+4
|
* bpo-29852: Argument Clinic Py_ssize_t converter now supports None (#716)Serhiy Storchaka2017-03-301-9/+2
| | | if pass `accept={int, NoneType}`.