summaryrefslogtreecommitdiffstats
path: root/Python/fileutils.c
Commit message (Collapse)AuthorAgeFilesLines
* bpo-42236: os.device_encoding() respects UTF-8 Mode (GH-23119)Victor Stinner2020-11-041-11/+7
| | | | On Unix, the os.device_encoding() function now returns 'UTF-8' rather than the device encoding if the Python UTF-8 Mode is enabled.
* bpo-42236: Use UTF-8 encoding if nl_langinfo(CODESET) fails (GH-23086)Victor Stinner2020-11-011-26/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the nl_langinfo(CODESET) function returns an empty string, Python now uses UTF-8 as the filesystem encoding. In May 2010 (commit b744ba1d14c5487576c95d0311e357b707600b47), I modified Python to log a warning and use UTF-8 as the filesystem encoding (instead of None) if nl_langinfo(CODESET) returns an empty string. In August 2020 (commit 94908bbc1503df830d1d615e7b57744ae1b41079), I modified Python startup to fail with a fatal error and a specific error message if nl_langinfo(CODESET) returns an empty string. The intent was to prevent guessing the encoding and also investigate user configuration where this case happens. In 10 years (2010 to 2020), I saw zero user report about the error message related to nl_langinfo(CODESET) returning an empty string. Today, UTF-8 became the defacto standard and it's safe to make the assumption that the user expects UTF-8. For example, nl_langinfo(CODESET) can return an empty string on macOS if the LC_CTYPE locale is not supported, and UTF-8 is the default encoding on macOS. While this change is likely to not affect anyone in practice, it should make UTF-8 lover happy ;-) Rewrite also the documentation explaining how Python selects the filesystem encoding and error handler.
* bpo-42236: Enhance _locale._get_locale_encoding() (GH-23083)Victor Stinner2020-11-011-15/+59
| | | | | | | | | * Rename _Py_GetLocaleEncoding() to _Py_GetLocaleEncodingObject() * Add _Py_GetLocaleEncoding() which returns a wchar_t* string to share code between _Py_GetLocaleEncodingObject() and config_get_locale_encoding(). * _Py_GetLocaleEncodingObject() now decodes nl_langinfo(CODESET) from the current locale encoding with surrogateescape, rather than using UTF-8.
* bpo-42208: Add _Py_GetLocaleEncoding() (GH-23050)Victor Stinner2020-10-311-1/+42
| | | | | | | | _io.TextIOWrapper no longer calls getpreferredencoding(False) of _bootlocale to get the locale encoding, but calls _Py_GetLocaleEncoding() instead. Add config_get_fs_encoding() sub-function. Reorganize also config_get_locale_encoding() code.
* bpo-38324: Fix test__locale.py Windows failures (GH-20529)TIGirardi2020-10-201-2/+13
| | | | Use wide-char _W_* fields of lconv structure on Windows Remove "ps_AF" from test__locale.known_numerics on Windows
* bpo-40422: Move _Py_closerange to fileutils.c (GH-22680)Kyle Evans2020-10-131-0/+76
| | | | | | | This API is relatively lightweight and organizationally, given that it's used by multiple modules, it makes sense to move it to fileutils. Requires making sure that _posixsubprocess is compiled with the appropriate Py_BUIILD_CORE_BUILTIN macro.
* bpo-36346: Make using the legacy Unicode C API optional (GH-21437)Serhiy Storchaka2020-07-101-4/+19
| | | | Add compile time option USE_UNICODE_WCHAR_CACHE. Setting it to 0 makes the interpreter not using the wchar_t cache and the legacy Unicode C API.
* bpo-41094: Fix decoding errors with audit when open files. (GH-21095)Serhiy Storchaka2020-06-241-4/+19
|
* bpo-40957: Fix refleak in _Py_fopen_obj() (GH-20827)Christian Heimes2020-06-131-0/+1
| | | Signed-off-by: Christian Heimes <christian@python.org>
* bpo-40268: Remove unused osdefs.h includes (GH-19532)Victor Stinner2020-04-151-1/+1
| | | When the include is needed, add required symbol in a comment.
* bpo-39943: Add the const qualifier to pointers on non-mutable PyBytes data. ↵Serhiy Storchaka2020-04-121-1/+1
| | | | (GH-19472)
* bpo-38353: Add subfunctions to getpath.c (GH-16572)Victor Stinner2019-10-041-4/+6
| | | | | | | | | | | | | | Following symbolic links is now limited to 40 attempts, just to prevent loops. Add subfunctions: * Add resolve_symlinks() * Add calculate_argv0_path_framework() * Add calculate_which() * Add calculate_program_macos() Fix also _Py_wreadlink(): readlink() result type is Py_ssize_t, not int.
* bpo-37549: os.dup() fails for standard streams on Windows 7 (GH-15389)Zackery Spytz2019-08-231-1/+8
|
* bpo-37834: Normalise handling of reparse points on Windows (GH-15231)Steve Dower2019-08-211-1/+6
| | | | | | | | | | bpo-37834: Normalise handling of reparse points on Windows * ntpath.realpath() and nt.stat() will traverse all supported reparse points (previously was mixed) * nt.lstat() will let the OS traverse reparse points that are not name surrogates (previously would not traverse any reparse point) * nt.[l]stat() will only set S_IFLNK for symlinks (previous behaviour) * nt.readlink() will read destinations for symlinks and junction points only bpo-1311: os.path.exists('nul') now returns True on Windows * nt.stat('nul').st_mode is now S_IFCHR (previously was an error)
* bpo-20443: _PyConfig_Read() gets the absolute path of run_filename (GH-14053)Victor Stinner2019-06-251-0/+97
| | | | | | | | | | | | Python now gets the absolute path of the script filename specified on the command line (ex: "python3 script.py"): the __file__ attribute of the __main__ module, sys.argv[0] and sys.path[0] become an absolute path, rather than a relative path. * Add _Py_isabs() and _Py_abspath() functions. * _PyConfig_Read() now tries to get the absolute path of run_filename, but keeps the relative path if _Py_abspath() fails. * Reimplement os._getfullpathname() using _Py_abspath(). * Use _Py_isabs() in getpath.c.
* bpo-37267: Do not check for FILE_TYPE_CHAR in os.dup() on Windows (GH-14051)Zackery Spytz2019-06-171-12/+5
| | | | On Windows, os.dup() no longer creates an inheritable fd when handling a character file.
* bpo-36842: Implement PEP 578 (GH-12613)Steve Dower2019-05-231-0/+18
| | | Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
* bpo-36775: Add _Py_FORCE_UTF8_FS_ENCODING macro (GH-13056)Victor Stinner2019-05-021-8/+8
| | | | | | | Add _Py_FORCE_UTF8_LOCALE and _Py_FORCE_UTF8_FS_ENCODING macros to avoid factorize "#if defined(__ANDROID__) || defined(__VXWORKS__)" and "#if defined(__APPLE__)". Cleanup also config_init_fs_encoding().
* bpo-36352: Avoid hardcoded MAXPATHLEN size in getpath.c (GH-12423)Victor Stinner2019-03-191-1/+1
| | | | * Use Py_ARRAY_LENGTH() rather than hardcoded MAXPATHLEN in getpath.c. * Pass string length to functions modifying strings.
* bpo-36352: Clarify fileutils.h documentation (GH-12406)Victor Stinner2019-03-181-14/+21
| | | | | | | The last parameter of _Py_wreadlink(), _Py_wrealpath() and _Py_wgetcwd() is a length, not a size: number of characters including the trailing NUL character. Enhance also documentation of error conditions.
* bpo-31904: Add encoding support for VxWorks RTOS (GH-12051)pxinwr2019-03-041-3/+3
| | | | | | | | Use UTF-8 as the system encoding on VxWorks. The main reason are: 1. The locale is frequently misconfigured. 2. Missing some functions to deal with locale in VxWorks C library.
* bpo-34523: Fix C locale coercion on FreeBSD CURRENT (GH-10672)Victor Stinner2018-11-231-0/+13
| | | | | | | | bpo-34523, bpo-35290: C locale coercion now resets the Python internal "force ASCII" mode. This change fix the filesystem encoding on FreeBSD CURRENT, which has a new "C.UTF-8" locale, when the UTF-8 mode is disabled. Add _Py_ResetForceASCII(): _Py_SetLocaleFromEnv() now calls it.
* bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606)Victor Stinner2018-11-201-27/+18
| | | | | | | | | | | | | | | | | | | | | locale.localeconv() now sets temporarily the LC_CTYPE locale to the LC_MONETARY locale if the two locales are different and monetary strings are non-ASCII. This temporary change affects other threads. Changes: * locale.localeconv() can now set LC_CTYPE to LC_MONETARY to decode monetary fields. * Add LocaleInfo.grouping_buffer: copy localeconv() grouping string since it can be replaced anytime if a different thread calls localeconv(). * _Py_GetLocaleconvNumeric() now requires a "struct lconv *" structure, so locale.localeconv() now longer calls localeconv() twice. Moreover, the function now requires all arguments to be non-NULL. * Rename STATIC_LOCALE_INFO_INIT to LocaleInfo_STATIC_INIT. * Move _Py_GetLocaleconvNumeric() definition from fileutils.h to pycore_fileutils.h. pycore_fileutils.h now includes locale.h. * The _locale module is now built with Py_BUILD_CORE defined.
* bpo-35081: Add pycore_fileutils.h (GH-10371)Victor Stinner2018-11-061-0/+1
| | | | Move Py_BUILD_CORE code from Include/fileutils.h to a new Include/internal/pycore_fileutils.h file.
* bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)Stéphane Wirtel2018-10-171-19/+5
| | | On macOS, fix reading from and writing into a file with a size larger than 2 GiB.
* bpo-34523: Support surrogatepass in locale codecs (GH-8995)Victor Stinner2018-08-291-25/+87
| | | | | | | | | | | | | | | | | | | | Add support for the "surrogatepass" error handler in PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault() for the UTF-8 encoding. Changes: * _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the surrogatepass error handler (_Py_ERROR_SURROGATEPASS). * _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use the _Py_error_handler enum instead of "int surrogateescape" to pass the error handler. These functions now return -3 if the error handler is unknown. * Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() in test_codecs. * Rename get_error_handler() to _Py_GetErrorHandler() and expose it as a private function. * _freeze_importlib doesn't need config.filesystem_errors="strict" workaround anymore.
* bpo-34523: Py_DecodeLocale() use UTF-8 on Windows (GH-8998)Victor Stinner2018-08-291-4/+12
| | | | | | | Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding on Windows if Py_LegacyWindowsFSEncodingFlag is zero. pymain_read_conf() now sets Py_LegacyWindowsFSEncodingFlag in its loop, but restore its value at exit.
* bpo-34403: On HP-UX, force ASCII for C locale (GH-8969)Victor Stinner2018-08-281-33/+71
| | | | | | | | On HP-UX with C or POSIX locale, sys.getfilesystemencoding() now returns "ascii" instead of "roman8" (when the UTF-8 Mode is disabled and the C locale is not coerced). nl_langinfo(CODESET) announces "roman8" whereas it uses the Latin1 encoding in practice.
* bpo-34527: POSIX locale enables the UTF-8 Mode (GH-8972)Victor Stinner2018-08-281-1/+1
| | | | | | | | | | * The UTF-8 Mode is now also enabled by the "POSIX" locale, not only by the "C" locale. * On FreeBSD, Py_DecodeLocale() and Py_EncodeLocale() now also forces the ASCII encoding if the LC_CTYPE locale is "POSIX", not only if the LC_CTYPE locale is "C". * test_utf8_mode.test_cmd_line() checks also that the command line arguments are decoded from UTF-8 when the the UTF-8 Mode is enabled with POSIX locale or C locale.
* Spelling fixes to docs, docstrings, and comments (GH-6374)Ville Skyttä2018-04-201-1/+1
|
* bpo-32869: Fix incorrect dst buffer size for MultiByteToWideChar (#5739)Alexey Izbyshev2018-02-181-1/+2
| | | | This function expects the destination buffer size to be given in wide characters, not bytes.
* bpo-32777: Fix _Py_set_inheritable async-safety in subprocess (GH-5560)Alexey Izbyshev2018-02-061-3/+16
| | | | | | | Fix a rare but potential pre-exec child process deadlock in subprocess on POSIX systems when marking file descriptors inheritable on exec in the child process. This bug appears to have been introduced in 3.4 with the inheritable file descriptors support. This also changes Python/fileutils.c `set_inheritable` to use the "slow" two `fcntl` syscall path instead of the "fast" single `ioctl` syscall path when asked to be async signal safe (by way of being asked not to raise exceptions). `ioctl` is not a POSIX async-signal-safe approved function. ref: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
* bpo-29240: PyUnicode_DecodeLocale() uses UTF-8 on Android (#5272)Victor Stinner2018-01-221-0/+10
| | | | | | | | PyUnicode_DecodeLocaleAndSize(), PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale() now use always use the UTF-8 encoding on Android, instead of the current locale encoding. On Android API 19, mbstowcs() and wcstombs() are broken and cannot be used.
* bpo-31900: Fix localeconv() encoding for LC_NUMERIC (#4174)Victor Stinner2018-01-151-0/+77
| | | | | | | | * Add _Py_GetLocaleconvNumeric() function: decode decimal_point and thousands_sep fields of localeconv() from the LC_NUMERIC encoding, rather than decoding from the LC_CTYPE encoding. * Modify locale.localeconv() and "n" formatter of str.format() (for int, float and complex to use _Py_GetLocaleconvNumeric() internally.
* bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)Victor Stinner2018-01-151-139/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | Modify locale.localeconv(), time.tzname, os.strerror() and other functions to ignore the UTF-8 Mode: always use the current locale encoding. Changes: * Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or encoding error, they return the position of the error and an error message which are used to raise Unicode errors in PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale(). * Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx(). * PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all cases, especially for the strict error handler. * Add _Py_DecodeUTF8Ex(): return more information on decoding error and supports the strict error handler. * Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex(). * Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx(). * Ignore the UTF-8 mode to encode/decode localeconv(), strerror() and time zone name. * Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize() and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use the "current" locale. * Remove _PyUnicode_DecodeCurrentLocale(), _PyUnicode_DecodeCurrentLocaleAndSize() and _PyUnicode_EncodeCurrentLocale().
* bpo-29240: readline now ignores the UTF-8 Mode (#5145)Victor Stinner2018-01-101-28/+52
| | | | | | | | | | | | Add new fuctions ignoring the UTF-8 mode: * _Py_DecodeCurrentLocale() * _Py_EncodeCurrentLocale() * _PyUnicode_DecodeCurrentLocaleAndSize() * _PyUnicode_EncodeCurrentLocale() Modify the readline module to use these functions. Re-enable test_readline.test_nonascii().
* bpo-32030: Add _Py_FindEnvConfigValue() (#4963)Victor Stinner2017-12-211-10/+6
| | | | | | | | | | | | | | | Add a new _Py_FindEnvConfigValue() function: code shared between Windows and Unix implementations of _PyPathConfig_Calculate() to read the pyenv.cfg file. _Py_FindEnvConfigValue() now uses _Py_DecodeUTF8_surrogateescape() instead of using a Python Unicode string, the Python API must not be used early during Python initialization. Same change in Unix search_for_exec_prefix(): use _Py_DecodeUTF8_surrogateescape(). Cleanup also encode_current_locale(): PyMem_RawFree/PyMem_Free can be called with NULL. Fix also "NUL byte" => "NULL byte" typo.
* bpo-32030: Add _Py_EncodeLocaleRaw() (#4961)Victor Stinner2017-12-211-32/+72
| | | | | | | | | | | | Replace Py_EncodeLocale() with _Py_EncodeLocaleRaw() in: * _Py_wfopen() * _Py_wreadlink() * _Py_wrealpath() * _Py_wstat() * pymain_open_filename() These functions are called early during Python intialization, only the RAW memory allocator must be used.
* bpo-32030: Add _Py_EncodeUTF8_surrogateescape() (#4960)Victor Stinner2017-12-211-38/+4
| | | | | Py_EncodeLocale() now uses _Py_EncodeUTF8_surrogateescape(), instead of using temporary unicode and bytes objects. So Py_EncodeLocale() doesn't use the Python C API anymore.
* bpo-29240, bpo-32030: Py_Main() re-reads config if encoding changes (#4899)Victor Stinner2017-12-161-2/+2
| | | | | | | | | | | | | | | | | | bpo-29240, bpo-32030: If the encoding change (C locale coerced or UTF-8 Mode changed), Py_Main() now reads again the configuration with the new encoding. Changes: * Add _Py_UnixMain() called by main(). * Rename pymain_free_pymain() to pymain_clear_pymain(), it can now be called multipled times. * Rename pymain_parse_cmdline_envvars() to pymain_read_conf(). * Py_Main() now clears orig_argc and orig_argv at exit. * Remove argv_copy2, Py_Main() doesn't modify argv anymore. There is no need anymore to get two copies of the wchar_t** argv. * _PyCoreConfig: add coerce_c_locale and coerce_c_locale_warn. * Py_UTF8Mode is now initialized to -1. * Locale coercion (PEP 538) now respects -I and -E options.
* bpo-29240: Don't define decode_locale() on macOS (#4895)Victor Stinner2017-12-151-0/+4
| | | Don't define decode_locale() nor encode_locale() on macOS or Android.
* bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)Victor Stinner2017-12-131-74/+100
| | | | | | | | | | | | | | | | | | | | | | * Add -X utf8 command line option, PYTHONUTF8 environment variable and a new sys.flags.utf8_mode flag. * If the LC_CTYPE locale is "C" at startup: enable automatically the UTF-8 mode. * Add _winapi.GetACP(). encodings._alias_mbcs() now calls _winapi.GetACP() to get the ANSI code page * locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 mode. As a side effect, open() now uses the UTF-8 encoding by default in this mode. * Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding in the UTF-8 Mode. * Update subprocess._args_from_interpreter_flags() to handle -X utf8 * Skip some tests relying on the current locale if the UTF-8 mode is enabled. * Add test_utf8mode.py. * _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to return also the length (number of wide characters). * pymain_get_global_config() and pymain_set_global_config() now always copy flag values, rather than only copying if the new value is greater than the old value.
* Replace KB unit with KiB (#4293)Victor Stinner2017-11-081-4/+4
| | | | | | | | | | | kB (*kilo* byte) unit means 1000 bytes, whereas KiB ("kibibyte") means 1024 bytes. KB was misused: replace kB or KB with KiB when appropriate. Same change for MB and GB which become MiB and GiB. Change the output of Tools/iobench/iobench.py. Round also the size of the documentation from 5.5 MB to 5 MiB.
* bpo-31370: Remove support for threads-less builds (#3385)Antoine Pitrou2017-09-071-14/+0
| | | | | | * Remove Setup.config * Always define WITH_THREAD for compatibility.
* [security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302)Serhiy Storchaka2017-06-281-6/+17
| | | | | | | Based on patch by Victor Stinner. Add private C API function _PyUnicode_AsUnicode() which is similar to PyUnicode_AsUnicode(), but checks for null characters.
* bpo-29619: Convert st_ino using unsigned integer (#557)Victor Stinner2017-03-091-2/+2
| | | | bpo-29619: os.stat() and os.DirEntry.inodeo() now convert inode (st_ino) using unsigned integers.
* Issue #26919: On Android, operating system data is now always encoded/decodedXavier de Gaye2016-12-151-5/+5
| | | | | to/from UTF-8, instead of the locale encoding to avoid inconsistencies with os.fsencode() and os.fsdecode() which are already using UTF-8.
* Issue #28746: Fix the set_inheritable() file descriptor method on platformsXavier de Gaye2016-11-191-1/+1
| | | | that do not have the ioctl FIOCLEX and FIONCLEX commands
* Fix check_force_ascii()Victor Stinner2016-09-101-8/+9
| | | | | Issue #27938: Normalize aliases of the ASCII encoding, because _Py_normalize_encoding() now correctly normalize encoding names.
* Issue #23524: Finish removing _PyVerify_fd from sourcesSteve Dower2016-09-081-124/+4
|