summaryrefslogtreecommitdiffstats
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* gh-142217: Remove internal _Py_Identifier functions (#142219)Victor Stinner2025-12-031-41/+0
| | | | | | | | | | | | | | | | Remove internal functions: * _PyDict_ContainsId() * _PyDict_DelItemId() * _PyDict_GetItemIdWithError() * _PyDict_SetItemId() * _PyEval_GetBuiltinId() * _PyObject_CallMethodIdNoArgs() * _PyObject_CallMethodIdObjArgs() * _PyObject_CallMethodIdOneArg() * _PyObject_VectorcallMethodId() * _PyUnicode_EqualToASCIIId() These functions were not exported and so no usable outside CPython.
* gh-141070: Add PyUnstable_Object_Dump() function (#141072)Victor Stinner2025-11-181-1/+2
| | | | | | | | | | * Promote _PyObject_Dump() as a public function. * Keep _PyObject_Dump() alias to PyUnstable_Object_Dump() for backward compatibility. * Replace _PyObject_Dump() with PyUnstable_Object_Dump(). Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Kumar Aditya <kumaraditya@python.org> Co-authored-by: Petr Viktorin <encukou@gmail.com>
* gh-55531: Implement `normalize_encoding` in C (#136643)Stan Ulbrych2025-10-301-7/+8
| | | Closes gh-55531
* gh-139353: Add Objects/unicode_writer.c file (#139911)Victor Stinner2025-10-301-638/+29
| | | | | | | Move the public PyUnicodeWriter API and the private _PyUnicodeWriter API to a new Objects/unicode_writer.c file. Rename a few helper functions to share them between unicodeobject.c and unicode_writer.c, such as resize_compact() or unicode_result().
* gh-129117: Add unicodedata.isxidstart() function (#140269)Stan Ulbrych2025-10-301-0/+1
| | | | | | Expose `_PyUnicode_IsXidContinue/Start` in `unicodedata`: add isxidstart() and isxidcontinue() functions. Co-authored-by: Victor Stinner <vstinner@python.org>
* Remove dead stores to 'size' in UTF-8 decoder (unicodeobject.c) (#140637)Shamil2025-10-271-2/+0
|
* gh-111489: Remove _PyTuple_FromArray() alias (#139973)Victor Stinner2025-10-111-2/+1
| | | | Replace _PyTuple_FromArray() with PyTuple_FromArray(). Remove pycore_tuple.h includes.
* gh-139353: Add Objects/unicode_format.c file (#139491)Victor Stinner2025-10-101-972/+8
| | | | | | * Move PyUnicode_Format() implementation from unicodeobject.c to unicode_format.c. * Replace unicode_modifiable() with _PyUnicode_IsModifiable() * Add empty lines to have two empty lines between functions.
* gh-139353: Rename formatter_unicode.c to unicode_formatter.c (#139723)Victor Stinner2025-10-081-178/+8
| | | | | | | | | | * Move Python/formatter_unicode.c to Objects/unicode_formatter.c. * Move Objects/stringlib/localeutil.h content into unicode_formatter.c. Remove localeutil.h. * Move _PyUnicode_InsertThousandsGrouping() to unicode_formatter.c and mark the function as static. * Rename unicode_fill() to _PyUnicode_Fill() and export it in pycore_unicodeobject.h. * Move MAX_UNICODE to pycore_unicodeobject.h as _Py_MAX_UNICODE.
* gh-139156: Optimize _PyUnicode_EncodeCharmap() (#139306)Victor Stinner2025-09-251-14/+61
| | | | Specialize _PyUnicode_EncodeCharmap() for EncodingMapType which is used by Python codecs such as iso8859_15.
* gh-139156: Optimize the UTF-7 encoder (#139253)Victor Stinner2025-09-241-10/+5
| | | Remove base64SetO and base64WhiteSpace parameters.
* gh-139156: Use PyBytesWriter in PyUnicode_EncodeCodePage() (#139259)Victor Stinner2025-09-241-51/+36
| | | | Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.
* gh-139156: Use PyBytesWriter in _PyUnicode_EncodeCharmap() (#139251)Victor Stinner2025-09-241-54/+51
| | | | | | | Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API. Add _PyBytesWriter_GetSize() and _PyBytesWriter_GetData() static inline functions.
* gh-129813, PEP 782: Use PyBytesWriter in utf8_encoder() (#138874)Victor Stinner2025-09-231-54/+63
| | | | Replace the private _PyBytesWriter API with the new public PyBytesWriter API in utf8_encoder() and unicode_encode_ucs1().
* gh-139156: Use PyBytesWriter in PyUnicode_AsRawUnicodeEscapeString() (#139250)Victor Stinner2025-09-221-24/+13
| | | | Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.
* gh-139156: Use PyBytesWriter in UTF-16 encoder (#139233)Victor Stinner2025-09-221-52/+52
| | | | Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.
* gh-139156: Use PyBytesWriter in PyUnicode_AsUnicodeEscapeString() (#139249)Victor Stinner2025-09-221-30/+16
| | | | Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.
* gh-139156: Use PyBytesWriter in the UTF-7 encoder (#139248)Victor Stinner2025-09-221-24/+16
| | | | Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.
* gh-139156: Use PyBytesWriter in UTF-32 encoder (#139157)Victor Stinner2025-09-221-50/+59
| | | | Replace PyBytes_FromStringAndSize() and _PyBytes_Resize() with the PyBytesWriter API.
* gh-129813, PEP 782: Use Py_GetConstant(Py_CONSTANT_EMPTY_BYTES) (#138830)Victor Stinner2025-09-131-4/+4
| | | | | Replace PyBytes_FromStringAndSize(NULL, 0) with Py_GetConstant(Py_CONSTANT_EMPTY_BYTES). Py_GetConstant() cannot fail.
* gh-137210: Add a struct, slot & function for checking an extension's ABI ↵Petr Viktorin2025-09-051-3/+11
| | | | | (GH-137212) Co-authored-by: Steve Dower <steve.dower@microsoft.com>
* GH-137623: Use an AC decorator for docstring line length enforcement (#137690)Adam Turner2025-08-181-31/+62
|
* gh-137514: Add a free-threading wrapper for mutexes (GH-137515)Peter Bierma2025-08-071-13/+7
| | | Add `FT_MUTEX_LOCK`/`FT_MUTEX_UNLOCK`, which call `PyMutex_Lock` and `PyMutex_Unlock` on the free-threaded build, and no-op otherwise.
* gh-58124: Avoid CP_UTF8 in UnicodeDecodeError (#137415)Victor Stinner2025-08-061-4/+0
| | | | | Fix name of the Python encoding in Unicode errors of the code page codec: use "cp65000" and "cp65001" instead of "CP_UTF7" and "CP_UTF8" which are not valid Python code names.
* gh-132661: Disallow `Template`/`str` concatenation after PEP 750 spec update ↵Dave Peck2025-07-211-11/+4
| | | | | | | (#135996) Co-authored-by: sobolevn <mail@sobolevn.me> Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
* gh-134891: Add PyUnstable_Unicode_GET_CACHED_HASH (GH-134892)Petr Viktorin2025-06-061-5/+1
|
* gh-133968: Add PyUnicodeWriter_WriteASCII() function (#133973)Victor Stinner2025-05-291-0/+14
| | | | | | | | | | Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII(). Unrelated change to please the linter: remove an unused import in test_ctypes. Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* gh-133968: Add fast path to PyUnicodeWriter_WriteStr() (#133969)Victor Stinner2025-05-131-1/+6
| | | Don't call PyObject_Str() if the input type is str.
* gh-133767: Fix use-after-free in the unicode-escape decoder with an error ↵Serhiy Storchaka2025-05-121-18/+28
| | | | | | | | | | | | | | handler (GH-129648) If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal(). _PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal().
* gh-133610: Remove PyUnicode_AsDecoded/Encoded functions (#133612)Stan Ulbrych2025-05-091-29/+4
|
* gh-128972: Add `_Py_ALIGN_AS` and revert `PyASCIIObject` memory layout. ↵Petr Viktorin2025-05-021-3/+3
| | | | | | | | | | | | | (GH-133085) Add `_Py_ALIGN_AS` as per C API WG vote: https://github.com/capi-workgroup/decisions/issues/61 This patch only adds it to free-threaded builds; the `#ifdef Py_GIL_DISABLED` can be removed in the future. Use this to revert `PyASCIIObject` memory layout for non-free-threaded builds. The long-term plan is to deprecate the entire struct; until that happens it's better to keep it unchanged, as courtesy to people that rely on it despite it not being stable ABI.
* gh-132661: Implement PEP 750 (#132662)Lysandros Nikolaou2025-04-301-4/+11
| | | | | | | | | | | | | Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Wingy <git@wingysam.xyz> Co-authored-by: Koudai Aono <koxudaxi@gmail.com> Co-authored-by: Dave Peck <davepeck@gmail.com> Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> Co-authored-by: Paul Everitt <pauleveritt@me.com> Co-authored-by: sobolevn <mail@sobolevn.me>
* gh-132070: Use _PyObject_IsUniquelyReferenced in unicodeobject (gh-133039)Donghee Na2025-04-291-18/+25
| | | | | | --------- Co-authored-by: Kumar Aditya <kumaraditya@python.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
* gh-132798: Schedule removal of `PyUnicode_AsDecoded/Encoded` functions for ↵Stan Ulbrych2025-04-251-4/+8
| | | | | 3.15 (#132799) Co-authored-by: Victor Stinner <vstinner@python.org>
* gh-103997: Automatically dedent the argument to "-c" (#103998)Jon Crall2025-04-181-0/+157
| | | | | | | Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com> Co-authored-by: Kirill Podoprigora <80244920+Eclips4@users.noreply.github.com> Co-authored-by: Inada Naoki <songofacandy@gmail.com> Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* gh-111178: remove redundant casts for functions with correct signatures ↵Bénédikt Tran2025-04-011-14/+14
| | | | (#131673)
* gh-131238: Remove many includes from pycore_interp.h (#131472)Victor Stinner2025-03-191-0/+1
|
* GH-131238: Core header refactor (GH-131250)Mark Shannon2025-03-171-0/+1
| | | | | * Moves most structs in pycore_ header files into pycore_structs.h and pycore_runtime_structs.h * Removes many cross-header dependencies
* GH-127705: Fix _Py_RefcntAdd to handle objects becoming immortal (GH-131140)Mark Shannon2025-03-121-1/+1
|
* gh-111178: Fix function signatures of unicodeiter (#130684)Victor Stinner2025-03-041-19/+23
|
* gh-130790: Remove references about unicode's readiness from comments (#130801)Sergey Miryanov2025-03-031-1/+0
|
* gh-87790: support thousands separators for formatting fractional part of ↵Sergey B Kirpichev2025-02-251-6/+7
| | | | | | | | | | | | | | floats (#125304) ```pycon >>> f"{123_456.123_456:_._f}" # Whole and fractional '123_456.123_456' >>> f"{123_456.123_456:_f}" # Integer component only '123_456.123456' >>> f"{123_456.123_456:._f}" # Fractional component only '123456.123_456' >>> f"{123_456.123_456:.4_f}" # with precision '123456.1_235' ```
* gh-129701: Fix a data race in `intern_common` in the free threaded build ↵Sam Gross2025-02-171-6/+14
| | | | | | | | | | | | | | (GH-130089) * gh-129701: Fix a data race in `intern_common` in the free threaded build * Use a mutex to avoid potentially returning a non-immortalized string, because immortalization happens after the insertion into the interned dict. * Use `Py_DECREF()` calls instead of `Py_SET_REFCNT(s, Py_REFCNT(s) - 2)` for thread-safety. This code path isn't performance sensistive, so just use `Py_DECREF()` unconditionally for simplicity.
* gh-82045: Correct and deduplicate "isprintable" docs; add test. (GH-130118)Stan Ulbrych2025-02-141-4/+3
| | | | | | | | | | | | | | | | We had the definition of what makes a character "printable" documented in three places, giving two different definitions. The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that. With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database. That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics. Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place. Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs. Also add a thorough test that the implementation agrees with this definition. Author: Greg Price <gnprice@gmail.com> Co-authored-by: Greg Price <gnprice@gmail.com>
* gh-129354: Use PyErr_FormatUnraisable() function (#129511)Victor Stinner2025-01-311-1/+3
| | | Replace PyErr_WriteUnraisable() with PyErr_FormatUnraisable().
* gh-89188: Implement PyUnicode_KIND() as a function (#129412)Victor Stinner2025-01-301-0/+21
| | | | | Implement PyUnicode_KIND() and PyUnicode_DATA() as function, in addition to the macros with the same names. The macros rely on C bit fields which have compiler-specific layout.
* gh-128016: Improved invalid escape sequence warning message (#128020)Umar Butler2025-01-151-2/+4
|
* gh-128137: Update PyASCIIObject to handle interned field with the atomic ↵Donghee Na2025-01-051-3/+3
| | | | operation (gh-128196)
* gh-127903: Fix a crash on debug builds when calling ↵Alexander Shadchin2025-01-031-3/+6
| | | | `Objects/unicodeobject::_copy_characters`` (#127876)
* gh-128212: Fix race in `_PyUnicode_CheckConsistency` (GH-128367)Sam Gross2025-01-021-1/+1
| | | | | There was a data race on the utf8 field between `PyUnicode_SET_UTF8` and `_PyUnicode_CheckConsistency`. Use the `_PyUnicode_UTF8()` accessor, which uses an atomic load internally, to avoid the data race.