cpython.git - https://github.com/python/cpython.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	gh-101372: Fix unicodedata.is_normalized to properly handle the UCD 3… ↵	Miss Islington (bot)	2023-02-06	1	-1/+1
\| \| \| \| \| \| \|	(gh-101388) (cherry picked from commit 9ef7e75434587fc8f167d73eee5dd9bdca62714b) Co-authored-by: Dong-hee Na <donghee.na@python.org>
*	bpo-43908: Make heap types converted during 3.10 alpha immutable (GH-26351) ↵	Miss Islington (bot)	2021-06-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(GH-26766) * Make functools types immutable * Multibyte codec types are now immutable * pyexpat.xmlparser is now immutable * array.arrayiterator is now immutable * _thread types are now immutable * _csv types are now immutable * _queue.SimpleQueue is now immutable * mmap.mmap is now immutable * unicodedata.UCD is now immutable * sqlite3 types are now immutable * _lsprof.Profiler is now immutable * _overlapped.Overlapped is now immutable * _operator types are now immutable * winapi__overlapped.Overlapped is now immutable * _lzma types are now immutable * _bz2 types are now immutable * _dbm.dbm and _gdbm.gdbm are now immutable (cherry picked from commit 00710e6346fd2394aa020b2dfae170093effac98) Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no> Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>
*	bpo-42972: Fully support GC for pyexpat, unicodedata, and dbm/gdbm heap ↵	Miss Islington (bot)	2021-05-27	1	-3/+14
\| \| \| \| \| \| \| \| \| \|	types (GH-26376) * bpo-42972: pyexpat * bpo-42972: unicodedata * bpo-42972: dbm/gdbm (cherry picked from commit 59af59c2dfa52dcd5605185263f266a49ced934c) Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>
*	bpo-43916: Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to selected types (GH-25748)	Erlend Egeberg Aasland	2021-04-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types: * _dbm.dbm * _gdbm.gdbm * _multibytecodec.MultibyteCodec * _sre..SRE_Scanner * _thread._localdummy * _thread.lock * _winapi.Overlapped * array.arrayiterator * functools.KeyWrapper * functools._lru_list_elem * pyexpat.xmlparser * re.Match * re.Pattern * unicodedata.UCD * zlib.Compress * zlib.Decompress
*	bpo-41798: Allocate unicodedata CAPI on the heap (GH-24128)	Erlend Egeberg Aasland	2021-01-20	1	-8/+29
\|
*	bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587)	Victor Stinner	2020-12-01	1	-1/+1
\| \| \| \| \| \| \| \| \|	No longer use deprecated aliases to functions: * Replace PyObject_MALLOC() with PyObject_Malloc() * Replace PyObject_REALLOC() with PyObject_Realloc() * Replace PyObject_FREE() with PyObject_Free() * Replace PyObject_Del() with PyObject_Free() * Replace PyObject_DEL() with PyObject_Free()
*	bpo-42157: Rename unicodedata.ucnhash_CAPI (GH-22994)	Victor Stinner	2020-10-27	1	-2/+2
\| \| \| \| \| \| \|	Removed the unicodedata.ucnhash_CAPI attribute which was an internal PyCapsule object. The related private _PyUnicode_Name_CAPI structure was moved to the internal C API. Rename unicodedata.ucnhash_CAPI as unicodedata._ucnhash_CAPI.
*	bpo-42157: Convert unicodedata.UCD to heap type (GH-22991)	Victor Stinner	2020-10-26	1	-76/+44
\| \| \| \| \| \| \|	Convert the unicodedata extension module to the multiphase initialization API (PEP 489) and convert the unicodedata.UCD static type to a heap type. Co-Authored-By: Mohamed Koubaa <koubaa.m@gmail.com>
*	bpo-42157: unicodedata avoids references to UCD_Type (GH-22990)	Victor Stinner	2020-10-26	1	-105/+111
\| \| \| \| \| \| \| \| \| \|	* UCD_Check() uses PyModule_Check() * Simplify the internal _PyUnicode_Name_CAPI structure: * Remove size and state members * Remove state and self parameters of getcode() and getname() functions * Remove global_module_state
*	bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713)	Victor Stinner	2020-10-26	1	-13/+15
\| \| \| \| \| \| \| \| \| \|	The private _PyUnicode_Name_CAPI structure of the PyCapsule API unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the structure gets a new state member which must be passed to the getcode() and getname() functions. * Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h * unicodedata module is now built with Py_BUILD_CORE_MODULE. * unicodedata: move hashAPI variable into unicodedata_module_state.
*	bpo-1635741: Add a global module state to unicodedata (GH-22712)	Victor Stinner	2020-10-15	1	-54/+107
\| \| \| \| \| \|	Prepare unicodedata to add a state per module: start with a global "module" state, pass it to subfunctions which access &UCD_Type. This change also prepares the conversion of the UCD_Type static type to a heap type.
*	bpo-1635741, unicodedata: add ucd_type parameter to UCD_Check() macro (GH-22328)	Mohamed Koubaa	2020-09-23	1	-13/+16
\| \| \|	Co-authored-by: Victor Stinner <vstinner@python.org>
*	bpo-40268: Remove unused structmember.h includes (GH-19530)	Victor Stinner	2020-04-15	1	-1/+1
\| \| \| \| \| \|	If only offsetof() is needed: include stddef.h instead. When structmember.h is used, add a comment explaining that PyMemberDef is used.
*	bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode ↵	Serhiy Storchaka	2020-04-11	1	-3/+3
\| \| \| \|	data. (GH-19345)
*	bpo-39943: Remove unused self from find_nfc_index() (GH-18973)	Andy Lester	2020-03-17	1	-4/+4
\|
*	closes bpo-39926: Update Unicode to 13.0.0. (GH-18910)	Benjamin Peterson	2020-03-11	1	-4/+5
\|
*	bpo-39573: Clean up modules and headers to use Py_IS_TYPE() function (GH-18521)	Dong-hee Na	2020-02-17	1	-1/+1
\|
*	bpo-39573: Add Py_SET_TYPE() function (GH-18394)	Victor Stinner	2020-02-07	1	-1/+1
\| \| \|	Add Py_SET_TYPE() function to set the type of an object.
*	bpo-37752: Delete redundant Py_CHARMASK in normalizestring() (GH-15095)	Jordon Xu	2019-09-10	1	-2/+2
\|
*	bpo-38043: Use `bool` for boolean flags on is_normalized_quickcheck. (GH-15711)	Greg Price	2019-09-09	1	-11/+11
\|
*	closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558)	Greg Price	2019-09-04	1	-24/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX #15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop
*	bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async ↵	Jeroen Demeyer	2019-05-31	1	-2/+2
\| \| \| \| \| \| \| \| \|	(GH-13464) Automatically replace tp_print -> tp_vectorcall_offset tp_compare -> tp_as_async tp_reserved -> tp_as_async
*	bpo-36642: make unicodedata const (GH-12855)	Inada Naoki	2019-04-16	1	-1/+1
\|
*	closes bpo-32285: Add unicodedata.is_normalized. (GH-4806)	Max Bélanger	2018-11-04	1	-17/+98
\|
*	bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958)	Wonsup Yoon	2018-06-15	1	-3/+7
\| \| \| \| \|	Hangul composition check boundaries are wrong for the second character ([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3) instead of [0x11A7, 0x11C3]).
*	update to Unicode 11.0.0 (closes bpo-33778) (GH-7439)	Benjamin Peterson	2018-06-07	1	-1/+1
\| \| \|	Also, standardize indentation of generated tables.
*	Fix miscellaneous typos (#4275)	luzpaz	2017-11-05	1	-1/+1
\|
*	bpo-30736: upgrade to Unicode 10.0 (#2344)	Benjamin Peterson	2017-06-23	1	-2/+3
\| \| \|	Straightforward. While we're at it, though, strip trailing whitespace from generated tables.
*	Issue #28511: Use the "U" format instead of "O!" in PyArg_Parse*.	Serhiy Storchaka	2016-10-23	1	-5/+2
\|
*	Add an extra byte for null in case we ever get very long unicode names.	Christian Heimes	2016-09-23	1	-4/+4
\|\
\| *	Add an extra byte for null in case we ever get very long unicode names.	Christian Heimes	2016-09-23	1	-4/+4
\| \|
* \|	Unicode 9.0.0	Benjamin Peterson	2016-09-15	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Not completely mechanical since support for East Asian Width changes—emoji codepoints became Wide—had to be added to unicodedata.
* \|	Restrict name_length to NAME_MAXLEN in unicodedata_UCD_lookup()	Christian Heimes	2016-09-14	1	-1/+1
\|\ \ \| \|/
\| *	Restrict name_length to NAME_MAXLEN in unicodedata_UCD_lookup()	Christian Heimes	2016-09-14	1	-1/+1
\| \|
* \|	Issue #25923: Added the const qualifier to static constant arrays.	Serhiy Storchaka	2015-12-25	1	-2/+2
\|/
*	upgrade to Unicode 8.0.0	Benjamin Peterson	2015-06-27	1	-2/+3
\|
*	Issue #24000: Improved Argument Clinic's mapping of converters to legacy	Larry Hastings	2015-05-08	1	-2/+2
\| \| \| \|	"format units". Updated the documentation to match.
*	Issue #24001: Argument Clinic converters now use accept={type}	Larry Hastings	2015-05-04	1	-22/+22
\| \| \| \|	instead of types={'type'} to specify the types the converter accepts.
*	Issue #20181: Converted the unicodedata module to Argument Clinic.	Serhiy Storchaka	2015-04-17	1	-227/+196
\|
*	Issue #23944: Argument Clinic now wraps long impl prototypes at column 78.	Larry Hastings	2015-04-14	1	-2/+4
\|
*	Issue #23501: Argumen Clinic now generates code into separate files by default.	Serhiy Storchaka	2015-04-03	1	-34/+3
\|
*	merge 3.3 (#23367)	Benjamin Peterson	2015-03-02	1	-3/+10
\|\
\| *	fix possible overflow bugs in unicodedata (closes #23367)	Benjamin Peterson	2015-03-02	1	-3/+10
\| \|
* \|	Issue #23446: Use PyMem_New instead of PyMem_Malloc to avoid possible integer	Serhiy Storchaka	2015-02-16	1	-2/+2
\| \| \| \| \| \| \| \|	overflows. Added few missed PyErr_NoMemory().
* \|	Issue #23181: More "codepoint" -> "code point".	Serhiy Storchaka	2015-01-18	1	-7/+7
\| \|
* \|	Closes #21780: make the unicodedata module "ssize_t clean" for parsing ↵	Victor Stinner	2014-07-01	1	-2/+8
\| \| \| \| \| \| \| \|	parameters
* \|	Issue #20530: Argument Clinic's signature format has been revised again.	Larry Hastings	2014-02-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new syntax is highly human readable while still preventing false positives. The syntax also extends Python syntax to denote "self" and positional-only parameters, allowing inspect.Signature objects to be totally accurate for all supported builtins in Python 3.4.
* \|	Issue #20326: Argument Clinic now uses a simple, unique signature to	Larry Hastings	2014-01-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	annotate text signatures in docstrings, resulting in fewer false positives. "self" parameters are also explicitly marked, allowing inspect.Signature() to authoritatively detect (and skip) said parameters. Issue #20326: Argument Clinic now generates separate checksums for the input and output sections of the block, allowing external tools to verify that the input has not changed (and thus the output is not out-of-date).
* \|	Issue #20390: Small fixes and improvements for Argument Clinic.	Larry Hastings	2014-01-26	1	-7/+6
\| \|
* \|	Issue #20189: Four additional builtin types (PyTypeObject,	Larry Hastings	2014-01-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	PyMethodDescr_Type, _PyMethodWrapper_Type, and PyWrapperDescr_Type) have been modified to provide introspection information for builtins. Also: many additional Lib, test suite, and Argument Clinic fixes.