cpython.git - https://github.com/python/cpython.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	closes bpo-37966: Fully implement the UAX GH-15 quick-check algorithm. ↵	Miss Islington (bot)	2019-09-04	1	-24/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(GH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX GH-15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop (cherry picked from commit 2f09413947d1ce0043de62ed2346f9a2b4e5880b) Co-authored-by: Greg Price <gnprice@gmail.com>
*	bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async ↵	Jeroen Demeyer	2019-05-31	1	-2/+2
\| \| \| \| \| \| \| \| \|	(GH-13464) Automatically replace tp_print -> tp_vectorcall_offset tp_compare -> tp_as_async tp_reserved -> tp_as_async
*	bpo-36642: make unicodedata const (GH-12855)	Inada Naoki	2019-04-16	1	-1/+1
\|
*	closes bpo-32285: Add unicodedata.is_normalized. (GH-4806)	Max Bélanger	2018-11-04	1	-17/+98
\|
*	bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958)	Wonsup Yoon	2018-06-15	1	-3/+7
\| \| \| \| \|	Hangul composition check boundaries are wrong for the second character ([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3) instead of [0x11A7, 0x11C3]).
*	update to Unicode 11.0.0 (closes bpo-33778) (GH-7439)	Benjamin Peterson	2018-06-07	1	-1/+1
\| \| \|	Also, standardize indentation of generated tables.
*	Fix miscellaneous typos (#4275)	luzpaz	2017-11-05	1	-1/+1
\|
*	bpo-30736: upgrade to Unicode 10.0 (#2344)	Benjamin Peterson	2017-06-23	1	-2/+3
\| \| \|	Straightforward. While we're at it, though, strip trailing whitespace from generated tables.
*	Issue #28511: Use the "U" format instead of "O!" in PyArg_Parse*.	Serhiy Storchaka	2016-10-23	1	-5/+2
\|
*	Add an extra byte for null in case we ever get very long unicode names.	Christian Heimes	2016-09-23	1	-4/+4
\|\
\| *	Add an extra byte for null in case we ever get very long unicode names.	Christian Heimes	2016-09-23	1	-4/+4
\| \|
* \|	Unicode 9.0.0	Benjamin Peterson	2016-09-15	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Not completely mechanical since support for East Asian Width changes—emoji codepoints became Wide—had to be added to unicodedata.
* \|	Restrict name_length to NAME_MAXLEN in unicodedata_UCD_lookup()	Christian Heimes	2016-09-14	1	-1/+1
\|\ \ \| \|/
\| *	Restrict name_length to NAME_MAXLEN in unicodedata_UCD_lookup()	Christian Heimes	2016-09-14	1	-1/+1
\| \|
* \|	Issue #25923: Added the const qualifier to static constant arrays.	Serhiy Storchaka	2015-12-25	1	-2/+2
\|/
*	upgrade to Unicode 8.0.0	Benjamin Peterson	2015-06-27	1	-2/+3
\|
*	Issue #24000: Improved Argument Clinic's mapping of converters to legacy	Larry Hastings	2015-05-08	1	-2/+2
\| \| \| \|	"format units". Updated the documentation to match.
*	Issue #24001: Argument Clinic converters now use accept={type}	Larry Hastings	2015-05-04	1	-22/+22
\| \| \| \|	instead of types={'type'} to specify the types the converter accepts.
*	Issue #20181: Converted the unicodedata module to Argument Clinic.	Serhiy Storchaka	2015-04-17	1	-227/+196
\|
*	Issue #23944: Argument Clinic now wraps long impl prototypes at column 78.	Larry Hastings	2015-04-14	1	-2/+4
\|
*	Issue #23501: Argumen Clinic now generates code into separate files by default.	Serhiy Storchaka	2015-04-03	1	-34/+3
\|
*	merge 3.3 (#23367)	Benjamin Peterson	2015-03-02	1	-3/+10
\|\
\| *	fix possible overflow bugs in unicodedata (closes #23367)	Benjamin Peterson	2015-03-02	1	-3/+10
\| \|
* \|	Issue #23446: Use PyMem_New instead of PyMem_Malloc to avoid possible integer	Serhiy Storchaka	2015-02-16	1	-2/+2
\| \| \| \| \| \| \| \|	overflows. Added few missed PyErr_NoMemory().
* \|	Issue #23181: More "codepoint" -> "code point".	Serhiy Storchaka	2015-01-18	1	-7/+7
\| \|
* \|	Closes #21780: make the unicodedata module "ssize_t clean" for parsing ↵	Victor Stinner	2014-07-01	1	-2/+8
\| \| \| \| \| \| \| \|	parameters
* \|	Issue #20530: Argument Clinic's signature format has been revised again.	Larry Hastings	2014-02-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new syntax is highly human readable while still preventing false positives. The syntax also extends Python syntax to denote "self" and positional-only parameters, allowing inspect.Signature objects to be totally accurate for all supported builtins in Python 3.4.
* \|	Issue #20326: Argument Clinic now uses a simple, unique signature to	Larry Hastings	2014-01-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	annotate text signatures in docstrings, resulting in fewer false positives. "self" parameters are also explicitly marked, allowing inspect.Signature() to authoritatively detect (and skip) said parameters. Issue #20326: Argument Clinic now generates separate checksums for the input and output sections of the block, allowing external tools to verify that the input has not changed (and thus the output is not out-of-date).
* \|	Issue #20390: Small fixes and improvements for Argument Clinic.	Larry Hastings	2014-01-26	1	-7/+6
\| \|
* \|	Issue #20189: Four additional builtin types (PyTypeObject,	Larry Hastings	2014-01-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	PyMethodDescr_Type, _PyMethodWrapper_Type, and PyWrapperDescr_Type) have been modified to provide introspection information for builtins. Also: many additional Lib, test suite, and Argument Clinic fixes.
* \|	Issue #19273: The marker comments Argument Clinic uses have been changed	Larry Hastings	2014-01-07	1	-6/+6
\| \| \| \| \| \| \| \|	to improve readability.
* \|	Issue #20141: Improved Argument Clinic's support for the PyArg_Parse "O!"	Larry Hastings	2014-01-07	1	-5/+5
\| \| \| \| \| \| \| \|	format unit.
* \|	Issue #19674: inspect.signature() now produces a correct signature	Larry Hastings	2013-11-23	1	-5/+9
\| \| \| \| \| \| \| \|	for some builtins.
* \|	Argument Clinic: rename "self" to "module" for module-level functions.	Larry Hastings	2013-11-18	1	-11/+12
\| \|
* \|	Issue #16612: Add "Argument Clinic", a compile-time preprocessor	Larry Hastings	2013-10-19	1	-13/+51
\| \| \| \| \| \| \| \|	for C files to generate argument parsing code. (See PEP 436.)
* \|	merge 3.3	Benjamin Peterson	2013-10-11	1	-1/+1
\|\ \ \| \|/
\| *	replace hardcoded version	Benjamin Peterson	2013-10-11	1	-1/+1
\| \|
* \|	merge 3.3	Benjamin Peterson	2013-10-11	1	-1/+1
\|\ \ \| \|/
\| *	make sure the docstring is never out of date wrt unicode data version	Benjamin Peterson	2013-10-11	1	-1/+1
\| \|
* \|	merge 3.3 (#19220)	Benjamin Peterson	2013-10-10	1	-3/+1
\|\ \ \| \|/
\| *	remove url from docstring (closes #19220)	Benjamin Peterson	2013-10-10	1	-2/+1
\| \|
* \|	upgrade unicode db to 6.3.0 (closes #19221)	Benjamin Peterson	2013-10-10	1	-2/+2
\|/
*	#18803: fix more typos. Patch by Févry Thibault.	Ezio Melotti	2013-08-25	1	-1/+1
\|
*	#18466: fix more typos. Patch by Févry Thibault.	Ezio Melotti	2013-08-17	1	-1/+1
\|
*	#16681: merge with 3.2.	Ezio Melotti	2012-12-14	1	-1/+1
\|\
\| *	#16681: use "bidirectional class" instead of "bidirectional category" in the ↵	Ezio Melotti	2012-12-14	1	-1/+1
\| \| \| \| \| \| \| \|	docstring too.
* \|	Use C-style comments (required for the AIX build slave).	Stefan Krah	2012-09-23	1	-2/+2
\| \|
* \|	Issue #14909: A number of places were using PyMem_Realloc() apis and	Kristjan Valur Jonsson	2012-05-31	1	-2/+5
\| \| \| \| \| \| \| \| \| \|	PyObject_GC_Resize() with incorrect error handling. In case of errors, the original object would be leaked. This checkin fixes those cases.
* \|	update to Unicode 6.1	Benjamin Peterson	2012-02-21	1	-1/+1
\| \|
* \|	#13379: merge with 3.2.	Ezio Melotti	2011-11-10	1	-5/+6
\|\ \ \| \|/