diff options
author | Ken Jin <kenjin4096@gmail.com> | 2022-04-06 11:38:25 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-04-06 11:38:25 (GMT) |
commit | 9ffe47df5468a72603f730eae48c2fd4ec615ffa (patch) | |
tree | 50335ba775291324e5c724e3226bcaa2b8fe7b5a /Doc/whatsnew | |
parent | 074da788028c1f1e867dc81698efdcdc263f2288 (diff) | |
download | cpython-9ffe47df5468a72603f730eae48c2fd4ec615ffa.zip cpython-9ffe47df5468a72603f730eae48c2fd4ec615ffa.tar.gz cpython-9ffe47df5468a72603f730eae48c2fd4ec615ffa.tar.bz2 |
bpo-47189: What's New in 3.11: Faster CPython (GH-32235)
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Guido van Rossum <gvanrossum@users.noreply.github.com>
Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Diffstat (limited to 'Doc/whatsnew')
-rw-r--r-- | Doc/whatsnew/3.11.rst | 226 |
1 files changed, 219 insertions, 7 deletions
diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index a2c57eb..75b455d 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -62,6 +62,8 @@ Summary -- Release highlights .. This section singles out the most important changes in Python 3.11. Brevity is key. +- Python 3.11 is up to 10-60% faster than Python 3.10. On average, we measured a + 1.22x speedup on the standard benchmark suite. See `Faster CPython`_ for details. .. PEP-sized items next. @@ -477,13 +479,6 @@ Optimizations almost eliminated when no exception is raised. (Contributed by Mark Shannon in :issue:`40222`.) -* Method calls with keywords are now faster due to bytecode - changes which avoid creating bound method instances. Previously, this - optimization was applied only to method calls with purely positional - arguments. - (Contributed by Ken Jin and Mark Shannon in :issue:`26110`, based on ideas - implemented in PyPy.) - * Pure ASCII strings are now normalized in constant time by :func:`unicodedata.normalize`. (Contributed by Dong-hee Na in :issue:`44987`.) @@ -498,6 +493,223 @@ Optimizations (Contributed by Inada Naoki in :issue:`46845`.) +Faster CPython +============== + +CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst>`_ +than CPython 3.10 when measured with the +`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite, +and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup +could be up to 10-60% faster. + +This project focuses on two major areas in Python: faster startup and faster +runtime. Other optimizations not under this project are listed in `Optimizations`_. + +Faster Startup +-------------- + +Frozen imports / Static code objects +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Python caches bytecode in the :ref:`__pycache__<tut-pycache>` directory to +speed up module loading. + +Previously in 3.10, Python module execution looked like this: + +.. code-block:: text + + Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate + +In Python 3.11, the core modules essential for Python startup are "frozen". +This means that their code objects (and bytecode) are statically allocated +by the interpreter. This reduces the steps in module execution process to this: + +.. code-block:: text + + Statically allocated code object -> Evaluate + +Interpreter startup is now 10-15% faster in Python 3.11. This has a big +impact for short-running programs using Python. + +(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.) + + +Faster Runtime +-------------- + +Cheaper, lazy Python frames +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Python frames are created whenever Python calls a Python function. This frame +holds execution information. The following are new frame optimizations: + +- Streamlined the frame creation process. +- Avoided memory allocation by generously re-using frame space on the C stack. +- Streamlined the internal frame struct to contain only essential information. + Frames previously held extra debugging and memory management information. + +Old-style frame objects are now created only when required by debuggers. For +most user code, no frame objects are created at all. As a result, nearly all +Python functions calls have sped up significantly. We measured a 3-7% speedup +in pyperformance. + +(Contributed by Mark Shannon in :issue:`44590`.) + +.. _inline-calls: + +Inlined Python function calls +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +During a Python function call, Python will call an evaluating C function to +interpret that function's code. This effectively limits pure Python recursion to +what's safe for the C stack. + +In 3.11, when CPython detects Python code calling another Python function, +it sets up a new frame, and "jumps" to the new code inside the new frame. This +avoids calling the C interpreting function altogether. + +Most Python function calls now consume no C stack space. This speeds up +most of such calls. In simple recursive functions like fibonacci or +factorial, a 1.7x speedup was observed. This also means recursive functions +can recurse significantly deeper (if the user increases the recursion limit). +We measured a 1-3% improvement in pyperformance. + +(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.) + +PEP 659: Specializing Adaptive Interpreter +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +:pep:`659` is one of the key parts of the faster CPython project. The general +idea is that while Python is a dynamic language, most code has regions where +objects and types rarely change. This concept is known as *type stability*. + +At runtime, Python will try to look for common patterns and type stability +in the executing code. Python will then replace the current operation with a +more specialized one. This specialized operation uses fast paths available only +to those use cases/types, which generally outperform their generic +counterparts. This also brings in another concept called *inline caching*, where +Python caches the results of expensive operations directly in the bytecode. + +The specializer will also combine certain common instruction pairs into one +superinstruction. This reduces the overhead during execution. + +Python will only specialize +when it sees code that is "hot" (executed multiple times). This prevents Python +from wasting time for run-once code. Python can also de-specialize when code is +too dynamic or when the use changes. Specialization is attempted periodically, +and specialization attempts are not too expensive. This allows specialization +to adapt to new circumstances. + +(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler. +See :pep:`659` for more information.) + +.. + If I missed out anyone, please add them. + ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Operation | Form | Specialization | Operation speedup | Contributor(s) | +| | | | (up to) | | ++===============+====================+=======================================================+===================+===================+ +| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, | +| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, | +| | | fast paths for their underlying types. | | Brandt Bucher, | +| | | | | Dennis Sweeney | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-25% | Irit Katriel, | +| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon | +| | | data structures. | | | +| | | | | | +| | | Subscripting custom ``__getitem__`` | | | +| | | is also inlined similar to :ref:`inline-calls`. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney | +| subscript | | | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Calls | ``f(arg)`` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, | +| | ``C(arg)`` | as ``len`` and ``str`` directly call their underlying | | Ken Jin | +| | | C version. This avoids going through the internal | | | +| | | calling convention. | | | +| | | | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Load | ``print`` | The object's index in the globals/builtins namespace | [1]_ | Mark Shannon | +| global | ``len`` | is cached. Loading globals and builtins require | | | +| variable | | zero namespace lookups. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [2]_ | Mark Shannon | +| attribute | | index inside the class/object's namespace is cached. | | | +| | | In most cases, attribute loading will require zero | | | +| | | namespace lookups. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% | Ken Jin, | +| methods for | | loading now has no namespace lookups -- even for | | Mark Shannon | +| call | | classes with long inheritance chains. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon | +| attribute | | | in pyperformance | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | 8% | Brandt Bucher | +| Sequence | | and ``tuple``. Avoids internal calling convention. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ + +.. [1] A similar optimization already existed since Python 3.8. 3.11 + specializes for more forms and reduces some overhead. + +.. [2] A similar optimization already existed since Python 3.10. + 3.11 specializes for more forms. Furthermore, all attribute loads should + be sped up by :issue:`45947`. + + +Misc +---- + +* Objects now require less memory due to lazily created object namespaces. Their + namespace dictionaries now also share keys more freely. + (Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.) + +* A more concise representation of exceptions in the interpreter reduced the + time required for catching an exception by about 10%. + (Contributed by Irit Katriel in :issue:`45711`.) + +FAQ +--- + +| Q: How should I write my code to utilize these speedups? +| +| A: You don't have to change your code. Write Pythonic code that follows common + best practices. The Faster CPython project optimizes for common code + patterns we observe. +| +| +| Q: Will CPython 3.11 use more memory? +| +| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10. + This is offset by memory optimizations for frame objects and object + dictionaries as mentioned above. +| +| +| Q: I don't see any speedups in my workload. Why? +| +| A: Certain code won't have noticeable benefits. If your code spends most of + its time on I/O operations, or already does most of its + computation in a C extension library like numpy, there won't be significant + speedup. This project currently benefits pure-Python workloads the most. +| +| Furthermore, the pyperformance figures are a geometric mean. Even within the + pyperformance benchmarks, certain benchmarks have slowed down slightly, while + others have sped up by nearly 2x! +| +| +| Q: Is there a JIT compiler? +| +| A: No. We're still exploring other optimizations. + + +About +----- + +Faster CPython explores optimizations for :term:`CPython`. The main team is +funded by Microsoft to work on this full-time. Pablo Galindo Salgado is also +funded by Bloomberg LP to work on the project part-time. Finally, many +contributors are volunteers from the community. + + CPython bytecode changes ======================== |