summaryrefslogtreecommitdiffstats
path: root/Doc/whatsnew
diff options
context:
space:
mode:
authorKen Jin <kenjin4096@gmail.com>2022-04-06 11:38:25 (GMT)
committerGitHub <noreply@github.com>2022-04-06 11:38:25 (GMT)
commit9ffe47df5468a72603f730eae48c2fd4ec615ffa (patch)
tree50335ba775291324e5c724e3226bcaa2b8fe7b5a /Doc/whatsnew
parent074da788028c1f1e867dc81698efdcdc263f2288 (diff)
downloadcpython-9ffe47df5468a72603f730eae48c2fd4ec615ffa.zip
cpython-9ffe47df5468a72603f730eae48c2fd4ec615ffa.tar.gz
cpython-9ffe47df5468a72603f730eae48c2fd4ec615ffa.tar.bz2
bpo-47189: What's New in 3.11: Faster CPython (GH-32235)
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Guido van Rossum <gvanrossum@users.noreply.github.com> Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Diffstat (limited to 'Doc/whatsnew')
-rw-r--r--Doc/whatsnew/3.11.rst226
1 files changed, 219 insertions, 7 deletions
diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index a2c57eb..75b455d 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -62,6 +62,8 @@ Summary -- Release highlights
.. This section singles out the most important changes in Python 3.11.
Brevity is key.
+- Python 3.11 is up to 10-60% faster than Python 3.10. On average, we measured a
+ 1.22x speedup on the standard benchmark suite. See `Faster CPython`_ for details.
.. PEP-sized items next.
@@ -477,13 +479,6 @@ Optimizations
almost eliminated when no exception is raised.
(Contributed by Mark Shannon in :issue:`40222`.)
-* Method calls with keywords are now faster due to bytecode
- changes which avoid creating bound method instances. Previously, this
- optimization was applied only to method calls with purely positional
- arguments.
- (Contributed by Ken Jin and Mark Shannon in :issue:`26110`, based on ideas
- implemented in PyPy.)
-
* Pure ASCII strings are now normalized in constant time by :func:`unicodedata.normalize`.
(Contributed by Dong-hee Na in :issue:`44987`.)
@@ -498,6 +493,223 @@ Optimizations
(Contributed by Inada Naoki in :issue:`46845`.)
+Faster CPython
+==============
+
+CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst>`_
+than CPython 3.10 when measured with the
+`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite,
+and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
+could be up to 10-60% faster.
+
+This project focuses on two major areas in Python: faster startup and faster
+runtime. Other optimizations not under this project are listed in `Optimizations`_.
+
+Faster Startup
+--------------
+
+Frozen imports / Static code objects
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Python caches bytecode in the :ref:`__pycache__<tut-pycache>` directory to
+speed up module loading.
+
+Previously in 3.10, Python module execution looked like this:
+
+.. code-block:: text
+
+ Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate
+
+In Python 3.11, the core modules essential for Python startup are "frozen".
+This means that their code objects (and bytecode) are statically allocated
+by the interpreter. This reduces the steps in module execution process to this:
+
+.. code-block:: text
+
+ Statically allocated code object -> Evaluate
+
+Interpreter startup is now 10-15% faster in Python 3.11. This has a big
+impact for short-running programs using Python.
+
+(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
+
+
+Faster Runtime
+--------------
+
+Cheaper, lazy Python frames
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Python frames are created whenever Python calls a Python function. This frame
+holds execution information. The following are new frame optimizations:
+
+- Streamlined the frame creation process.
+- Avoided memory allocation by generously re-using frame space on the C stack.
+- Streamlined the internal frame struct to contain only essential information.
+ Frames previously held extra debugging and memory management information.
+
+Old-style frame objects are now created only when required by debuggers. For
+most user code, no frame objects are created at all. As a result, nearly all
+Python functions calls have sped up significantly. We measured a 3-7% speedup
+in pyperformance.
+
+(Contributed by Mark Shannon in :issue:`44590`.)
+
+.. _inline-calls:
+
+Inlined Python function calls
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+During a Python function call, Python will call an evaluating C function to
+interpret that function's code. This effectively limits pure Python recursion to
+what's safe for the C stack.
+
+In 3.11, when CPython detects Python code calling another Python function,
+it sets up a new frame, and "jumps" to the new code inside the new frame. This
+avoids calling the C interpreting function altogether.
+
+Most Python function calls now consume no C stack space. This speeds up
+most of such calls. In simple recursive functions like fibonacci or
+factorial, a 1.7x speedup was observed. This also means recursive functions
+can recurse significantly deeper (if the user increases the recursion limit).
+We measured a 1-3% improvement in pyperformance.
+
+(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)
+
+PEP 659: Specializing Adaptive Interpreter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+:pep:`659` is one of the key parts of the faster CPython project. The general
+idea is that while Python is a dynamic language, most code has regions where
+objects and types rarely change. This concept is known as *type stability*.
+
+At runtime, Python will try to look for common patterns and type stability
+in the executing code. Python will then replace the current operation with a
+more specialized one. This specialized operation uses fast paths available only
+to those use cases/types, which generally outperform their generic
+counterparts. This also brings in another concept called *inline caching*, where
+Python caches the results of expensive operations directly in the bytecode.
+
+The specializer will also combine certain common instruction pairs into one
+superinstruction. This reduces the overhead during execution.
+
+Python will only specialize
+when it sees code that is "hot" (executed multiple times). This prevents Python
+from wasting time for run-once code. Python can also de-specialize when code is
+too dynamic or when the use changes. Specialization is attempted periodically,
+and specialization attempts are not too expensive. This allows specialization
+to adapt to new circumstances.
+
+(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
+See :pep:`659` for more information.)
+
+..
+ If I missed out anyone, please add them.
+
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Operation | Form | Specialization | Operation speedup | Contributor(s) |
+| | | | (up to) | |
++===============+====================+=======================================================+===================+===================+
+| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
+| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, |
+| | | fast paths for their underlying types. | | Brandt Bucher, |
+| | | | | Dennis Sweeney |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-25% | Irit Katriel, |
+| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon |
+| | | data structures. | | |
+| | | | | |
+| | | Subscripting custom ``__getitem__`` | | |
+| | | is also inlined similar to :ref:`inline-calls`. | | |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney |
+| subscript | | | | |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Calls | ``f(arg)`` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, |
+| | ``C(arg)`` | as ``len`` and ``str`` directly call their underlying | | Ken Jin |
+| | | C version. This avoids going through the internal | | |
+| | | calling convention. | | |
+| | | | | |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Load | ``print`` | The object's index in the globals/builtins namespace | [1]_ | Mark Shannon |
+| global | ``len`` | is cached. Loading globals and builtins require | | |
+| variable | | zero namespace lookups. | | |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [2]_ | Mark Shannon |
+| attribute | | index inside the class/object's namespace is cached. | | |
+| | | In most cases, attribute loading will require zero | | |
+| | | namespace lookups. | | |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% | Ken Jin, |
+| methods for | | loading now has no namespace lookups -- even for | | Mark Shannon |
+| call | | classes with long inheritance chains. | | |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon |
+| attribute | | | in pyperformance | |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | 8% | Brandt Bucher |
+| Sequence | | and ``tuple``. Avoids internal calling convention. | | |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+
+.. [1] A similar optimization already existed since Python 3.8. 3.11
+ specializes for more forms and reduces some overhead.
+
+.. [2] A similar optimization already existed since Python 3.10.
+ 3.11 specializes for more forms. Furthermore, all attribute loads should
+ be sped up by :issue:`45947`.
+
+
+Misc
+----
+
+* Objects now require less memory due to lazily created object namespaces. Their
+ namespace dictionaries now also share keys more freely.
+ (Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.)
+
+* A more concise representation of exceptions in the interpreter reduced the
+ time required for catching an exception by about 10%.
+ (Contributed by Irit Katriel in :issue:`45711`.)
+
+FAQ
+---
+
+| Q: How should I write my code to utilize these speedups?
+|
+| A: You don't have to change your code. Write Pythonic code that follows common
+ best practices. The Faster CPython project optimizes for common code
+ patterns we observe.
+|
+|
+| Q: Will CPython 3.11 use more memory?
+|
+| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
+ This is offset by memory optimizations for frame objects and object
+ dictionaries as mentioned above.
+|
+|
+| Q: I don't see any speedups in my workload. Why?
+|
+| A: Certain code won't have noticeable benefits. If your code spends most of
+ its time on I/O operations, or already does most of its
+ computation in a C extension library like numpy, there won't be significant
+ speedup. This project currently benefits pure-Python workloads the most.
+|
+| Furthermore, the pyperformance figures are a geometric mean. Even within the
+ pyperformance benchmarks, certain benchmarks have slowed down slightly, while
+ others have sped up by nearly 2x!
+|
+|
+| Q: Is there a JIT compiler?
+|
+| A: No. We're still exploring other optimizations.
+
+
+About
+-----
+
+Faster CPython explores optimizations for :term:`CPython`. The main team is
+funded by Microsoft to work on this full-time. Pablo Galindo Salgado is also
+funded by Bloomberg LP to work on the project part-time. Finally, many
+contributors are volunteers from the community.
+
+
CPython bytecode changes
========================