summaryrefslogtreecommitdiffstats
path: root/Doc
diff options
context:
space:
mode:
authorPablo Galindo Salgado <Pablogsal@gmail.com>2022-08-30 17:11:18 (GMT)
committerGitHub <noreply@github.com>2022-08-30 17:11:18 (GMT)
commit6d791a97364b68d5f9c3514a0470aac487fc538d (patch)
tree745205d7e8698ea7398eb353311f55dc973507bf /Doc
parent0f733fffe8f4caaac3ce1b5306af86b42fb0c7fa (diff)
downloadcpython-6d791a97364b68d5f9c3514a0470aac487fc538d.zip
cpython-6d791a97364b68d5f9c3514a0470aac487fc538d.tar.gz
cpython-6d791a97364b68d5f9c3514a0470aac487fc538d.tar.bz2
gh-96143: Allow Linux perf profiler to see Python calls (GH-96123)
:warning: :warning: Note for reviewers, hackers and fellow systems/low-level/compiler engineers :warning: :warning: If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**. If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
Diffstat (limited to 'Doc')
-rw-r--r--Doc/c-api/init_config.rst14
-rw-r--r--Doc/howto/index.rst1
-rw-r--r--Doc/howto/perf_profiling.rst200
-rw-r--r--Doc/using/cmdline.rst13
4 files changed, 228 insertions, 0 deletions
diff --git a/Doc/c-api/init_config.rst b/Doc/c-api/init_config.rst
index 2074ec4..c4a342e 100644
--- a/Doc/c-api/init_config.rst
+++ b/Doc/c-api/init_config.rst
@@ -1155,6 +1155,20 @@ PyConfig
Default: ``-1`` in Python mode, ``0`` in isolated mode.
+ .. c:member:: int perf_profiling
+
+ Enable compatibility mode with the perf profiler?
+
+ If non-zero, initialize the perf trampoline. See :ref:`perf_profiling`
+ for more information.
+
+ Set by :option:`-X perf <-X>` command line option and by the
+ :envvar:`PYTHONPERFSUPPORT` environment variable.
+
+ Default: ``-1``.
+
+ .. versionadded:: 3.12
+
.. c:member:: int use_environment
Use :ref:`environment variables <using-on-envvars>`?
diff --git a/Doc/howto/index.rst b/Doc/howto/index.rst
index 8a378e6..f521276 100644
--- a/Doc/howto/index.rst
+++ b/Doc/howto/index.rst
@@ -30,6 +30,7 @@ Currently, the HOWTOs are:
ipaddress.rst
clinic.rst
instrumentation.rst
+ perf_profiling.rst
annotations.rst
isolating-extensions.rst
diff --git a/Doc/howto/perf_profiling.rst b/Doc/howto/perf_profiling.rst
new file mode 100644
index 0000000..2e1bb48
--- /dev/null
+++ b/Doc/howto/perf_profiling.rst
@@ -0,0 +1,200 @@
+.. highlight:: shell-session
+
+.. _perf_profiling:
+
+==============================================
+Python support for the Linux ``perf`` profiler
+==============================================
+
+:author: Pablo Galindo
+
+The Linux ``perf`` profiler is a very powerful tool that allows you to profile and
+obtain information about the performance of your application. ``perf`` also has
+a very vibrant ecosystem of tools that aid with the analysis of the data that it
+produces.
+
+The main problem with using the ``perf`` profiler with Python applications is that
+``perf`` only allows to get information about native symbols, this is, the names of
+the functions and procedures written in C. This means that the names and file names
+of the Python functions in your code will not appear in the output of the ``perf``.
+
+Since Python 3.12, the interpreter can run in a special mode that allows Python
+functions to appear in the output of the ``perf`` profiler. When this mode is
+enabled, the interpreter will interpose a small piece of code compiled on the
+fly before the execution of every Python function and it will teach ``perf`` the
+relationship between this piece of code and the associated Python function using
+`perf map files`_.
+
+.. warning::
+
+ Support for the ``perf`` profiler is only currently available for Linux on
+ selected architectures. Check the output of the configure build step or
+ check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE``
+ to see if your system is supported.
+
+For example, consider the following script:
+
+.. code-block:: python
+
+ def foo(n):
+ result = 0
+ for _ in range(n):
+ result += 1
+ return result
+
+ def bar(n):
+ foo(n)
+
+ def baz(n):
+ bar(n)
+
+ if __name__ == "__main__":
+ baz(1000000)
+
+We can run perf to sample CPU stack traces at 9999 Hertz:
+
+ $ perf record -F 9999 -g -o perf.data python my_script.py
+
+Then we can use perf report to analyze the data:
+
+.. code-block:: shell-session
+
+ $ perf report --stdio -n -g
+
+ # Children Self Samples Command Shared Object Symbol
+ # ........ ........ ............ .......... .................. ..........................................
+ #
+ 91.08% 0.00% 0 python.exe python.exe [.] _start
+ |
+ ---_start
+ |
+ --90.71%--__libc_start_main
+ Py_BytesMain
+ |
+ |--56.88%--pymain_run_python.constprop.0
+ | |
+ | |--56.13%--_PyRun_AnyFileObject
+ | | _PyRun_SimpleFileObject
+ | | |
+ | | |--55.02%--run_mod
+ | | | |
+ | | | --54.65%--PyEval_EvalCode
+ | | | _PyEval_EvalFrameDefault
+ | | | PyObject_Vectorcall
+ | | | _PyEval_Vector
+ | | | _PyEval_EvalFrameDefault
+ | | | PyObject_Vectorcall
+ | | | _PyEval_Vector
+ | | | _PyEval_EvalFrameDefault
+ | | | PyObject_Vectorcall
+ | | | _PyEval_Vector
+ | | | |
+ | | | |--51.67%--_PyEval_EvalFrameDefault
+ | | | | |
+ | | | | |--11.52%--_PyLong_Add
+ | | | | | |
+ | | | | | |--2.97%--_PyObject_Malloc
+ ...
+
+As you can see here, the Python functions are not shown in the output, only ``_Py_Eval_EvalFrameDefault`` appears
+(the function that evaluates the Python bytecode) shows up. Unfortunately that's not very useful because all Python
+functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which
+bytecode-evaluating function.
+
+Instead, if we run the same experiment with perf support activated we get:
+
+.. code-block:: shell-session
+
+ $ perf report --stdio -n -g
+
+ # Children Self Samples Command Shared Object Symbol
+ # ........ ........ ............ .......... .................. .....................................................................
+ #
+ 90.58% 0.36% 1 python.exe python.exe [.] _start
+ |
+ ---_start
+ |
+ --89.86%--__libc_start_main
+ Py_BytesMain
+ |
+ |--55.43%--pymain_run_python.constprop.0
+ | |
+ | |--54.71%--_PyRun_AnyFileObject
+ | | _PyRun_SimpleFileObject
+ | | |
+ | | |--53.62%--run_mod
+ | | | |
+ | | | --53.26%--PyEval_EvalCode
+ | | | py::<module>:/src/script.py
+ | | | _PyEval_EvalFrameDefault
+ | | | PyObject_Vectorcall
+ | | | _PyEval_Vector
+ | | | py::baz:/src/script.py
+ | | | _PyEval_EvalFrameDefault
+ | | | PyObject_Vectorcall
+ | | | _PyEval_Vector
+ | | | py::bar:/src/script.py
+ | | | _PyEval_EvalFrameDefault
+ | | | PyObject_Vectorcall
+ | | | _PyEval_Vector
+ | | | py::foo:/src/script.py
+ | | | |
+ | | | |--51.81%--_PyEval_EvalFrameDefault
+ | | | | |
+ | | | | |--13.77%--_PyLong_Add
+ | | | | | |
+ | | | | | |--3.26%--_PyObject_Malloc
+
+
+
+Enabling perf profiling mode
+----------------------------
+
+There are two main ways to activate the perf profiling mode. If you want it to be
+active since the start of the Python interpreter, you can use the `-Xperf` option:
+
+ $ python -Xperf my_script.py
+
+There is also support for dynamically activating and deactivating the perf
+profiling mode by using the APIs in the :mod:`sys` module:
+
+.. code-block:: python
+
+ import sys
+ sys.activate_stack_trampoline("perf")
+
+ # Run some code with Perf profiling active
+
+ sys.deactivate_stack_trampoline()
+
+ # Perf profiling is not active anymore
+
+These APIs can be handy if you want to activate/deactivate profiling mode in
+response to a signal or other communication mechanism with your process.
+
+
+
+Now we can analyze the data with ``perf report``:
+
+ $ perf report -g -i perf.data
+
+
+How to obtain the best results
+-------------------------------
+
+For the best results, Python should be compiled with
+``CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"`` as this allows
+profilers to unwind using only the frame pointer and not on DWARF debug
+information. This is because as the code that is interposed to allow perf
+support is dynamically generated it doesn't have any DWARF debugging information
+available.
+
+You can check if you system has been compiled with this flag by running:
+
+ $ python -m sysconfig | grep 'no-omit-frame-pointer'
+
+If you don't see any output it means that your interpreter has not been compiled with
+frame pointers and therefore it may not be able to show Python functions in the output
+of ``perf``.
+
+.. _perf map files: https://github.com/torvalds/linux/blob/0513e464f9007b70b96740271a948ca5ab6e7dd7/tools/perf/Documentation/jit-interface.txt
diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst
index 6678d47..5ecc882 100644
--- a/Doc/using/cmdline.rst
+++ b/Doc/using/cmdline.rst
@@ -535,6 +535,12 @@ Miscellaneous options
development (running from the source tree) then the default is "off".
Note that the "importlib_bootstrap" and "importlib_bootstrap_external"
frozen modules are always used, even if this flag is set to "off".
+ * ``-X perf`` to activate compatibility mode with the ``perf`` profiler.
+ When this option is activated, the Linux ``perf`` profiler will be able to
+ report Python calls. This option is only available on some platforms and
+ will do nothing if is not supported on the current system. The default value
+ is "off". See also :envvar:`PYTHONPERFSUPPORT` and :ref:`perf_profiling`
+ for more information.
It also allows passing arbitrary values and retrieving them through the
:data:`sys._xoptions` dictionary.
@@ -1025,6 +1031,13 @@ conflict.
.. versionadded:: 3.11
+.. envvar:: PYTHONPERFSUPPORT
+
+ If this variable is set to a nonzero value, it activates compatibility mode
+ with the ``perf`` profiler so Python calls can be detected by it. See the
+ :ref:`perf_profiling` section for more information.
+
+ .. versionadded:: 3.12
Debug-mode variables