diff options
author | Pablo Galindo Salgado <Pablogsal@gmail.com> | 2022-08-30 17:11:18 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-08-30 17:11:18 (GMT) |
commit | 6d791a97364b68d5f9c3514a0470aac487fc538d (patch) | |
tree | 745205d7e8698ea7398eb353311f55dc973507bf /Doc | |
parent | 0f733fffe8f4caaac3ce1b5306af86b42fb0c7fa (diff) | |
download | cpython-6d791a97364b68d5f9c3514a0470aac487fc538d.zip cpython-6d791a97364b68d5f9c3514a0470aac487fc538d.tar.gz cpython-6d791a97364b68d5f9c3514a0470aac487fc538d.tar.bz2 |
gh-96143: Allow Linux perf profiler to see Python calls (GH-96123)
:warning: :warning: Note for reviewers, hackers and fellow systems/low-level/compiler engineers :warning: :warning:
If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**.
If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
Diffstat (limited to 'Doc')
-rw-r--r-- | Doc/c-api/init_config.rst | 14 | ||||
-rw-r--r-- | Doc/howto/index.rst | 1 | ||||
-rw-r--r-- | Doc/howto/perf_profiling.rst | 200 | ||||
-rw-r--r-- | Doc/using/cmdline.rst | 13 |
4 files changed, 228 insertions, 0 deletions
diff --git a/Doc/c-api/init_config.rst b/Doc/c-api/init_config.rst index 2074ec4..c4a342e 100644 --- a/Doc/c-api/init_config.rst +++ b/Doc/c-api/init_config.rst @@ -1155,6 +1155,20 @@ PyConfig Default: ``-1`` in Python mode, ``0`` in isolated mode. + .. c:member:: int perf_profiling + + Enable compatibility mode with the perf profiler? + + If non-zero, initialize the perf trampoline. See :ref:`perf_profiling` + for more information. + + Set by :option:`-X perf <-X>` command line option and by the + :envvar:`PYTHONPERFSUPPORT` environment variable. + + Default: ``-1``. + + .. versionadded:: 3.12 + .. c:member:: int use_environment Use :ref:`environment variables <using-on-envvars>`? diff --git a/Doc/howto/index.rst b/Doc/howto/index.rst index 8a378e6..f521276 100644 --- a/Doc/howto/index.rst +++ b/Doc/howto/index.rst @@ -30,6 +30,7 @@ Currently, the HOWTOs are: ipaddress.rst clinic.rst instrumentation.rst + perf_profiling.rst annotations.rst isolating-extensions.rst diff --git a/Doc/howto/perf_profiling.rst b/Doc/howto/perf_profiling.rst new file mode 100644 index 0000000..2e1bb48 --- /dev/null +++ b/Doc/howto/perf_profiling.rst @@ -0,0 +1,200 @@ +.. highlight:: shell-session + +.. _perf_profiling: + +============================================== +Python support for the Linux ``perf`` profiler +============================================== + +:author: Pablo Galindo + +The Linux ``perf`` profiler is a very powerful tool that allows you to profile and +obtain information about the performance of your application. ``perf`` also has +a very vibrant ecosystem of tools that aid with the analysis of the data that it +produces. + +The main problem with using the ``perf`` profiler with Python applications is that +``perf`` only allows to get information about native symbols, this is, the names of +the functions and procedures written in C. This means that the names and file names +of the Python functions in your code will not appear in the output of the ``perf``. + +Since Python 3.12, the interpreter can run in a special mode that allows Python +functions to appear in the output of the ``perf`` profiler. When this mode is +enabled, the interpreter will interpose a small piece of code compiled on the +fly before the execution of every Python function and it will teach ``perf`` the +relationship between this piece of code and the associated Python function using +`perf map files`_. + +.. warning:: + + Support for the ``perf`` profiler is only currently available for Linux on + selected architectures. Check the output of the configure build step or + check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE`` + to see if your system is supported. + +For example, consider the following script: + +.. code-block:: python + + def foo(n): + result = 0 + for _ in range(n): + result += 1 + return result + + def bar(n): + foo(n) + + def baz(n): + bar(n) + + if __name__ == "__main__": + baz(1000000) + +We can run perf to sample CPU stack traces at 9999 Hertz: + + $ perf record -F 9999 -g -o perf.data python my_script.py + +Then we can use perf report to analyze the data: + +.. code-block:: shell-session + + $ perf report --stdio -n -g + + # Children Self Samples Command Shared Object Symbol + # ........ ........ ............ .......... .................. .......................................... + # + 91.08% 0.00% 0 python.exe python.exe [.] _start + | + ---_start + | + --90.71%--__libc_start_main + Py_BytesMain + | + |--56.88%--pymain_run_python.constprop.0 + | | + | |--56.13%--_PyRun_AnyFileObject + | | _PyRun_SimpleFileObject + | | | + | | |--55.02%--run_mod + | | | | + | | | --54.65%--PyEval_EvalCode + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | | + | | | |--51.67%--_PyEval_EvalFrameDefault + | | | | | + | | | | |--11.52%--_PyLong_Add + | | | | | | + | | | | | |--2.97%--_PyObject_Malloc + ... + +As you can see here, the Python functions are not shown in the output, only ``_Py_Eval_EvalFrameDefault`` appears +(the function that evaluates the Python bytecode) shows up. Unfortunately that's not very useful because all Python +functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which +bytecode-evaluating function. + +Instead, if we run the same experiment with perf support activated we get: + +.. code-block:: shell-session + + $ perf report --stdio -n -g + + # Children Self Samples Command Shared Object Symbol + # ........ ........ ............ .......... .................. ..................................................................... + # + 90.58% 0.36% 1 python.exe python.exe [.] _start + | + ---_start + | + --89.86%--__libc_start_main + Py_BytesMain + | + |--55.43%--pymain_run_python.constprop.0 + | | + | |--54.71%--_PyRun_AnyFileObject + | | _PyRun_SimpleFileObject + | | | + | | |--53.62%--run_mod + | | | | + | | | --53.26%--PyEval_EvalCode + | | | py::<module>:/src/script.py + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | py::baz:/src/script.py + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | py::bar:/src/script.py + | | | _PyEval_EvalFrameDefault + | | | PyObject_Vectorcall + | | | _PyEval_Vector + | | | py::foo:/src/script.py + | | | | + | | | |--51.81%--_PyEval_EvalFrameDefault + | | | | | + | | | | |--13.77%--_PyLong_Add + | | | | | | + | | | | | |--3.26%--_PyObject_Malloc + + + +Enabling perf profiling mode +---------------------------- + +There are two main ways to activate the perf profiling mode. If you want it to be +active since the start of the Python interpreter, you can use the `-Xperf` option: + + $ python -Xperf my_script.py + +There is also support for dynamically activating and deactivating the perf +profiling mode by using the APIs in the :mod:`sys` module: + +.. code-block:: python + + import sys + sys.activate_stack_trampoline("perf") + + # Run some code with Perf profiling active + + sys.deactivate_stack_trampoline() + + # Perf profiling is not active anymore + +These APIs can be handy if you want to activate/deactivate profiling mode in +response to a signal or other communication mechanism with your process. + + + +Now we can analyze the data with ``perf report``: + + $ perf report -g -i perf.data + + +How to obtain the best results +------------------------------- + +For the best results, Python should be compiled with +``CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"`` as this allows +profilers to unwind using only the frame pointer and not on DWARF debug +information. This is because as the code that is interposed to allow perf +support is dynamically generated it doesn't have any DWARF debugging information +available. + +You can check if you system has been compiled with this flag by running: + + $ python -m sysconfig | grep 'no-omit-frame-pointer' + +If you don't see any output it means that your interpreter has not been compiled with +frame pointers and therefore it may not be able to show Python functions in the output +of ``perf``. + +.. _perf map files: https://github.com/torvalds/linux/blob/0513e464f9007b70b96740271a948ca5ab6e7dd7/tools/perf/Documentation/jit-interface.txt diff --git a/Doc/using/cmdline.rst b/Doc/using/cmdline.rst index 6678d47..5ecc882 100644 --- a/Doc/using/cmdline.rst +++ b/Doc/using/cmdline.rst @@ -535,6 +535,12 @@ Miscellaneous options development (running from the source tree) then the default is "off". Note that the "importlib_bootstrap" and "importlib_bootstrap_external" frozen modules are always used, even if this flag is set to "off". + * ``-X perf`` to activate compatibility mode with the ``perf`` profiler. + When this option is activated, the Linux ``perf`` profiler will be able to + report Python calls. This option is only available on some platforms and + will do nothing if is not supported on the current system. The default value + is "off". See also :envvar:`PYTHONPERFSUPPORT` and :ref:`perf_profiling` + for more information. It also allows passing arbitrary values and retrieving them through the :data:`sys._xoptions` dictionary. @@ -1025,6 +1031,13 @@ conflict. .. versionadded:: 3.11 +.. envvar:: PYTHONPERFSUPPORT + + If this variable is set to a nonzero value, it activates compatibility mode + with the ``perf`` profiler so Python calls can be detected by it. See the + :ref:`perf_profiling` section for more information. + + .. versionadded:: 3.12 Debug-mode variables |