summaryrefslogtreecommitdiffstats
path: root/Doc/library/profiling.sampling.rst
blob: 1f60e2cb578c4dd786a974e184b706ac72a29b75 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
.. highlight:: shell-session

.. _profiling-sampling:

***************************************************
:mod:`profiling.sampling` --- Statistical profiler
***************************************************

.. module:: profiling.sampling
   :synopsis: Statistical sampling profiler for Python processes.

.. versionadded:: 3.15

**Source code:** :source:`Lib/profiling/sampling/`

.. program:: profiling.sampling

--------------

.. image:: tachyon-logo.png
   :alt: Tachyon logo
   :align: center
   :width: 300px

The :mod:`profiling.sampling` module, named **Tachyon**, provides statistical
profiling of Python programs through periodic stack sampling. Tachyon can
run scripts directly or attach to any running Python process without requiring
code changes or restarts. Because sampling occurs externally to the target
process, overhead is virtually zero, making Tachyon suitable for both
development and production environments.


What is statistical profiling?
==============================

Statistical profiling builds a picture of program behavior by periodically
capturing snapshots of the call stack. Rather than instrumenting every function
call and return as deterministic profilers do, Tachyon reads the call stack at
regular intervals to record what code is currently running.

This approach rests on a simple principle: functions that consume significant
CPU time will appear frequently in the collected samples. By gathering thousands
of samples over a profiling session, Tachyon constructs an accurate statistical
estimate of where time is spent. The more samples collected, the
more precise this estimate becomes.


How time is estimated
---------------------

The time values shown in Tachyon's output are **estimates derived from sample
counts**, not direct measurements. Tachyon counts how many times each function
appears in the collected samples, then multiplies by the sampling interval to
estimate time.

For example, with a 100 microsecond sampling interval over a 10-second profile,
Tachyon collects approximately 100,000 samples. If a function appears in 5,000
samples (5% of total), Tachyon estimates it consumed 5% of the 10-second
duration, or about 500 milliseconds. This is a statistical estimate, not a
precise measurement.

The accuracy of these estimates depends on sample count. With 100,000 samples,
a function showing 5% has a margin of error of roughly ±0.5%. With only 1,000
samples, the same 5% measurement could actually represent anywhere from 3% to
7% of real time.

This is why longer profiling durations and shorter sampling intervals produce
more reliable results---they collect more samples. For most performance
analysis, the default settings provide sufficient accuracy to identify
bottlenecks and guide optimization efforts.

Because sampling is statistical, results will vary slightly between runs. A
function showing 12% in one run might show 11% or 13% in the next. This is
normal and expected. Focus on the overall pattern rather than exact percentages,
and don't worry about small variations between runs.


When to use a different approach
--------------------------------

Statistical sampling is not ideal for every situation.

For very short scripts that complete in under one second, the profiler may not
collect enough samples for reliable results. Use :mod:`profiling.tracing`
instead, or run the script in a loop to extend profiling time.

When you need exact call counts, sampling cannot provide them. Sampling
estimates frequency from snapshots, so if you need to know precisely how many
times a function was called, use :mod:`profiling.tracing`.

When comparing two implementations where the difference might be only 1-2%,
sampling noise can obscure real differences. Use :mod:`timeit` for
micro-benchmarks or :mod:`profiling.tracing` for precise measurements.


The key difference from :mod:`profiling.tracing` is how measurement happens.
A tracing profiler instruments your code, recording every function call and
return. This provides exact call counts and precise timing but adds overhead
to every function call. A sampling profiler, by contrast, observes the program
from outside at fixed intervals without modifying its execution. Think of the
difference like this: tracing is like having someone follow you and write down
every step you take, while sampling is like taking photographs every second
and inferring your path from those snapshots.

This external observation model is what makes sampling profiling practical for
production use. The profiled program runs at full speed because there is no
instrumentation code running inside it, and the target process is never stopped
or paused during sampling---Tachyon reads the call stack directly from the
process's memory while it continues to run. You can attach to a live server,
collect data, and detach without the application ever knowing it was observed.
The trade-off is that very short-lived functions may be missed if they happen
to complete between samples.

Statistical profiling excels at answering the question, "Where is my program
spending time?" It reveals hotspots and bottlenecks in production code where
deterministic profiling overhead would be unacceptable. For exact call counts
and complete call graphs, use :mod:`profiling.tracing` instead.


Quick examples
==============

Profile a script and see the results immediately::

   python -m profiling.sampling run script.py

Profile a module with arguments::

   python -m profiling.sampling run -m mypackage.module arg1 arg2

Generate an interactive flame graph::

   python -m profiling.sampling run --flamegraph -o profile.html script.py

Attach to a running process by PID::

   python -m profiling.sampling attach 12345

Use live mode for real-time monitoring (press ``q`` to quit)::

   python -m profiling.sampling run --live script.py

Profile for 60 seconds with a faster sampling rate::

   python -m profiling.sampling run -d 60 -i 50 script.py

Generate a line-by-line heatmap::

   python -m profiling.sampling run --heatmap script.py

Enable opcode-level profiling to see which bytecode instructions are executing::

   python -m profiling.sampling run --opcodes --flamegraph script.py


Commands
========

Tachyon operates through two subcommands that determine how to obtain the
target process.


The ``run`` command
-------------------

The ``run`` command launches a Python script or module and profiles it from
startup::

   python -m profiling.sampling run script.py
   python -m profiling.sampling run -m mypackage.module

When profiling a script, the profiler starts the target in a subprocess, waits
for it to initialize, then begins collecting samples. The ``-m`` flag
indicates that the target should be run as a module (equivalent to
``python -m``). Arguments after the target are passed through to the
profiled program::

   python -m profiling.sampling run script.py --config settings.yaml


The ``attach`` command
----------------------

The ``attach`` command connects to an already-running Python process by its
process ID::

   python -m profiling.sampling attach 12345

This command is particularly valuable for investigating performance issues in
production systems. The target process requires no modification and need not
be restarted. The profiler attaches, collects samples for the specified
duration, then detaches and produces output.

::

   python -m profiling.sampling attach --live 12345
   python -m profiling.sampling attach --flamegraph -d 30 -o profile.html 12345

On most systems, attaching to another process requires appropriate permissions.
See :ref:`profiling-permissions` for platform-specific requirements.


Profiling in production
-----------------------

The sampling profiler is designed for production use. It imposes no measurable
overhead on the target process because it reads memory externally rather than
instrumenting code. The target application continues running at full speed and
is unaware it is being profiled.

When profiling production systems, keep these guidelines in mind:

Start with shorter durations (10-30 seconds) to get quick results, then extend
if you need more statistical accuracy. The default 10-second duration is usually
sufficient to identify major hotspots.

If possible, profile during representative load rather than peak traffic.
Profiles collected during normal operation are easier to interpret than those
collected during unusual spikes.

The profiler itself consumes some CPU on the machine where it runs (not on the
target process). On the same machine, this is typically negligible. When
profiling remote processes, network latency does not affect the target.

Results from production may differ from development due to different data
sizes, concurrent load, or caching effects. This is expected and is often
exactly what you want to capture.


.. _profiling-permissions:

Platform requirements
---------------------

The profiler reads the target process's memory to capture stack traces. This
requires elevated permissions on most operating systems.

**Linux**

On Linux, the profiler uses ``ptrace`` or ``process_vm_readv`` to read the
target process's memory. This typically requires one of:

- Running as root
- Having the ``CAP_SYS_PTRACE`` capability
- Adjusting the Yama ptrace scope: ``/proc/sys/kernel/yama/ptrace_scope``

The default ptrace_scope of 1 restricts ptrace to parent processes only. To
allow attaching to any process owned by the same user, set it to 0::

   echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

**macOS**

On macOS, the profiler uses ``task_for_pid()`` to access the target process.
This requires one of:

- Running as root
- The profiler binary having the ``com.apple.security.cs.debugger`` entitlement
- System Integrity Protection (SIP) being disabled (not recommended)

**Windows**

On Windows, the profiler requires administrative privileges or the
``SeDebugPrivilege`` privilege to read another process's memory.


Version compatibility
---------------------

The profiler and target process must run the same Python minor version (for
example, both Python 3.15). Attaching from Python 3.14 to a Python 3.15 process
is not supported.

Additional restrictions apply to pre-release Python versions: if either the
profiler or target is running a pre-release (alpha, beta, or release candidate),
both must run the exact same version.

On free-threaded Python builds, the profiler cannot attach from a free-threaded
build to a standard build, or vice versa.


Sampling configuration
======================

Before exploring the various output formats and visualization options, it is
important to understand how to configure the sampling process itself. The
profiler offers several options that control how frequently samples are
collected, how long profiling runs, which threads are observed, and what
additional context is captured in each sample.

The default configuration works well for most use cases:

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Option
     - Default
   * - Default for ``--interval`` / ``-i``
     - 100 µs between samples (~10,000 samples/sec)
   * - Default for ``--duration`` / ``-d``
     - 10 seconds
   * - Default for ``--all-threads`` / ``-a``
     - Main thread only
   * - Default for ``--native``
     - No ``<native>`` frames (C code time attributed to caller)
   * - Default for ``--no-gc``
     - ``<GC>`` frames included when garbage collection is active
   * - Default for ``--mode``
     - Wall-clock mode (all samples recorded)
   * - Default for ``--realtime-stats``
     - Disabled
   * - Default for ``--subprocesses``
     - Disabled


Sampling interval and duration
------------------------------

The two most fundamental parameters are the sampling interval and duration.
Together, these determine how many samples will be collected during a profiling
session.

The :option:`--interval` option (:option:`-i`) sets the time between samples in
microseconds. The default is 100 microseconds, which produces approximately
10,000 samples per second::

   python -m profiling.sampling run -i 50 script.py

Lower intervals capture more samples and provide finer-grained data at the
cost of slightly higher profiler CPU usage. Higher intervals reduce profiler
overhead but may miss short-lived functions. For most applications, the
default interval provides a good balance between accuracy and overhead.

The :option:`--duration` option (:option:`-d`) sets how long to profile in seconds. The
default is 10 seconds::

   python -m profiling.sampling run -d 60 script.py

Longer durations collect more samples and produce more statistically reliable
results, especially for code paths that execute infrequently. When profiling
a program that runs for a fixed time, you may want to set the duration to
match or exceed the expected runtime.


Thread selection
----------------

Python programs often use multiple threads, whether explicitly through the
:mod:`threading` module or implicitly through libraries that manage thread
pools.

By default, the profiler samples only the main thread. The :option:`--all-threads`
option (:option:`-a`) enables sampling of all threads in the process::

   python -m profiling.sampling run -a script.py

Multi-thread profiling reveals how work is distributed across threads and can
identify threads that are blocked or starved. Each thread's samples are
combined in the output, with the ability to filter by thread in some formats.
This option is particularly useful when investigating concurrency issues or
when work is distributed across a thread pool.


Special frames
--------------

The profiler can inject artificial frames into the captured stacks to provide
additional context about what the interpreter is doing at the moment each
sample is taken. These synthetic frames help distinguish different types of
execution that would otherwise be invisible.

The :option:`--native` option adds ``<native>`` frames to indicate when Python has
called into C code (extension modules, built-in functions, or the interpreter
itself)::

   python -m profiling.sampling run --native script.py

These frames help distinguish time spent in Python code versus time spent in
native libraries. Without this option, native code execution appears as time
in the Python function that made the call. This is useful when optimizing
code that makes heavy use of C extensions like NumPy or database drivers.

By default, the profiler includes ``<GC>`` frames when garbage collection is
active. The :option:`--no-gc` option suppresses these frames::

   python -m profiling.sampling run --no-gc script.py

GC frames help identify programs where garbage collection consumes significant
time, which may indicate memory allocation patterns worth optimizing. If you
see substantial time in ``<GC>`` frames, consider investigating object
allocation rates or using object pooling.


Opcode-aware profiling
----------------------

The :option:`--opcodes` option enables instruction-level profiling that captures
which Python bytecode instructions are executing at each sample::

   python -m profiling.sampling run --opcodes --flamegraph script.py

This feature provides visibility into Python's bytecode execution, including
adaptive specialization optimizations. When a generic instruction like
``LOAD_ATTR`` is specialized at runtime into a more efficient variant like
``LOAD_ATTR_INSTANCE_VALUE``, the profiler shows both the specialized name
and the base instruction.

Opcode information appears in several output formats:

- **Flame graphs**: Hovering over a frame displays a tooltip with a bytecode
  instruction breakdown, showing which opcodes consumed time in that function
- **Heatmap**: Expandable bytecode panels per source line show instruction
  breakdown with specialization percentages
- **Live mode**: An opcode panel shows instruction-level statistics for the
  selected function, accessible via keyboard navigation
- **Gecko format**: Opcode transitions are emitted as interval markers in the
  Firefox Profiler timeline

This level of detail is particularly useful for:

- Understanding the performance impact of Python's adaptive specialization
- Identifying hot bytecode instructions that might benefit from optimization
- Analyzing the effectiveness of different code patterns at the instruction level
- Debugging performance issues that occur at the bytecode level

The :option:`--opcodes` option is compatible with :option:`--live`, :option:`--flamegraph`,
:option:`--heatmap`, and :option:`--gecko` formats. It requires additional memory to store
opcode information and may slightly reduce sampling performance, but provides
unprecedented visibility into Python's execution model.


Real-time statistics
--------------------

The :option:`--realtime-stats` option displays sampling rate statistics during
profiling::

   python -m profiling.sampling run --realtime-stats script.py

This shows the actual achieved sampling rate, which may be lower than requested
if the profiler cannot keep up. The statistics help verify that profiling is
working correctly and that sufficient samples are being collected. See
:ref:`sampling-efficiency` for details on interpreting these metrics.


Subprocess profiling
--------------------

The :option:`--subprocesses` option enables automatic profiling of subprocesses
spawned by the target::

   python -m profiling.sampling run --subprocesses script.py
   python -m profiling.sampling attach --subprocesses 12345

When enabled, the profiler monitors the target process for child process
creation. When a new Python child process is detected, a separate profiler
instance is automatically spawned to profile it. This is useful for
applications that use :mod:`multiprocessing`, :mod:`subprocess`,
:mod:`concurrent.futures` with :class:`~concurrent.futures.ProcessPoolExecutor`,
or other process spawning mechanisms.

.. code-block:: python
   :caption: worker_pool.py

   from concurrent.futures import ProcessPoolExecutor
   import math

   def compute_factorial(n):
       total = 0
       for i in range(50):
           total += math.factorial(n)
       return total

   if __name__ == "__main__":
       numbers = [5000 + i * 100 for i in range(50)]
       with ProcessPoolExecutor(max_workers=4) as executor:
           results = list(executor.map(compute_factorial, numbers))
       print(f"Computed {len(results)} factorials")

::

   python -m profiling.sampling run --subprocesses --flamegraph worker_pool.py

This produces separate flame graphs for the main process and each worker
process: ``flamegraph_<main_pid>.html``, ``flamegraph_<worker1_pid>.html``,
and so on.

Each subprocess receives its own output file. The filename is derived from
the specified output path (or the default) with the subprocess's process ID
appended:

- If you specify ``-o profile.html``, subprocesses produce ``profile_12345.html``,
  ``profile_12346.html``, and so on
- With default output, subprocesses produce files like ``flamegraph_12345.html``
  or directories like ``heatmap_12345``
- For pstats format (which defaults to stdout), subprocesses produce files like
  ``profile_12345.pstats``

The subprocess profilers inherit most sampling options from the parent (interval,
duration, thread selection, native frames, GC frames, async-aware mode, and
output format). All Python descendant processes are profiled recursively,
including grandchildren and further descendants.

Subprocess detection works by periodically scanning for new descendants of
the target process and checking whether each new process is a Python process
by probing the process memory for Python runtime structures. Non-Python
subprocesses (such as shell commands or external tools) are ignored.

There is a limit of 100 concurrent subprocess profilers to prevent resource
exhaustion in programs that spawn many processes. If this limit is reached,
additional subprocesses are not profiled and a warning is printed.

The :option:`--subprocesses` option is incompatible with :option:`--live` mode
because live mode uses an interactive terminal interface that cannot
accommodate multiple concurrent profiler displays.


.. _sampling-efficiency:

Sampling efficiency
-------------------

Sampling efficiency metrics help assess the quality of the collected data.
These metrics appear in the profiler's terminal output and in the flame graph
sidebar.

**Sampling efficiency** is the percentage of sample attempts that succeeded.
Each sample attempt reads the target process's call stack from memory. An
attempt can fail if the process is in an inconsistent state at the moment of
reading, such as during a context switch or while the interpreter is updating
its internal structures. A low efficiency may indicate that the profiler could
not keep up with the requested sampling rate, often due to system load or an
overly aggressive interval setting.

**Missed samples** is the percentage of expected samples that were not
collected. Based on the configured interval and duration, the profiler expects
to collect a certain number of samples. Some samples may be missed if the
profiler falls behind schedule, for example when the system is under heavy
load. A small percentage of missed samples is normal and does not significantly
affect the statistical accuracy of the profile.

Both metrics are informational. Even with some failed attempts or missed
samples, the profile remains statistically valid as long as enough samples
were collected. The profiler reports the actual number of samples captured,
which you can use to judge whether the data is sufficient for your analysis.


Profiling modes
===============

The sampling profiler supports four modes that control which samples are
recorded. The mode determines what the profile measures: total elapsed time,
CPU execution time, time spent holding the global interpreter lock, or
exception handling.


Wall-clock mode
---------------

Wall-clock mode (:option:`--mode`\ ``=wall``) captures all samples regardless of what the
thread is doing. This is the default mode and provides a complete picture of
where time passes during program execution::

   python -m profiling.sampling run --mode=wall script.py

In wall-clock mode, samples are recorded whether the thread is actively
executing Python code, waiting for I/O, blocked on a lock, or sleeping.
This makes wall-clock profiling ideal for understanding the overall time
distribution in your program, including time spent waiting.

If your program spends significant time in I/O operations, network calls, or
sleep, wall-clock mode will show these waits as time attributed to the calling
function. This is often exactly what you want when optimizing end-to-end
latency.


CPU mode
--------

CPU mode (:option:`--mode`\ ``=cpu``) records samples only when the thread is actually
executing on a CPU core::

   python -m profiling.sampling run --mode=cpu script.py

Samples taken while the thread is sleeping, blocked on I/O, or waiting for
a lock are discarded. The resulting profile shows where CPU cycles are consumed,
filtering out idle time.

CPU mode is useful when you want to focus on computational hotspots without
being distracted by I/O waits. If your program alternates between computation
and network calls, CPU mode reveals which computational sections are most
expensive.


Comparing wall-clock and CPU profiles
-------------------------------------

Running both wall-clock and CPU mode profiles can reveal whether a function's
time is spent computing or waiting.

If a function appears prominently in both profiles, it is a true computational
hotspot---actively using the CPU. Optimization should focus on algorithmic
improvements or more efficient code.

If a function is high in wall-clock mode but low or absent in CPU mode, it is
I/O-bound or waiting. The function spends most of its time waiting for network,
disk, locks, or sleep. CPU optimization won't help here; consider async I/O,
connection pooling, or reducing wait time instead.

.. code-block:: python

   import time

   def do_sleep():
       time.sleep(2)

   def do_compute():
       sum(i**2 for i in range(1000000))

   if __name__ == "__main__":
       do_sleep()
       do_compute()

::

   python -m profiling.sampling run --mode=wall script.py  # do_sleep ~98%, do_compute ~1%
   python -m profiling.sampling run --mode=cpu script.py   # do_sleep absent, do_compute dominates


GIL mode
--------

GIL mode (:option:`--mode`\ ``=gil``) records samples only when the thread holds Python's
global interpreter lock::

   python -m profiling.sampling run --mode=gil script.py

The GIL is held only while executing Python bytecode. When Python calls into
C extensions, performs I/O operations, or executes native code, the GIL is
typically released. This means GIL mode effectively measures time spent
running Python code specifically, filtering out time in native libraries.

In multi-threaded programs, GIL mode reveals which code is preventing other
threads from running Python bytecode. Since only one thread can hold the GIL
at a time, functions that appear frequently in GIL mode profiles are
monopolizing the interpreter.

GIL mode helps answer questions like "which functions are monopolizing the
GIL?" and "why are my other threads starving?" It can also be useful in
single-threaded programs to distinguish Python execution time from time spent
in C extensions or I/O.

.. code-block:: python

   import hashlib

   def hash_work():
       # C extension - releases GIL during computation
       for _ in range(200):
           hashlib.sha256(b"data" * 250000).hexdigest()

   def python_work():
       # Pure Python - holds GIL during computation
       for _ in range(3):
           sum(i**2 for i in range(1000000))

   if __name__ == "__main__":
       hash_work()
       python_work()

::

   python -m profiling.sampling run --mode=cpu script.py  # hash_work ~42%, python_work ~38%
   python -m profiling.sampling run --mode=gil script.py  # hash_work ~5%, python_work ~60%


Exception mode
--------------

Exception mode (``--mode=exception``) records samples only when a thread has
an active exception::

   python -m profiling.sampling run --mode=exception script.py

Samples are recorded in two situations: when an exception is being propagated
up the call stack (after ``raise`` but before being caught), or when code is
executing inside an ``except`` block where exception information is still
present in the thread state.

The following example illustrates which code regions are captured:

.. code-block:: python

   def example():
       try:
           raise ValueError("error")    # Captured: exception being raised
       except ValueError:
           process_error()              # Captured: inside except block
       finally:
           cleanup()                    # NOT captured: exception already handled

   def example_propagating():
       try:
           try:
               raise ValueError("error")
           finally:
               cleanup()                # Captured: exception propagating through
       except ValueError:
           pass

   def example_no_exception():
       try:
           do_work()
       finally:
           cleanup()                    # NOT captured: no exception involved

Note that ``finally`` blocks are only captured when an exception is actively
propagating through them. Once an ``except`` block finishes executing, Python
clears the exception information before running any subsequent ``finally``
block. Similarly, ``finally`` blocks that run during normal execution (when no
exception was raised) are not captured because no exception state is present.

This mode is useful for understanding where your program spends time handling
errors. Exception handling can be a significant source of overhead in code
that uses exceptions for flow control (such as ``StopIteration`` in iterators)
or in applications that process many error conditions (such as network servers
handling connection failures).

Exception mode helps answer questions like "how much time is spent handling
exceptions?" and "which exception handlers are the most expensive?" It can
reveal hidden performance costs in code that catches and processes many
exceptions, even when those exceptions are handled gracefully. For example,
if a parsing library uses exceptions internally to signal format errors, this
mode will capture time spent in those handlers even if the calling code never
sees the exceptions.


Output formats
==============

The profiler produces output in several formats, each suited to different
analysis workflows. The format is selected with a command-line flag, and
output goes to stdout, a file, or a directory depending on the format.


pstats format
-------------

The pstats format (:option:`--pstats`) produces a text table similar to what
deterministic profilers generate. This is the default output format::

   python -m profiling.sampling run script.py
   python -m profiling.sampling run --pstats script.py

.. figure:: tachyon-pstats.png
   :alt: Tachyon pstats terminal output
   :align: center
   :width: 100%

   The pstats format displays profiling results in a color-coded table showing
   function hotspots, sample counts, and timing estimates.

Output appears on stdout by default::

   Profile Stats (Mode: wall):
        nsamples  sample%    tottime (ms)  cumul%   cumtime (ms)  filename:lineno(function)
          234/892    11.7%       234.00     44.6%       892.00    server.py:145(handle_request)
          156/156     7.8%       156.00      7.8%       156.00    <built-in>:0(socket.recv)
           98/421     4.9%        98.00     21.1%       421.00    parser.py:67(parse_message)

The columns show sampling counts and estimated times:

- **nsamples**: Displayed as ``direct/cumulative`` (for example, ``10/50``).
  Direct samples are when the function was at the top of the stack, actively
  executing. Cumulative samples are when the function appeared anywhere on the
  stack, including when it was waiting for functions it called. If a function
  shows ``10/50``, it was directly executing in 10 samples and was on the call
  stack in 50 samples total.

- **sample%** and **cumul%**: Percentages of total samples for direct and
  cumulative counts respectively.

- **tottime** and **cumtime**: Estimated wall-clock time based on sample counts
  and the profiling duration. Time units are selected automatically based on
  the magnitude: seconds for large values, milliseconds for moderate values,
  or microseconds for small values.

The output includes a legend explaining each column and a summary of
interesting functions that highlights:

- **Hot spots**: Functions with high direct/cumulative sample ratio (ratio
  close to 1.0). These functions spend most of their time executing their own
  code rather than waiting for callees. High ratios indicate where CPU time
  is actually consumed.

- **Indirect calls**: Functions with large differences between cumulative and
  direct samples. These are orchestration functions that delegate work to
  other functions. They appear frequently on the stack but rarely at the top.

- **Call magnification**: Functions where cumulative samples far exceed direct
  samples (high cumulative/direct multiplier). These are frequently-nested
  functions that appear deep in many call chains.

Use :option:`--no-summary` to suppress both the legend and summary sections.

To save pstats output to a file instead of stdout::

   python -m profiling.sampling run -o profile.txt script.py

The pstats format supports several options for controlling the display.
The :option:`--sort` option determines the column used for ordering results::

   python -m profiling.sampling run --sort=tottime script.py
   python -m profiling.sampling run --sort=cumtime script.py
   python -m profiling.sampling run --sort=nsamples script.py

The :option:`--limit` option restricts output to the top N entries::

   python -m profiling.sampling run --limit=30 script.py

The :option:`--no-summary` option suppresses the header summary that precedes the
statistics table.


Collapsed stacks format
-----------------------

Collapsed stacks format (:option:`--collapsed`) produces one line per unique call
stack, with a count of how many times that stack was sampled::

   python -m profiling.sampling run --collapsed script.py

The output looks like:

.. code-block:: text

   main;process_data;parse_json;decode_utf8 42
   main;process_data;parse_json 156
   main;handle_request;send_response 89

Each line contains semicolon-separated function names representing the call
stack from bottom to top, followed by a space and the sample count. This
format is designed for compatibility with external flame graph tools,
particularly Brendan Gregg's ``flamegraph.pl`` script.

To generate a flame graph from collapsed stacks::

   python -m profiling.sampling run --collapsed script.py > stacks.txt
   flamegraph.pl stacks.txt > profile.svg

The resulting SVG can be viewed in any web browser and provides an interactive
visualization where you can click to zoom into specific call paths.


Flame graph format
------------------

Flame graph format (:option:`--flamegraph`) produces a self-contained HTML file with
an interactive flame graph visualization::

   python -m profiling.sampling run --flamegraph script.py
   python -m profiling.sampling run --flamegraph -o profile.html script.py

.. figure:: tachyon-flamegraph.png
   :alt: Tachyon interactive flame graph
   :align: center
   :width: 100%

   The flame graph visualization shows call stacks as nested rectangles, with
   width proportional to time spent. The sidebar displays runtime statistics,
   GIL metrics, and hotspot functions.

.. only:: html

   `Try the interactive example <../_static/tachyon-example-flamegraph.html>`__!

If no output file is specified, the profiler generates a filename based on
the process ID (for example, ``flamegraph.12345.html``).

The generated HTML file requires no external dependencies and can be opened
directly in a web browser. The visualization displays call stacks as nested
rectangles, with width proportional to time spent. Hovering over a rectangle
shows details about that function including source code context, and clicking
zooms into that portion of the call tree.

The flame graph interface includes:

- A sidebar showing profile summary, thread statistics, sampling efficiency
  metrics (see :ref:`sampling-efficiency`), and top hotspot functions
- Search functionality supporting both function name matching and
  ``file.py:42`` line patterns
- Per-thread filtering via dropdown
- Dark/light theme toggle (preference saved across sessions)
- SVG export for saving the current view

The thread statistics section shows runtime behavior metrics:

- **GIL Held**: percentage of samples where a thread held the global interpreter
  lock (actively running Python code)
- **GIL Released**: percentage of samples where no thread held the GIL
- **Waiting GIL**: percentage of samples where a thread was waiting to acquire
  the GIL
- **GC**: percentage of samples during garbage collection

These statistics help identify GIL contention and understand how time is
distributed between Python execution, native code, and waiting.

Flame graphs are particularly effective for identifying deep call stacks and
understanding the hierarchical structure of time consumption. Wide rectangles
at the top indicate functions that consume significant time either directly
or through their callees.


Gecko format
------------

Gecko format (:option:`--gecko`) produces JSON output compatible with the Firefox
Profiler::

   python -m profiling.sampling run --gecko script.py
   python -m profiling.sampling run --gecko -o profile.json script.py

The `Firefox Profiler <https://profiler.firefox.com>`__ is a sophisticated
web-based tool originally built for profiling Firefox itself. It provides
features beyond basic flame graphs, including a timeline view, call tree
exploration, and marker visualization. See the
`Firefox Profiler documentation <https://profiler.firefox.com/docs/#/>`__ for
detailed usage instructions.

To use the output, open the Firefox Profiler in your browser and load the
JSON file. The profiler runs entirely client-side, so your profiling data
never leaves your machine.

Gecko format automatically collects additional metadata about GIL state and
CPU activity, enabling analysis features specific to Python's threading model.
The profiler emits interval markers that appear as colored bands in the
Firefox Profiler timeline:

- **GIL markers**: show when threads hold or release the global interpreter lock
- **CPU markers**: show when threads are executing on CPU versus idle
- **Code type markers**: distinguish Python code from native (C extension) code
- **GC markers**: indicate garbage collection activity

For this reason, the :option:`--mode` option is not available with Gecko format;
all relevant data is captured automatically.

.. figure:: tachyon-gecko-calltree.png
   :alt: Firefox Profiler Call Tree view
   :align: center
   :width: 100%

   The Call Tree view shows the complete call hierarchy with sample counts
   and percentages. The sidebar displays detailed statistics for the
   selected function including running time and sample distribution.

.. figure:: tachyon-gecko-flamegraph.png
   :alt: Firefox Profiler Flame Graph view
   :align: center
   :width: 100%

   The Flame Graph visualization shows call stacks as nested rectangles.
   Functions names are visible in the call hierarchy.

.. figure:: tachyon-gecko-opcodes.png
   :alt: Firefox Profiler Marker Chart with opcodes
   :align: center
   :width: 100%

   The Marker Chart displays interval markers including CPU state, GIL
   status, and opcodes. With ``--opcodes`` enabled, bytecode instructions
   like ``BINARY_OP_ADD_FLOAT``, ``CALL_PY_EXACT_ARGS``, and
   ``CALL_LIST_APPEND`` appear as markers showing execution over time.


Heatmap format
--------------

Heatmap format (:option:`--heatmap`) generates an interactive HTML visualization
showing sample counts at the source line level::

   python -m profiling.sampling run --heatmap script.py
   python -m profiling.sampling run --heatmap -o my_heatmap script.py

.. figure:: tachyon-heatmap.png
   :alt: Tachyon heatmap visualization
   :align: center
   :width: 100%

   The heatmap overlays sample counts directly on your source code. Lines are
   color-coded from cool (few samples) to hot (many samples). Navigation
   buttons (▲▼) let you jump between callers and callees.

Unlike other formats that produce a single file, heatmap output creates a
directory containing HTML files for each profiled source file. If no output
path is specified, the directory is named ``heatmap_PID``.

The heatmap visualization displays your source code with a color gradient
indicating how many samples were collected at each line. Hot lines (many
samples) appear in warm colors, while cold lines (few or no samples) appear
in cool colors. This view helps pinpoint exactly which lines of code are
responsible for time consumption.

The heatmap interface provides several interactive features:

- **Coloring modes**: toggle between "Self Time" (direct execution) and
  "Total Time" (cumulative, including time in called functions)
- **Cold code filtering**: show all lines or only lines with samples
- **Call graph navigation**: each line shows navigation buttons (▲ for callers,
  ▼ for callees) that let you trace execution paths through your code. When
  multiple functions called or were called from a line, a menu appears showing
  all options with their sample counts.
- **Scroll minimap**: a vertical overview showing the heat distribution across
  the entire file
- **Hierarchical index**: files organized by type (stdlib, site-packages,
  project) with aggregate sample counts per folder
- **Dark/light theme**: toggle with preference saved across sessions
- **Line linking**: click line numbers to create shareable URLs

When opcode-level profiling is enabled with :option:`--opcodes`, each hot line
can be expanded to show which bytecode instructions consumed time:

.. figure:: tachyon-heatmap-with-opcodes.png
   :alt: Heatmap with expanded bytecode panel
   :align: center
   :width: 100%

   Expanding a hot line reveals the bytecode instructions executed, including
   specialized variants. The panel shows sample counts per instruction and the
   overall specialization percentage for the line.

.. only:: html

   `Try the interactive example <../_static/tachyon-example-heatmap.html>`__!

Heatmaps are especially useful when you know which file contains a performance
issue but need to identify the specific lines. Many developers prefer this
format because it maps directly to their source code, making it easy to read
and navigate. For smaller scripts and focused analysis, heatmaps provide an
intuitive view that shows exactly where time is spent without requiring
interpretation of hierarchical visualizations.


Live mode
=========

Live mode (:option:`--live`) provides a terminal-based real-time view of profiling
data, similar to the ``top`` command for system processes::

   python -m profiling.sampling run --live script.py
   python -m profiling.sampling attach --live 12345

.. figure:: tachyon-live-mode-2.gif
   :alt: Tachyon live mode showing all threads
   :align: center
   :width: 100%

   Live mode displays real-time profiling statistics, showing combined
   data from multiple threads in a multi-threaded application.

The display updates continuously as new samples arrive, showing the current
hottest functions. This mode requires the :mod:`curses` module, which is
available on Unix-like systems but not on Windows. The terminal must be at
least 60 columns wide and 12 lines tall; larger terminals display more columns.

The header displays the top 3 hottest functions, sampling efficiency metrics,
and thread status statistics (GIL held percentage, CPU usage, GC time). The
main table shows function statistics with the currently sorted column indicated
by an arrow (▼).

When :option:`--opcodes` is enabled, an additional opcode panel appears below the
main table, showing instruction-level statistics for the currently selected
function. This panel displays which bytecode instructions are executing most
frequently, including specialized variants and their base opcodes.

.. figure:: tachyon-live-mode-1.gif
   :alt: Tachyon live mode with opcode panel
   :align: center
   :width: 100%

   Live mode with ``--opcodes`` enabled shows an opcode panel with a bytecode
   instruction breakdown for the selected function.


Keyboard commands
-----------------

Within live mode, keyboard commands control the display:

:kbd:`q`
   Quit the profiler and return to the shell.

:kbd:`s` / :kbd:`S`
   Cycle through sort orders forward/backward (sample count, percentage,
   total time, cumulative percentage, cumulative time).

:kbd:`p`
   Pause or resume display updates. Sampling continues in the background
   while the display is paused, so you can freeze the view to examine results
   without stopping data collection.

:kbd:`r`
   Reset all statistics and start fresh. This is disabled after profiling
   finishes to prevent accidental data loss.

:kbd:`/`
   Enter filter mode to search for functions by name. The filter uses
   case-insensitive substring matching against the filename and function name.
   Type a pattern and press Enter to apply, or Escape to cancel. Glob patterns
   and regular expressions are not supported.

:kbd:`c`
   Clear the current filter and show all functions again.

:kbd:`t`
   Toggle between viewing all threads combined or per-thread statistics.
   In per-thread mode, a thread counter (for example, ``1/4``) appears showing
   your position among the available threads.

:kbd:`←` :kbd:`→` or :kbd:`↑` :kbd:`↓`
   In per-thread view, navigate between threads. Navigation wraps around
   from the last thread to the first and vice versa.

:kbd:`+` / :kbd:`-`
   Increase or decrease the display refresh rate. The range is 0.05 seconds
   (20 Hz, very responsive) to 1.0 second (1 Hz, lower overhead). Faster refresh
   rates use more CPU. The default is 0.1 seconds (10 Hz).

:kbd:`x`
   Toggle trend indicators that show whether functions are becoming hotter
   or cooler over time. When enabled, increasing metrics appear in green and
   decreasing metrics appear in red, comparing each update to the previous one.

:kbd:`h` or :kbd:`?`
   Show the help screen with all available commands.

:kbd:`j` / :kbd:`k` (or :kbd:`Up` / :kbd:`Down`)
   Navigate through opcode entries in the opcode panel (when ``--opcodes`` is
   enabled). These keys scroll through the instruction-level statistics for the
   currently selected function.

When profiling finishes (duration expires or target process exits), the display
shows a "PROFILING COMPLETE" banner and freezes the final results. You can
still navigate, sort, and filter the results before pressing :kbd:`q` to exit.

Live mode is incompatible with output format options (:option:`--collapsed`,
:option:`--flamegraph`, and so on) because it uses an interactive terminal
interface rather than producing file output.


Async-aware profiling
=====================

For programs using :mod:`asyncio`, the profiler offers async-aware mode
(:option:`--async-aware`) that reconstructs call stacks based on the task structure
rather than the raw Python frames::

   python -m profiling.sampling run --async-aware async_script.py

Standard profiling of async code can be confusing because the physical call
stack often shows event loop internals rather than the logical flow of your
coroutines. Async-aware mode addresses this by tracking which task is running
and presenting stacks that reflect the ``await`` chain.

.. code-block:: python

   import asyncio

   async def fetch(url):
       await asyncio.sleep(0.1)
       return url

   async def main():
       for _ in range(50):
           await asyncio.gather(fetch("a"), fetch("b"), fetch("c"))

   if __name__ == "__main__":
       asyncio.run(main())

::

   python -m profiling.sampling run --async-aware --flamegraph -o out.html script.py

.. note::

   Async-aware profiling requires the target process to have the :mod:`asyncio`
   module loaded. If you profile a script before it imports asyncio, async-aware
   mode will not be able to capture task information.


Async modes
-----------

The :option:`--async-mode` option controls which tasks appear in the profile::

   python -m profiling.sampling run --async-aware --async-mode=running async_script.py
   python -m profiling.sampling run --async-aware --async-mode=all async_script.py

With :option:`--async-mode`\ ``=running`` (the default), only the task currently executing
on the CPU is profiled. This shows where your program is actively spending time
and is the typical choice for performance analysis.

With :option:`--async-mode`\ ``=all``, tasks that are suspended (awaiting I/O, locks, or
other tasks) are also included. This mode is useful for understanding what your
program is waiting on, but produces larger profiles since every suspended task
appears in each sample.


Task markers and stack reconstruction
-------------------------------------

In async-aware profiles, you will see ``<task>`` frames that mark boundaries
between asyncio tasks. These are synthetic frames inserted by the profiler to
show the task structure. The task name appears as the function name in these
frames.

When a task awaits another task, the profiler reconstructs the logical call
chain by following the ``await`` relationships. Only "leaf" tasks (tasks that
no other task is currently awaiting) generate their own stack entries. Tasks
being awaited by other tasks appear as part of their awaiter's stack instead.

If a task has multiple awaiters (a diamond pattern in the task graph), the
profiler deterministically selects one parent and annotates the task marker
with the number of parents, for example ``MyTask (2 parents)``. This indicates
that alternate execution paths exist but are not shown in this particular stack.


Option restrictions
-------------------

Async-aware mode uses a different stack reconstruction mechanism and is
incompatible with: :option:`--native`, :option:`--no-gc`, :option:`--all-threads`, and
:option:`--mode`\ ``=cpu`` or :option:`--mode`\ ``=gil``.


Command-line interface
======================

.. program:: profiling.sampling

The complete command-line interface for reference.


Global options
--------------

.. option:: run

   Run and profile a Python script or module.

.. option:: attach

   Attach to and profile a running process by PID.


Sampling options
----------------

.. option:: -i <microseconds>, --interval <microseconds>

   Sampling interval in microseconds. Default: 100.

.. option:: -d <seconds>, --duration <seconds>

   Profiling duration in seconds. Default: 10.

.. option:: -a, --all-threads

   Sample all threads, not just the main thread.

.. option:: --realtime-stats

   Display sampling statistics during profiling.

.. option:: --native

   Include ``<native>`` frames for non-Python code.

.. option:: --no-gc

   Exclude ``<GC>`` frames for garbage collection.

.. option:: --async-aware

   Enable async-aware profiling for asyncio programs.

.. option:: --opcodes

   Gather bytecode opcode information for instruction-level profiling. Shows
   which bytecode instructions are executing, including specializations.
   Compatible with ``--live``, ``--flamegraph``, ``--heatmap``, and ``--gecko``
   formats only.

.. option:: --subprocesses

   Also profile subprocesses. Each subprocess gets its own profiler
   instance and output file. Incompatible with ``--live``.


Mode options
------------

.. option:: --mode <mode>

   Sampling mode: ``wall`` (default), ``cpu``, ``gil``, or ``exception``.
   The ``cpu``, ``gil``, and ``exception`` modes are incompatible with
   ``--async-aware``.

.. option:: --async-mode <mode>

   Async profiling mode: ``running`` (default) or ``all``.
   Requires ``--async-aware``.


Output options
--------------

.. option:: --pstats

   Generate text statistics output. This is the default.

.. option:: --collapsed

   Generate collapsed stack format for external flame graph tools.

.. option:: --flamegraph

   Generate self-contained HTML flame graph.

.. option:: --gecko

   Generate Gecko JSON format for Firefox Profiler.

.. option:: --heatmap

   Generate HTML heatmap with line-level sample counts.

.. option:: -o <path>, --output <path>

   Output file or directory path. Default behavior varies by format:
   ``--pstats`` writes to stdout, ``--flamegraph`` and ``--gecko`` generate
   files like ``flamegraph.PID.html``, and ``--heatmap`` creates a directory
   named ``heatmap_PID``.


pstats display options
----------------------

These options apply only to pstats format output.

.. option:: --sort <key>

   Sort order: ``nsamples``, ``tottime``, ``cumtime``, ``sample-pct``,
   ``cumul-pct``, ``nsamples-cumul``, or ``name``. Default: ``nsamples``.

.. option:: -l <count>, --limit <count>

   Maximum number of entries to display. Default: 15.

.. option:: --no-summary

   Omit the Legend and Summary of Interesting Functions sections from output.


Run command options
-------------------

.. option:: -m, --module

   Treat the target as a module name rather than a script path.

.. option:: --live

   Start interactive terminal interface instead of batch profiling.


.. seealso::

   :mod:`profiling`
      Overview of Python profiling tools and guidance on choosing a profiler.

   :mod:`profiling.tracing`
      Deterministic tracing profiler for exact call counts and timing.

   :mod:`pstats`
      Statistics analysis for profile data.

   `Firefox Profiler <https://profiler.firefox.com>`__
      Web-based profiler that accepts Gecko format output. See the
      `documentation <https://profiler.firefox.com/docs/#/>`__ for usage details.

   `FlameGraph <https://github.com/brendangregg/FlameGraph>`__
      Tools for generating flame graphs from collapsed stack format.