gh-90536: Add support for the BOLT post-link binary optimizer (gh-95908)

* Add support for the BOLT post-link binary optimizer Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt) provides a fairly large speedup without any code or functionality changes. It provides roughly a 1% speedup on pyperformance, and a 4% improvement on the Pyston web macrobenchmarks. It is gated behind an `--enable-bolt` configure arg because not all toolchains and environments are supported. It has been tested on a Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6 sources (their binary distribution of this version did not include bolt). Compared to [a previous attempt](https://github.com/faster-cpython/ideas/issues/224), this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE flags which enable much better optimizations from bolt. The effects of this change are a bit more dependent on CPU microarchitecture than other changes, since it optimizes i-cache behavior which seems to be a bit more variable between architectures. The 1%/4% numbers were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance I got a slightly lower speedup (1%/3%). The low speedup on pyperformance is not entirely unexpected, because BOLT improves i-cache behavior, and the benchmarks in the pyperformance suite are small and tend to fit in i-cache. This change uses the existing pgo profiling task (`python -m test --pgo`), though I was able to measure about a 1% macrobenchmark improvement by using the macrobenchmarks as the training task. I personally think that both the PGO and BOLT tasks should be updated to use macrobenchmarks, but for the sake of splitting up the work this PR uses the existing pgo task. * Simplify the build flags * Add a NEWS entry * Update Makefile.pre.in Co-authored-by: Dong-hee Na <donghee.na92@gmail.com> * Update configure.ac Co-authored-by: Dong-hee Na <donghee.na92@gmail.com> * Add myself to ACKS * Add docs * Other review comments * fix tab/space issue * Make it more clear that --enable-bolt is experimental * Add link to bolt's github page Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
author: Kevin Modzelewski <kmod@users.noreply.github.com> 2022-08-18 21:33:54 (GMT)
committer: GitHub <noreply@github.com> 2022-08-18 21:33:54 (GMT)
commit: 214eb2cce5caa99f476ae8abd406077e2c293a3c (patch)
tree: b3b6dc69e4b8ccc67bd0fdef9a7a1804a4ddee6f /Misc
parent: 22a95cb5114891e87f6933482dc6eaa00e6a11ad (diff)
download: cpython-214eb2cce5caa99f476ae8abd406077e2c293a3c.zip
cpython-214eb2cce5caa99f476ae8abd406077e2c293a3c.tar.gz
cpython-214eb2cce5caa99f476ae8abd406077e2c293a3c.tar.bz2
2 files changed, 3 insertions, 0 deletions
diff --git a/Misc/ACKS b/Misc/ACKS
index c1f570a..16a482e 100644
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -1212,6 +1212,7 @@ Gideon Mitchell
 Tim Mitchell
 Zubin Mithra
 Florian Mladitsch
+Kevin Modzelewski
 Doug Moen
 Jakub Molinski
 Juliette Monsel
diff --git a/Misc/NEWS.d/next/Build/2022-08-12-13-06-03.gh-issue-90536.qMpF6p.rst b/Misc/NEWS.d/next/Build/2022-08-12-13-06-03.gh-issue-90536.qMpF6p.rst
new file mode 100644
index 0000000..4605e03
--- /dev/null
+++ b/Misc/NEWS.d/next/Build/2022-08-12-13-06-03.gh-issue-90536.qMpF6p.rst
@@ -0,0 +1,2 @@
+Use the BOLT post-link optimizer to improve performance, particularly on
+medium-to-large applications.
author	Kevin Modzelewski <kmod@users.noreply.github.com>	2022-08-18 21:33:54 (GMT)
committer	GitHub <noreply@github.com>	2022-08-18 21:33:54 (GMT)
commit	214eb2cce5caa99f476ae8abd406077e2c293a3c (patch)
tree	b3b6dc69e4b8ccc67bd0fdef9a7a1804a4ddee6f /Misc
parent	22a95cb5114891e87f6933482dc6eaa00e6a11ad (diff)
download	cpython-214eb2cce5caa99f476ae8abd406077e2c293a3c.zip cpython-214eb2cce5caa99f476ae8abd406077e2c293a3c.tar.gz cpython-214eb2cce5caa99f476ae8abd406077e2c293a3c.tar.bz2