diff options
author | Kevin Modzelewski <kmod@users.noreply.github.com> | 2022-08-18 21:33:54 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-08-18 21:33:54 (GMT) |
commit | 214eb2cce5caa99f476ae8abd406077e2c293a3c (patch) | |
tree | b3b6dc69e4b8ccc67bd0fdef9a7a1804a4ddee6f /configure.ac | |
parent | 22a95cb5114891e87f6933482dc6eaa00e6a11ad (diff) | |
download | cpython-214eb2cce5caa99f476ae8abd406077e2c293a3c.zip cpython-214eb2cce5caa99f476ae8abd406077e2c293a3c.tar.gz cpython-214eb2cce5caa99f476ae8abd406077e2c293a3c.tar.bz2 |
gh-90536: Add support for the BOLT post-link binary optimizer (gh-95908)
* Add support for the BOLT post-link binary optimizer
Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt)
provides a fairly large speedup without any code or functionality
changes. It provides roughly a 1% speedup on pyperformance, and a
4% improvement on the Pyston web macrobenchmarks.
It is gated behind an `--enable-bolt` configure arg because not all
toolchains and environments are supported. It has been tested on a
Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6
sources (their binary distribution of this version did not include bolt).
Compared to [a previous attempt](https://github.com/faster-cpython/ideas/issues/224),
this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE
flags which enable much better optimizations from bolt.
The effects of this change are a bit more dependent on CPU microarchitecture
than other changes, since it optimizes i-cache behavior which seems
to be a bit more variable between architectures. The 1%/4% numbers
were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I
got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance
I got a slightly lower speedup (1%/3%).
The low speedup on pyperformance is not entirely unexpected, because
BOLT improves i-cache behavior, and the benchmarks in the pyperformance
suite are small and tend to fit in i-cache.
This change uses the existing pgo profiling task (`python -m test --pgo`),
though I was able to measure about a 1% macrobenchmark improvement by
using the macrobenchmarks as the training task. I personally think that
both the PGO and BOLT tasks should be updated to use macrobenchmarks,
but for the sake of splitting up the work this PR uses the existing pgo task.
* Simplify the build flags
* Add a NEWS entry
* Update Makefile.pre.in
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Update configure.ac
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Add myself to ACKS
* Add docs
* Other review comments
* fix tab/space issue
* Make it more clear that --enable-bolt is experimental
* Add link to bolt's github page
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
Diffstat (limited to 'configure.ac')
-rw-r--r-- | configure.ac | 53 |
1 files changed, 53 insertions, 0 deletions
diff --git a/configure.ac b/configure.ac index 85d9e80..bab405e 100644 --- a/configure.ac +++ b/configure.ac @@ -1917,6 +1917,59 @@ if test "$Py_LTO" = 'true' ; then LDFLAGS_NODIST="$LDFLAGS_NODIST $LTOFLAGS" fi +# Enable bolt flags +Py_BOLT='false' +AC_MSG_CHECKING(for --enable-bolt) +AC_ARG_ENABLE(bolt, AS_HELP_STRING( + [--enable-bolt], + [enable usage of the llvm-bolt post-link optimizer (default is no)]), +[ +if test "$enableval" != no +then + Py_BOLT='true' + AC_MSG_RESULT(yes); +else + Py_BOLT='false' + AC_MSG_RESULT(no); +fi], +[AC_MSG_RESULT(no)]) + +AC_SUBST(PREBOLT_RULE) +if test "$Py_BOLT" = 'true' ; then + PREBOLT_RULE="${DEF_MAKE_ALL_RULE}" + DEF_MAKE_ALL_RULE="bolt-opt" + DEF_MAKE_RULE="build_all" + + # These flags are required for bolt to work: + CFLAGS_NODIST="$CFLAGS_NODIST -fno-reorder-blocks-and-partition" + LDFLAGS_NODIST="$LDFLAGS_NODIST -Wl,--emit-relocs" + + # These flags are required to get good performance from bolt: + CFLAGS_NODIST="$CFLAGS_NODIST -fno-pie" + # We want to add these no-pie flags to linking executables but not shared libraries: + LINKCC="$LINKCC -fno-pie -no-pie" + # Designate the DWARF version into 4 since the LLVM-BOLT does not support DWARF5 yet. + CFLAGS="$CFLAGS -gdwarf-4" + LDFLAGS="$LDFLAGS -gdwarf-4" + AC_SUBST(LLVM_BOLT) + AC_PATH_TOOL(LLVM_BOLT, llvm-bolt, '', ${llvm_path}) + if test -n "${LLVM_BOLT}" -a -x "${LLVM_BOLT}" + then + AC_MSG_RESULT("Found llvm-bolt") + else + AC_MSG_ERROR([llvm-bolt is required for a --enable-bolt build but could not be found.]) + fi + + AC_SUBST(MERGE_FDATA) + AC_PATH_TOOL(MERGE_FDATA, merge-fdata, '', ${llvm_path}) + if test -n "${MERGE_FDATA}" -a -x "${MERGE_FDATA}" + then + AC_MSG_RESULT("Found merge-fdata") + else + AC_MSG_ERROR([merge-fdata is required for a --enable-bolt build but could not be found.]) + fi +fi + # Enable PGO flags. AC_SUBST(PGO_PROF_GEN_FLAG) AC_SUBST(PGO_PROF_USE_FLAG) |