diff options
author | Florian Schmaus <flo@geekplace.eu> | 2021-04-05 13:38:47 (GMT) |
---|---|---|
committer | Florian Schmaus <flo@geekplace.eu> | 2023-11-23 09:01:55 (GMT) |
commit | 8a2575e432b85baecb0054cc570db69f074c2633 (patch) | |
tree | a2e629fa7b7cdf50b284324de42e0f9cb405bc71 /src/graphviz.h | |
parent | 82a7cb3f263a6af7cc4c24408cf0a4ca5b648b55 (diff) | |
download | Ninja-8a2575e432b85baecb0054cc570db69f074c2633.zip Ninja-8a2575e432b85baecb0054cc570db69f074c2633.tar.gz Ninja-8a2575e432b85baecb0054cc570db69f074c2633.tar.bz2 |
Consider the remaining load capacity in main loop
This changes CanRunMore() to return an int instead of a bool. The
return value is the "remaining load capacity. That is the number of
new jobs that can be spawned without saturating a potential enabled
load limitation (if ninja's -l option is used). We assume that every
started edge increases the load by one. Hence the available "load
capacity" is the maximum allowed load minus the current load.
Previously, ninja would oversaturate the system with jobs, even though
a load and job limit was provided, when multiple ninja builds are
running. This is because changes in load average are inert, newly
started dobs to no immediatly change the load average, yet ninja
assumed that new jobs are immediately reflected in the load
average. Ninja would retrieve the current 1min load average, check if
it is below the limit and, if so, start a new job, and then
repeat. Since it takes a while for the new job to get reflected in the
load average, ninja would often spawn jobs until the job limit ("-j")
is reached. If this is done by multiple parallel ninja builds, then
the system becomes oversaturated, causing excessing context switches,
which eventually slow down each and every build process.
We can easily prevent this by considering the remaining load capacity
in ninja's main loop.
The following benchmark demonstrates how the change of this comit
helps to speed up multiple parallel builds on the same host. We
compare the total build times of 8 parallel builds of LLVM on a
256-core system using "ninja -j 258".
ninja-master: 1351 seconds
ninja-load-capacity: 920 seconds
That is, with this commit, the whole process becomes 1.46× faster.
The used benchmark script created and prepared 8 build directories,
records the start time, spawns 8 subshells invoking "ninja -j 258",
awaits the termination of those subshells, and records the end
time. Besides the total running time, it also outputs /proc/loadavg,
provides an indication of where the performance is gained:
ninja-master: 3.90 93.94 146.38 1/1936 209125
ninja-load-capacity: 92.46 210.50 199.90 1/1936 36917
So with this change, ninja uses the available hardware cores better in
the presence of competing ninja processes, while it does not overload
the system.
Finally, let us look at the two "dstat -cdgyl 60" traces of 8
parallel LLVM builds on a 256-core machine using "ninja -l 258":
ninja-master
--total-cpu-usage-- -dsk/total- ---paging-- ---system-- ---load-avg---
usr sys idl wai stl| read writ| in out | int csw | 1m 5m 15m
1 0 99 0 0| 12k 4759k| 5B 55B|1135 455 |17.9 70.3 38.1
38 6 56 0 0|2458B 7988k| 205B 0 | 34k 23k| 466 170 73.2
26 3 71 0 0| 102k 94M| 0 0 | 22k 6265 | 239 156 74.3
50 5 45 0 0|3149B 97M| 0 0 | 37k 12k| 257 191 92.2
58 6 36 0 0| 90k 71M| 0 0 | 43k 12k| 320 224 110
50 4 46 0 0| 52k 78M| 0 0 | 38k 6690 | 247 223 117
50 5 45 0 0| 202k 90M| 0 0 | 37k 9876 | 239 238 130
60 5 34 0 0| 109k 93M| 0 0 | 44k 8950 | 247 248 140
69 5 26 0 0|5939B 93M| 0 0 | 50k 11k| 309 268 154
49 4 47 0 0| 172k 111M| 0 0 | 36k 7835 | 283 267 161
58 7 35 0 0| 29k 142M| 0 0 | 45k 7666 | 261 267 168
72 4 24 0 0| 46k 281M| 0 0 | 50k 13k| 384 296 183
49 6 46 0 0| 68B 198M| 0 0 | 37k 6847 | 281 281 185
82 6 12 0 0| 0 97M| 0 0 | 59k 15k| 462 323 205
31 5 63 0 0| 0 301M| 0 0 | 26k 5350 | 251 291 202
66 7 28 0 0| 68B 254M| 0 0 | 49k 9091 | 270 292 208
68 8 25 0 0| 0 230M| 0 0 | 51k 8186 | 287 292 213
52 5 42 1 0| 0 407M| 0 0 | 42k 5619 | 207 271 211
29 7 64 0 0| 0 418M| 0 0 | 27k 2801 | 131 241 205
1 1 98 0 0| 137B 267M| 0 0 |1944 813 |55.8 199 193
0 0 100 0 0|2253B 43M| 0 0 | 582 365 |26.8 165 181
0 0 99 0 0| 0 68M| 0 0 | 706 414 |11.5 136 170
4 0 96 0 0| 0 13M| 0 0 |2892 378 |10.0 113 160
ninja-load-capacity
--total-cpu-usage-- -dsk/total- ---paging-- ---system-- ---load-avg---
usr sys idl wai stl| read writ| in out | int csw | 1m 5m 15m
1 0 98 0 0| 12k 5079k| 5B 55B|1201 470 |1.35 40.2 115
43 6 51 0 0|3345B 78M| 0 0 | 34k 20k| 247 127 142
71 6 23 0 0| 0 59M| 0 0 | 53k 8485 | 286 159 152
60 5 35 0 0| 68B 118M| 0 0 | 45k 7125 | 277 178 158
62 4 35 0 0| 0 115M| 0 0 | 45k 6036 | 248 188 163
61 5 34 0 0| 0 96M| 0 0 | 44k 9448 | 284 212 173
66 5 28 0 0| 9B 94M| 0 0 | 49k 5733 | 266 219 178
64 7 29 0 0| 0 159M| 0 0 | 49k 6350 | 241 223 182
66 6 28 0 0| 0 240M| 0 0 | 50k 9325 | 285 241 191
68 4 27 0 0| 0 204M| 0 0 | 49k 5550 | 262 241 194
68 8 24 0 0| 0 161M| 0 0 | 53k 6368 | 255 244 198
79 7 14 0 0| 0 325M| 0 0 | 59k 5910 | 264 249 202
72 6 22 0 0| 0 367M| 0 0 | 54k 6684 | 253 249 205
71 6 22 1 0| 0 377M| 0 0 | 52k 8175 | 284 257 211
48 8 44 0 0| 0 417M| 0 0 | 40k 5878 | 223 247 210
23 4 73 0 0| 0 238M| 0 0 | 22k 1644 | 114 214 201
0 0 100 0 0| 0 264M| 0 0 |1016 813 |43.3 175 189
0 0 100 0 0| 0 95M| 0 0 | 670 480 |17.1 144 177
As one can see in the above dstat traces, ninja-master will have a
high 1min load average, of up to 462. This is because ninja will not
considered the remaining load capacity when spawning new jobs, but
instead spawn as new jobs until it runs into the -j limitation. This,
in turn, causes an increase of context switches: the rows with a high
1min load average also have >10k context switches (csw). Whereas a
remaining load-capacity aware ninja avoids oversaturing the system
with excessive additional jobs.
Note that since the load average is an exponentially damped moving
sum, build systems that take the load average into consideration to
limit the load average to the number of available processors will
always (slightly) overprovision the system with tasks. Eventually,
this change decreases the aggressiveness ninja schedules new jobs if
the '-l' knob is used, and by that, the level of overprovisioning, to
a reasonable level compared to the status quo. It should be mentioned
that this means that an individual build using '-l' will now be
potentially a bit slower. However, this can easily be fixed by
increase the value provided to the '-l' argument.
The benchmarks where performed using the following script:
set -euo pipefail
VANILLA_NINJA=~/code/ninja-master/build/ninja
LOAD_CAPACITY_AWARE_NINJA=~/code/ninja-load-capacity/build/ninja
CMAKE_NINJA_PROJECT_SOURCE=~/code/llvm-project/llvm
declare -ir PARALLEL_BUILDS=8
readonly TMP_DIR=$(mktemp --directory --tmpdir=/var/tmp)
cleanup() {
rm -rf "${TMP_DIR}"
}
trap cleanup EXIT
BUILD_DIRS=()
echo "Preparing build directories"
for i in $(seq 1 ${PARALLEL_BUILDS}); do
BUILD_DIR="${TMP_DIR}/${i}"
mkdir "${BUILD_DIR}"
(
cd "${BUILD_DIR}"
cmake -G Ninja "${CMAKE_NINJA_PROJECT_SOURCE}" \
&> "${BUILD_DIR}/build.log"
)&
BUILD_DIRS+=("${BUILD_DIR}")
done
wait
NPROC=$(nproc)
MAX_LOAD=$(echo "${NPROC} + 2" | bc )
SLEEP_SECONDS=300
NINJA_BINS=(
"${VANILLA_NINJA}"
"${LOAD_CAPACITY_AWARE_NINJA}"
)
LAST_NINJA_BIN="${LOAD_CAPACITY_AWARE_NINJA}"
for NINJA_BIN in "${NINJA_BINS[@]}"; do
echo "Cleaning build dirs"
for BUILD_DIR in "${BUILD_DIRS[@]}"; do
(
"${NINJA_BIN}" -C "${BUILD_DIR}" clean &> "${BUILD_DIR}/build.log"
)&
done
wait
echo "Starting ${PARALLEL_BUILDS} parallel builds with ${NINJA_BIN} using -j ${MAX_LOAD}"
START=$(date +%s)
for BUILD_DIR in "${BUILD_DIRS[@]}"; do
(
"${NINJA_BIN}" -C "${BUILD_DIR}" -l "${MAX_LOAD}" &> "${BUILD_DIR}/build.log"
)&
done
wait
STOP=$(date +%s)
DELTA_SECONDS=$((STOP - START))
echo "Using ${NINJA_BIN} to perform ${PARALLEL_BUILDS} of ${CMAKE_NINJA_PROJECT_SOURCE}"
echo "took ${DELTA_SECONDS} seconds on this ${NPROC} core system using -j ${MAX_LOAD}"
echo "/proc/loadavg:"
cat /proc/loadavg
echo "ninja --version:"
"${NINJA_BIN}" --version
if [[ "${NINJA_BIN}" != "${LAST_NINJA_BIN}" ]]; then
echo "Sleeping ${SLEEP_SECONDS} seconds to bring system into quiescent state"
sleep ${SLEEP_SECONDS}
fi
done
Diffstat (limited to 'src/graphviz.h')
0 files changed, 0 insertions, 0 deletions