| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This resolves #341.
|
| |
|
|
|
|
|
|
|
| |
Limit supported size and alignment to HUGE_MAXCLASS, which in turn is
now limited to be less than PTRDIFF_MAX.
This resolves #278 and #295.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor the arenas array, which contains pointers to all extant arenas,
such that it starts out as a sparse array of maximum size, and use
double-checked atomics-based reads as the basis for fast and simple
arena_get(). Additionally, reduce arenas_lock's role such that it only
protects against arena initalization races. These changes remove the
possibility for arena lookups to trigger locking, which resolves at
least one known (fork-related) deadlock.
This resolves #315.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix arena_size arena_new() computation to incorporate
runs_avail_nclasses elements for runs_avail, rather than
(runs_avail_nclasses - 1) elements. Since offsetof(arena_t, runs_avail)
is used rather than sizeof(arena_t) for the first term of the
computation, all of the runs_avail elements must be added into the
second term.
This bug was introduced (by Jason Evans) while merging pull request #330
as 3417a304ccde61ac1f68b436ec22c03f1d6824ec (Separate arena_avail
trees).
|
|
|
|
|
|
| |
Merge of 3417a304ccde61ac1f68b436ec22c03f1d6824ec looks like a small
bug: first_best_fit doesn't scan through all the classes, since ind is
offset from runs_avail_nclasses by run_avail_bias.
|
| |
|
|
|
|
| |
Use appropriate versions to resolve 64-to-32-bit data loss warnings.
|
|
|
|
|
| |
These tree types converged to become identical, yet they still had
independently generated red-black tree implementations.
|
|
|
|
|
|
|
|
|
|
|
| |
Separate run trees by index, replacing the previous quantize logic.
Quantization by index is now performed only on insertion / removal from
the tree, and not on node comparison, saving some cpu. This also means
we don't have to dereference the miscelm* pointers, saving half of the
memory loads from miscelms/mapbits that have fallen out of cache. A
linear scan of the indicies appears to be fast enough.
The only cost of this is an extra tree array in each arena.
|
|
|
|
|
| |
Reduce run quantization overhead by generating lookup tables during
bootstrapping, and using the tables for all subsequent run quantization.
|
|
|
|
|
|
|
|
|
|
|
| |
In practice this bug had limited impact (and then only by increasing
chunk fragmentation) because run_quantize_ceil() returned correct
results except for inputs that could only arise from aligned allocation
requests that required more than page alignment.
This bug existed in the original run quantization implementation, which
was introduced by 8a03cf039cd06f9fa6972711195055d865673966 (Implement
cache index randomization for large allocations.).
|
|
|
|
|
| |
Also rename run_quantize_*() to improve clarity. These tests
demonstrate that run_quantize_ceil() is flawed.
|
|
|
|
|
|
|
| |
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec. This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.
Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time
This resolves #325.
|
|
|
|
|
| |
Refactor out arena_compute_npurge() by integrating its logic into
arena_stash_dirty() as an incremental computation.
|
|
|
|
|
| |
Refactor early return logic in arena_ralloc_no_move() to return early on
failure rather than on success.
|
| |
|
|
|
|
|
| |
Remove 32-bit variant, convert prng64() to prng_lg_range(), and add
prng_range().
|
|
|
|
|
|
| |
- Combine multiple runtime branches into a single malloc_slow check.
- Avoid calling arena_choose / size2index / index2size on fast path.
- A few micro optimizations.
|
|
|
|
|
|
| |
Signed-off-by: Steve Dougherty <sdougherty@barracuda.com>
This resolves #281.
|
|
|
|
| |
This resolves #284.
|
|
|
|
|
|
|
|
| |
Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of
large allocations that have been randomly assigned an offset of 0 when
--enable-cache-oblivious configure option is enabled. This addresses a
special case missed in d260f442ce693de4351229027b37b3293fcbfd7d (Fix
xallocx(..., MALLOCX_ZERO) bugs.).
|
|
|
|
|
|
|
|
|
|
| |
Zero all trailing bytes of large allocations when
--enable-cache-oblivious configure option is enabled. This regression
was introduced by 8a03cf039cd06f9fa6972711195055d865673966 (Implement
cache index randomization for large allocations.).
Zero trailing bytes of huge allocations when resizing from/to a size
class that is not a multiple of the chunk size.
|
| |
|
| |
|
|
|
|
|
| |
arena_maxclass is no longer an appropriate name, because arenas also
manage huge allocations.
|
|
|
|
|
| |
Fix xallocx() bugs related to the 'extra' parameter when specified as
non-zero.
|
| |
|
|
|
|
| |
This resolves #256.
|
|
|
|
|
|
|
|
| |
Don't bitshift by negative amounts when encoding/decoding run sizes in
chunk header maps. This affected systems with page sizes greater than 8
KiB.
Reported by Ingvar Hagelund <ingvar@redpill-linpro.com>.
|
|
|
|
|
| |
Fix arena_run_split_large_helper() to treat newly committed memory as
zeroed.
|
|
|
|
|
|
| |
Only set the unzeroed flag when initializing the entire mapbits entry,
rather than mutating just the unzeroed bit. This simplifies the
possible mapbits state transitions.
|
|
|
|
|
| |
Decommit arena chunk header during chunk deallocation if the rest of the
chunk is decommitted.
|
|
|
|
|
|
|
|
|
| |
Cascade from decommit to purge when purging unused dirty pages, so that
it is possible to decommit cleaned memory rather than just purging. For
non-Windows debug builds, decommit runs rather than purging them, since
this causes access of deallocated runs to segfault.
This resolves #251.
|
|
|
|
|
|
|
|
| |
Fix arena_ralloc_large_grow() to properly account for large_pad, so that
in-place large reallocation succeeds when possible, rather than always
failing. This regression was introduced by
8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index
randomization for large allocations.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on
the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls. The chunk hooks
allow control over chunk allocation/deallocation, decommit/commit,
purging, and splitting/merging, such that the application can rely on
jemalloc's internal chunk caching and retaining functionality, yet
implement a variety of chunk management mechanisms and policies.
Merge the chunks_[sz]ad_{mmap,dss} red-black trees into
chunks_[sz]ad_retained. This slightly reduces how hard jemalloc tries
to honor the dss precedence setting; prior to this change the precedence
setting was also consulted when recycling chunks.
Fix chunk purging. Don't purge chunks in arena_purge_stashed(); instead
deallocate them in arena_unstash_purged(), so that the dirty memory
linkage remains valid until after the last time it is used.
This resolves #176 and #201.
|
|
|
|
|
| |
This change merely documents that arena_palloc_large() always receives
usize as its argument.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Create and use FMT* macros that are equivalent to the PRI* macros that
inttypes.h defines. This allows uniform use of the Unix-specific format
specifiers, e.g. "%zu", as well as avoiding Windows-specific definitions
of e.g. PRIu64.
Add ffs()/ffsl() support for compiling with gcc.
Extract compatibility definitions of ENOENT, EINVAL, EAGAIN, EPERM,
ENOMEM, and ENORANGE into include/msvc_compat/windows_extra.h and
use the file for tests as well as for core jemalloc code.
|
|
|
|
|
|
|
|
|
| |
This effectively reverts 97c04a93838c4001688fe31bf018972b4696efe2 (Use
first-fit rather than first-best-fit run/chunk allocation.). In some
pathological cases, first-fit search dominates allocation time, and it
also tends not to converge as readily on a steady state of memory
layout, since precise allocation order has a bigger effect than for
first-best-fit.
|
|
|
|
|
|
|
|
|
|
| |
Conditionally define ENOENT, EINVAL, etc. (was unconditional).
Add/use PRIzu, PRIzd, and PRIzx for use in malloc_printf() calls. gcc issued
(harmless) warnings since e.g. "%zu" should be "%Iu" on Windows, and the
alternative to this workaround would have been to disable the function
attributes which cause gcc to look for type mismatches in formatted printing
function calls.
|
| |
|
|
|
|
| |
This resolves #235.
|
|
|
|
|
|
| |
Pass large allocation requests to arena_malloc() when possible. This
regression was introduced by 155bfa7da18cab0d21d87aa2dce4554166836f5d
(Normalize size classes.).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extract szad size quantization into {extent,run}_quantize(), and .
quantize szad run sizes to the union of valid small region run sizes and
large run sizes.
Refactor iteration in arena_run_first_fit() to use
run_quantize{,_first,_next(), and add support for padded large runs.
For large allocations that have no specified alignment constraints,
compute a pseudo-random offset from the beginning of the first backing
page that is a multiple of the cache line size. Under typical
configurations with 4-KiB pages and 64-byte cache lines this results in
a uniform distribution among 64 page boundary offsets.
Add the --disable-cache-oblivious option, primarily intended for
performance testing.
This resolves #13.
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the shrinking case of huge_ralloc_no_move_similar() to purge the
correct number of pages, at the correct offset. This regression was
introduced by 8d6a3e8321a7767cb2ca0930b85d5d488a8cc659 (Implement
dynamic per arena control over dirty page purging.).
Fix huge_ralloc_no_move_shrink() to purge the correct number of pages.
This bug was introduced by 9673983443a0782d975fbcb5d8457cfd411b8b56
(Purge/zero sub-chunk huge allocations as necessary.).
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add mallctls:
- arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be
modified to change the initial lg_dirty_mult setting for newly created
arenas.
- arena.<i>.lg_dirty_mult controls an individual arena's dirty page
purging threshold, and synchronously triggers any purging that may be
necessary to maintain the constraint.
- arena.<i>.chunk.purge allows the per arena dirty page purging function
to be replaced.
This resolves #93.
|
| |
|