| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
For small bitmaps, a linear scan of the bitmap is slightly faster than
a tree search - bitmap_t is more compact, and there are fewer writes
since we don't have to propogate state transitions up the tree.
On x86_64 with the current settings, I'm seeing ~.5%-1% CPU improvement
in production canaries with this change.
The old tree code is left since 32bit sizes are much larger (and ffsl
smaller), and maybe the run sizes will change in the future.
This resolves #339.
|
| |
|
| |
|
| |
|
|
|
|
| |
This resolves #341.
|
| |
|
|
|
|
|
|
|
| |
Add HUGE_MAXCLASS overflow checks that are specific to heap profiling
code paths. This fixes test failures that were introduced by
0c516a00c4cb28cff55ce0995f756b5aae074c9e (Make *allocx() size class
overflow behavior defined.).
|
|
|
|
|
|
| |
This fixes compilation warnings regarding integer overflow that were
introduced by 0c516a00c4cb28cff55ce0995f756b5aae074c9e (Make *allocx()
size class overflow behavior defined.).
|
|
|
|
|
|
|
| |
Limit supported size and alignment to HUGE_MAXCLASS, which in turn is
now limited to be less than PTRDIFF_MAX.
This resolves #278 and #295.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor the arenas array, which contains pointers to all extant arenas,
such that it starts out as a sparse array of maximum size, and use
double-checked atomics-based reads as the basis for fast and simple
arena_get(). Additionally, reduce arenas_lock's role such that it only
protects against arena initalization races. These changes remove the
possibility for arena lookups to trigger locking, which resolves at
least one known (fork-related) deadlock.
This resolves #315.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix arena_size arena_new() computation to incorporate
runs_avail_nclasses elements for runs_avail, rather than
(runs_avail_nclasses - 1) elements. Since offsetof(arena_t, runs_avail)
is used rather than sizeof(arena_t) for the first term of the
computation, all of the runs_avail elements must be added into the
second term.
This bug was introduced (by Jason Evans) while merging pull request #330
as 3417a304ccde61ac1f68b436ec22c03f1d6824ec (Separate arena_avail
trees).
|
|
|
|
|
|
| |
Merge of 3417a304ccde61ac1f68b436ec22c03f1d6824ec looks like a small
bug: first_best_fit doesn't scan through all the classes, since ind is
offset from runs_avail_nclasses by run_avail_bias.
|
|
|
|
|
|
|
|
| |
Attempt mmap-based in-place huge reallocation by plumbing new_addr into
chunk_alloc_mmap(). This can dramatically speed up incremental huge
reallocation.
This resolves #335.
|
|
|
|
| |
This resolves #258.
|
|
|
|
| |
This resolves #323.
|
|
|
|
|
| |
This regression was caused by 9f4ee6034c3ac6a8c8b5f9a0d76822fb2fd90c41
(Refactor jemalloc_ffs*() into ffs_*().).
|
| |
|
|
|
|
|
| |
This will prevent accidental creation of potential integer truncation
bugs when developing on LP64 systems.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Use appropriate versions to resolve 64-to-32-bit data loss warnings.
|
|
|
|
| |
This resolves #333.
|
|
|
|
|
| |
These tree types converged to become identical, yet they still had
independently generated red-black tree implementations.
|
|
|
|
|
|
|
|
|
|
|
| |
Separate run trees by index, replacing the previous quantize logic.
Quantization by index is now performed only on insertion / removal from
the tree, and not on node comparison, saving some cpu. This also means
we don't have to dereference the miscelm* pointers, saving half of the
memory loads from miscelms/mapbits that have fallen out of cache. A
linear scan of the indicies appears to be fast enough.
The only cost of this is an extra tree array in each arena.
|
|
|
|
|
| |
Since this is an intrusive tree, rbt_nil is the whole size of the node
and can be quite large. For example, miscelm is ~100 bytes.
|
|
|
|
|
| |
Reduce run quantization overhead by generating lookup tables during
bootstrapping, and using the tables for all subsequent run quantization.
|
|
|
|
|
|
|
|
|
|
|
| |
In practice this bug had limited impact (and then only by increasing
chunk fragmentation) because run_quantize_ceil() returned correct
results except for inputs that could only arise from aligned allocation
requests that required more than page alignment.
This bug existed in the original run quantization implementation, which
was introduced by 8a03cf039cd06f9fa6972711195055d865673966 (Implement
cache index randomization for large allocations.).
|
|
|
|
|
| |
Also rename run_quantize_*() to improve clarity. These tests
demonstrate that run_quantize_ceil() is flawed.
|
| |
|
|
|
|
|
|
|
| |
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec. This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
|
| |
|
| |
|
|
|
|
| |
struct timespec is already defined by the system (at least on MinGW).
|
|
|
|
|
| |
Add jemalloc_ffs64() and use it instead of jemalloc_ffsl() in
prng_range(), since long is not guaranteed to be a 64-bit type.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
in Cygwin x64
|
| |
|
| |
|
|
|
|
| |
Reported by Christopher Ferris <cferris@google.com>.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.
Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time
This resolves #325.
|