| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initial implementation of a twopass pairing heap with aux list.
Research papers linked in comments.
Where search/nsearch/last aren't needed, this gives much faster first(),
delete(), and insert(). Insert is O(1), and first/delete don't have to
walk the whole tree.
Also tested rb_old with parent pointers - it was better than the current
rb.h for memory loads, but still much worse than a pairing heap.
An array-based heap would be much faster if everything fits in memory,
but on a cold cache it has many more memory loads for most operations.
|
| |
|
|
|
|
|
| |
Add a cast to avoid comparing a ssize_t value to a uint64_t value that
is always larger than a 32-bit ssize_t. This silences an innocuous
compiler warning from e.g. gcc 4.2.1 about the comparison always having
the same result.
|
| | |
|
| |
|
|
|
|
|
|
| |
Add missing stats.arenas.<i>.{dss,lg_dirty_mult,decay_time}
initialization.
Fix stats.arenas.<i>.{pactive,pdirty} to read under the protection of
the arena mutex.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Fix stats.cactive accounting to always increase/decrease by multiples of
the chunk size, even for huge size classes that are not multiples of the
chunk size, e.g. {2.5, 3, 3.5, 5, 7} MiB with 2 MiB chunk size. This
regression was introduced by 155bfa7da18cab0d21d87aa2dce4554166836f5d
(Normalize size classes.) and first released in 4.0.0.
This resolves #336.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
For small bitmaps, a linear scan of the bitmap is slightly faster than
a tree search - bitmap_t is more compact, and there are fewer writes
since we don't have to propogate state transitions up the tree.
On x86_64 with the current settings, I'm seeing ~.5%-1% CPU improvement
in production canaries with this change.
The old tree code is left since 32bit sizes are much larger (and ffsl
smaller), and maybe the run sizes will change in the future.
This resolves #339.
|
| | |
|
| |
|
|
| |
This resolves #341.
|
| |
|
|
|
|
|
| |
Limit supported size and alignment to HUGE_MAXCLASS, which in turn is
now limited to be less than PTRDIFF_MAX.
This resolves #278 and #295.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Refactor the arenas array, which contains pointers to all extant arenas,
such that it starts out as a sparse array of maximum size, and use
double-checked atomics-based reads as the basis for fast and simple
arena_get(). Additionally, reduce arenas_lock's role such that it only
protects against arena initalization races. These changes remove the
possibility for arena lookups to trigger locking, which resolves at
least one known (fork-related) deadlock.
This resolves #315.
|
| |
|
|
|
|
|
|
| |
Attempt mmap-based in-place huge reallocation by plumbing new_addr into
chunk_alloc_mmap(). This can dramatically speed up incremental huge
reallocation.
This resolves #335.
|
| |
|
|
|
| |
This regression was caused by 9f4ee6034c3ac6a8c8b5f9a0d76822fb2fd90c41
(Refactor jemalloc_ffs*() into ffs_*().).
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
Use appropriate versions to resolve 64-to-32-bit data loss warnings.
|
| |
|
|
| |
This resolves #333.
|
| |
|
|
|
| |
These tree types converged to become identical, yet they still had
independently generated red-black tree implementations.
|
| |
|
|
|
|
|
|
|
|
|
| |
Separate run trees by index, replacing the previous quantize logic.
Quantization by index is now performed only on insertion / removal from
the tree, and not on node comparison, saving some cpu. This also means
we don't have to dereference the miscelm* pointers, saving half of the
memory loads from miscelms/mapbits that have fallen out of cache. A
linear scan of the indicies appears to be fast enough.
The only cost of this is an extra tree array in each arena.
|
| |
|
|
|
| |
Since this is an intrusive tree, rbt_nil is the whole size of the node
and can be quite large. For example, miscelm is ~100 bytes.
|
| |
|
|
|
| |
Reduce run quantization overhead by generating lookup tables during
bootstrapping, and using the tables for all subsequent run quantization.
|
| |
|
|
|
| |
Also rename run_quantize_*() to improve clarity. These tests
demonstrate that run_quantize_ceil() is flawed.
|
| | |
|
| |
|
|
|
|
|
| |
Use a single uint64_t in nstime_t to store nanoseconds rather than using
struct timespec. This reduces fragility around conversions between long
and uint64_t, especially missing casts that only cause problems on
32-bit platforms.
|
| | |
|
| |
|
|
| |
struct timespec is already defined by the system (at least on MinGW).
|
| |
|
|
|
| |
Add jemalloc_ffs64() and use it instead of jemalloc_ffsl() in
prng_range(), since long is not guaranteed to be a 64-bit type.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
Reported by Christopher Ferris <cferris@google.com>.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an alternative to the existing ratio-based unused dirty page
purging, and is intended to eventually become the sole purging
mechanism.
Add mallctls:
- opt.purge
- opt.decay_time
- arena.<i>.decay
- arena.<i>.decay_time
- arenas.decay_time
- stats.arenas.<i>.decay_time
This resolves #325.
|
| |
|
|
|
|
| |
Check in a generated smootherstep table as smoothstep.h rather than
generating it at configure time, since not all systems (e.g. Windows)
have dc.
|
| |
|
|
|
| |
Refactor arenas_cache tsd into arenas_tdata, which is a structure of
type arena_tdata_t.
|
| | |
|
| |
|
|
|
| |
Remove 32-bit variant, convert prng64() to prng_lg_range(), and add
prng_range().
|
| | |
|
| |
|
|
|
| |
Implement ticker, which provides a simple API for ticking off some
number of events before indicating that the ticker has hit its limit.
|
| | |
|
| | |
|
| |
|
|
|
| |
Add --with-malloc-conf, which makes it possible to embed a default
options string during configuration.
|
| | |
|
| |
|
|
| |
This resolves #294.
|
| | |
|