| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Rather than purging uncoalesced extents, perform just enough incremental
coalescing to purge only fully coalesced extents. In the absence of
cached extent reuse, the immediate versus delayed incremental purging
algorithms result in the same purge order.
This resolves #655.
|
|
|
|
|
|
|
|
| |
Fix the test_decay_ticker test to carefully control slab
creation/destruction such that the decay backlog reliably reaches zero.
Use an isolated arena so that no extraneous allocation can confuse the
situation. Speed up time during the latter part of the test so that the
entire decay time can expire in a reasonable amount of wall time.
|
|
|
|
| |
strategy
|
| |
|
| |
|
|
|
|
|
|
|
| |
In the C11 atomics backport, we couldn't use not_reached() in
atomic_enum_to_builtin (in atomic_gcc_atomic.h), since atomic.h was hermetic and
assert.h wasn't; there was a dependency issue. assert.h is hermetic now, so we
can include it.
|
|
|
|
|
|
|
|
|
| |
This is the first header refactoring diff, #533. It splits the assert and util
components into separate, hermetic, header files. In the process, it splits out
two of the large sub-components of util (the stdio.h replacement, and bit
manipulation routines) into their own components (malloc_io.h and bit_util.h).
This is mostly to break up cyclic dependencies, but it also breaks off a good
chunk of the catch-all-ness of util, which is nice.
|
|
|
|
|
|
|
|
|
|
| |
Convert the nrequests field to be partially derived, and the curlextents
to be fully derived, in order to reduce the number of stats updates
needed during common operations.
This change affects ndalloc stats during arena reset, because it is no
longer possible to cancel out ndalloc effects (curlextents would become
negative).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces a backport of C11 atomics. It has four implementations; ranked
in order of preference, they are:
- GCC/Clang __atomic builtins
- GCC/Clang __sync builtins
- MSVC _Interlocked builtins
- C11 atomics, from <stdatomic.h>
The primary advantages are:
- Close adherence to the standard API gives us a defined memory model.
- Type safety: atomic objects are now separate types from non-atomic ones, so
that it's impossible to mix up atomic and non-atomic updates (which is
undefined behavior that compilers are starting to take advantage of).
- Efficiency: we can specify ordering for operations, avoiding fences and
atomic operations on strongly ordered architectures (example:
`atomic_write_u32(ptr, val);` involves a CAS loop, whereas
`atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store.
This diff leaves in the current atomics API (implementing them in terms of the
backport). This lets us transition uses over piecemeal.
Testing:
This is by nature hard to test. I've manually tested the first three options on
Linux on gcc by futzing with the #defines manually, on freebsd with gcc and
clang, on MSVC, and on OS X with clang. All of these were x86 machines though,
and we don't have any test infrastructure set up for non-x86 platforms.
|
|
|
|
|
| |
In the long term, we'll transition to C99-style inline semantics. In the
short-term, this will allow both styles to coexist without breaking one another.
|
|
|
|
|
|
|
|
| |
This fixes a regression caused by
54269dc0ed3e4d04b2539016431de3cfe8330719 (Remove obsolete
arena_maybe_purge() call.), as well as providing a general fix.
This resolves #665.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This avoids signed/unsigned comparison warnings when specifying integer
constants as inputs.
|
| |
|
|
|
|
|
| |
This disables run_tests.sh configurations that use the combination of
32-bit clang and heap profiling.
|
|
|
|
|
|
| |
This regression was introduced by
194d6f9de8ff92841b67f38a2a6a06818e3240dd (Restructure *CFLAGS/*CXXFLAGS
configuration.).
|
|
|
|
|
|
| |
This fixes a regression introduced by
d433471f581ca50583c7a99f9802f7388f81aa36 (Derive
{allocated,nmalloc,ndalloc,nrequests}_large stats.).
|
|
|
|
|
|
|
| |
Remove obsolete unit test scaffolding for extent quantization. Remove
redundant assertions. Add an assertion to
extents_first_best_fit_locked() that should help prevent aligned
allocation regressions.
|
| |
|
|
|
|
|
|
| |
We don't touch witness at all when config_debug == false. Let's only pay the
memory cost in malloc_mutex_s when needed. Note that when !config_debug, we keep
the field in a union so that we don't have to do #ifdefs in multiple places.
|
|
|
|
|
|
|
|
|
| |
malloc_conf does not reliably work with MSVC, which complains of
"inconsistent dll linkage", i.e. its inability to support the
application overriding malloc_conf when dynamically linking/loading.
Work around this limitation by adding test harness support for per test
shell script sourcing, and converting all tests to use MALLOC_CONF
instead of malloc_conf.
|
|
|
|
|
| |
This complements 94c5d22a4da7844d0bdc5b370e47b1ba14268af2 (Remove mb.h,
which is unused).
|
|
|
|
|
|
| |
This removes an unneeded library dependency when falling back to
intrinsics-based backtracing (or failing to enable heap profiling at
all).
|
|
|
|
|
| |
Remove a call to arena_maybe_purge() that was necessary for ratio-based
purging, but is obsolete in the context of decay-based purging.
|
| |
|
|
|
|
|
|
|
|
| |
Extent splitting and coalescing is a major component of large allocation
overhead, and disabling coalescing of cached extents provides a simple
and effective hysteresis mechanism. Once two-phase purging is
implemented, it will probably make sense to leave coalescing disabled
for the first phase, but coalesce during the second phase.
|
|
|
|
|
| |
Refactor extent_can_coalesce(), extent_coalesce(), and extent_record()
to avoid needlessly repeating extent [de]activation operations.
|
|
|
|
|
| |
Mapped memory increases when extent_alloc_wrapper() succeeds, and
decreases when extent_dalloc_wrapper() is called (during purging).
|
|
|
|
| |
This removes the last use of arena->lock.
|
|
|
|
| |
This mildly reduces stats update overhead during normal operation.
|
|
|
|
| |
This replaces arena->lock synchronization.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This avoids a gcc diagnostic note:
note: The ABI for passing parameters with 64-byte alignment has
changed in GCC 4.6
This note related to the cacheline alignment of rtree_ctx_t, which was
introduced by 4a346f55939af4f200121cc4454089592d952f18 (Replace rtree
path cache with LRU cache.).
|
|
|
|
|
|
|
| |
Fix extent_alloc_dss() to account for bytes that are not a multiple of
the page size. This regression was introduced by
577d4572b0821a15e5370f9bf566d884b7cf707c (Make dss operations
lockless.), which was first released in 4.3.0.
|
|
|
|
|
|
|
| |
Fix rtree_subkey() to use uintptr_t rather than unsigned for key
bitmasking. This regression was introduced by
4a346f55939af4f200121cc4454089592d952f18 (Replace rtree path cache with
LRU cache.).
|
|
|
|
|
|
| |
This fixes interactions with witness_assert_depth[_to_rank](), which was
added in d0e93ada51e20f4ae394ff4dbdcf96182767c89c (Add
witness_assert_depth[_to_rank]().).
|
| |
|
|
|
|
|
| |
This avoids worst case behavior if e.g. another thread is preempted
while owning the resource the spinning thread is waiting for.
|
| |
|
|
|
|
|
| |
NULL can never actually be inserted in practice, and removing support
allows a branch to be removed from the fast path.
|
|
|
|
|
|
|
| |
Rather than dynamically building a table to aid per level computations,
define a constant table at compile time. Omit both high and low
insignificant bits. Use one to three tree levels, depending on the
number of significant bits.
|
|
|
|
| |
A subsequent change instead ignores insignificant high bits.
|
| |
|
|
|
|
|
| |
Anything but a hit in the first element of the lookup cache is
expensive enough to negate the benefits of inlining.
|
|
|
|
|
|
|
| |
Rework rtree_ctx_t to encapsulate an rtree leaf LRU lookup cache rather
than a single-path element lookup cache. The replacement is logically
much simpler, as well as slightly faster in the fast path case and less
prone to degraded performance during non-trivial sequences of lookups.
|