summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Perform delayed coalescing prior to purging.Jason Evans2017-03-076-50/+152
| | | | | | | | | Rather than purging uncoalesced extents, perform just enough incremental coalescing to purge only fully coalesced extents. In the absence of cached extent reuse, the immediate versus delayed incremental purging algorithms result in the same purge order. This resolves #655.
* Fix flakiness in test_decay_ticker.Jason Evans2017-03-071-106/+148
| | | | | | | | Fix the test_decay_ticker test to carefully control slab creation/destruction such that the decay backlog reliably reaches zero. Use an isolated arena so that no extraneous allocation can confuse the situation. Speed up time during the latter part of the test so that the entire decay time can expire in a reasonable amount of wall time.
* Change arena to use the atomic functions for ssize_t instead of the union ↵David Goldblatt2017-03-072-12/+3
| | | | strategy
* Add atomic types for ssize_tDavid Goldblatt2017-03-072-0/+11
|
* Make type abbreviations consistent: ssize_t is zd everywhereDavid Goldblatt2017-03-072-6/+6
|
* Insert not_reached after an exhaustive switchDavid Goldblatt2017-03-061-2/+4
| | | | | | | In the C11 atomics backport, we couldn't use not_reached() in atomic_enum_to_builtin (in atomic_gcc_atomic.h), since atomic.h was hermetic and assert.h wasn't; there was a dependency issue. assert.h is hermetic now, so we can include it.
* Disentangle assert and utilDavid Goldblatt2017-03-0612-242/+266
| | | | | | | | | This is the first header refactoring diff, #533. It splits the assert and util components into separate, hermetic, header files. In the process, it splits out two of the large sub-components of util (the stdio.h replacement, and bit manipulation routines) into their own components (malloc_io.h and bit_util.h). This is mostly to break up cyclic dependencies, but it also breaks off a good chunk of the catch-all-ness of util, which is nice.
* Optimize malloc_large_stats_t maintenance.Jason Evans2017-03-042-31/+8
| | | | | | | | | | Convert the nrequests field to be partially derived, and the curlextents to be fully derived, in order to reduce the number of stats updates needed during common operations. This change affects ndalloc stats during arena reset, because it is no longer possible to cancel out ndalloc effects (curlextents would become negative).
* Introduce a backport of C11 atomicsDavid Goldblatt2017-03-0315-680/+955
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a backport of C11 atomics. It has four implementations; ranked in order of preference, they are: - GCC/Clang __atomic builtins - GCC/Clang __sync builtins - MSVC _Interlocked builtins - C11 atomics, from <stdatomic.h> The primary advantages are: - Close adherence to the standard API gives us a defined memory model. - Type safety: atomic objects are now separate types from non-atomic ones, so that it's impossible to mix up atomic and non-atomic updates (which is undefined behavior that compilers are starting to take advantage of). - Efficiency: we can specify ordering for operations, avoiding fences and atomic operations on strongly ordered architectures (example: `atomic_write_u32(ptr, val);` involves a CAS loop, whereas `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store. This diff leaves in the current atomics API (implementing them in terms of the backport). This lets us transition uses over piecemeal. Testing: This is by nature hard to test. I've manually tested the first three options on Linux on gcc by futzing with the #defines manually, on freebsd with gcc and clang, on MSVC, and on OS X with clang. All of these were x86 machines though, and we don't have any test infrastructure set up for non-x86 platforms.
* Stop #define-ining away 'inline'David Goldblatt2017-03-031-1/+0
| | | | | In the long term, we'll transition to C99-style inline semantics. In the short-term, this will allow both styles to coexist without breaking one another.
* Immediately purge cached extents if decay_time is 0.Jason Evans2017-03-035-44/+138
| | | | | | | | This fixes a regression caused by 54269dc0ed3e4d04b2539016431de3cfe8330719 (Remove obsolete arena_maybe_purge() call.), as well as providing a general fix. This resolves #665.
* Convert arena_decay_t's time to be atomically synchronized.Jason Evans2017-03-034-17/+33
|
* Fix typos.Jason Evans2017-03-011-2/+2
|
* Small style fix in ctl.cQi Wang2017-03-011-2/+1
|
* fix typo sytem -> systemcharsyam2017-03-011-1/+1
|
* Add casts to CONF_HANDLE_T_U().Jason Evans2017-03-011-4/+4
| | | | | This avoids signed/unsigned comparison warnings when specifying integer constants as inputs.
* Update ChangeLog for 4.5.0.Jason Evans2017-02-281-0/+35
|
* Dodge 32-bit-clang-specific backtracing failure.Jason Evans2017-02-281-0/+4
| | | | | This disables run_tests.sh configurations that use the combination of 32-bit clang and heap profiling.
* Put -D_REENTRANT in CPPFLAGS rather than CFLAGS.Jason Evans2017-02-281-1/+1
| | | | | | This regression was introduced by 194d6f9de8ff92841b67f38a2a6a06818e3240dd (Restructure *CFLAGS/*CXXFLAGS configuration.).
* Fix {allocated,nmalloc,ndalloc,nrequests}_large stats regression.Jason Evans2017-02-272-15/+3
| | | | | | This fixes a regression introduced by d433471f581ca50583c7a99f9802f7388f81aa36 (Derive {allocated,nmalloc,ndalloc,nrequests}_large stats.).
* Tidy up extent quantization.Jason Evans2017-02-272-25/+5
| | | | | | | Remove obsolete unit test scaffolding for extent quantization. Remove redundant assertions. Add an assertion to extents_first_best_fit_locked() that should help prevent aligned allocation regressions.
* Update a comment.Jason Evans2017-02-261-4/+4
|
* Get rid of witness in malloc_mutex_t when !(configured w/ debug).Qi Wang2017-02-243-14/+34
| | | | | | We don't touch witness at all when config_debug == false. Let's only pay the memory cost in malloc_mutex_s when needed. Note that when !config_debug, we keep the field in a union so that we don't have to do #ifdefs in multiple places.
* Use MALLOC_CONF rather than malloc_conf for tests.Jason Evans2017-02-2335-80/+119
| | | | | | | | | malloc_conf does not reliably work with MSVC, which complains of "inconsistent dll linkage", i.e. its inability to support the application overriding malloc_conf when dynamically linking/loading. Work around this limitation by adding test harness support for per test shell script sourcing, and converting all tests to use MALLOC_CONF instead of malloc_conf.
* Remove remainder of mb (memory barrier).Jason Evans2017-02-223-4/+0
| | | | | This complements 94c5d22a4da7844d0bdc5b370e47b1ba14268af2 (Remove mb.h, which is unused).
* Avoid -lgcc for heap profiling if unwind.h is missing.Jason Evans2017-02-211-1/+3
| | | | | | This removes an unneeded library dependency when falling back to intrinsics-based backtracing (or failing to enable heap profiling at all).
* Remove obsolete arena_maybe_purge() call.Jason Evans2017-02-211-4/+0
| | | | | Remove a call to arena_maybe_purge() that was necessary for ratio-based purging, but is obsolete in the context of decay-based purging.
* Move arena_basic_stats_merge() prototype (hygienic cleanup).Jason Evans2017-02-211-3/+3
|
* Disable coalescing of cached extents.Jason Evans2017-02-174-24/+43
| | | | | | | | Extent splitting and coalescing is a major component of large allocation overhead, and disabling coalescing of cached extents provides a simple and effective hysteresis mechanism. Once two-phase purging is implemented, it will probably make sense to leave coalescing disabled for the first phase, but coalesce during the second phase.
* Optimize extent coalescing.Jason Evans2017-02-171-20/+23
| | | | | Refactor extent_can_coalesce(), extent_coalesce(), and extent_record() to avoid needlessly repeating extent [de]activation operations.
* Fix arena->stats.mapped accounting.Jason Evans2017-02-164-26/+61
| | | | | Mapped memory increases when extent_alloc_wrapper() succeeds, and decreases when extent_dalloc_wrapper() is called (during purging).
* Synchronize arena->decay with arena->decay.mtx.Jason Evans2017-02-164-33/+35
| | | | This removes the last use of arena->lock.
* Derive {allocated,nmalloc,ndalloc,nrequests}_large stats.Jason Evans2017-02-162-26/+27
| | | | This mildly reduces stats update overhead during normal operation.
* Synchronize arena->tcache_ql with arena->tcache_ql_mtx.Jason Evans2017-02-165-22/+32
| | | | This replaces arena->lock synchronization.
* Convert arena->stats synchronization to atomics.Jason Evans2017-02-169-228/+326
|
* Convert arena->prof_accumbytes synchronization to atomics.Jason Evans2017-02-1615-59/+128
|
* Convert arena->dss_prec synchronization to atomics.Jason Evans2017-02-164-17/+10
|
* Do not generate unused tsd_*_[gs]et() functions.Jason Evans2017-02-134-33/+31
| | | | | | | | | This avoids a gcc diagnostic note: note: The ABI for passing parameters with 64-byte alignment has changed in GCC 4.6 This note related to the cacheline alignment of rtree_ctx_t, which was introduced by 4a346f55939af4f200121cc4454089592d952f18 (Replace rtree path cache with LRU cache.).
* Fix extent_alloc_dss() regression.Jason Evans2017-02-101-19/+29
| | | | | | | Fix extent_alloc_dss() to account for bytes that are not a multiple of the page size. This regression was introduced by 577d4572b0821a15e5370f9bf566d884b7cf707c (Make dss operations lockless.), which was first released in 4.3.0.
* Fix rtree_subkey() regression.Jason Evans2017-02-101-1/+1
| | | | | | | Fix rtree_subkey() to use uintptr_t rather than unsigned for key bitmasking. This regression was introduced by 4a346f55939af4f200121cc4454089592d952f18 (Replace rtree path cache with LRU cache.).
* Enable mutex witnesses even when !isthreaded.Jason Evans2017-02-101-9/+5
| | | | | | This fixes interactions with witness_assert_depth[_to_rank](), which was added in d0e93ada51e20f4ae394ff4dbdcf96182767c89c (Add witness_assert_depth[_to_rank]().).
* Spin adaptively in rtree_elm_acquire().Jason Evans2017-02-091-10/+11
|
* Enhance spin_adaptive() to yield after several iterations.Jason Evans2017-02-093-6/+28
| | | | | This avoids worst case behavior if e.g. another thread is preempted while owning the resource the spinning thread is waiting for.
* Replace spin_init() with SPIN_INITIALIZER.Jason Evans2017-02-095-12/+4
|
* Remove rtree support for 0 (NULL) keys.Jason Evans2017-02-093-45/+43
| | | | | NULL can never actually be inserted in practice, and removing support allows a branch to be removed from the fast path.
* Determine rtree levels at compile time.Jason Evans2017-02-099-272/+248
| | | | | | | Rather than dynamically building a table to aid per level computations, define a constant table at compile time. Omit both high and low insignificant bits. Use one to three tree levels, depending on the number of significant bits.
* Remove rtree leading 0 bit optimization.Jason Evans2017-02-092-84/+16
| | | | A subsequent change instead ignores insignificant high bits.
* Make non-essential inline rtree functions static functions.Jason Evans2017-02-094-119/+85
|
* Split rtree_elm_lookup_hard() out of rtree_elm_lookup().Jason Evans2017-02-094-101/+111
| | | | | Anything but a hit in the first element of the lookup cache is expensive enough to negate the benefits of inlining.
* Replace rtree path cache with LRU cache.Jason Evans2017-02-094-124/+108
| | | | | | | Rework rtree_ctx_t to encapsulate an rtree leaf LRU lookup cache rather than a single-path element lookup cache. The replacement is logically much simpler, as well as slightly faster in the fast path case and less prone to degraded performance during non-trivial sequences of lookups.