summaryrefslogtreecommitdiffstats
path: root/src/arena.c
Commit message (Collapse)AuthorAgeFilesLines
* Stop depending on JEMALLOC_N() for function interception during testing.Jason Evans2017-05-121-12/+4
| | | | | | Instead, always define function pointers for interceptable functions, but mark them const unless testing, so that the compiler can optimize out the pointer dereferences.
* Add extent_destroy_t and use it during arena destruction.Jason Evans2017-04-291-11/+9
| | | | | | | | | | Add the extent_destroy_t extent destruction hook to extent_hooks_t, and use it during arena destruction. This hook explicitly communicates to the callee that the extent must be destroyed or tracked for later reuse, lest it be permanently leaked. Prior to this change, retained extents could unintentionally be leaked if extent retention was enabled. This resolves #560.
* Refactor !opt.munmap to opt.retain.Jason Evans2017-04-291-3/+3
|
* Replace --disable-munmap with opt.munmap.Jason Evans2017-04-251-2/+2
| | | | | | | | | Control use of munmap(2) via a run-time option rather than a compile-time option (with the same per platform default). The old behavior of --disable-munmap can be achieved with --with-malloc-conf=munmap:false. This partially resolves #580.
* Use trylock in arena_decay_impl().Qi Wang2017-04-241-8/+16
| | | | If another thread is working on decay, we don't have to wait for the mutex.
* Header refactoring: size_classes module - remove from the catchallDavid Goldblatt2017-04-241-0/+1
|
* Get rid of most of the various inline macros.David Goldblatt2017-04-241-3/+3
|
* Remove --disable-tcache.Jason Evans2017-04-211-32/+24
| | | | | | | | | | | Simplify configuration by removing the --disable-tcache option, but replace the testing for that configuration with --with-malloc-conf=tcache:false. Fix the thread.arena and thread.tcache.flush mallctls to work correctly if tcache is disabled. This partially resolves #580.
* Bypass extent tracking for auto arenas.Qi Wang2017-04-211-11/+22
| | | | | | | | Tracking extents is required by arena_reset. To support this, the extent linkage was used for tracking 1) large allocations, and 2) full slabs. However modifying the extent linkage could be an expensive operation as it likely incurs cache misses. Since we forbid arena_reset on auto arenas, let's bypass the linkage operations for auto arenas.
* Header refactoring: move assert.h out of the catch-allDavid Goldblatt2017-04-191-0/+1
|
* Header refactoring: move util.h out of the catchallDavid Goldblatt2017-04-191-0/+2
|
* Prefer old/low extent_t structures during reuse.Jason Evans2017-04-171-7/+7
| | | | | | Rather than using a LIFO queue to track available extent_t structures, use a red-black tree, and always choose the oldest/lowest available during reuse.
* Switch to fine-grained reentrancy support.Qi Wang2017-04-151-4/+2
| | | | | | | Previously we had a general detection and support of reentrancy, at the cost of having branches and inc / dec operations on fast paths. To avoid taxing fast paths, we move the reentrancy operations onto tsd slow state, and only modify reentrancy level around external calls (that might trigger reentrancy).
* Pass alloc_ctx down profiling path.Qi Wang2017-04-121-2/+9
| | | | | | With this change, when profiling is enabled, we avoid doing redundant rtree lookups. Also changed dalloc_atx_t to alloc_atx_t, as it's now used on allocation path as well (to speed up profiling).
* Pass dalloc_ctx down the sdalloc path.Qi Wang2017-04-121-1/+1
| | | | This avoids redundant rtree lookups.
* Header refactoring: Split up jemalloc_internal.hDavid Goldblatt2017-04-111-1/+2
| | | | | | | | | | | | | | This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while now: - The source of system-wide definitions. - The catch-all include file. - The module header file for jemalloc.c This commit splits up this functionality. The system-wide definitions responsibility has moved to jemalloc_preamble.h. The catch-all include file is now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in jemalloc_internal_[externs|inlines|types].h, just as they are for the other modules.
* Move reentrancy_level to the beginning of TSD.Qi Wang2017-04-071-1/+1
|
* Add basic reentrancy-checking support, and allow arena_new to reenter.David Goldblatt2017-04-071-0/+13
| | | | | | | | | This checks whether or not we're reentrant using thread-local data, and, if we are, moves certain internal allocations to use arena 0 (which should be properly initialized after bootstrapping). The immediate thing this allows is spinning up threads in arena_new, which will enable spinning up background threads there.
* Optimizing TSD and thread cache layout.Qi Wang2017-04-071-5/+11
| | | | | | | | | | 1) Re-organize TSD so that frequently accessed fields are closer to the beginning and more compact. Assuming 64-bit, the first 2.5 cachelines now contains everything needed on tcache fast path, expect the tcache struct itself. 2) Re-organize tcache and tbins. Take lg_fill_div out of tbin, and reduce tbin to 24 bytes (down from 32). Split tbins into tbins_small and tbins_large, and place tbins_small close to the beginning.
* Transition arena struct fields to C11 atomicsDavid Goldblatt2017-04-051-27/+29
|
* Convert prng module to use C11-style atomicsDavid Goldblatt2017-04-041-2/+2
|
* Move arena_slab_data_t's nfree into extent_t's e_bits.Jason Evans2017-03-281-19/+18
| | | | | | | | | | | | | | Compact extent_t to 128 bytes on 64-bit systems by moving arena_slab_data_t's nfree into extent_t's e_bits. Cacheline-align extent_t structures so that they always cross the minimum number of cacheline boundaries. Re-order extent_t fields such that all fields except the slab bitmap (and overlaid heap profiling context pointer) are in the first cacheline. This resolves #461.
* Implement bitmap_ffu(), which finds the first unset bit.Jason Evans2017-03-251-1/+1
|
* Profile per arena base mutex, instead of just a0.Qi Wang2017-03-231-0/+2
|
* Refactor mutex profiling code with x-macros.Qi Wang2017-03-231-10/+14
|
* Added extents_dirty / _muzzy mutexes, as well as decay_dirty / _muzzy.Qi Wang2017-03-231-4/+7
|
* Added "stats.mutexes.reset" mallctl to reset all mutex stats.Qi Wang2017-03-231-3/+3
| | | | Also switched from the term "lock" to "mutex".
* Added lock profiling and output for global locks (ctl, prof and base).Qi Wang2017-03-231-3/+3
|
* Add arena lock stats output.Qi Wang2017-03-231-0/+18
|
* Output bin lock profiling results to malloc_stats.Qi Wang2017-03-231-0/+1
| | | | | Two counters are included for the small bins: lock contention rate, and max lock waiting time.
* Push down iealloc() calls.Jason Evans2017-03-231-33/+31
| | | | | Call iealloc() as deep into call chains as possible without causing redundant calls.
* Remove extent dereferences from the deallocation fast paths.Jason Evans2017-03-231-1/+1
|
* Remove extent arg from isalloc() and arena_salloc().Jason Evans2017-03-231-4/+4
|
* Incorporate szind/slab into rtree leaves.Jason Evans2017-03-231-1/+10
| | | | | | Expand and restructure the rtree API such that all common operations can be achieved with minimal work, regardless of whether the rtree leaf fields are independent versus packed into a single atomic pointer.
* Remove binind field from arena_slab_data_t.Jason Evans2017-03-231-5/+5
| | | | | binind is now redundant; the containing extent_t's szind field always provides the same value.
* Convert extent_t's usize to szind.Jason Evans2017-03-231-16/+18
| | | | | | | | Rather than storing usize only for large (and prof-promoted) allocations, store the size class index for allocations that reside within the extent, such that the size class index is valid for all extents that contain extant allocations, and invalid otherwise (mainly to make debugging simpler).
* Implement two-phase decay-based purging.Jason Evans2017-03-151-120/+260
| | | | | | | | | | | | | | | | | | | | Split decay-based purging into two phases, the first of which uses lazy purging to convert dirty pages to "muzzy", and the second of which uses forced purging, decommit, or unmapping to convert pages to clean or destroy them altogether. Not all operating systems support lazy purging, yet the application may provide extent hooks that implement lazy purging, so care must be taken to dynamically omit the first phase when necessary. The mallctl interfaces change as follows: - opt.decay_time --> opt.{dirty,muzzy}_decay_time - arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time - arenas.decay_time --> arenas.{dirty,muzzy}_decay_time - stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy} - stats.arenas.<i>.{npurge,nmadvise,purged} --> stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged} This resolves #521.
* Move arena_t's purging field into arena_decay_t.Jason Evans2017-03-151-5/+4
|
* Refactor decay-related function parametrization.Jason Evans2017-03-151-86/+96
| | | | | | | Refactor most of the decay-related functions to take as parameters the decay_t and associated extents_t structures to operate on. This prepares for supporting both lazy and forced purging on different decay schedules.
* Convert remaining arena_stats_t fields to atomicsDavid Goldblatt2017-03-141-23/+33
| | | | | | | These were all size_ts, so we have atomics support for them on all platforms, so the conversion is straightforward. Left non-atomic is curlextents, which AFAICT is not used atomically anywhere.
* Switch atomic uint64_ts in arena_stats_t to C11 atomicsDavid Goldblatt2017-03-141-21/+39
| | | | | | I expect this to be the trickiest conversion we will see, since we want atomics on 64-bit platforms, but are also always able to piggyback on some sort of external synchronization on non-64 bit platforms.
* Convert arena_t's purging field to non-atomic bool.Jason Evans2017-03-101-4/+5
| | | | The decay mutex already protects all accesses.
* Implement per-CPU arena.Qi Wang2017-03-091-0/+10
| | | | | | | | | | | | | | | | | | The new feature, opt.percpu_arena, determines thread-arena association dynamically based CPU id. Three modes are supported: "percpu", "phycpu" and disabled. "percpu" uses the current core id (with help from sched_getcpu()) directly as the arena index, while "phycpu" will assign threads on the same physical CPU to the same arena. In other words, "percpu" means # of arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of CPUs). Note that no runtime check on whether hyper threading is enabled is added yet. When enabled, threads will be migrated between arenas when a CPU change is detected. In the current design, to reduce overhead from reading CPU id, each arena tracks the thread accessed most recently. When a new thread comes in, we will read CPU id and update arena if necessary.
* Fix arena_prefork lock rank order for witness.Qi Wang2017-03-091-6/+16
| | | | | | | | When witness is enabled, lock rank order needs to be preserved during prefork, not only for each arena, but also across arenas. This change breaks arena_prefork into further stages to ensure valid rank order across arenas. Also changed test/unit/fork to use a manual arena to catch this case.
* Perform delayed coalescing prior to purging.Jason Evans2017-03-071-8/+20
| | | | | | | | | Rather than purging uncoalesced extents, perform just enough incremental coalescing to purge only fully coalesced extents. In the absence of cached extent reuse, the immediate versus delayed incremental purging algorithms result in the same purge order. This resolves #655.
* Change arena to use the atomic functions for ssize_t instead of the union ↵David Goldblatt2017-03-071-6/+2
| | | | strategy
* Optimize malloc_large_stats_t maintenance.Jason Evans2017-03-041-29/+6
| | | | | | | | | | Convert the nrequests field to be partially derived, and the curlextents to be fully derived, in order to reduce the number of stats updates needed during common operations. This change affects ndalloc stats during arena reset, because it is no longer possible to cancel out ndalloc effects (curlextents would become negative).
* Immediately purge cached extents if decay_time is 0.Jason Evans2017-03-031-37/+32
| | | | | | | | This fixes a regression caused by 54269dc0ed3e4d04b2539016431de3cfe8330719 (Remove obsolete arena_maybe_purge() call.), as well as providing a general fix. This resolves #665.
* Convert arena_decay_t's time to be atomically synchronized.Jason Evans2017-03-031-13/+22
|
* Fix {allocated,nmalloc,ndalloc,nrequests}_large stats regression.Jason Evans2017-02-271-14/+2
| | | | | | This fixes a regression introduced by d433471f581ca50583c7a99f9802f7388f81aa36 (Derive {allocated,nmalloc,ndalloc,nrequests}_large stats.).