summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Make prof's cum_gctx a C11-style atomicDavid Goldblatt2017-04-051-2/+2
|
* Make the mutex n_waiting_thds field a C11-style atomicDavid Goldblatt2017-04-054-7/+19
|
* Convert extent module to use C11-style atomcisDavid Goldblatt2017-04-051-8/+10
|
* Convert accumbytes in prof_accum_t to C11 atomics, when possibleDavid Goldblatt2017-04-053-6/+12
|
* Make extent_dss use C11-style atomicsDavid Goldblatt2017-04-051-15/+21
|
* Make base_t's extent_hooks field C11-atomicDavid Goldblatt2017-04-052-15/+9
|
* Transition arena struct fields to C11 atomicsDavid Goldblatt2017-04-053-39/+48
|
* Move arena-tracking atomics in jemalloc.c to C11-styleDavid Goldblatt2017-04-053-11/+12
|
* Transition e_prof_tctx in struct extent to C11 atomicsDavid Goldblatt2017-04-042-8/+8
|
* Convert prng module to use C11-style atomicsDavid Goldblatt2017-04-044-43/+46
|
* Make the tsd member init functions to take tsd_t * type.Qi Wang2017-04-043-3/+9
|
* Do proper cleanup for tsd_state_reincarnated.Qi Wang2017-04-044-18/+50
| | | | | Also enable arena_bind under non-nominal state, as the cleanup will be handled correctly now.
* Remove the leafkey NULL check in leaf_elm_lookup.Qi Wang2017-04-042-14/+10
|
* Add init function support to tsd members.Qi Wang2017-04-049-33/+74
| | | | | | This will facilitate embedding tcache into tsd, which will require proper initialization cannot be done via the static initializer. Make tsd->rtree_ctx to be initialized via rtree_ctx_data_init().
* issue-586: detect main executable even if PIE is activeAliaksey Kandratsenka2017-04-041-1/+12
| | | | | | | | | | Previous logic of detecting main program addresses is to assume that main executable is at least addressess. With PIE (active by default on Ubuntus) it doesn't work. In order to deal with that, we're attempting to find main executable mapping in /proc/[pid]/maps. And old logic is preserved too just in case.
* Lookup extent once per time during tcache_flush_small / _large.Qi Wang2017-03-281-14/+28
| | | | Caching the extents on stack to avoid redundant looking up overhead.
* Move arena_slab_data_t's nfree into extent_t's e_bits.Jason Evans2017-03-286-43/+86
| | | | | | | | | | | | | | Compact extent_t to 128 bytes on 64-bit systems by moving arena_slab_data_t's nfree into extent_t's e_bits. Cacheline-align extent_t structures so that they always cross the minimum number of cacheline boundaries. Re-order extent_t fields such that all fields except the slab bitmap (and overlaid heap profiling context pointer) are in the first cacheline. This resolves #461.
* Simplify rtree cache replacement policy.Qi Wang2017-03-271-14/+11
| | | | | To avoid memmove on free() fast path, simplify the cache replacement policy to only bubble up the cache hit element by 1.
* Simplify rtree_clear() to avoid locking.Jason Evans2017-03-271-4/+4
|
* Fix a race in rtree_szind_slab_update() for RTREE_LEAF_COMPACT.Jason Evans2017-03-272-13/+53
|
* Remove BITMAP_USE_TREE.Jason Evans2017-03-275-307/+0
| | | | | | | | | | Remove tree-structured bitmap support, in order to reduce complexity and ease maintenance. No bitmaps larger than 512 bits have been necessary since before 4.0.0, and there is no current plan that would increase maximum bitmap size. Although tree-structured bitmaps were used on 32-bit platforms prior to this change, the overall benefits were questionable (higher metadata overhead, higher bitmap modification cost, marginally lower search cost).
* Fix bitmap_ffu() to work with 3+ levels.Jason Evans2017-03-272-41/+56
|
* Pack various extent_t fields into a bitfield.Jason Evans2017-03-262-104/+155
| | | | This reduces sizeof(extent_t) from 160 to 136 on x64.
* Store arena index rather than (arena_t *) in extent_t.Jason Evans2017-03-263-5/+5
|
* Fix BITMAP_USE_TREE version of bitmap_ffu().Jason Evans2017-03-262-5/+48
| | | | | | | | | This fixes an extent searching regression on 32-bit systems, caused by the initial bitmap_ffu() implementation in c8021d01f6efe14dc1bd200021a815638063cb5f (Implement bitmap_ffu(), which finds the first unset bit.), as first used in 5d33233a5e6601902df7cddd8cc8aa0b135c77b2 (Use a bitmap in extents_t to speed up search.).
* Force inline ifree to avoid function call costs on fast path.Qi Wang2017-03-251-2/+2
| | | | | Without ALWAYS_INLINE, sometimes ifree() gets compiled into its own function, which adds overhead on the fast path.
* Use a bitmap in extents_t to speed up search.Jason Evans2017-03-253-12/+44
| | | | | Rather than iteratively checking all sufficiently large heaps during search, maintain and use a bitmap in order to skip empty heaps.
* Implement BITMAP_GROUPS().Jason Evans2017-03-251-0/+6
|
* Implement bitmap_ffu(), which finds the first unset bit.Jason Evans2017-03-256-25/+136
|
* Use first fit layout policy instead of best fit.Jason Evans2017-03-251-12/+42
| | | | | | | | | For extents which do not delay coalescing, use first fit layout policy rather than first-best fit layout policy. This packs extents toward older virtual memory mappings, but at the cost of higher search overhead in the common case. This resolves #711.
* Added documentation for mutex profiling related mallctls.Qi Wang2017-03-231-0/+206
|
* Profile per arena base mutex, instead of just a0.Qi Wang2017-03-233-6/+7
|
* Refactor mutex profiling code with x-macros.Qi Wang2017-03-237-232/+225
|
* Switch to nstime_t for the time related fields in mutex profiling.Qi Wang2017-03-235-20/+24
|
* Added custom mutex spin.Qi Wang2017-03-233-17/+27
| | | | | | | A fixed max spin count is used -- with benchmark results showing it solves almost all problems. As the benchmark used was rather intense, the upper bound could be a little bit high. However it should offer a good tradeoff between spinning and blocking.
* Added extents_dirty / _muzzy mutexes, as well as decay_dirty / _muzzy.Qi Wang2017-03-234-41/+61
|
* Added "stats.mutexes.reset" mallctl to reset all mutex stats.Qi Wang2017-03-2312-189/+250
| | | | Also switched from the term "lock" to "mutex".
* Added JSON output for lock stats.Qi Wang2017-03-234-44/+124
| | | | Also added option 'x' to malloc_stats() to bypass lock section.
* Added lock profiling and output for global locks (ctl, prof and base).Qi Wang2017-03-239-78/+174
|
* Add arena lock stats output.Qi Wang2017-03-239-51/+269
|
* Output bin lock profiling results to malloc_stats.Qi Wang2017-03-238-34/+120
| | | | | Two counters are included for the small bins: lock contention rate, and max lock waiting time.
* First stage of mutex profiling.Qi Wang2017-03-235-32/+149
| | | | Switched to trylock and update counters based on state.
* Further specialize arena_[s]dalloc() tcache fast path.Jason Evans2017-03-233-45/+129
| | | | | | Use tsd_rtree_ctx() rather than tsdn_rtree_ctx() when tcache is non-NULL, in order to avoid an extra branch (and potentially extra stack space) in the fast path.
* Push down iealloc() calls.Jason Evans2017-03-239-227/+176
| | | | | Call iealloc() as deep into call chains as possible without causing redundant calls.
* Remove extent dereferences from the deallocation fast paths.Jason Evans2017-03-238-87/+113
|
* Remove extent arg from isalloc() and arena_salloc().Jason Evans2017-03-236-51/+29
|
* Implement compact rtree leaf element representation.Jason Evans2017-03-235-7/+163
| | | | | | | | | If a single virtual adddress pointer has enough unused bits to pack {szind_t, extent_t *, bool, bool}, use a single pointer-sized field in each rtree leaf element, rather than using three separate fields. This has little impact on access speed (fewer loads/stores, but more bit twiddling), except that denser representation increases TLB effectiveness.
* Embed root node into rtree_t.Jason Evans2017-03-235-140/+86
| | | | This avoids one atomic operation per tree access.
* Incorporate szind/slab into rtree leaves.Jason Evans2017-03-2313-224/+469
| | | | | | Expand and restructure the rtree API such that all common operations can be achieved with minimal work, regardless of whether the rtree leaf fields are independent versus packed into a single atomic pointer.
* Split rtree_elm_t into rtree_{node,leaf}_elm_t.Jason Evans2017-03-239-257/+458
| | | | | | | | | | | | | | | | This allows leaf elements to differ in size from internal node elements. In principle it would be more correct to use a different type for each level of the tree, but due to implementation details related to atomic operations, we use casts anyway, thus counteracting the value of additional type correctness. Furthermore, such a scheme would require function code generation (via cpp macros), as well as either unwieldy type names for leaves or type aliases, e.g. typedef struct rtree_elm_d2_s rtree_leaf_elm_t; This alternate strategy would be more correct, and with less code duplication, but probably not worth the complexity.