summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
...
* Switch to fine-grained reentrancy support.Qi Wang2017-04-155-7/+32
| | | | | | | Previously we had a general detection and support of reentrancy, at the cost of having branches and inc / dec operations on fast paths. To avoid taxing fast paths, we move the reentrancy operations onto tsd slow state, and only modify reentrancy level around external calls (that might trigger reentrancy).
* Bundle 3 branches on fast path into tsd_state.Qi Wang2017-04-147-41/+64
| | | | | | Added tsd_state_nominal_slow, which on fast path malloc() incorporates tcache_enabled check, and on fast path free() bundles both malloc_slow and tcache_enabled branches.
* Pass alloc_ctx down profiling path.Qi Wang2017-04-125-47/+67
| | | | | | With this change, when profiling is enabled, we avoid doing redundant rtree lookups. Also changed dalloc_atx_t to alloc_atx_t, as it's now used on allocation path as well (to speed up profiling).
* Pass dalloc_ctx down the sdalloc path.Qi Wang2017-04-122-20/+28
| | | | This avoids redundant rtree lookups.
* Header refactoring: move atomic.h out of the catch-allDavid Goldblatt2017-04-1110-1/+18
|
* Header refactoring: Split up jemalloc_internal.hDavid Goldblatt2017-04-118-1303/+1332
| | | | | | | | | | | | | | This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while now: - The source of system-wide definitions. - The catch-all include file. - The module header file for jemalloc.c This commit splits up this functionality. The system-wide definitions responsibility has moved to jemalloc_preamble.h. The catch-all include file is now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in jemalloc_internal_[externs|inlines|types].h, just as they are for the other modules.
* Header refactoring: break out ql.h dependenciesDavid Goldblatt2017-04-1110-2/+18
|
* Header refactoring: break out qr.h dependenciesDavid Goldblatt2017-04-112-1/+2
|
* Header refactoring: break out rb.h dependenciesDavid Goldblatt2017-04-113-4/+6
|
* Header refactoring: break out ph.h dependenciesDavid Goldblatt2017-04-113-1/+4
|
* Header refactoring: Add CPP_PROLOGUE and CPP_EPILOGUE macrosDavid Goldblatt2017-04-111-4/+8
|
* Pass dealloc_ctx down free() fast path.Qi Wang2017-04-114-11/+28
| | | | This gets rid of the redundent rtree lookup down fast path.
* Move reentrancy_level to the beginning of TSD.Qi Wang2017-04-071-7/+7
|
* Add basic reentrancy-checking support, and allow arena_new to reenter.David Goldblatt2017-04-074-5/+18
| | | | | | | | | This checks whether or not we're reentrant using thread-local data, and, if we are, moves certain internal allocations to use arena 0 (which should be properly initialized after bootstrapping). The immediate thing this allows is spinning up threads in arena_new, which will enable spinning up background threads there.
* Add hooking functionalityDavid Goldblatt2017-04-075-2/+29
| | | | | This allows us to hook chosen functions and do interesting things there (in particular: reentrancy checking).
* Optimizing TSD and thread cache layout.Qi Wang2017-04-078-63/+129
| | | | | | | | | | 1) Re-organize TSD so that frequently accessed fields are closer to the beginning and more compact. Assuming 64-bit, the first 2.5 cachelines now contains everything needed on tcache fast path, expect the tcache struct itself. 2) Re-organize tcache and tbins. Take lg_fill_div out of tbin, and reduce tbin to 24 bytes (down from 32). Split tbins into tbins_small and tbins_large, and place tbins_small close to the beginning.
* Get rid of tcache_enabled_t as we have runtime init support.Qi Wang2017-04-075-21/+11
|
* Integrate auto tcache into TSD.Qi Wang2017-04-0711-98/+128
| | | | | | | | | The embedded tcache is initialized upon tsd initialization. The avail arrays for the tbins will be allocated / deallocated accordingly during init / cleanup. With this change, the pointer to the auto tcache will always be available, as long as we have access to the TSD. tcache_available() (called in tcache_get()) is provided to check if we should use tcache.
* Remove the pre-C11-atomics API, which is now unusedDavid Goldblatt2017-04-051-45/+0
|
* Make the mutex n_waiting_thds field a C11-style atomicDavid Goldblatt2017-04-053-4/+15
|
* Convert accumbytes in prof_accum_t to C11 atomics, when possibleDavid Goldblatt2017-04-052-5/+9
|
* Make base_t's extent_hooks field C11-atomicDavid Goldblatt2017-04-051-5/+5
|
* Transition arena struct fields to C11 atomicsDavid Goldblatt2017-04-051-6/+10
|
* Move arena-tracking atomics in jemalloc.c to C11-styleDavid Goldblatt2017-04-052-5/+4
|
* Transition e_prof_tctx in struct extent to C11 atomicsDavid Goldblatt2017-04-042-8/+8
|
* Convert prng module to use C11-style atomicsDavid Goldblatt2017-04-042-23/+24
|
* Make the tsd member init functions to take tsd_t * type.Qi Wang2017-04-041-1/+2
|
* Do proper cleanup for tsd_state_reincarnated.Qi Wang2017-04-041-1/+2
| | | | | Also enable arena_bind under non-nominal state, as the cleanup will be handled correctly now.
* Remove the leafkey NULL check in leaf_elm_lookup.Qi Wang2017-04-042-14/+10
|
* Add init function support to tsd members.Qi Wang2017-04-046-26/+35
| | | | | | This will facilitate embedding tcache into tsd, which will require proper initialization cannot be done via the static initializer. Make tsd->rtree_ctx to be initialized via rtree_ctx_data_init().
* Move arena_slab_data_t's nfree into extent_t's e_bits.Jason Evans2017-03-284-23/+66
| | | | | | | | | | | | | | Compact extent_t to 128 bytes on 64-bit systems by moving arena_slab_data_t's nfree into extent_t's e_bits. Cacheline-align extent_t structures so that they always cross the minimum number of cacheline boundaries. Re-order extent_t fields such that all fields except the slab bitmap (and overlaid heap profiling context pointer) are in the first cacheline. This resolves #461.
* Simplify rtree cache replacement policy.Qi Wang2017-03-271-14/+11
| | | | | To avoid memmove on free() fast path, simplify the cache replacement policy to only bubble up the cache hit element by 1.
* Simplify rtree_clear() to avoid locking.Jason Evans2017-03-271-4/+4
|
* Fix a race in rtree_szind_slab_update() for RTREE_LEAF_COMPACT.Jason Evans2017-03-272-13/+53
|
* Remove BITMAP_USE_TREE.Jason Evans2017-03-273-213/+0
| | | | | | | | | | Remove tree-structured bitmap support, in order to reduce complexity and ease maintenance. No bitmaps larger than 512 bits have been necessary since before 4.0.0, and there is no current plan that would increase maximum bitmap size. Although tree-structured bitmaps were used on 32-bit platforms prior to this change, the overall benefits were questionable (higher metadata overhead, higher bitmap modification cost, marginally lower search cost).
* Fix bitmap_ffu() to work with 3+ levels.Jason Evans2017-03-271-41/+29
|
* Pack various extent_t fields into a bitfield.Jason Evans2017-03-262-104/+155
| | | | This reduces sizeof(extent_t) from 160 to 136 on x64.
* Store arena index rather than (arena_t *) in extent_t.Jason Evans2017-03-263-5/+5
|
* Fix BITMAP_USE_TREE version of bitmap_ffu().Jason Evans2017-03-261-0/+13
| | | | | | | | | This fixes an extent searching regression on 32-bit systems, caused by the initial bitmap_ffu() implementation in c8021d01f6efe14dc1bd200021a815638063cb5f (Implement bitmap_ffu(), which finds the first unset bit.), as first used in 5d33233a5e6601902df7cddd8cc8aa0b135c77b2 (Use a bitmap in extents_t to speed up search.).
* Use a bitmap in extents_t to speed up search.Jason Evans2017-03-252-1/+14
| | | | | Rather than iteratively checking all sufficiently large heaps during search, maintain and use a bitmap in order to skip empty heaps.
* Implement BITMAP_GROUPS().Jason Evans2017-03-251-0/+6
|
* Implement bitmap_ffu(), which finds the first unset bit.Jason Evans2017-03-253-6/+67
|
* Profile per arena base mutex, instead of just a0.Qi Wang2017-03-231-1/+1
|
* Refactor mutex profiling code with x-macros.Qi Wang2017-03-234-22/+45
|
* Switch to nstime_t for the time related fields in mutex profiling.Qi Wang2017-03-233-6/+8
|
* Added custom mutex spin.Qi Wang2017-03-232-15/+13
| | | | | | | A fixed max spin count is used -- with benchmark results showing it solves almost all problems. As the benchmark used was rather intense, the upper bound could be a little bit high. However it should offer a good tradeoff between spinning and blocking.
* Added extents_dirty / _muzzy mutexes, as well as decay_dirty / _muzzy.Qi Wang2017-03-232-4/+6
|
* Added "stats.mutexes.reset" mallctl to reset all mutex stats.Qi Wang2017-03-238-33/+35
| | | | Also switched from the term "lock" to "mutex".
* Added lock profiling and output for global locks (ctl, prof and base).Qi Wang2017-03-235-4/+19
|
* Add arena lock stats output.Qi Wang2017-03-235-9/+23
|