jemalloc.git - jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Header refactoring: break out ph.h dependencies	David Goldblatt	2017-04-11	1	-0/+2
\|
*	Add basic reentrancy-checking support, and allow arena_new to reenter.	David Goldblatt	2017-04-07	4	-28/+55
\| \| \| \| \| \| \| \| \|	This checks whether or not we're reentrant using thread-local data, and, if we are, moves certain internal allocations to use arena 0 (which should be properly initialized after bootstrapping). The immediate thing this allows is spinning up threads in arena_new, which will enable spinning up background threads there.
*	Add hooking functionality	David Goldblatt	2017-04-07	10	-9/+124
\| \| \| \| \|	This allows us to hook chosen functions and do interesting things there (in particular: reentrancy checking).
*	Integrate auto tcache into TSD.	Qi Wang	2017-04-07	1	-0/+6
\| \| \| \| \| \| \| \| \|	The embedded tcache is initialized upon tsd initialization. The avail arrays for the tbins will be allocated / deallocated accordingly during init / cleanup. With this change, the pointer to the auto tcache will always be available, as long as we have access to the TSD. tcache_available() (called in tcache_get()) is provided to check if we should use tcache.
*	Remove the pre-C11-atomics API, which is now unused	David Goldblatt	2017-04-05	1	-45/+0
\|
*	Convert prng module to use C11-style atomics	David Goldblatt	2017-04-04	1	-18/+20
\|
*	Do proper cleanup for tsd_state_reincarnated.	Qi Wang	2017-04-04	1	-1/+40
\| \| \| \| \|	Also enable arena_bind under non-nominal state, as the cleanup will be handled correctly now.
*	Add init function support to tsd members.	Qi Wang	2017-04-04	1	-6/+10
\| \| \| \| \| \|	This will facilitate embedding tcache into tsd, which will require proper initialization cannot be done via the static initializer. Make tsd->rtree_ctx to be initialized via rtree_ctx_data_init().
*	Remove BITMAP_USE_TREE.	Jason Evans	2017-03-27	1	-16/+0
\| \| \| \| \| \| \| \| \| \|	Remove tree-structured bitmap support, in order to reduce complexity and ease maintenance. No bitmaps larger than 512 bits have been necessary since before 4.0.0, and there is no current plan that would increase maximum bitmap size. Although tree-structured bitmaps were used on 32-bit platforms prior to this change, the overall benefits were questionable (higher metadata overhead, higher bitmap modification cost, marginally lower search cost).
*	Fix bitmap_ffu() to work with 3+ levels.	Jason Evans	2017-03-27	1	-0/+27
\|
*	Fix BITMAP_USE_TREE version of bitmap_ffu().	Jason Evans	2017-03-26	1	-5/+35
\| \| \| \| \| \| \| \| \|	This fixes an extent searching regression on 32-bit systems, caused by the initial bitmap_ffu() implementation in c8021d01f6efe14dc1bd200021a815638063cb5f (Implement bitmap_ffu(), which finds the first unset bit.), as first used in 5d33233a5e6601902df7cddd8cc8aa0b135c77b2 (Use a bitmap in extents_t to speed up search.).
*	Implement bitmap_ffu(), which finds the first unset bit.	Jason Evans	2017-03-25	1	-12/+47
\|
*	Added JSON output for lock stats.	Qi Wang	2017-03-23	1	-1/+6
\| \| \| \|	Also added option 'x' to malloc_stats() to bypass lock section.
*	Push down iealloc() calls.	Jason Evans	2017-03-23	1	-7/+2
\| \| \| \| \|	Call iealloc() as deep into call chains as possible without causing redundant calls.
*	Embed root node into rtree_t.	Jason Evans	2017-03-23	1	-64/+55
\| \| \| \|	This avoids one atomic operation per tree access.
*	Incorporate szind/slab into rtree leaves.	Jason Evans	2017-03-23	2	-36/+73
\| \| \| \| \| \|	Expand and restructure the rtree API such that all common operations can be achieved with minimal work, regardless of whether the rtree leaf fields are independent versus packed into a single atomic pointer.
*	Split rtree_elm_t into rtree_{node,leaf}_elm_t.	Jason Evans	2017-03-23	1	-22/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows leaf elements to differ in size from internal node elements. In principle it would be more correct to use a different type for each level of the tree, but due to implementation details related to atomic operations, we use casts anyway, thus counteracting the value of additional type correctness. Furthermore, such a scheme would require function code generation (via cpp macros), as well as either unwieldy type names for leaves or type aliases, e.g. typedef struct rtree_elm_d2_s rtree_leaf_elm_t; This alternate strategy would be more correct, and with less code duplication, but probably not worth the complexity.
*	Convert extent_t's usize to szind.	Jason Evans	2017-03-23	1	-2/+2
\| \| \| \| \| \| \| \|	Rather than storing usize only for large (and prof-promoted) allocations, store the size class index for allocations that reside within the extent, such that the size class index is valid for all extents that contain extant allocations, and invalid otherwise (mainly to make debugging simpler).
*	Implement two-phase decay-based purging.	Jason Evans	2017-03-15	5	-116/+270
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split decay-based purging into two phases, the first of which uses lazy purging to convert dirty pages to "muzzy", and the second of which uses forced purging, decommit, or unmapping to convert pages to clean or destroy them altogether. Not all operating systems support lazy purging, yet the application may provide extent hooks that implement lazy purging, so care must be taken to dynamically omit the first phase when necessary. The mallctl interfaces change as follows: - opt.decay_time --> opt.{dirty,muzzy}_decay_time - arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time - arenas.decay_time --> arenas.{dirty,muzzy}_decay_time - stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy} - stats.arenas.<i>.{npurge,nmadvise,purged} --> stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged} This resolves #521.
*	Implement per-CPU arena.	Qi Wang	2017-03-09	3	-69/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new feature, opt.percpu_arena, determines thread-arena association dynamically based CPU id. Three modes are supported: "percpu", "phycpu" and disabled. "percpu" uses the current core id (with help from sched_getcpu()) directly as the arena index, while "phycpu" will assign threads on the same physical CPU to the same arena. In other words, "percpu" means # of arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of CPUs). Note that no runtime check on whether hyper threading is enabled is added yet. When enabled, threads will be migrated between arenas when a CPU change is detected. In the current design, to reduce overhead from reading CPU id, each arena tracks the thread accessed most recently. When a new thread comes in, we will read CPU id and update arena if necessary.
*	Fix arena_prefork lock rank order for witness.	Qi Wang	2017-03-09	1	-0/+13
\| \| \| \| \| \| \| \|	When witness is enabled, lock rank order needs to be preserved during prefork, not only for each arena, but also across arenas. This change breaks arena_prefork into further stages to ensure valid rank order across arenas. Also changed test/unit/fork to use a manual arena to catch this case.
*	Store associated arena in tcache.	Qi Wang	2017-03-07	1	-8/+32
\| \| \| \| \| \|	This fixes tcache_flush for manual tcaches, which wasn't able to find the correct arena it associated with. Also changed the decay test to cover this case (by using manually created arenas).
*	Add any() and remove_any() to ph.	Jason Evans	2017-03-07	1	-1/+30
\| \| \| \| \| \| \|	These functions select the easiest-to-remove element in the heap, which is either the most recently inserted aux list element or the root. If no calls are made to first() or remove_first(), the behavior (and time complexity) is the same as for a LIFO queue.
*	Fix flakiness in test_decay_ticker.	Jason Evans	2017-03-07	1	-106/+148
\| \| \| \| \| \| \| \|	Fix the test_decay_ticker test to carefully control slab creation/destruction such that the decay backlog reliably reaches zero. Use an isolated arena so that no extraneous allocation can confuse the situation. Speed up time during the latter part of the test so that the entire decay time can expire in a reasonable amount of wall time.
*	Add atomic types for ssize_t	David Goldblatt	2017-03-07	1	-0/+8
\|
*	Disentangle assert and util	David Goldblatt	2017-03-06	3	-52/+61
\| \| \| \| \| \| \| \| \|	This is the first header refactoring diff, #533. It splits the assert and util components into separate, hermetic, header files. In the process, it splits out two of the large sub-components of util (the stdio.h replacement, and bit manipulation routines) into their own components (malloc_io.h and bit_util.h). This is mostly to break up cyclic dependencies, but it also breaks off a good chunk of the catch-all-ness of util, which is nice.
*	Introduce a backport of C11 atomics	David Goldblatt	2017-03-03	1	-71/+227
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a backport of C11 atomics. It has four implementations; ranked in order of preference, they are: - GCC/Clang __atomic builtins - GCC/Clang __sync builtins - MSVC _Interlocked builtins - C11 atomics, from <stdatomic.h> The primary advantages are: - Close adherence to the standard API gives us a defined memory model. - Type safety: atomic objects are now separate types from non-atomic ones, so that it's impossible to mix up atomic and non-atomic updates (which is undefined behavior that compilers are starting to take advantage of). - Efficiency: we can specify ordering for operations, avoiding fences and atomic operations on strongly ordered architectures (example: `atomic_write_u32(ptr, val);` involves a CAS loop, whereas `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store. This diff leaves in the current atomics API (implementing them in terms of the backport). This lets us transition uses over piecemeal. Testing: This is by nature hard to test. I've manually tested the first three options on Linux on gcc by futzing with the #defines manually, on freebsd with gcc and clang, on MSVC, and on OS X with clang. All of these were x86 machines though, and we don't have any test infrastructure set up for non-x86 platforms.
*	Immediately purge cached extents if decay_time is 0.	Jason Evans	2017-03-03	1	-1/+104
\| \| \| \| \| \| \| \|	This fixes a regression caused by 54269dc0ed3e4d04b2539016431de3cfe8330719 (Remove obsolete arena_maybe_purge() call.), as well as providing a general fix. This resolves #665.
*	Use MALLOC_CONF rather than malloc_conf for tests.	Jason Evans	2017-02-23	35	-80/+119
\| \| \| \| \| \| \| \| \|	malloc_conf does not reliably work with MSVC, which complains of "inconsistent dll linkage", i.e. its inability to support the application overriding malloc_conf when dynamically linking/loading. Work around this limitation by adding test harness support for per test shell script sourcing, and converting all tests to use MALLOC_CONF instead of malloc_conf.
*	Enhance spin_adaptive() to yield after several iterations.	Jason Evans	2017-02-09	1	-0/+16
\| \| \| \| \|	This avoids worst case behavior if e.g. another thread is preempted while owning the resource the spinning thread is waiting for.
*	Remove rtree support for 0 (NULL) keys.	Jason Evans	2017-02-09	1	-5/+7
\| \| \| \| \|	NULL can never actually be inserted in practice, and removing support allows a branch to be removed from the fast path.
*	Determine rtree levels at compile time.	Jason Evans	2017-02-09	1	-138/+103
\| \| \| \| \| \| \|	Rather than dynamically building a table to aid per level computations, define a constant table at compile time. Omit both high and low insignificant bits. Use one to three tree levels, depending on the number of significant bits.
*	Conditianalize lg_tcache_max use on JEMALLOC_TCACHE.	Jason Evans	2017-02-07	1	-1/+5
\|
*	Disentangle arena and extent locking.	Jason Evans	2017-02-02	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor arena and extent locking protocols such that arena and extent locks are never held when calling into the extent_*_wrapper() API. This requires extra care during purging since the arena lock no longer protects the inner purging logic. It also requires extra care to protect extents from being merged with adjacent extents. Convert extent_t's 'active' flag to an enumerated 'state', so that retained extents are explicitly marked as such, rather than depending on ring linkage state. Refactor the extent collections (and their synchronization) for cached and retained extents into extents_t. Incorporate LRU functionality to support purging. Incorporate page count accounting, which replaces arena->ndirty and arena->stats.retained. Assert that no core locks are held when entering any internal [de]allocation functions. This is in addition to existing assertions that no locks are held when entering external [de]allocation functions. Audit and document synchronization protocols for all arena_t fields. This fixes a potential deadlock due to recursive allocation during gdump, in a similar fashion to b49c649bc18fff4bd10a1c8adbaf1f25f6453cb6 (Fix lock order reversal during gdump.), but with a necessarily much broader code impact.
*	Add witness_assert_depth[_to_rank]().	Jason Evans	2017-02-02	1	-12/+39
\| \| \| \| \|	This makes it possible to make lock state assertions about precisely which locks are held.
*	Silence harmless warnings discovered via run_tests.sh.	Jason Evans	2017-02-01	1	-2/+5
\|
*	Replace tabs following #define with spaces.	Jason Evans	2017-01-21	47	-249/+249
\| \| \| \|	This resolves #564.
*	Remove extraneous parens around return arguments.	Jason Evans	2017-01-21	63	-241/+241
\| \| \| \|	This resolves #540.
*	Update brace style.	Jason Evans	2017-01-21	69	-1062/+717
\| \| \| \| \| \| \|	Add braces around single-line blocks, and remove line breaks before function-opening braces. This resolves #537.
*	Fix --disable-stats support.	Jason Evans	2017-01-20	1	-12/+20
\| \| \| \| \|	Fix numerous regressions that were exposed by --disable-stats, both in the core library and in the tests.
*	Test JSON output of malloc_stats_print() and fix bugs.	Jason Evans	2017-01-19	1	-0/+1006
\| \| \| \| \| \| \| \|	Implement and test a JSON validation parser. Use the parser to validate JSON output from malloc_stats_print(), with a significant subset of supported output options. This resolves #551.
*	Fix prof_realloc() regression.	Jason Evans	2017-01-17	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly revert the prof_realloc() changes in 498856f44a30b31fe713a18eb2fc7c6ecf3a9f63 (Move slabs out of chunks.) so that prof_free_sampled_object() is called when appropriate. Leave the prof_tctx_[re]set() optimization in place, but add an assertion to verify that all eight cases are correctly handled. Add a comment to make clear the code ordering, so that the regression originally fixed by ea8d97b8978a0c0423f0ed64332463a25b787c3d (Fix prof_{malloc,free}_sample_object() call order in prof_realloc().) is not repeated. This resolves #499.
*	Add nullptr support to sized delete operators.	Jason Evans	2017-01-17	1	-0/+10
\|
*	Remove leading blank lines from function bodies.	Jason Evans	2017-01-13	62	-154/+0
\| \| \| \|	This resolves #535.
*	Break up headers into constituent parts	David Goldblatt	2017-01-12	1	-10/+6
\| \| \| \| \| \| \| \| \| \|	This is part of a broader change to make header files better represent the dependencies between one another (see https://github.com/jemalloc/jemalloc/issues/533). It breaks up component headers into smaller parts that can be made to have a simpler dependency graph. For the autogenerated headers (smoothstep.h and size_classes.h), no splitting was necessary, so I didn't add support to emit multiple headers.
*	Implement arena.<i>.destroy .	Jason Evans	2017-01-07	3	-29/+240
\| \| \| \| \| \| \|	Add MALLCTL_ARENAS_DESTROYED for accessing destroyed arena stats as an analogue to MALLCTL_ARENAS_ALL. This resolves #382.
*	Refactor test extent hook code to be reusable.	Jason Evans	2017-01-07	3	-348/+366
\| \| \| \| \| \|	Move test extent hook code from the extent integration test into a header, and normalize the out-of-band controls and introspection. Also refactor the base unit test to use the header.
*	Replace the arenas.initialized mallctl with arena.<i>.initialized .	Jason Evans	2017-01-07	1	-18/+31
\|
*	Rename the arenas.extend mallctl to arenas.create.	Jason Evans	2017-01-07	6	-13/+13
\|
*	Add MALLCTL_ARENAS_ALL.	Jason Evans	2017-01-07	1	-0/+8
\| \| \| \| \| \| \|	Add the MALLCTL_ARENAS_ALL cpp macro as a fixed index for use in accessing the arena.<i>.{purge,decay,dss} and stats.arenas.<i>.* mallctls, and deprecate access via the arenas.narenas index (to be removed in 6.0.0).