jemalloc.git - jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Header refactoring: unify and de-catchall mutex module	David Goldblatt	2017-05-24	1	-0/+1
\|
*	Protect the rtree/extent interactions with a mutex pool.	David Goldblatt	2017-05-19	1	-115/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of embedding a lock bit in rtree leaf elements, we associate extents with a small set of mutexes. This gets us two things: - We can use the system mutexes. This (hypothetically) protects us from priority inversion, and lets us stop doing a backoff/sleep loop, instead opting for precise wakeups from the mutex. - Cuts down on the number of mutex acquisitions we have to do (from 4 in the worst case to two). We end up simplifying most of the rtree code (which no longer has to deal with locking or concurrency at all), at the cost of additional complexity in the extent code: since the mutex protecting the rtree leaf elements is determined by reading the extent out of those elements, the initial read is racy, so that we may acquire an out of date mutex. We re-check the extent in the leaf after acquiring the mutex to protect us from this race.
*	Allow mutexes to take a lock ordering enum at construction.	David Goldblatt	2017-05-19	1	-1/+2
\| \| \| \| \| \| \|	This lets us specify whether and how mutexes of the same rank are allowed to be acquired. Currently, we only allow two polices (only a single mutex at a given rank at a time, and mutexes acquired in ascending order), but we can plausibly allow more (e.g. the "release uncontended mutexes before blocking").
*	Stop depending on JEMALLOC_N() for function interception during testing.	Jason Evans	2017-05-12	1	-42/+12
\| \| \| \| \| \|	Instead, always define function pointers for interceptable functions, but mark them const unless testing, so that the compiler can optimize out the pointer dereferences.
*	Header refactoring: tsd - cleanup and dependency breaking.	David Goldblatt	2017-05-01	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This removes the tsd macros (which are used only for tsd_t in real builds). We break up the circular dependencies involving tsd. We also move all tsd access through getters and setters. This allows us to assert that we only touch data when tsd is in a valid state. We simplify the usages of the x macro trick, removing all the customizability (get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup. This lets us make initialization order independent of order within tsd_t.
*	Header refactoring: move assert.h out of the catch-all	David Goldblatt	2017-04-19	1	-0/+2
\|
*	Improve rtree cache with a two-level cache design.	Qi Wang	2017-04-17	1	-6/+31
\| \| \| \| \| \| \| \|	Two levels of rcache is implemented: a direct mapped cache as L1, combined with a LRU cache as L2. The L1 cache offers low cost on cache hit, but could suffer collision under circumstances. This is complemented by the L2 LRU cache, which is slower on cache access (overhead from linear search + reordering), but solves collison of L1 rather well.
*	Header refactoring: Split up jemalloc_internal.h	David Goldblatt	2017-04-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while now: - The source of system-wide definitions. - The catch-all include file. - The module header file for jemalloc.c This commit splits up this functionality. The system-wide definitions responsibility has moved to jemalloc_preamble.h. The catch-all include file is now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in jemalloc_internal_[externs\|inlines\|types].h, just as they are for the other modules.
*	Make the tsd member init functions to take tsd_t * type.	Qi Wang	2017-04-04	1	-1/+6
\|
*	Add init function support to tsd members.	Qi Wang	2017-04-04	1	-0/+11
\| \| \| \| \| \|	This will facilitate embedding tcache into tsd, which will require proper initialization cannot be done via the static initializer. Make tsd->rtree_ctx to be initialized via rtree_ctx_data_init().
*	Embed root node into rtree_t.	Jason Evans	2017-03-23	1	-70/+21
\| \| \| \|	This avoids one atomic operation per tree access.
*	Split rtree_elm_t into rtree_{node,leaf}_elm_t.	Jason Evans	2017-03-23	1	-105/+248
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows leaf elements to differ in size from internal node elements. In principle it would be more correct to use a different type for each level of the tree, but due to implementation details related to atomic operations, we use casts anyway, thus counteracting the value of additional type correctness. Furthermore, such a scheme would require function code generation (via cpp macros), as well as either unwieldy type names for leaves or type aliases, e.g. typedef struct rtree_elm_d2_s rtree_leaf_elm_t; This alternate strategy would be more correct, and with less code duplication, but probably not worth the complexity.
*	Convert rtree code to use C11 atomics	David Goldblatt	2017-03-13	1	-16/+34
\| \| \| \| \| \| \|	In the process, I changed the implementation of rtree_elm_acquire so that it won't even try to CAS if its initial read (getting the extent + lock bit) indicates that the CAS is doomed to fail. This can significantly improve performance under contention.
*	Remove rtree support for 0 (NULL) keys.	Jason Evans	2017-02-09	1	-10/+8
\| \| \| \| \|	NULL can never actually be inserted in practice, and removing support allows a branch to be removed from the fast path.
*	Determine rtree levels at compile time.	Jason Evans	2017-02-09	1	-109/+27
\| \| \| \| \| \| \|	Rather than dynamically building a table to aid per level computations, define a constant table at compile time. Omit both high and low insignificant bits. Use one to three tree levels, depending on the number of significant bits.
*	Remove rtree leading 0 bit optimization.	Jason Evans	2017-02-09	1	-53/+12
\| \| \| \|	A subsequent change instead ignores insignificant high bits.
*	Make non-essential inline rtree functions static functions.	Jason Evans	2017-02-09	1	-8/+69
\|
*	Split rtree_elm_lookup_hard() out of rtree_elm_lookup().	Jason Evans	2017-02-09	1	-0/+105
\| \| \| \| \|	Anything but a hit in the first element of the lookup cache is expensive enough to negate the benefits of inlining.
*	Replace tabs following #define with spaces.	Jason Evans	2017-01-21	1	-5/+5
\| \| \| \|	This resolves #564.
*	Remove extraneous parens around return arguments.	Jason Evans	2017-01-21	1	-11/+11
\| \| \| \|	This resolves #540.
*	Update brace style.	Jason Evans	2017-01-21	1	-39/+31
\| \| \| \| \| \| \|	Add braces around single-line blocks, and remove line breaks before function-opening braces. This resolves #537.
*	Remove leading blank lines from function bodies.	Jason Evans	2017-01-13	1	-6/+0
\| \| \| \|	This resolves #535.
*	Implement per arena base allocators.	Jason Evans	2016-12-27	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add/rename related mallctls: - Add stats.arenas.<i>.base . - Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal . - Add stats.arenas.<i>.resident . Modify the arenas.extend mallctl to take an optional (extent_hooks_t *) argument so that it is possible for all base allocations to be serviced by the specified extent hooks. This resolves #463.
*	Fix long spinning in rtree_node_init	Dave Watson	2016-11-03	1	-14/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rtree_node_init spinlocks the node, allocates, and then sets the node. This is under heavy contention at the top of the tree if many threads start to allocate at the same time. Instead, take a per-rtree sleeping mutex to reduce spinning. Tested both pthreads and osx OSSpinLock, and both reduce spinning adequately Previous benchmark time: ./ttest1 500 100 ~15s New benchmark time: ./ttest1 500 100 .57s
*	Fix over-sized allocation of rtree leaf nodes.	Jason Evans	2016-10-28	1	-1/+1
\| \| \| \| \|	Use the correct level metadata when allocating child nodes so that leaf nodes don't end up over-sized (2^16 elements vs 2^4 elements).
*	Add/use adaptive spinning.	Jason Evans	2016-10-13	1	-1/+4
\| \| \| \| \| \| \| \|	Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.
*	Add rtree lookup path caching.	Jason Evans	2016-06-06	1	-1/+2
\| \| \| \| \| \| \| \| \|	rtree-based extent lookups remain more expensive than chunk-based run lookups, but with this optimization the fast path slowdown is ~3 CPU cycles per metadata lookup (on Intel Core i7-4980HQ), versus ~11 cycles prior. The path caching speedup tends to degrade gracefully unless allocated memory is spread far apart (as is the case when using a mixture of sbrk() and mmap()).
*	Make tsd cleanup functions optional, remove noop cleanup functions.	Jason Evans	2016-06-06	1	-7/+0
\|
*	Add rtree element witnesses.	Jason Evans	2016-06-03	1	-0/+123
\|
*	Refactor rtree to always use base_alloc() for node allocation.	Jason Evans	2016-06-03	1	-16/+55
\|
*	Add element acquire/release capabilities to rtree.	Jason Evans	2016-06-03	1	-10/+13
\| \| \| \| \| \| \| \|	This makes it possible to acquire short-term "ownership" of rtree elements so that it is possible to read an extent pointer and read the extent's contents with a guarantee that the element will not be modified until the ownership is released. This is intended as a mechanism for resolving rtree read/write races rather than as a way to lock extents.
*	Optimize rtree_get().	Jason Evans	2016-03-23	1	-0/+2
\| \| \| \| \| \| \|	Specialize fast path to avoid code that cannot execute for dependent loads. Manually unroll.
*	Fix unsigned comparison underflow.	Jason Evans	2015-03-12	1	-1/+1
\| \| \| \|	These bugs only affected tests and debug builds.
*	Refactor rtree to be lock-free.	Jason Evans	2015-02-05	1	-59/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent huge allocation refactoring associates huge allocations with arenas, but it remains necessary to quickly look up huge allocation metadata during reallocation/deallocation. A global radix tree remains a good solution to this problem, but locking would have become the primary bottleneck after (upcoming) migration of chunk management from global to per arena data structures. This lock-free implementation uses double-checked reads to traverse the tree, so that in the steady state, each read or write requires only a single atomic operation. This implementation also assures that no more than two tree levels actually exist, through a combination of careful virtual memory allocation which makes large sparse nodes cheap, and skipping the root node on x64 (possible because the top 16 bits are all 0 in practice).
*	Convert all tsd variables to reside in a single tsd structure.	Jason Evans	2014-09-23	1	-2/+4
\|
*	Try to use __builtin_ffsl if ffsl is unavailable.	Richard Diamond	2014-06-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Some platforms (like those using Newlib) don't have ffs/ffsl. This commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't found. __builtin_ffsl performs the same function as ffsl, and has the added benefit of being available on any platform utilizing Gcc-compatible compiler. This change does not address the used of ffs in the MALLOCX_ARENA() macro.
*	Convert rtree from (void *) to (uint8_t) storage.	Jason Evans	2014-01-03	1	-16/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	Reduce rtree memory usage by storing booleans (1 byte each) rather than pointers. The rtree code is only used to record whether jemalloc manages a chunk of memory, so there's no need to store pointers in the rtree. Increase rtree node size to 64 KiB in order to reduce tree depth from 13 to 3 on 64-bit systems. The conversion to more compact leaf nodes was enough by itself to make the rtree depth 1 on 32-bit systems; due to the fact that root nodes are smaller than the specified node size if possible, the node size change has no impact on 32-bit systems (assuming default chunk size).
*	Add rtree unit tests.	Jason Evans	2014-01-03	1	-3/+32
\|
*	Fix fork(2)-related deadlocks.	Jason Evans	2012-10-09	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a library constructor for jemalloc that initializes the allocator. This fixes a race that could occur if threads were created by the main thread prior to any memory allocation, followed by fork(2), and then memory allocation in the child process. Fix the prefork/postfork functions to acquire/release the ctl, prof, and rtree mutexes. This fixes various fork() child process deadlocks, but one possible deadlock remains (intentionally) unaddressed: prof backtracing can acquire runtime library mutexes, so deadlock is still possible if heap profiling is enabled during fork(). This deadlock is known to be a real issue in at least the case of libgcc-based backtracing. Reported by tfengjun.
*	Move repo contents in jemalloc/ to top level.	Jason Evans	2011-04-01	1	-0/+46