jemalloc.git - jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix a compilation error.	Jason Evans	2015-07-22	1	-8/+10
\| \| \| \| \|	This regression was introduced by 1b0e4abbfdbcc1c1a71d1f617adb19951109bfce (Port mq_get() to MinGW.).
*	Add JEMALLOC_FORMAT_PRINTF().	Jason Evans	2015-07-22	2	-4/+4
\| \| \| \| \| \|	Replace JEMALLOC_ATTR(format(printf, ...). with JEMALLOC_FORMAT_PRINTF(), so that configuration feature tests can omit the attribute if it would cause extraneous compilation warnings.
*	Port mq_get() to MinGW.	Jason Evans	2015-07-21	2	-10/+36
\|
*	Fix more MinGW build warnings.	Jason Evans	2015-07-18	4	-43/+46
\|
*	Add the config.cache_oblivious mallctl.	Jason Evans	2015-07-17	1	-0/+1
\|
*	Add timer support for Windows.	Jason Evans	2015-07-13	2	-10/+24
\|
*	Avoid function prototype incompatibilities.	Jason Evans	2015-07-10	2	-5/+5
\| \| \| \| \| \| \| \| \|	Add various function attributes to the exported functions to give the compiler more information to work with during optimization, and also specify throw() when compiling with C++ on Linux, in order to adequately match what __THROW does in glibc. This resolves #237.
*	Fix an integer overflow bug in {size2index,s2u}_compute().	Jason Evans	2015-07-10	1	-0/+89
\| \| \| \| \| \| \|	This {bug,regression} was introduced by 155bfa7da18cab0d21d87aa2dce4554166836f5d (Normalize size classes.). This resolves #241.
*	Fix indentation.	Jason Evans	2015-07-09	1	-1/+1
\|
*	Fix size class overflow handling when profiling is enabled.	Jason Evans	2015-06-24	3	-5/+59
\| \| \| \| \| \| \| \| \| \|	Fix size class overflow handling for malloc(), posix_memalign(), memalign(), calloc(), and realloc() when profiling is enabled. Remove an assertion that erroneously caused arena_sdalloc() to fail when profiling was enabled. This resolves #232.
*	Clarify relationship between stats.resident and stats.mapped.	Jason Evans	2015-05-30	1	-3/+7
\|
*	Avoid atomic operations for dependent rtree reads.	Jason Evans	2015-05-16	1	-14/+14
\|
*	Fix signed/unsigned comparison in arena_lg_dirty_mult_valid().	Jason Evans	2015-03-24	1	-3/+3
\|
*	Implement dynamic per arena control over dirty page purging.	Jason Evans	2015-03-19	3	-19/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.
*	use CLOCK_MONOTONIC in the timer if it's available	Daniel Micay	2015-03-13	2	-0/+27
\| \| \| \| \| \|	Linux sets _POSIX_MONOTONIC_CLOCK to 0 meaning it might be available, so a sysconf check is necessary at runtime with a fallback to the mandatory CLOCK_REALTIME clock.
*	Remove obsolete (incorrect) assertions.	Jason Evans	2015-02-16	1	-21/+24
\| \| \| \| \| \| \|	This regression was introduced by 88fef7ceda6269598cef0cee8b984c8765673c27 (Refactor huge_*() calls into arena internals.), and went undetected because of the --enable-debug regression.
*	Move centralized chunk management into arenas.	Jason Evans	2015-02-12	1	-27/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.
*	Test and fix tcache ID recycling.	Jason Evans	2015-02-10	1	-0/+12
\|
*	Implement explicit tcache support.	Jason Evans	2015-02-10	1	-0/+110
\| \| \| \| \| \| \| \| \|	Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.
*	Refactor rtree to be lock-free.	Jason Evans	2015-02-05	1	-25/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent huge allocation refactoring associates huge allocations with arenas, but it remains necessary to quickly look up huge allocation metadata during reallocation/deallocation. A global radix tree remains a good solution to this problem, but locking would have become the primary bottleneck after (upcoming) migration of chunk management from global to per arena data structures. This lock-free implementation uses double-checked reads to traverse the tree, so that in the steady state, each read or write requires only a single atomic operation. This implementation also assures that no more than two tree levels actually exist, through a combination of careful virtual memory allocation which makes large sparse nodes cheap, and skipping the root node on x64 (possible because the top 16 bits are all 0 in practice).
*	Implement more atomic operations.	Jason Evans	2015-02-05	1	-30/+55
\| \| \| \| \| \|	- atomic__p(). - atomic_cas_(). - atomic_write_*().
*	Implement the prof.gdump mallctl.	Jason Evans	2015-01-26	1	-2/+27
\| \| \| \| \| \| \| \|	This feature makes it possible to toggle the gdump feature on/off during program execution, whereas the the opt.prof_dump mallctl value can only be set during program startup. This resolves #72.
*	Introduce two new modes of junk filling: "alloc" and "free".	Guilherme Goncalves	2014-12-15	4	-16/+33
\| \| \| \| \| \| \| \|	In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.
*	Style and spelling fixes.	Jason Evans	2014-12-09	3	-5/+3
\|
*	Fix test_stats_arenas_bins for 32-bit builds.	Yuriy Kaminskiy	2014-12-03	1	-0/+1
\|
*	Thwart compiler optimizations.	Jason Evans	2014-10-15	1	-0/+12
\|
*	Add per size class huge allocation statistics.	Jason Evans	2014-10-13	2	-10/+114
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add per size class huge allocation statistics, and normalize various stats: - Change the arenas.nlruns type from size_t to unsigned. - Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's. - Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with stats.arenas.<i>.bins.<j>.curregs . - Add the stats.arenas.<i>.hchunks.<j>.nmalloc, stats.arenas.<i>.hchunks.<j>.ndalloc, stats.arenas.<i>.hchunks.<j>.nrequests, and stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.
*	Don't fetch tsd in a0{d,}alloc().	Jason Evans	2014-10-11	1	-0/+1
\| \| \| \| \|	Don't fetch tsd in a0{d,}alloc(), because doing so can cause infinite recursion on systems that require an allocated tsd wrapper.
*	Add configure options.	Jason Evans	2014-10-10	2	-1/+27
\| \| \| \| \| \| \| \| \| \| \| \|	Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.
*	Don't configure HAVE_SSE2.	Jason Evans	2014-10-09	1	-1/+4
\| \| \| \| \| \| \|	Don't configure HAVE_SSE2 (on behalf of SFMT), because its dependencies are notoriously unportable in practice. This resolves #119.
*	Avoid atexit(3) when possible, disable prof_final by default.	Jason Evans	2014-10-09	2	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \|	atexit(3) can deadlock internally during its own initialization if jemalloc calls atexit() during jemalloc initialization. Mitigate the impact by restructuring prof initialization to avoid calling atexit() unless the registered function will actually dump a final heap profile. Additionally, disable prof_final by default so that this land mine is opt-in rather than opt-out. This resolves #144.
*	Use regular arena allocation for huge tree nodes.	Daniel Micay	2014-10-08	1	-7/+20
\| \| \| \| \| \|	This avoids grabbing the base mutex, as a step towards fine-grained locking for huge allocations. The thread cache also provides a tiny (~3%) improvement for serial huge allocations.
*	Refactor/fix arenas manipulation.	Jason Evans	2014-10-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.
*	Normalize size classes.	Jason Evans	2014-10-06	2	-4/+15
\| \| \| \| \| \| \| \| \| \|	Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).
*	Attempt to expand huge allocations in-place.	Daniel Micay	2014-10-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137
*	Avoid purging in microbench when lazy-lock is enabled.	Jason Evans	2014-10-04	1	-0/+9
\|
*	Fix tsd cleanup regressions.	Jason Evans	2014-10-04	2	-9/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix tsd cleanup regressions that were introduced in 5460aa6f6676c7f253bfcb75c028dfd38cae8aaf (Convert all tsd variables to reside in a single tsd structure.). These regressions were twofold: 1) tsd_tryget() should never (and need never) return NULL. Rename it to tsd_fetch() and simplify all callers. 2) tsd__set() must only be called when tsd is in the nominal state, because cleanup happens during the nominal-->purgatory transition, and re-initialization must not happen while in the purgatory state. Add tsd_nominal() and use it as needed. Note that tsd_{p,}_get() can still be used as long as no re-initialization that would require cleanup occurs. This means that e.g. the thread_allocated counter can be updated unconditionally.
*	Skip test_prof_thread_name_validation if !config_prof.	Jason Evans	2014-10-04	1	-0/+2
\|
*	Implement/test/fix prof-related mallctl's.	Jason Evans	2014-10-04	3	-0/+268
\| \| \| \| \| \| \| \| \| \| \|	Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.
*	Convert to uniform style: cond == false --> !cond	Jason Evans	2014-10-03	2	-4/+3
\|
*	Remove obsolete comment.	Jason Evans	2014-10-03	1	-6/+0
\|
*	Test prof.reset mallctl and fix numerous discovered bugs.	Jason Evans	2014-10-03	1	-0/+238
\|
*	Refactor permuted backtrace test allocation.	Jason Evans	2014-10-02	9	-43/+53
\| \| \| \| \| \|	Refactor permuted backtrace test allocation that was originally used only by the prof_accum test, so that it can be used by other heap profiling test binaries.
*	Implement compile-time bitmap size computation.	Jason Evans	2014-09-28	1	-11/+5
\|
*	Convert all tsd variables to reside in a single tsd structure.	Jason Evans	2014-09-23	3	-25/+37
\|
*	Add support for sized deallocation.	Daniel Micay	2014-09-09	2	-0/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28
*	Add relevant function attributes to [msn]allocx().	Jason Evans	2014-09-08	1	-17/+9
\|
*	Thwart optimization of free(malloc(1)) in microbench.	Jason Evans	2014-09-08	1	-19/+25
\|
*	avoid conflict with the POSIX timer_t type	Daniel Micay	2014-09-08	3	-11/+11
\| \| \| \|	It hits a compilation error with glibc 2.19 without a rename.
*	Add microbench tests.	Jason Evans	2014-09-08	1	-0/+142
\|