jemalloc.git - jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Remove all vestiges of chunks.	Jason Evans	2016-10-12	1	-36/+0
\| \| \| \| \| \| \| \|	Remove mallctls: - opt.lg_chunk - stats.cactive This resolves #464.
*	Rename most remaining chunk APIs to extent.	Jason Evans	2016-06-06	1	-36/+0
\|
*	s/chunk_lookup/extent_lookup/g, s/chunks_rtree/extents_rtree/g	Jason Evans	2016-06-06	1	-15/+0
\|
*	s/CHUNK_HOOKS_INITIALIZER/EXTENT_HOOKS_INITIALIZER/g	Jason Evans	2016-06-06	1	-10/+0
\|
*	s/chunk_hook/extent_hook/g	Jason Evans	2016-06-06	1	-14/+17
\|
*	Use huge size class infrastructure for large size classes.	Jason Evans	2016-06-06	1	-1/+2
\|
*	Implement cache-oblivious support for huge size classes.	Jason Evans	2016-06-03	1	-4/+4
\|
*	Remove CHUNK_ADDR2BASE() and CHUNK_ADDR2OFFSET().	Jason Evans	2016-06-03	1	-8/+0
\|
*	Convert rtree from per chunk to per page.	Jason Evans	2016-06-03	1	-2/+2
\| \| \| \|	Refactor [de]registration to maintain interior rtree entries for slabs.
*	Refactor chunk_purge_wrapper() to take extent argument.	Jason Evans	2016-06-03	1	-2/+1
\|
*	Refactor chunk_[de]commit_wrapper() to take extent arguments.	Jason Evans	2016-06-03	1	-4/+2
\|
*	Refactor chunk_dalloc_{cache,wrapper}() to take extent arguments.	Jason Evans	2016-06-03	1	-6/+2
\| \| \| \| \| \|	Rename arena_extent_[d]alloc() to extent_[d]alloc(). Move all chunk [de]registration responsibility into chunk.c.
*	Add/use chunk_split_wrapper().	Jason Evans	2016-06-03	1	-3/+5
\| \| \| \| \| \|	Remove redundant ptr/oldsize args from huge_*(). Refactor huge/chunk/arena code boundaries.
*	Add/use chunk_merge_wrapper().	Jason Evans	2016-06-03	1	-0/+2
\|
*	Add/use chunk_commit_wrapper().	Jason Evans	2016-06-03	1	-0/+3
\|
*	Add/use chunk_decommit_wrapper().	Jason Evans	2016-06-03	1	-0/+3
\|
*	Merge chunk_alloc_base() into its only caller.	Jason Evans	2016-06-03	1	-1/+0
\|
*	Remove redundant chunk argument from chunk_{,de,re}register().	Jason Evans	2016-06-03	1	-5/+3
\|
*	Refactor rtree to always use base_alloc() for node allocation.	Jason Evans	2016-06-03	1	-4/+5
\|
*	Use rtree-based chunk lookups rather than pointer bit twiddling.	Jason Evans	2016-06-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Look up chunk metadata via the radix tree, rather than using CHUNK_ADDR2BASE(). Propagate pointer's containing extent. Minimize extent lookups by doing a single lookup (e.g. in free()) and propagating the pointer's extent into nearly all the functions that may need it.
*	Add element acquire/release capabilities to rtree.	Jason Evans	2016-06-03	1	-1/+1
\| \| \| \| \| \| \| \|	This makes it possible to acquire short-term "ownership" of rtree elements so that it is possible to read an extent pointer and read the extent's contents with a guarantee that the element will not be modified until the ownership is released. This is intended as a mechanism for resolving rtree read/write races rather than as a way to lock extents.
*	Rename extent_node_t to extent_t.	Jason Evans	2016-05-16	1	-6/+5
\|
*	Resolve bootstrapping issues when embedded in FreeBSD libc.	Jason Evans	2016-05-11	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.
*	Add witness, a simple online locking validator.	Jason Evans	2016-04-14	1	-17/+21
\| \| \| \|	This resolves #358.
*	Fix potential chunk leaks.	Jason Evans	2016-03-31	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Move chunk_dalloc_arena()'s implementation into chunk_dalloc_wrapper(), so that if the dalloc hook fails, proper decommit/purge/retain cascading occurs. This fixes three potential chunk leaks on OOM paths, one during dss-based chunk allocation, one during chunk header commit (currently relevant only on Windows), and one during rtree write (e.g. if rtree node allocation fails). Merge chunk_purge_arena() into chunk_purge_default() (refactor, no change to functionality).
*	Arena chunk decommit cleanups and fixes.	Jason Evans	2015-08-11	1	-1/+1
\| \| \| \| \|	Decommit arena chunk header during chunk deallocation if the rest of the chunk is decommitted.
*	Implement chunk hook support for page run commit/decommit.	Jason Evans	2015-08-07	1	-3/+3
\| \| \| \| \| \| \| \| \|	Cascade from decommit to purge when purging unused dirty pages, so that it is possible to decommit cleaned memory rather than just purging. For non-Windows debug builds, decommit runs rather than purging them, since this causes access of deallocated runs to segfault. This resolves #251.
*	Generalize chunk management hooks.	Jason Evans	2015-08-04	1	-17/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls. The chunk hooks allow control over chunk allocation/deallocation, decommit/commit, purging, and splitting/merging, such that the application can rely on jemalloc's internal chunk caching and retaining functionality, yet implement a variety of chunk management mechanisms and policies. Merge the chunks_[sz]ad_{mmap,dss} red-black trees into chunks_[sz]ad_retained. This slightly reduces how hard jemalloc tries to honor the dss precedence setting; prior to this change the precedence setting was also consulted when recycling chunks. Fix chunk purging. Don't purge chunks in arena_purge_stashed(); instead deallocate them in arena_unstash_purged(), so that the dirty memory linkage remains valid until after the last time it is used. This resolves #176 and #201.
*	Change default chunk size from 256 KiB to 2 MiB.	Jason Evans	2015-07-16	1	-1/+1
\| \| \| \| \| \|	This change improves interaction with transparent huge pages, e.g. reduced page faults (at least in the absence of unused dirty page purging).
*	Avoid atomic operations for dependent rtree reads.	Jason Evans	2015-05-16	1	-3/+3
\|
*	Implement dynamic per arena control over dirty page purging.	Jason Evans	2015-03-19	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.
*	Change default chunk size from 4 MiB to 256 KiB.	Jason Evans	2015-03-07	1	-1/+1
\| \| \| \| \| \| \| \|	Recent changes have improved huge allocation scalability, which removes upward pressure to set the chunk size so large that huge allocations are rare. Smaller chunks are more likely to completely drain, so set the default to the smallest size that doesn't leave excessive unusable trailing space in chunk headers.
*	Fix chunk cache races.	Jason Evans	2015-02-19	1	-4/+9
\| \| \| \| \| \|	These regressions were introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
*	Rename "dirty chunks" to "cached chunks".	Jason Evans	2015-02-18	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Rename "dirty chunks" to "cached chunks", in order to avoid overloading the term "dirty". Fix the regression caused by 339c2b23b2d61993ac768afcc72af135662c6771 (Fix chunk_unmap() to propagate dirty state.), and actually address what that change attempted, which is to only purge chunks once, and propagate whether zeroed pages resulted into chunk_record().
*	Fix chunk_unmap() to propagate dirty state.	Jason Evans	2015-02-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Fix chunk_unmap() to propagate whether a chunk is dirty, and modify dirty chunk purging to record this information so it can be passed to chunk_unmap(). Since the broken version of chunk_unmap() claimed that all chunks were clean, this resulted in potential memory corruption for purging implementations that do not zero (e.g. MADV_FREE). This regression was introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
*	Integrate whole chunks into unused dirty page purging machinery.	Jason Evans	2015-02-17	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.
*	Move centralized chunk management into arenas.	Jason Evans	2015-02-12	1	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.
*	Refactor rtree to be lock-free.	Jason Evans	2015-02-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent huge allocation refactoring associates huge allocations with arenas, but it remains necessary to quickly look up huge allocation metadata during reallocation/deallocation. A global radix tree remains a good solution to this problem, but locking would have become the primary bottleneck after (upcoming) migration of chunk management from global to per arena data structures. This lock-free implementation uses double-checked reads to traverse the tree, so that in the steady state, each read or write requires only a single atomic operation. This implementation also assures that no more than two tree levels actually exist, through a combination of careful virtual memory allocation which makes large sparse nodes cheap, and skipping the root node on x64 (possible because the top 16 bits are all 0 in practice).
*	Normalize size classes.	Jason Evans	2014-10-06	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).
*	Attempt to expand huge allocations in-place.	Daniel Micay	2014-10-05	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137
*	Refactor chunk map.	Qinfan Wu	2014-09-05	1	-0/+1
\| \| \| \| \|	Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.
*	Refactor huge allocation to be managed by arenas.	Jason Evans	2014-05-16	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.
*	Add support for user-specified chunk allocators/deallocators.	aravind	2014-05-12	1	-3/+5
\| \| \| \| \| \| \|	Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.
*	Add arena-specific and selective dss allocation.	Jason Evans	2012-10-13	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the "arenas.extend" mallctl, so that it is possible to create new arenas that are outside the set that jemalloc automatically multiplexes threads onto. Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible to explicitly allocate from a particular arena. Add the "opt.dss" mallctl, which controls the default precedence of dss allocation relative to mmap allocation. Add the "arena.<i>.dss" mallctl, which makes it possible to set the default dss precedence on a per arena or global basis. Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". Add the "stats.arenas.<i>.dss" mallctl.
*	Fix fork(2)-related deadlocks.	Jason Evans	2012-10-09	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a library constructor for jemalloc that initializes the allocator. This fixes a race that could occur if threads were created by the main thread prior to any memory allocation, followed by fork(2), and then memory allocation in the child process. Fix the prefork/postfork functions to acquire/release the ctl, prof, and rtree mutexes. This fixes various fork() child process deadlocks, but one possible deadlock remains (intentionally) unaddressed: prof backtracing can acquire runtime library mutexes, so deadlock is still possible if heap profiling is enabled during fork(). This deadlock is known to be a real issue in at least the case of libgcc-based backtracing. Reported by tfengjun.
*	Remove mmap_unaligned.	Jason Evans	2012-04-22	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Remove mmap_unaligned, which was used to heuristically decide whether to optimistically call mmap() in such a way that could reduce the total number of system calls. If I remember correctly, the intention of mmap_unaligned was to avoid always executing the slow path in the presence of ASLR. However, that reasoning seems to have been based on a flawed understanding of how ASLR actually works. Although ASLR apparently causes mmap() to ignore address requests, it does not cause total placement randomness, so there is a reasonable expectation that iterative mmap() calls will start returning chunk-aligned mappings once the first chunk has been properly aligned.
*	Add alignment support to chunk_alloc().	Mike Hommey	2012-04-10	1	-1/+1
\|
*	Implement tsd.	Jason Evans	2012-03-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Implement tsd, which is a TLS/TSD abstraction that uses one or both internally. Modify bootstrapping such that no tsd's are utilized until allocation is safe. Remove malloc_[v]tprintf(), and use malloc_snprintf() instead. Fix %p argument size handling in malloc_vsnprintf(). Fix a long-standing statistics-related bug in the "thread.arena" mallctl that could cause crashes due to linked list corruption.
*	Remove the swap feature.	Jason Evans	2012-02-13	1	-2/+0
\| \| \| \| \|	Remove the swap feature, which enabled per application swap files. In practice this feature has not proven itself useful to users.
*	Reduce cpp conditional logic complexity.	Jason Evans	2012-02-11	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert configuration-related cpp conditional logic to use static constant variables, e.g.: #ifdef JEMALLOC_DEBUG [...] #endif becomes: if (config_debug) { [...] } The advantage is clearer, more concise code. The main disadvantage is that data structures no longer have conditionally defined fields, so they pay the cost of all fields regardless of whether they are used. In practice, this is only a minor concern; config_stats will go away in an upcoming change, and config_prof is the only other major feature that depends on more than a few special-purpose fields.