jemalloc.git - jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Refactor arenas array (fixes deadlock).	Jason Evans	2016-02-25	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Refactor the arenas array, which contains pointers to all extant arenas, such that it starts out as a sparse array of maximum size, and use double-checked atomics-based reads as the basis for fast and simple arena_get(). Additionally, reduce arenas_lock's role such that it only protects against arena initalization races. These changes remove the possibility for arena lookups to trigger locking, which resolves at least one known (fork-related) deadlock. This resolves #315.
*	Attempt mmap-based in-place huge reallocation.	Jason Evans	2016-02-25	1	-7/+4
\| \| \| \| \| \| \| \|	Attempt mmap-based in-place huge reallocation by plumbing new_addr into chunk_alloc_mmap(). This can dramatically speed up incremental huge reallocation. This resolves #335.
*	Silence miscellaneous 64-to-32-bit data loss warnings.	Jason Evans	2016-02-24	1	-2/+2
\|
*	Refactor jemalloc_ffs() into ffs_().	Jason Evans	2016-02-24	1	-1/+1
\| \| \| \|	Use appropriate versions to resolve 64-to-32-bit data loss warnings.
*	Fix a strict aliasing violation.	Jason Evans	2015-08-12	1	-1/+6
\|
*	Fix chunk_dalloc_arena() re: zeroing due to purge.	Jason Evans	2015-08-12	1	-1/+1
\|
*	Arena chunk decommit cleanups and fixes.	Jason Evans	2015-08-11	1	-2/+2
\| \| \| \| \|	Decommit arena chunk header during chunk deallocation if the rest of the chunk is decommitted.
*	Implement chunk hook support for page run commit/decommit.	Jason Evans	2015-08-07	1	-46/+80
\| \| \| \| \| \| \| \| \|	Cascade from decommit to purge when purging unused dirty pages, so that it is possible to decommit cleaned memory rather than just purging. For non-Windows debug builds, decommit runs rather than purging them, since this causes access of deallocated runs to segfault. This resolves #251.
*	Generalize chunk management hooks.	Jason Evans	2015-08-04	1	-100/+246
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls. The chunk hooks allow control over chunk allocation/deallocation, decommit/commit, purging, and splitting/merging, such that the application can rely on jemalloc's internal chunk caching and retaining functionality, yet implement a variety of chunk management mechanisms and policies. Merge the chunks_[sz]ad_{mmap,dss} red-black trees into chunks_[sz]ad_retained. This slightly reduces how hard jemalloc tries to honor the dss precedence setting; prior to this change the precedence setting was also consulted when recycling chunks. Fix chunk purging. Don't purge chunks in arena_purge_stashed(); instead deallocate them in arena_unstash_purged(), so that the dirty memory linkage remains valid until after the last time it is used. This resolves #176 and #201.
*	Implement support for non-coalescing maps on MinGW.	Jason Evans	2015-07-25	1	-0/+6
\| \| \| \| \| \| \| \|	- Do not reallocate huge objects in place if the number of backing chunks would change. - Do not cache multi-chunk mappings. This resolves #213.
*	Revert to first-best-fit run/chunk allocation.	Jason Evans	2015-07-16	1	-35/+9
\| \| \| \| \| \| \| \| \|	This effectively reverts 97c04a93838c4001688fe31bf018972b4696efe2 (Use first-fit rather than first-best-fit run/chunk allocation.). In some pathological cases, first-fit search dominates allocation time, and it also tends not to converge as readily on a steady state of memory layout, since precise allocation order has a bigger effect than for first-best-fit.
*	Use jemalloc_ffs() rather than ffs().	Jason Evans	2015-07-08	1	-4/+12
\|
*	Optimizations for Windows	Matthijs	2015-06-25	1	-1/+16
\| \| \| \| \| \| \|	- Set opt_lg_chunk based on run-time OS setting - Verify LG_PAGE is compatible with run-time OS setting - When targeting Windows Vista or newer, use SRWLOCK instead of CRITICAL_SECTION - When targeting Windows Vista or newer, statically initialize init_lock
*	Fix two valgrind integration regressions.	Jason Evans	2015-06-22	1	-2/+8
\| \| \| \|	The regressions were never merged into the master branch.
*	Implement dynamic per arena control over dirty page purging.	Jason Evans	2015-03-19	1	-2/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.
*	Fix a chunk_recycle() regression.	Jason Evans	2015-03-07	1	-4/+15
\| \| \| \| \| \|	This regression was introduced by 97c04a93838c4001688fe31bf018972b4696efe2 (Use first-fit rather than first-best-fit run/chunk allocation.).
*	Use first-fit rather than first-best-fit run/chunk allocation.	Jason Evans	2015-03-07	1	-4/+39
\| \| \| \| \| \| \|	This tends to more effectively pack active memory toward low addresses. However, additional tree searches are required in many cases, so whether this change stands the test of time will depend on real-world benchmarks.
*	Quantize szad trees by size class.	Jason Evans	2015-03-07	1	-1/+1
\| \| \| \| \| \| \|	Treat sizes that round down to the same size class as size-equivalent in trees that are used to search for first best fit, so that there are only as many "firsts" as there are size classes. This comes closer to the ideal of first fit.
*	Fix a compilation error and an incorrect assertion.	Jason Evans	2015-02-19	1	-2/+2
\|
*	Fix chunk cache races.	Jason Evans	2015-02-19	1	-35/+79
\| \| \| \| \| \|	These regressions were introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
*	Rename "dirty chunks" to "cached chunks".	Jason Evans	2015-02-18	1	-22/+23
\| \| \| \| \| \| \| \| \| \|	Rename "dirty chunks" to "cached chunks", in order to avoid overloading the term "dirty". Fix the regression caused by 339c2b23b2d61993ac768afcc72af135662c6771 (Fix chunk_unmap() to propagate dirty state.), and actually address what that change attempted, which is to only purge chunks once, and propagate whether zeroed pages resulted into chunk_record().
*	Fix chunk_unmap() to propagate dirty state.	Jason Evans	2015-02-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Fix chunk_unmap() to propagate whether a chunk is dirty, and modify dirty chunk purging to record this information so it can be passed to chunk_unmap(). Since the broken version of chunk_unmap() claimed that all chunks were clean, this resulted in potential memory corruption for purging implementations that do not zero (e.g. MADV_FREE). This regression was introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
*	Simplify extent_node_t and add extent_node_init().	Jason Evans	2015-02-17	1	-14/+11
\|
*	Integrate whole chunks into unused dirty page purging machinery.	Jason Evans	2015-02-17	1	-53/+91
\| \| \| \| \| \| \| \| \| \| \| \|	Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.
*	add missing check for new_addr chunk size	Daniel Micay	2015-02-12	1	-1/+1
\| \| \| \| \| \|	8ddc93293cd8370870f221225ef1e013fbff6d65 switched this to over using the address tree in order to avoid false negatives, so it now needs to check that the size of the free extent is large enough to satisfy the request.
*	Move centralized chunk management into arenas.	Jason Evans	2015-02-12	1	-172/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.
*	Refactor rtree to be lock-free.	Jason Evans	2015-02-05	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent huge allocation refactoring associates huge allocations with arenas, but it remains necessary to quickly look up huge allocation metadata during reallocation/deallocation. A global radix tree remains a good solution to this problem, but locking would have become the primary bottleneck after (upcoming) migration of chunk management from global to per arena data structures. This lock-free implementation uses double-checked reads to traverse the tree, so that in the steady state, each read or write requires only a single atomic operation. This implementation also assures that no more than two tree levels actually exist, through a combination of careful virtual memory allocation which makes large sparse nodes cheap, and skipping the root node on x64 (possible because the top 16 bits are all 0 in practice).
*	Refactor base_alloc() to guarantee demand-zeroed memory.	Jason Evans	2015-02-05	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Refactor base_alloc() to guarantee that allocations are carved from demand-zeroed virtual memory. This supports sparse data structures such as multi-page radix tree nodes. Enhance base_alloc() to keep track of fragments which were too small to support previous allocation requests, and try to consume them during subsequent requests. This becomes important when request sizes commonly approach or exceed the chunk size (as could radix tree node allocations).
*	Fix chunk_recycle()'s new_addr functionality.	Jason Evans	2015-02-05	1	-2/+6
\| \| \| \| \| \| \| \|	Fix chunk_recycle()'s new_addr functionality to search by address rather than just size if new_addr is specified. The functionality added by a95018ee819abf897562d9d1f3bc31d4dd725a8d (Attempt to expand huge allocations in-place.) only worked if the two search orders happened to return the same results (e.g. in simple test cases).
*	Implement the prof.gdump mallctl.	Jason Evans	2015-01-26	1	-1/+2
\| \| \| \| \| \| \| \|	This feature makes it possible to toggle the gdump feature on/off during program execution, whereas the the opt.prof_dump mallctl value can only be set during program startup. This resolves #72.
*	Avoid pointless chunk_recycle() call.	Jason Evans	2015-01-26	1	-21/+29
\| \| \| \| \| \| \|	Avoid calling chunk_recycle() for mmap()ed chunks if config_munmap is disabled, in which case there are never any recyclable chunks. This resolves #164.
*	Fix an infinite recursion bug related to a0/tsd bootstrapping.	Jason Evans	2015-01-15	1	-1/+3
\| \| \| \|	This resolves #184.
*	Style and spelling fixes.	Jason Evans	2014-12-09	1	-1/+1
\|
*	teach the dss chunk allocator to handle new_addr	Daniel Micay	2014-11-29	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides in-place expansion of huge allocations when the end of the allocation is at the end of the sbrk heap. There's already the ability to extend in-place via recycled chunks but this handles the initial growth of the heap via repeated vector / string reallocations. A possible future extension could allow realloc to go from the following: \| huge allocation \| recycled chunks \| ^ dss_end To a larger allocation built from recycled and new chunks: \| huge allocation \| ^ dss_end Doing that would involve teaching the chunk recycling code to request new chunks to satisfy the request. The chunk_dss code wouldn't require any further changes. #include <stdlib.h> int main(void) { size_t chunk = 4 * 1024 * 1024; void ptr = NULL; for (size_t size = chunk; size < chunk 128; size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; } } dss:secondary: 0.083s dss:primary: 0.083s After: dss:secondary: 0.083s dss:primary: 0.003s The dss heap grows in the upwards direction, so the oldest chunks are at the low addresses and they are used first. Linux prefers to grow the mmap heap downwards, so the trick will not work in the current* mmap chunk allocator as a huge allocation will only be at the top of the heap in a contrived case.
*	Initialize chunks_mtx for all configurations.	Jason Evans	2014-10-16	1	-4/+3
\| \| \| \|	This resolves #150.
*	Refactor/fix arenas manipulation.	Jason Evans	2014-10-08	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.
*	Normalize size classes.	Jason Evans	2014-10-06	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).
*	Attempt to expand huge allocations in-place.	Daniel Micay	2014-10-05	1	-20/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137
*	Convert to uniform style: cond == false --> !cond	Jason Evans	2014-10-03	1	-8/+8
\|
*	Refactor chunk map.	Qinfan Wu	2014-09-05	1	-0/+1
\| \| \| \| \|	Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.
*	Refactor huge allocation to be managed by arenas.	Jason Evans	2014-05-16	1	-60/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.
*	Add support for user-specified chunk allocators/deallocators.	aravind	2014-05-12	1	-15/+43
\| \| \| \| \| \| \|	Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.
*	Optimize Valgrind integration.	Jason Evans	2014-04-15	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().
*	Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.	Jason Evans	2014-04-15	1	-4/+4
\| \| \| \| \| \| \|	Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.
*	Convert rtree from (void *) to (uint8_t) storage.	Jason Evans	2014-01-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Reduce rtree memory usage by storing booleans (1 byte each) rather than pointers. The rtree code is only used to record whether jemalloc manages a chunk of memory, so there's no need to store pointers in the rtree. Increase rtree node size to 64 KiB in order to reduce tree depth from 13 to 3 on 64-bit systems. The conversion to more compact leaf nodes was enough by itself to make the rtree depth 1 on 32-bit systems; due to the fact that root nodes are smaller than the specified node size if possible, the node size change has no impact on 32-bit systems (assuming default chunk size).
*	Add rtree unit tests.	Jason Evans	2014-01-03	1	-1/+1
\|
*	Consistently use malloc_mutex_prefork().	Jason Evans	2013-10-21	1	-1/+1
\| \| \| \| \|	Consistently use malloc_mutex_prefork() instead of malloc_mutex_lock() in all prefork functions.
*	Fix a compiler warning.	Jason Evans	2013-10-20	1	-1/+1
\| \| \| \| \| \| \|	Fix a compiler warning in chunk_record() that was due to reading node rather than xnode. In practice this did not cause any correctness issue, but dataflow analysis in some compilers cannot tell that node and xnode are always equal in cases that the read is reached.
*	Fix another deadlock related to chunk_record().	Jason Evans	2013-04-23	1	-8/+11
\| \| \| \| \| \|	Fix chunk_record() to unlock chunks_mtx before deallocating a base node, in order to avoid potential deadlock. This fix addresses the second of two similar bugs.
*	Fix deadlock related to chunk_record().	Jason Evans	2013-04-17	1	-4/+11
\| \| \| \| \| \| \|	Fix chunk_record() to unlock chunks_mtx before deallocating a base node, in order to avoid potential deadlock. Reported by Tudor Bosman.