jemalloc.git - jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Implement dynamic per arena control over dirty page purging.	Jason Evans	2015-03-19	1	-2/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.
*	Fix a chunk_recycle() regression.	Jason Evans	2015-03-07	1	-4/+15
\| \| \| \| \| \|	This regression was introduced by 97c04a93838c4001688fe31bf018972b4696efe2 (Use first-fit rather than first-best-fit run/chunk allocation.).
*	Use first-fit rather than first-best-fit run/chunk allocation.	Jason Evans	2015-03-07	1	-4/+39
\| \| \| \| \| \| \|	This tends to more effectively pack active memory toward low addresses. However, additional tree searches are required in many cases, so whether this change stands the test of time will depend on real-world benchmarks.
*	Quantize szad trees by size class.	Jason Evans	2015-03-07	1	-1/+1
\| \| \| \| \| \| \|	Treat sizes that round down to the same size class as size-equivalent in trees that are used to search for first best fit, so that there are only as many "firsts" as there are size classes. This comes closer to the ideal of first fit.
*	Fix a compilation error and an incorrect assertion.	Jason Evans	2015-02-19	1	-2/+2
\|
*	Fix chunk cache races.	Jason Evans	2015-02-19	1	-35/+79
\| \| \| \| \| \|	These regressions were introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
*	Rename "dirty chunks" to "cached chunks".	Jason Evans	2015-02-18	1	-22/+23
\| \| \| \| \| \| \| \| \| \|	Rename "dirty chunks" to "cached chunks", in order to avoid overloading the term "dirty". Fix the regression caused by 339c2b23b2d61993ac768afcc72af135662c6771 (Fix chunk_unmap() to propagate dirty state.), and actually address what that change attempted, which is to only purge chunks once, and propagate whether zeroed pages resulted into chunk_record().
*	Fix chunk_unmap() to propagate dirty state.	Jason Evans	2015-02-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Fix chunk_unmap() to propagate whether a chunk is dirty, and modify dirty chunk purging to record this information so it can be passed to chunk_unmap(). Since the broken version of chunk_unmap() claimed that all chunks were clean, this resulted in potential memory corruption for purging implementations that do not zero (e.g. MADV_FREE). This regression was introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
*	Simplify extent_node_t and add extent_node_init().	Jason Evans	2015-02-17	1	-14/+11
\|
*	Integrate whole chunks into unused dirty page purging machinery.	Jason Evans	2015-02-17	1	-53/+91
\| \| \| \| \| \| \| \| \| \| \| \|	Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.
*	add missing check for new_addr chunk size	Daniel Micay	2015-02-12	1	-1/+1
\| \| \| \| \| \|	8ddc93293cd8370870f221225ef1e013fbff6d65 switched this to over using the address tree in order to avoid false negatives, so it now needs to check that the size of the free extent is large enough to satisfy the request.
*	Move centralized chunk management into arenas.	Jason Evans	2015-02-12	1	-172/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.
*	Refactor rtree to be lock-free.	Jason Evans	2015-02-05	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent huge allocation refactoring associates huge allocations with arenas, but it remains necessary to quickly look up huge allocation metadata during reallocation/deallocation. A global radix tree remains a good solution to this problem, but locking would have become the primary bottleneck after (upcoming) migration of chunk management from global to per arena data structures. This lock-free implementation uses double-checked reads to traverse the tree, so that in the steady state, each read or write requires only a single atomic operation. This implementation also assures that no more than two tree levels actually exist, through a combination of careful virtual memory allocation which makes large sparse nodes cheap, and skipping the root node on x64 (possible because the top 16 bits are all 0 in practice).
*	Refactor base_alloc() to guarantee demand-zeroed memory.	Jason Evans	2015-02-05	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Refactor base_alloc() to guarantee that allocations are carved from demand-zeroed virtual memory. This supports sparse data structures such as multi-page radix tree nodes. Enhance base_alloc() to keep track of fragments which were too small to support previous allocation requests, and try to consume them during subsequent requests. This becomes important when request sizes commonly approach or exceed the chunk size (as could radix tree node allocations).
*	Fix chunk_recycle()'s new_addr functionality.	Jason Evans	2015-02-05	1	-2/+6
\| \| \| \| \| \| \| \|	Fix chunk_recycle()'s new_addr functionality to search by address rather than just size if new_addr is specified. The functionality added by a95018ee819abf897562d9d1f3bc31d4dd725a8d (Attempt to expand huge allocations in-place.) only worked if the two search orders happened to return the same results (e.g. in simple test cases).
*	Implement the prof.gdump mallctl.	Jason Evans	2015-01-26	1	-1/+2
\| \| \| \| \| \| \| \|	This feature makes it possible to toggle the gdump feature on/off during program execution, whereas the the opt.prof_dump mallctl value can only be set during program startup. This resolves #72.
*	Avoid pointless chunk_recycle() call.	Jason Evans	2015-01-26	1	-21/+29
\| \| \| \| \| \| \|	Avoid calling chunk_recycle() for mmap()ed chunks if config_munmap is disabled, in which case there are never any recyclable chunks. This resolves #164.
*	Fix an infinite recursion bug related to a0/tsd bootstrapping.	Jason Evans	2015-01-15	1	-1/+3
\| \| \| \|	This resolves #184.
*	Style and spelling fixes.	Jason Evans	2014-12-09	1	-1/+1
\|
*	teach the dss chunk allocator to handle new_addr	Daniel Micay	2014-11-29	1	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides in-place expansion of huge allocations when the end of the allocation is at the end of the sbrk heap. There's already the ability to extend in-place via recycled chunks but this handles the initial growth of the heap via repeated vector / string reallocations. A possible future extension could allow realloc to go from the following: \| huge allocation \| recycled chunks \| ^ dss_end To a larger allocation built from recycled and new chunks: \| huge allocation \| ^ dss_end Doing that would involve teaching the chunk recycling code to request new chunks to satisfy the request. The chunk_dss code wouldn't require any further changes. #include <stdlib.h> int main(void) { size_t chunk = 4 * 1024 * 1024; void ptr = NULL; for (size_t size = chunk; size < chunk 128; size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; } } dss:secondary: 0.083s dss:primary: 0.083s After: dss:secondary: 0.083s dss:primary: 0.003s The dss heap grows in the upwards direction, so the oldest chunks are at the low addresses and they are used first. Linux prefers to grow the mmap heap downwards, so the trick will not work in the current* mmap chunk allocator as a huge allocation will only be at the top of the heap in a contrived case.
*	Initialize chunks_mtx for all configurations.	Jason Evans	2014-10-16	1	-4/+3
\| \| \| \|	This resolves #150.
*	Refactor/fix arenas manipulation.	Jason Evans	2014-10-08	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.
*	Normalize size classes.	Jason Evans	2014-10-06	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).
*	Attempt to expand huge allocations in-place.	Daniel Micay	2014-10-05	1	-20/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137
*	Convert to uniform style: cond == false --> !cond	Jason Evans	2014-10-03	1	-8/+8
\|
*	Refactor chunk map.	Qinfan Wu	2014-09-05	1	-0/+1
\| \| \| \| \|	Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.
*	Refactor huge allocation to be managed by arenas.	Jason Evans	2014-05-16	1	-60/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to dalloc (some were dealloc). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.
*	Add support for user-specified chunk allocators/deallocators.	aravind	2014-05-12	1	-15/+43
\| \| \| \| \| \| \|	Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.
*	Optimize Valgrind integration.	Jason Evans	2014-04-15	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().
*	Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.	Jason Evans	2014-04-15	1	-4/+4
\| \| \| \| \| \| \|	Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.
*	Convert rtree from (void *) to (uint8_t) storage.	Jason Evans	2014-01-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Reduce rtree memory usage by storing booleans (1 byte each) rather than pointers. The rtree code is only used to record whether jemalloc manages a chunk of memory, so there's no need to store pointers in the rtree. Increase rtree node size to 64 KiB in order to reduce tree depth from 13 to 3 on 64-bit systems. The conversion to more compact leaf nodes was enough by itself to make the rtree depth 1 on 32-bit systems; due to the fact that root nodes are smaller than the specified node size if possible, the node size change has no impact on 32-bit systems (assuming default chunk size).
*	Add rtree unit tests.	Jason Evans	2014-01-03	1	-1/+1
\|
*	Consistently use malloc_mutex_prefork().	Jason Evans	2013-10-21	1	-1/+1
\| \| \| \| \|	Consistently use malloc_mutex_prefork() instead of malloc_mutex_lock() in all prefork functions.
*	Fix a compiler warning.	Jason Evans	2013-10-20	1	-1/+1
\| \| \| \| \| \| \|	Fix a compiler warning in chunk_record() that was due to reading node rather than xnode. In practice this did not cause any correctness issue, but dataflow analysis in some compilers cannot tell that node and xnode are always equal in cases that the read is reached.
*	Fix another deadlock related to chunk_record().	Jason Evans	2013-04-23	1	-8/+11
\| \| \| \| \| \|	Fix chunk_record() to unlock chunks_mtx before deallocating a base node, in order to avoid potential deadlock. This fix addresses the second of two similar bugs.
*	Fix deadlock related to chunk_record().	Jason Evans	2013-04-17	1	-4/+11
\| \| \| \| \| \| \|	Fix chunk_record() to unlock chunks_mtx before deallocating a base node, in order to avoid potential deadlock. Reported by Tudor Bosman.
*	Fix Valgrind integration.	Jason Evans	2013-02-01	1	-22/+26
\| \| \| \| \|	Fix Valgrind integration to annotate all internally allocated memory in a way that keeps Valgrind happy about internal data structure access.
*	Fix a chunk recycling bug.	Jason Evans	2013-02-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Fix a chunk recycling bug that could cause the allocator to lose track of whether a chunk was zeroed. On FreeBSD, NetBSD, and OS X, it could cause corruption if allocating via sbrk(2) (unlikely unless running with the "dss:primary" option specified). This was completely harmless on Linux unless using mlockall(2) (and unlikely even then, unless the --disable-munmap configure option or the "dss:primary" option was specified). This regression was introduced in 3.1.0 by the mlockall(2)/madvise(2) interaction fix.
*	Avoid validating freshly mapped memory.	Jason Evans	2013-01-22	1	-17/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move validation of supposedly zeroed pages from chunk_alloc() to chunk_recycle(). There is little point to validating newly mapped memory returned by chunk_alloc_mmap(), and memory that comes from sbrk() is explicitly zeroed, so there is little risk to assuming that chunk_alloc_dss() actually does the zeroing properly. This relaxation of validation can make a big difference to application startup time and overall system usage on platforms that use jemalloc as the system allocator (namely FreeBSD). Submitted by Ian Lepore <ian@FreeBSD.org>.
*	Fix chunk_recycle() Valgrind integration.	Jason Evans	2012-12-12	1	-3/+2
\| \| \| \| \| \| \| \|	Fix chunk_recycyle() to unconditionally inform Valgrind that returned memory is undefined. This fixes Valgrind warnings that would result from a huge allocation being freed, then recycled for use as an arena chunk. The arena code would write metadata to the chunk header, and Valgrind would consider these invalid writes.
*	Fix dss/mmap allocation precedence code.	Jason Evans	2012-10-17	1	-26/+14
\| \| \| \| \|	Fix dss/mmap allocation precedence code to use recyclable mmap memory only after primary dss allocation fails.
*	Add arena-specific and selective dss allocation.	Jason Evans	2012-10-13	1	-42/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the "arenas.extend" mallctl, so that it is possible to create new arenas that are outside the set that jemalloc automatically multiplexes threads onto. Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible to explicitly allocate from a particular arena. Add the "opt.dss" mallctl, which controls the default precedence of dss allocation relative to mmap allocation. Add the "arena.<i>.dss" mallctl, which makes it possible to set the default dss precedence on a per arena or global basis. Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". Add the "stats.arenas.<i>.dss" mallctl.
*	Fix fork(2)-related deadlocks.	Jason Evans	2012-10-09	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a library constructor for jemalloc that initializes the allocator. This fixes a race that could occur if threads were created by the main thread prior to any memory allocation, followed by fork(2), and then memory allocation in the child process. Fix the prefork/postfork functions to acquire/release the ctl, prof, and rtree mutexes. This fixes various fork() child process deadlocks, but one possible deadlock remains (intentionally) unaddressed: prof backtracing can acquire runtime library mutexes, so deadlock is still possible if heap profiling is enabled during fork(). This deadlock is known to be a real issue in at least the case of libgcc-based backtracing. Reported by tfengjun.
*	Fix mlockall()/madvise() interaction.	Jason Evans	2012-10-09	1	-8/+14
\| \| \| \| \| \| \| \|	mlockall(2) can cause purging via madvise(2) to fail. Fix purging code to check whether madvise() succeeded, and base zeroed page metadata on the result. Reported by Olivier Lecomte.
*	Fix chunk_recycle() to stop leaking trailing chunks.	Jason Evans	2012-05-09	1	-40/+38
\| \| \| \| \| \| \|	Fix chunk_recycle() to correctly compute trailsize and re-insert trailing chunks. This fixes a major virtual memory leak. Simplify chunk_record() to avoid dropping/re-acquiring chunks_mtx.
*	Fix chunk_alloc_mmap() bugs.	Jason Evans	2012-05-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify chunk_alloc_mmap() to no longer attempt map extension. The extra complexity isn't warranted, because although in the success case it saves one system call as compared to immediately falling back to chunk_alloc_mmap_slow(), it also makes the failure case even more expensive. This simplification removes two bugs: - For Windows platforms, pages_unmap() wasn't being called for unaligned mappings prior to falling back to chunk_alloc_mmap_slow(). This caused permanent virtual memory leaks. - For non-Windows platforms, alignment greater than chunksize caused pages_map() to be called with size 0 when attempting map extension. This always resulted in an mmap() error, and subsequent fallback to chunk_alloc_mmap_slow().
*	Fix a base allocator deadlock.	Jason Evans	2012-05-03	1	-3/+14
\| \| \| \| \|	Fix a base allocator deadlock due to chunk_recycle() calling back into the base allocator.
*	Add missing Valgrind annotations.	Jason Evans	2012-04-24	1	-0/+1
\|
*	Remove mmap_unaligned.	Jason Evans	2012-04-22	1	-11/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Remove mmap_unaligned, which was used to heuristically decide whether to optimistically call mmap() in such a way that could reduce the total number of system calls. If I remember correctly, the intention of mmap_unaligned was to avoid always executing the slow path in the presence of ASLR. However, that reasoning seems to have been based on a flawed understanding of how ASLR actually works. Although ASLR apparently causes mmap() to ignore address requests, it does not cause total placement randomness, so there is a reasonable expectation that iterative mmap() calls will start returning chunk-aligned mappings once the first chunk has been properly aligned.
*	Fix chunk allocation/deallocation bugs.	Jason Evans	2012-04-21	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Fix chunk_alloc_dss() to zero memory when requested. Fix chunk_dealloc() to avoid chunk_dealloc_mmap() for dss-allocated memory. Fix huge_palloc() to always junk fill when requested. Improve chunk_recycle() to report that memory is zeroed as a side effect of pages_purge().