jemalloc.git - jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix in-place shrinking huge reallocation purging bugs.	Jason Evans	2015-03-26	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \|	Fix the shrinking case of huge_ralloc_no_move_similar() to purge the correct number of pages, at the correct offset. This regression was introduced by 8d6a3e8321a7767cb2ca0930b85d5d488a8cc659 (Implement dynamic per arena control over dirty page purging.). Fix huge_ralloc_no_move_shrink() to purge the correct number of pages. This bug was introduced by 9673983443a0782d975fbcb5d8457cfd411b8b56 (Purge/zero sub-chunk huge allocations as necessary.).
*	Add the "stats.arenas.<i>.lg_dirty_mult" mallctl.	Jason Evans	2015-03-24	1	-3/+5
\|
*	Fix signed/unsigned comparison in arena_lg_dirty_mult_valid().	Jason Evans	2015-03-24	1	-1/+2
\|
*	Implement dynamic per arena control over dirty page purging.	Jason Evans	2015-03-19	1	-12/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.
*	Fix a declaration-after-statement regression.	Jason Evans	2015-03-11	1	-3/+2
\|
*	Normalize rdelm/rd structure field naming.	Jason Evans	2015-03-11	1	-4/+4
\|
*	Refactor dirty run linkage to reduce sizeof(extent_node_t).	Jason Evans	2015-03-11	1	-41/+48
\|
*	Use first-fit rather than first-best-fit run/chunk allocation.	Jason Evans	2015-03-07	1	-25/+47
\| \| \| \| \| \| \|	This tends to more effectively pack active memory toward low addresses. However, additional tree searches are required in many cases, so whether this change stands the test of time will depend on real-world benchmarks.
*	Quantize szad trees by size class.	Jason Evans	2015-03-07	1	-9/+27
\| \| \| \| \| \| \|	Treat sizes that round down to the same size class as size-equivalent in trees that are used to search for first best fit, so that there are only as many "firsts" as there are size classes. This comes closer to the ideal of first fit.
*	Fix chunk cache races.	Jason Evans	2015-02-19	1	-93/+163
\| \| \| \| \| \|	These regressions were introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
*	Rename "dirty chunks" to "cached chunks".	Jason Evans	2015-02-18	1	-45/+25
\| \| \| \| \| \| \| \| \| \|	Rename "dirty chunks" to "cached chunks", in order to avoid overloading the term "dirty". Fix the regression caused by 339c2b23b2d61993ac768afcc72af135662c6771 (Fix chunk_unmap() to propagate dirty state.), and actually address what that change attempted, which is to only purge chunks once, and propagate whether zeroed pages resulted into chunk_record().
*	Fix chunk_unmap() to propagate dirty state.	Jason Evans	2015-02-18	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Fix chunk_unmap() to propagate whether a chunk is dirty, and modify dirty chunk purging to record this information so it can be passed to chunk_unmap(). Since the broken version of chunk_unmap() claimed that all chunks were clean, this resulted in potential memory corruption for purging implementations that do not zero (e.g. MADV_FREE). This regression was introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
*	arena_chunk_dirty_node_init() --> extent_node_dirty_linkage_init()	Jason Evans	2015-02-18	1	-11/+3
\|
*	Simplify extent_node_t and add extent_node_init().	Jason Evans	2015-02-17	1	-7/+2
\|
*	Integrate whole chunks into unused dirty page purging machinery.	Jason Evans	2015-02-17	1	-114/+289
\| \| \| \| \| \| \| \| \| \| \| \|	Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.
*	Normalize _link and link_ fields to all be *_link.	Jason Evans	2015-02-16	1	-4/+4
\|
*	Refactor huge_*() calls into arena internals.	Jason Evans	2015-02-12	1	-56/+104
\| \| \| \| \|	Make redirects to the huge_*() API the arena code's responsibility, since arenas now take responsibility for all allocation sizes.
*	Move centralized chunk management into arenas.	Jason Evans	2015-02-12	1	-10/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.
*	Implement explicit tcache support.	Jason Evans	2015-02-10	1	-15/+9
\| \| \| \| \| \| \| \| \|	Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.
*	Make opt.lg_dirty_mult work as documented	Mike Hommey	2015-02-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The documentation for opt.lg_dirty_mult says: Per-arena minimum ratio (log base 2) of active to dirty pages. Some dirty unused pages may be allowed to accumulate, within the limit set by the ratio (or one chunk worth of dirty pages, whichever is greater) (...) The restriction in parentheses currently doesn't happen. This makes jemalloc aggressively madvise(), which in turns increases the amount of page faults significantly. For instance, this resulted in several(!) hundred(!) milliseconds startup regression on Firefox for Android. This may require further tweaking, but starting with actually doing what the documentation says is a good start.
*	Implement metadata statistics.	Jason Evans	2015-01-24	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.
*	Move variable declaration to the top its block for MSVC compatibility.	Guilherme Goncalves	2014-12-17	1	-2/+2
\|
*	Introduce two new modes of junk filling: "alloc" and "free".	Guilherme Goncalves	2014-12-15	1	-25/+28
\| \| \| \| \| \| \| \|	In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.
*	Style and spelling fixes.	Jason Evans	2014-12-09	1	-1/+1
\|
*	Fix more pointer arithmetic undefined behavior.	Jason Evans	2014-11-17	1	-4/+4
\| \| \| \| \| \|	Reported by Guilherme Gonçalves. This resolves #166.
*	Fix pointer arithmetic undefined behavior.	Jason Evans	2014-11-17	1	-4/+7
\| \| \| \|	Reported by Denis Denisov.
*	Disable arena_dirty_count() validation.	Jason Evans	2014-11-01	1	-2/+6
\|
*	mark huge allocations as unlikely	Daniel Micay	2014-10-31	1	-2/+2
\| \| \| \|	This cleans up the fast path a bit more by moving away more code.
*	Use JEMALLOC_INLINE_C everywhere it's appropriate.	Jason Evans	2014-10-30	1	-8/+8
\|
*	use sized deallocation internally for ralloc	Daniel Micay	2014-10-16	1	-1/+1
\| \| \| \| \| \| \|	The size of the source allocation is known at this point, so reading the chunk header can be avoided for the small size class fast path. This is not very useful right now, but it provides a significant performance boost with an alternate ralloc entry point taking the old size.
*	Fix huge allocation statistics.	Jason Evans	2014-10-15	1	-81/+220
\|
*	Add per size class huge allocation statistics.	Jason Evans	2014-10-13	1	-29/+50
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add per size class huge allocation statistics, and normalize various stats: - Change the arenas.nlruns type from size_t to unsigned. - Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's. - Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with stats.arenas.<i>.bins.<j>.curregs . - Add the stats.arenas.<i>.hchunks.<j>.nmalloc, stats.arenas.<i>.hchunks.<j>.ndalloc, stats.arenas.<i>.hchunks.<j>.nrequests, and stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.
*	Remove arena_dalloc_bin_run() clean page preservation.	Jason Evans	2014-10-11	1	-66/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Remove code in arena_dalloc_bin_run() that preserved the "clean" state of trailing clean pages by splitting them into a separate run during deallocation. This was a useful mechanism for reducing dirty page churn when bin runs comprised many pages, but bin runs are now quite small. Remove the nextind field from arena_run_t now that it is no longer needed, and change arena_run_t's bin field (arena_bin_t *) to binind (index_t). These two changes remove 8 bytes of chunk header overhead per page, which saves 1/512 of all arena chunk memory.
*	Add configure options.	Jason Evans	2014-10-10	1	-11/+31
\| \| \| \| \| \| \| \| \| \| \| \|	Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.
*	Refactor/fix arenas manipulation.	Jason Evans	2014-10-08	1	-10/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.
*	Normalize size classes.	Jason Evans	2014-10-06	1	-111/+112
\| \| \| \| \| \| \| \| \| \|	Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).
*	Attempt to expand huge allocations in-place.	Daniel Micay	2014-10-05	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137
*	Fix OOM-related regression in arena_tcache_fill_small().	Jason Evans	2014-10-05	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix an OOM-related regression in arena_tcache_fill_small() that caused cache corruption that would almost certainly expose the application to undefined behavior, usually in the form of an allocation request returning an already-allocated region, or somewhat less likely, a freed region that had already been returned to the arena, thus making it available to the arena for any purpose. This regression was introduced by 9c43c13a35220c10d97a886616899189daceb359 (Reverse tcache fill order.), and was present in all releases from 2.2.0 through 3.6.0. This resolves #98.
*	Convert to uniform style: cond == false --> !cond	Jason Evans	2014-10-03	1	-14/+14
\|
*	Move small run metadata into the arena chunk header.	Jason Evans	2014-09-29	1	-194/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move small run metadata into the arena chunk header, with multiple expected benefits: - Lower run fragmentation due to reduced run sizes; runs are more likely to completely drain when there are fewer total regions. - Improved cache behavior. Prior to this change, run headers were always page-aligned, which put extra pressure on some CPU cache sets. The degree to which this was a problem was hardware dependent, but it likely hurt some even for the most advanced modern hardware. - Buffer overruns/underruns are less likely to corrupt allocator metadata. - Size classes between 4 KiB and 16 KiB become reasonable to support without any special handling, and the runs are small enough that dirty unused pages aren't a significant concern.
*	Convert all tsd variables to reside in a single tsd structure.	Jason Evans	2014-09-23	1	-9/+14
\|
*	Apply likely()/unlikely() to allocation/deallocation fast paths.	Jason Evans	2014-09-12	1	-14/+14
\|
*	Optimize [nmd]alloc() fast paths.	Jason Evans	2014-09-07	1	-1/+1
\| \| \| \| \| \|	Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.
*	Refactor chunk map.	Qinfan Wu	2014-09-05	1	-99/+109
\| \| \| \| \|	Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.
*	Fix and refactor runs_dirty-based purging.	Jason Evans	2014-08-14	1	-104/+80
\| \| \| \| \| \| \| \| \| \| \| \| \|	Fix runs_dirty-based purging to also purge dirty pages in the spare chunk. Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and move the arena->ndirty accounting into those functions. Remove the u.ql_link field from arena_chunk_map_t, and get rid of the enclosing union for u.rb_link, since only rb_link remains. Remove the ndirty field from arena_chunk_t.
*	arena->npurgatory is no longer needed since we drop arena's lock	Qinfan Wu	2014-08-12	1	-12/+3
\| \| \| \|	after stashing all the purgeable runs.
*	Remove chunks_dirty tree, nruns_avail and nruns_adjac since we no	Qinfan Wu	2014-08-12	1	-177/+10
\| \| \| \|	longer need to maintain the tree for dirty page purging.
*	Purge dirty pages from the beginning of the dirty list.	Qinfan Wu	2014-08-12	1	-165/+70
\|
*	Add dirty page counting for debug	Qinfan Wu	2014-08-12	1	-4/+29
\|
*	Maintain all the dirty runs in a linked list for each arena	Qinfan Wu	2014-08-12	1	-0/+47
\|