summaryrefslogtreecommitdiffstats
path: root/src/arena.c
Commit message (Collapse)AuthorAgeFilesLines
* use sized deallocation internally for rallocDaniel Micay2014-10-161-1/+1
| | | | | | | The size of the source allocation is known at this point, so reading the chunk header can be avoided for the small size class fast path. This is not very useful right now, but it provides a significant performance boost with an alternate ralloc entry point taking the old size.
* Fix huge allocation statistics.Jason Evans2014-10-151-81/+220
|
* Add per size class huge allocation statistics.Jason Evans2014-10-131-29/+50
| | | | | | | | | | | | | Add per size class huge allocation statistics, and normalize various stats: - Change the arenas.nlruns type from size_t to unsigned. - Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's. - Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with stats.arenas.<i>.bins.<j>.curregs . - Add the stats.arenas.<i>.hchunks.<j>.nmalloc, stats.arenas.<i>.hchunks.<j>.ndalloc, stats.arenas.<i>.hchunks.<j>.nrequests, and stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.
* Remove arena_dalloc_bin_run() clean page preservation.Jason Evans2014-10-111-66/+7
| | | | | | | | | | | | | Remove code in arena_dalloc_bin_run() that preserved the "clean" state of trailing clean pages by splitting them into a separate run during deallocation. This was a useful mechanism for reducing dirty page churn when bin runs comprised many pages, but bin runs are now quite small. Remove the nextind field from arena_run_t now that it is no longer needed, and change arena_run_t's bin field (arena_bin_t *) to binind (index_t). These two changes remove 8 bytes of chunk header overhead per page, which saves 1/512 of all arena chunk memory.
* Add configure options.Jason Evans2014-10-101-11/+31
| | | | | | | | | | | | Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.
* Refactor/fix arenas manipulation.Jason Evans2014-10-081-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.
* Normalize size classes.Jason Evans2014-10-061-111/+112
| | | | | | | | | | Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).
* Attempt to expand huge allocations in-place.Daniel Micay2014-10-051-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void *ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size *= 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137
* Fix OOM-related regression in arena_tcache_fill_small().Jason Evans2014-10-051-1/+12
| | | | | | | | | | | | | | | Fix an OOM-related regression in arena_tcache_fill_small() that caused cache corruption that would almost certainly expose the application to undefined behavior, usually in the form of an allocation request returning an already-allocated region, or somewhat less likely, a freed region that had already been returned to the arena, thus making it available to the arena for any purpose. This regression was introduced by 9c43c13a35220c10d97a886616899189daceb359 (Reverse tcache fill order.), and was present in all releases from 2.2.0 through 3.6.0. This resolves #98.
* Convert to uniform style: cond == false --> !condJason Evans2014-10-031-14/+14
|
* Move small run metadata into the arena chunk header.Jason Evans2014-09-291-194/+153
| | | | | | | | | | | | | | | | Move small run metadata into the arena chunk header, with multiple expected benefits: - Lower run fragmentation due to reduced run sizes; runs are more likely to completely drain when there are fewer total regions. - Improved cache behavior. Prior to this change, run headers were always page-aligned, which put extra pressure on some CPU cache sets. The degree to which this was a problem was hardware dependent, but it likely hurt some even for the most advanced modern hardware. - Buffer overruns/underruns are less likely to corrupt allocator metadata. - Size classes between 4 KiB and 16 KiB become reasonable to support without any special handling, and the runs are small enough that dirty unused pages aren't a significant concern.
* Convert all tsd variables to reside in a single tsd structure.Jason Evans2014-09-231-9/+14
|
* Apply likely()/unlikely() to allocation/deallocation fast paths.Jason Evans2014-09-121-14/+14
|
* Optimize [nmd]alloc() fast paths.Jason Evans2014-09-071-1/+1
| | | | | | Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.
* Refactor chunk map.Qinfan Wu2014-09-051-99/+109
| | | | | Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.
* Fix and refactor runs_dirty-based purging.Jason Evans2014-08-141-104/+80
| | | | | | | | | | | | | Fix runs_dirty-based purging to also purge dirty pages in the spare chunk. Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and move the arena->ndirty accounting into those functions. Remove the u.ql_link field from arena_chunk_map_t, and get rid of the enclosing union for u.rb_link, since only rb_link remains. Remove the ndirty field from arena_chunk_t.
* arena->npurgatory is no longer needed since we drop arena's lockQinfan Wu2014-08-121-12/+3
| | | | after stashing all the purgeable runs.
* Remove chunks_dirty tree, nruns_avail and nruns_adjac since we noQinfan Wu2014-08-121-177/+10
| | | | longer need to maintain the tree for dirty page purging.
* Purge dirty pages from the beginning of the dirty list.Qinfan Wu2014-08-121-165/+70
|
* Add dirty page counting for debugQinfan Wu2014-08-121-4/+29
|
* Maintain all the dirty runs in a linked list for each arenaQinfan Wu2014-08-121-0/+47
|
* Fix the cactive statistic.Jason Evans2014-08-071-3/+3
| | | | | | | Fix the cactive statistic to decrease (rather than increase) when active memory decreases. This regression was introduced by aa5113b1fdafd1129c22512837c6c3d66c295fc8 (Refactor overly large/complex functions) and first released in 3.5.0.
* Reintroduce the comment that was removed in f9ff603.Qinfan Wu2014-08-061-1/+5
|
* Fix the bug that causes not allocating free run with lowest address.Qinfan Wu2014-08-061-3/+7
|
* Try to use __builtin_ffsl if ffsl is unavailable.Richard Diamond2014-06-021-1/+1
| | | | | | | | | | | Some platforms (like those using Newlib) don't have ffs/ffsl. This commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't found. __builtin_ffsl performs the same function as ffsl, and has the added benefit of being available on any platform utilizing Gcc-compatible compiler. This change does not address the used of ffs in the MALLOCX_ARENA() macro.
* Add size class computation capability.Jason Evans2014-05-291-29/+33
| | | | | | | Add size class computation capability, currently used only as validation of the size class lookup tables. Generalize the size class spacing used for bins, for eventual use throughout the full range of allocation sizes.
* Refactor huge allocation to be managed by arenas.Jason Evans2014-05-161-14/+99
| | | | | | | | | | | | | | | | | | | | Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to *dalloc* (some were *dealloc*). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.
* Add support for user-specified chunk allocators/deallocators.aravind2014-05-121-3/+5
| | | | | | | Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.
* Refactor small_size2bin and small_bin2size.Jason Evans2014-04-171-9/+9
| | | | | Refactor small_size2bin and small_bin2size to be inline functions rather than directly accessed arrays.
* Merge pull request #73 from bmaurer/smallmallocJason Evans2014-04-161-1/+9
|\ | | | | Smaller malloc hot path
| * Create a const array with only a small bin to size mapBen Maurer2014-04-161-1/+9
| |
* | Optimize Valgrind integration.Jason Evans2014-04-151-14/+14
| | | | | | | | | | | | | | | | | | | | | | Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().
* | Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.Jason Evans2014-04-151-1/+4
|/ | | | | | | Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.
* Remove support for non-prof-promote heap profiling metadata.Jason Evans2014-04-111-21/+0
| | | | | | | | | | | | | | | Make promotion of sampled small objects to large objects mandatory, so that profiling metadata can always be stored in the chunk map, rather than requiring one pointer per small region in each small-region page run. In practice the non-prof-promote code was only useful when using jemalloc to track all objects and report them as leaks at program exit. However, Valgrind is at least as good a tool for this particular use case. Furthermore, the non-prof-promote code is getting in the way of some optimizations that will make heap profiling much cheaper for the predominant use case (sampling a small representative proportion of all allocations).
* refactoring for bits splittingBen Maurer2014-04-101-36/+40
|
* Fix a crashing case where arena_chunk_init_hard returns NULL.Chris Pride2014-03-261-1/+4
| | | | | | | | | | | | | This happens when it fails to allocate a new chunk. Which arena_chunk_alloc then passes into arena_avail_insert without any checks. This then causes a crash when arena_avail_insert tries to check chunk->ndirty. This was introduced by the refactoring of arena_chunk_alloc which previously would have returned NULL immediately after calling chunk_alloc. This is now the return from arena_chunk_init_hard so we need to check that return, and not continue if it was NULL.
* Fix typoErwan Legrand2014-02-141-1/+0
|
* Refactor overly large/complex functions.Jason Evans2014-01-151-383/+461
| | | | | | | Refactor overly large functions by breaking out helper functions. Refactor overly complex multi-purpose functions into separate more specific functions.
* Extract profiling code from [re]allocation functions.Jason Evans2014-01-121-7/+6
| | | | | | | | | | | | | | | | | | | Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().
* Add junk/zero filling unit tests, and fix discovered bugs.Jason Evans2014-01-081-17/+67
| | | | | | Fix growing large reallocation to junk fill new space. Fix huge deallocation to junk fill when munmap is disabled.
* Add quarantine unit tests.Jason Evans2013-12-171-12/+54
| | | | | | | | | Verify that freed regions are quarantined, and that redzone corruption is detected. Introduce a testing idiom for intercepting/replacing internal functions. In this case the replaced function is ordinarily a static function, but the idiom should work similarly for library-private functions.
* Don't junk-fill reallocations unless usize changes.Jason Evans2013-12-161-12/+3
| | | | | | | | | | | Don't junk fill reallocations for which the request size is less than the current usable size, but not enough smaller to cause a size class change. Unlike malloc()/calloc()/realloc(), *allocx() contractually treats the full usize as the allocation, so a caller can ask for zeroed memory via mallocx() and a series of rallocx() calls that all specify MALLOCX_ZERO, and be assured that all newly allocated bytes will be zeroed and made available to the application without danger of allocator mutation until the size class decreases enough to cause usize reduction.
* Implement the *allocx() API.Jason Evans2013-12-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | Implement the *allocx() API, which is a successor to the *allocm() API. The *allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t *foo; allocm((void **)&foo, NULL, 42, 0); whereas the following is safe: foo_t *foo; void *p; allocm(&p, NULL, 42, 0); foo = (foo_t *)p; mallocx() does not have this problem: foo_t *foo = (foo_t *)mallocx(42, 0);
* Remove unnecessary zeroing in arena_palloc().Jason Evans2013-10-301-43/+73
|
* Fix a Valgrind integration flaw.Jason Evans2013-10-201-5/+18
| | | | | | Fix a Valgrind integration flaw that caused Valgrind warnings about reads of uninitialized memory in internal zero-initialized data structures (relevant to tcache and prof code).
* Fix a Valgrind integration flaw.Jason Evans2013-10-201-7/+14
| | | | | Fix a Valgrind integration flaw that caused Valgrind warnings about reads of uninitialized memory in arena chunk headers.
* Fix a prof-related locking order bug.Jason Evans2013-02-061-5/+8
| | | | | Fix a locking order bug that could cause deadlock during fork if heap profiling were enabled.
* Fix Valgrind integration.Jason Evans2013-02-011-7/+8
| | | | | Fix Valgrind integration to annotate all internally allocated memory in a way that keeps Valgrind happy about internal data structure access.
* Tighten valgrind integration.Jason Evans2013-01-221-22/+28
| | | | | | | Tighten valgrind integration such that immediately after memory is validated or zeroed, valgrind is told to forget the memory's 'defined' state. The only place newly allocated memory should be left marked as 'defined' is in the public functions (e.g. calloc() and realloc()).
* Avoid arena_prof_accum()-related locking when possible.Jason Evans2012-11-131-24/+3
| | | | | | | Refactor arena_prof_accum() and its callers to avoid arena locking when prof_interval is 0 (as when profiling is disabled). Reported by Ben Maurer.