summaryrefslogtreecommitdiffstats
path: root/src/jemalloc.c
Commit message (Collapse)AuthorAgeFilesLines
* Refactor bootstrapping to delay tsd initialization.Jason Evans2015-01-221-108/+185
| | | | | | | | | | | | Refactor bootstrapping to delay tsd initialization, primarily to support integration with FreeBSD's libc. Refactor a0*() for internal-only use, and add the bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc. This separation limits use of the a0*() functions to metadata allocation, which doesn't require malloc/calloc/free API compatibility. This resolves #170.
* Fix arenas_cache_cleanup().Jason Evans2015-01-221-1/+1
| | | | | Fix arenas_cache_cleanup() to check whether arenas_cache is NULL before deallocation, rather than checking arenas.
* Fix OOM handling in memalign() and valloc().Jason Evans2015-01-171-2/+4
| | | | | | Fix memalign() and valloc() to heed imemalign()'s return value. Reported by Kurt Wampler.
* Introduce two new modes of junk filling: "alloc" and "free".Guilherme Goncalves2014-12-151-7/+47
| | | | | | | | In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.
* Ignore MALLOC_CONF in set{uid,gid,cap} binaries.Daniel Micay2014-12-141-1/+22
| | | | | | This eliminates the malloc tunables as tools for an attacker. Closes #173
* Style and spelling fixes.Jason Evans2014-12-091-1/+1
|
* rm unused arena wrangling from xallocxDaniel Micay2014-10-311-16/+8
| | | | | It has no use for the arena_t since unlike rallocx it never makes a new memory allocation. It's just an unused parameter in ixalloc_helper.
* Miscellaneous cleanups.Jason Evans2014-10-311-3/+3
|
* avoid redundant chunk header readsDaniel Micay2014-10-311-28/+26
| | | | | | * use sized deallocation in iralloct_realign * iralloc and ixalloc always need the old size, so pass it in from the caller where it's often already calculated
* mark huge allocations as unlikelyDaniel Micay2014-10-311-2/+2
| | | | This cleans up the fast path a bit more by moving away more code.
* Add --with-lg-tiny-min, generalize --with-lg-quantum.Jason Evans2014-10-111-0/+54
|
* Don't fetch tsd in a0{d,}alloc().Jason Evans2014-10-111-11/+7
| | | | | Don't fetch tsd in a0{d,}alloc(), because doing so can cause infinite recursion on systems that require an allocated tsd wrapper.
* Add configure options.Jason Evans2014-10-101-21/+29
| | | | | | | | | | | | Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.
* Fix a recursive lock acquisition regression.Jason Evans2014-10-081-11/+16
| | | | | | Fix a recursive lock acquisition regression, which was introduced by 8bb3198f72fc7587dc93527f9f19fb5be52fa553 (Refactor/fix arenas manipulation.).
* Use regular arena allocation for huge tree nodes.Daniel Micay2014-10-081-1/+1
| | | | | | This avoids grabbing the base mutex, as a step towards fine-grained locking for huge allocations. The thread cache also provides a tiny (~3%) improvement for serial huge allocations.
* Refactor/fix arenas manipulation.Jason Evans2014-10-081-147/+369
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.
* Normalize size classes.Jason Evans2014-10-061-1/+33
| | | | | | | | | | Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).
* Silence a compiler warning.Jason Evans2014-10-041-1/+1
|
* Fix tsd cleanup regressions.Jason Evans2014-10-041-43/+35
| | | | | | | | | | | | | | | | Fix tsd cleanup regressions that were introduced in 5460aa6f6676c7f253bfcb75c028dfd38cae8aaf (Convert all tsd variables to reside in a single tsd structure.). These regressions were twofold: 1) tsd_tryget() should never (and need never) return NULL. Rename it to tsd_fetch() and simplify all callers. 2) tsd_*_set() must only be called when tsd is in the nominal state, because cleanup happens during the nominal-->purgatory transition, and re-initialization must not happen while in the purgatory state. Add tsd_nominal() and use it as needed. Note that tsd_*{p,}_get() can still be used as long as no re-initialization that would require cleanup occurs. This means that e.g. the thread_allocated counter can be updated unconditionally.
* Implement/test/fix prof-related mallctl's.Jason Evans2014-10-041-0/+2
| | | | | | | | | | | Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.
* Convert to uniform style: cond == false --> !condJason Evans2014-10-031-11/+11
|
* Mark malloc_conf as a weak symbolDave Rigby2014-09-291-1/+1
| | | | This fixes issue #113 - je_malloc_conf is not respected on OS X
* Convert all tsd variables to reside in a single tsd structure.Jason Evans2014-09-231-171/+195
|
* Fix irallocx_prof() sample logic.Jason Evans2014-09-121-3/+3
| | | | | | | | Fix irallocx_prof() sample logic to only update the threshold counter after it knows what size the allocation ended up being. This regression was caused by 6e73dc194ee9682d3eacaf725a989f04629718f7 (Fix a profile sampling race.), which did not make it into any releases prior to this fix.
* Apply likely()/unlikely() to allocation/deallocation fast paths.Jason Evans2014-09-121-64/+66
|
* Fix mallocx() to always honor MALLOCX_ARENA() when profiling.Jason Evans2014-09-111-2/+1
|
* mark some conditions as unlikelyDaniel Micay2014-09-111-21/+21
| | | | | | | | | | | | * assertion failure * malloc_init failure * malloc not already initialized (in malloc_init) * running in valgrind * thread cache disabled at runtime Clang and GCC already consider a comparison with NULL or -1 to be cold, so many branches (out-of-memory) are already correctly considered as cold and marking them is not important.
* Fix a profile sampling race.Jason Evans2014-09-101-53/+56
| | | | | | | | | | | Fix a profile sampling race that was due to preparing to sample, yet doing nothing to assure that the context remains valid until the stats are updated. These regressions were caused by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.
* Fix sdallocx() assertion.Jason Evans2014-09-091-16/+18
| | | | | Refactor sdallocx() and nallocx() to share inallocx(), and fix an sdallocx() assertion to check usize rather than size.
* Add support for sized deallocation.Daniel Micay2014-09-091-0/+44
| | | | | | | | | | | | | | | | | This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28
* Optimize [nmd]alloc() fast paths.Jason Evans2014-09-071-100/+139
| | | | | | Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.
* Test for availability of malloc hooks via autoconfSara Golemon2014-08-221-1/+3
| | | | | | | | | __*_hook() is glibc, but on at least one glibc platform (homebrew), the __GLIBC__ define isn't set correctly and we miss being able to use these hooks. Do a feature test for it during configuration so that we enable it anywhere the hooks are actually available.
* Implement per thread heap profiling.Jason Evans2014-08-201-70/+70
| | | | | | | | | | | | | | | | | | | | | | | | Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.
* Don't catch fork()ing events for Native Client.Richard Diamond2014-06-021-1/+1
| | | | | | | | Native Client doesn't allow forking, thus there is no need to catch fork()ing events for Native Client. Additionally, without this commit, jemalloc will introduce an unresolved pthread_atfork() in PNaCl Rust bins.
* Refactor huge allocation to be managed by arenas.Jason Evans2014-05-161-2/+2
| | | | | | | | | | | | | | | | | | | | Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to *dalloc* (some were *dealloc*). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.
* Add support for user-specified chunk allocators/deallocators.aravind2014-05-121-1/+1
| | | | | | | Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.
* Fix coding sytle nits.Jason Evans2014-05-011-4/+4
|
* Simplify backtracing.Jason Evans2014-04-231-53/+27
| | | | | | | | | | | Simplify backtracing to not ignore any frames, and compensate for this in pprof in order to increase flexibility with respect to function-based refactoring even in the presence of non-deterministic inlining. Modify pprof to blacklist all jemalloc allocation entry points including non-standard ones like mallocx(), and ignore all allocator-internal frames. Prior to this change, pprof excluded the specifically blacklisted functions from backtraces, but it left allocator-internal frames intact.
* Optimize Valgrind integration.Jason Evans2014-04-151-28/+47
| | | | | | | | | | | Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().
* Remove the "opt.valgrind" mallctl.Jason Evans2014-04-151-18/+17
| | | | | Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc automatically detects whether it is running inside valgrind.
* Remove the *allocm() API, which is superceded by the *allocx() API.Jason Evans2014-04-151-85/+0
|
* Remove support for non-prof-promote heap profiling metadata.Jason Evans2014-04-111-8/+8
| | | | | | | | | | | | | | | Make promotion of sampled small objects to large objects mandatory, so that profiling metadata can always be stored in the chunk map, rather than requiring one pointer per small region in each small-region page run. In practice the non-prof-promote code was only useful when using jemalloc to track all objects and report them as leaks at program exit. However, Valgrind is at least as good a tool for this particular use case. Furthermore, the non-prof-promote code is getting in the way of some optimizations that will make heap profiling much cheaper for the predominant use case (sampling a small representative proportion of all allocations).
* Don't dereference chunk->arena in free() hot pathBen Maurer2014-04-051-1/+1
| | | | | | | | | | | | When you call free() we load chunk->arena even though that data isn't used on the tcache hot path. In profiling some FB applications, I found that ~30% of the dTLB misses in the free() function come from this line. With 4 MB chunks, the arena_chunk_t->map is ~ 32 KB (1024 pages in the chunk, 4 8 byte pointers in arena_chunk_map_t). This means there's only a 1/8 chance of the page containing chunk->arena also comtaining the map bits.
* Use arena dss prec instead of default for huge allocs.Max Wang2014-03-281-1/+1
| | | | | Pass a dss_prec_t parameter to huge_{m,p,r}alloc instead of defaulting to the chunk dss prec.
* Fix unused variable warnings.Jason Evans2014-01-211-4/+2
|
* Extract profiling code from [re]allocation functions.Jason Evans2014-01-121-354/+449
| | | | | | | | | | | | | | | | | | | Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().
* Fix an uninitialized variable read in xallocx().Jason Evans2013-12-201-0/+3
|
* Optimize arena_prof_ctx_set().Jason Evans2013-12-161-43/+58
| | | | | | Refactor such that arena_prof_ctx_set() receives usize as an argument, and use it to determine whether to handle ptr as a small region, rather than reading the chunk page map.
* Implement the *allocx() API.Jason Evans2013-12-131-162/+294
| | | | | | | | | | | | | | | | | | | | | | | Implement the *allocx() API, which is a successor to the *allocm() API. The *allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t *foo; allocm((void **)&foo, NULL, 42, 0); whereas the following is safe: foo_t *foo; void *p; allocm(&p, NULL, 42, 0); foo = (foo_t *)p; mallocx() does not have this problem: foo_t *foo = (foo_t *)mallocx(42, 0);
* Silence some unused variable warnings.Jason Evans2013-12-101-4/+4
|