summaryrefslogtreecommitdiffstats
path: root/src/jemalloc.c
Commit message (Collapse)AuthorAgeFilesLines
* Check for existance of CPU_COUNT macro before using it.Dave Watson2016-11-031-1/+7
| | | | This resolves #485.
* Do not mark malloc_conf as weak on Windows.Jason Evans2016-10-291-1/+1
| | | | | | | This works around malloc_conf not being properly initialized by at least the cygwin toolchain. Prior build system changes to use -Wl,--[no-]whole-archive may be necessary for malloc_conf resolution to work properly as a non-weak symbol (not tested).
* Do not mark malloc_conf as weak for unit tests.Jason Evans2016-10-291-1/+5
| | | | | | | This is generally correct (no need for weak symbols since no jemalloc library is involved in the link phase), and avoids linking problems (apparently unininitialized non-NULL malloc_conf) when using cygwin with gcc.
* Support static linking of jemalloc with glibcDave Watson2016-10-281-0/+31
| | | | | | | | | | | | | | | | | | | | | | | glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.
* Do not (recursively) allocate within tsd_fetch().Jason Evans2016-10-211-1/+8
| | | | | | | Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.
* Make dss operations lockless.Jason Evans2016-10-131-5/+1
| | | | | | | | | | | | | | Rather than protecting dss operations with a mutex, use atomic operations. This has negligible impact on synchronization overhead during typical dss allocation, but is a substantial improvement for chunk_in_dss() and the newly added chunk_dss_mergeable(), which can be called multiple times during chunk deallocations. This change also has the advantage of avoiding tsd in deallocation paths associated with purging, which resolves potential deadlocks during thread exit due to attempted tsd resurrection. This resolves #425.
* Add/use adaptive spinning.Jason Evans2016-10-131-1/+4
| | | | | | | | Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.
* Disallow 0x5a junk filling when running in Valgrind.Jason Evans2016-10-131-6/+28
| | | | | | | | Explicitly disallow junk:true and junk:free runtime settings when running in Valgrind, since deallocation-time junk filling and redzone validation cause false positive Valgrind reports. This resolves #470.
* Simplify run quantization.Jason Evans2016-10-061-2/+1
|
* Refactor runs_avail.Jason Evans2016-10-051-3/+14
| | | | | | | | Use pszind_t size classes rather than szind_t size classes, and always reserve space for NPSIZES elements. This removes unused heaps that are not multiples of the page size, and adds (currently) unused heaps for all huge size classes, with the immediate benefit that the size of arena_t allocations is constant (no longer dependent on chunk size).
* Implement pz2ind(), pind2sz(), and psz2u().Jason Evans2016-10-041-2/+2
| | | | | | | These compute size classes and indices similarly to size2index(), index2size() and s2u(), respectively, but using the subset of size classes that are multiples of the page size. Note that pszind_t and szind_t are not interchangeable.
* Use TSDN_NULL rather than NULL as appropriate.Jason Evans2016-10-041-2/+2
|
* Fix arena_bind().Qi Wang2016-09-231-6/+7
| | | | | When tsd is not in nominal state (e.g. during thread termination), we should not increment nthreads.
* Fix rallocx() sampling code to not eagerly commit sampler update.Jason Evans2016-06-081-3/+3
| | | | | | rallocx() for an alignment-constrained request may end up with a smaller-than-worst-case size if in-place reallocation succeeds due to serendipitous alignment. In such cases, sampling may not happen.
* Fix a Valgrind regression in calloc().Elliot Ronaghan2016-06-071-1/+1
| | | | | This regression was caused by 3ef51d7f733ac6432e80fa902a779ab5b98d74f6 (Optimize the fast paths of calloc() and [m,d,sd]allocx().).
* Resolve bootstrapping issues when embedded in FreeBSD libc.Jason Evans2016-05-111-244/+270
| | | | | | | | | | | | | b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.
* Fix tsd bootstrapping for a0malloc().Jason Evans2016-05-071-27/+31
|
* Optimize the fast paths of calloc() and [m,d,sd]allocx().Jason Evans2016-05-061-186/+114
| | | | | | | | This is a broader application of optimizations to malloc() and free() in f4a0f32d340985de477bbe329ecdaecd69ed1055 (Fast-path improvement: reduce # of branches and unnecessary operations.). This resolves #321.
* Modify pages_map() to support mapping uncommitted virtual memory.Jason Evans2016-05-061-0/+1
| | | | | | | | | | | If the OS overcommits: - Commit all mappings in pages_map() regardless of whether the caller requested committed memory. - Linux-specific: Specify MAP_NORESERVE to avoid unfortunate interactions with heuristic overcommit mode during fork(2). This resolves #193.
* Fix witness/fork() interactions.Jason Evans2016-04-261-3/+3
| | | | | | Fix witness to clear its list of owned mutexes in the child if platform-specific malloc_mutex code re-initializes mutexes rather than unlocking them.
* Fix fork()-related lock rank ordering reversals.Jason Evans2016-04-261-12/+29
|
* Fix arena_choose_hard() regression.Jason Evans2016-04-231-1/+1
| | | | | This regression was caused by 66cd953514a18477eb49732e40d5c2ab5f1b12c5 (Do not allocate metadata via non-auto arenas, nor tcaches.).
* Do not allocate metadata via non-auto arenas, nor tcaches.Jason Evans2016-04-221-37/+75
| | | | | This assures that all internally allocated metadata come from the first opt_narenas arenas, i.e. the automatically multiplexed arenas.
* Add witness, a simple online locking validator.Jason Evans2016-04-141-147/+230
| | | | This resolves #358.
* Fix a potential tsd cleanup leak.Jason Evans2016-02-281-0/+3
| | | | | | | | | | | | | | Prior to 767d85061a6fb88ec977bbcd9b429a43aff391e6 (Refactor arenas array (fixes deadlock).), it was possible under some circumstances for arena_get() to trigger recreation of the arenas cache during tsd cleanup, and the arenas cache would then be leaked. In principle a similar issue could still occur as a side effect of decay-based purging, which calls arena_tdata_get(). Fix arenas_tdata_cleanup() by setting tsd->arenas_tdata_bypass to true, so that arena_tdata_get() will gracefully fail (an expected behavior) rather than recreating tsd->arena_tdata. Reported by Christopher Ferris <cferris@google.com>.
* Add more HUGE_MAXCLASS overflow checks.Jason Evans2016-02-261-23/+34
| | | | | | | Add HUGE_MAXCLASS overflow checks that are specific to heap profiling code paths. This fixes test failures that were introduced by 0c516a00c4cb28cff55ce0995f756b5aae074c9e (Make *allocx() size class overflow behavior defined.).
* Make *allocx() size class overflow behavior defined.Jason Evans2016-02-251-24/+44
| | | | | | | Limit supported size and alignment to HUGE_MAXCLASS, which in turn is now limited to be less than PTRDIFF_MAX. This resolves #278 and #295.
* Refactor arenas array (fixes deadlock).Jason Evans2016-02-251-151/+90
| | | | | | | | | | | | Refactor the arenas array, which contains pointers to all extant arenas, such that it starts out as a sparse array of maximum size, and use double-checked atomics-based reads as the basis for fast and simple arena_get(). Additionally, reduce arenas_lock's role such that it only protects against arena initalization races. These changes remove the possibility for arena lookups to trigger locking, which resolves at least one known (fork-related) deadlock. This resolves #315.
* Silence miscellaneous 64-to-32-bit data loss warnings.Jason Evans2016-02-241-1/+1
|
* Use ssize_t for readlink() rather than int.Jason Evans2016-02-241-1/+1
|
* Make opt_narenas unsigned rather than size_t.Jason Evans2016-02-241-8/+12
|
* Refactor time_* into nstime_*.Jason Evans2016-02-221-1/+1
| | | | | | | Use a single uint64_t in nstime_t to store nanoseconds rather than using struct timespec. This reduces fragility around conversions between long and uint64_t, especially missing casts that only cause problems on 32-bit platforms.
* Implement decay-based unused dirty page purging.Jason Evans2016-02-201-11/+42
| | | | | | | | | | | | | | | | This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. Add mallctls: - opt.purge - opt.decay_time - arena.<i>.decay - arena.<i>.decay_time - arenas.decay_time - stats.arenas.<i>.decay_time This resolves #325.
* Refactor arenas_cache tsd.Jason Evans2016-02-201-62/+87
| | | | | Refactor arenas_cache tsd into arenas_tdata, which is a structure of type arena_tdata_t.
* Add --with-malloc-conf.Jason Evans2016-02-201-3/+6
| | | | | Add --with-malloc-conf, which makes it possible to embed a default options string during configuration.
* Call malloc_test_boot0() from malloc_init_hard_recursible().Cosmin Paraschiv2016-01-111-5/+16
| | | | | | | | | | | When using LinuxThreads, malloc bootstrapping deadlocks, since malloc_tsd_boot0() ends up calling pthread_setspecific(), which causes recursive allocation. Fix it by moving the malloc_tsd_boot0() call to malloc_init_hard_recursible(). The deadlock was introduced by 8bb3198f72fc7587dc93527f9f19fb5be52fa553 (Refactor/fix arenas manipulation.), when tsd_boot() was split and the top half, tsd_boot0(), got an extra tsd_wrapper_set() call.
* Fast-path improvement: reduce # of branches and unnecessary operations.Qi Wang2015-11-101-53/+133
| | | | | | - Combine multiple runtime branches into a single malloc_slow check. - Avoid calling arena_choose / size2index / index2size on fast path. - A few micro optimizations.
* Add mallocx() OOM tests.Jason Evans2015-09-171-0/+2
|
* Simplify imallocx_prof_sample().Jason Evans2015-09-171-26/+13
| | | | | | | Simplify imallocx_prof_sample() to always operate on usize rather than sometimes using size. This avoids redundant usize computations and more closely fits the style adopted by i[rx]allocx_prof_sample() to fix sampling bugs.
* Fix irallocx_prof_sample().Jason Evans2015-09-171-5/+5
| | | | | Fix irallocx_prof_sample() to always allocate large regions, even when alignment is non-zero.
* Fix ixallocx_prof_sample().Jason Evans2015-09-171-17/+4
| | | | | | Fix ixallocx_prof_sample() to never modify nor create sampled small allocations. xallocx() is in general incapable of moving small allocations, so this fix removes buggy code without loss of generality.
* Centralize xallocx() size[+extra] overflow checks.Jason Evans2015-09-151-7/+11
|
* Fix ixallocx_prof() to check for size greater than HUGE_MAXCLASS.Jason Evans2015-09-151-1/+5
|
* Resolve an unsupported special case in arena_prof_tctx_set().Jason Evans2015-09-151-3/+3
| | | | | | | | | | | Add arena_prof_tctx_reset() and use it instead of arena_prof_tctx_set() when resetting the tctx pointer during reallocation, which happens whenever an originally sampled reallocated object is not sampled during reallocation. This regression was introduced by 594c759f37c301d0245dc2accf4d4aaf9d202819 (Optimize arena_prof_tctx_set().)
* Fix ixallocx_prof_sample() argument order reversal.Jason Evans2015-09-151-1/+1
| | | | | Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample() in the correct order.
* s/max_usize/usize_max/gJason Evans2015-09-151-6/+6
|
* s/oldptr/old_ptr/gJason Evans2015-09-151-15/+15
|
* Make one call to prof_active_get_unlocked() per allocation event.Jason Evans2015-09-151-10/+19
| | | | | | | Make one call to prof_active_get_unlocked() per allocation event, and use the result throughout the relevant functions that handle an allocation event. Also add a missing check in prof_realloc(). These fixes protect allocation events against concurrent prof_active changes.
* Fix irealloc_prof() to prof_alloc_rollback() on OOM.Jason Evans2015-09-151-1/+3
|
* Optimize irallocx_prof() to optimistically update the sampler state.Jason Evans2015-09-151-3/+3
|