summaryrefslogtreecommitdiffstats
path: root/include/jemalloc
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Define 64-bits atomics unconditionally"Jason Evans2016-11-071-8/+10
| | | | | | This reverts commit af33e9a59735a2ee72132d3dd6e23fae6d296e34. This resolves #495.
* Refactor prng to not use 64-bit atomics on 32-bit platforms.Jason Evans2016-11-073-14/+149
| | | | This resolves #495.
* Fix chunk_alloc_cache() to support decommitted allocation.Jason Evans2016-11-041-1/+1
| | | | | | | | Fix chunk_alloc_cache() to support decommitted allocation, and use this ability in arena_chunk_alloc_internal() and arena_stash_dirty(), so that chunks don't get permanently stuck in a hybrid state. This resolves #487.
* Update symbol mangling.Jason Evans2016-11-031-0/+3
|
* Do not use syscall(2) on OS X 10.12 (deprecated).Jason Evans2016-11-031-0/+3
|
* Add os_unfair_lock support.Jason Evans2016-11-033-0/+17
| | | | | OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended replacement.
* Fix/refactor zone allocator integration code.Jason Evans2016-11-031-1/+1
| | | | | | | | | Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes, since OS X 10.12 cannot tolerate a child unlocking mutexes that were locked by its parent. Refactor; this was a side effect of experimenting with zone {de,re}registration during fork(2).
* Refactor witness_unlock() to fix undefined test behavior.Jason Evans2016-10-312-11/+29
| | | | This resolves #396.
* Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW.Jason Evans2016-10-301-2/+2
| | | | | | | | The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC), whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but still has resolution (~1ms) that is adequate for our purposes. This resolves #479.
* Support static linking of jemalloc with glibcDave Watson2016-10-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | | glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.
* Do not (recursively) allocate within tsd_fetch().Jason Evans2016-10-216-33/+73
| | | | | | | Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.
* Make dss operations lockless.Jason Evans2016-10-134-17/+7
| | | | | | | | | | | | | | Rather than protecting dss operations with a mutex, use atomic operations. This has negligible impact on synchronization overhead during typical dss allocation, but is a substantial improvement for chunk_in_dss() and the newly added chunk_dss_mergeable(), which can be called multiple times during chunk deallocations. This change also has the advantage of avoiding tsd in deallocation paths associated with purging, which resolves potential deadlocks during thread exit due to attempted tsd resurrection. This resolves #425.
* Add/use adaptive spinning.Jason Evans2016-10-132-0/+55
| | | | | | | | Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.
* Fix and simplify decay-based purging.Jason Evans2016-10-111-18/+11
| | | | | | | | | | | | | | | | | | | | | Simplify decay-based purging attempts to only be triggered when the epoch is advanced, rather than every time purgeable memory increases. In a correctly functioning system (not previously the case; see below), this only causes a behavior difference if during subsequent purge attempts the least recently used (LRU) purgeable memory extent is initially too large to be purged, but that memory is reused between attempts and one or more of the next LRU purgeable memory extents are small enough to be purged. In practice this is an arbitrary behavior change that is within the set of acceptable behaviors. As for the purging fix, assure that arena->decay.ndirty is recorded *after* the epoch advance and associated purging occurs. Prior to this fix, it was possible for purging during epoch advance to cause a substantially underrepresentative (arena->ndirty - arena->decay.ndirty), i.e. the number of dirty pages attributed to the current epoch was too low, and a series of unintended purges could result. This fix is also relevant in the context of the simplification described above, but the bug's impact would be limited to over-purging at epoch advances.
* Do not advance decay epoch when time goes backwards.Jason Evans2016-10-112-0/+4
| | | | | | Instead, move the epoch backward in time. Additionally, add nstime_monotonic() and use it in debug builds to assert that time only goes backward if nstime_update() is using a non-monotonic time source.
* Refactor arena->decay_* into arena->decay.* (arena_decay_t).Jason Evans2016-10-111-46/+53
|
* Refine nstime_update().Jason Evans2016-10-103-3/+19
| | | | | | | | | | | | | | | | | | | | | Add missing #include <time.h>. The critical time facilities appear to have been transitively included via unistd.h and sys/time.h, but in principle this omission was capable of having caused clock_gettime(CLOCK_MONOTONIC, ...) to have been overlooked in favor of gettimeofday(), which in turn could cause spurious non-monotonic time updates. Refactor nstime_get() out of nstime_update() and add configure tests for all variants. Add CLOCK_MONOTONIC_RAW support (Linux-specific) and mach_absolute_time() support (OS X-specific). Do not fall back to clock_gettime(CLOCK_REALTIME, ...). This was a fragile Linux-specific workaround, which we're unlikely to use at all now that clock_gettime(CLOCK_MONOTONIC_RAW, ...) is supported, and if we have no choice besides non-monotonic clocks, gettimeofday() is only incrementally worse.
* Simplify run quantization.Jason Evans2016-10-061-1/+1
|
* Refactor runs_avail.Jason Evans2016-10-053-7/+32
| | | | | | | | Use pszind_t size classes rather than szind_t size classes, and always reserve space for NPSIZES elements. This removes unused heaps that are not multiples of the page size, and adds (currently) unused heaps for all huge size classes, with the immediate benefit that the size of arena_t allocations is constant (no longer dependent on chunk size).
* Implement pz2ind(), pind2sz(), and psz2u().Jason Evans2016-10-043-13/+116
| | | | | | | These compute size classes and indices similarly to size2index(), index2size() and s2u(), respectively, but using the subset of size classes that are multiples of the page size. Note that pszind_t and szind_t are not interchangeable.
* Use TSDN_NULL rather than NULL as appropriate.Jason Evans2016-10-041-2/+2
|
* Define 64-bits atomics unconditionallyMike Hommey2016-10-041-10/+8
| | | | They are used on all platforms in prng.h.
* Fix LG_QUANTUM definition for sparc64Eric Le Bihan2016-09-261-1/+1
| | | | | | GCC 4.9.3 cross-compiled for sparc64 defines __sparc_v9__, not __sparc64__ nor __sparcv9. This prevents LG_QUANTUM from being defined properly. Adding this new value to the check solves the issue.
* Don't use compact red-black trees with the pgi compilerElliot Ronaghan2016-09-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some bug (either in the red-black tree code, or in the pgi compiler) seems to cause red-black trees to become unbalanced. This issue seems to go away if we don't use compact red-black trees. Since red-black trees don't seem to be used much anymore, I opted for what seems to be an easy fix here instead of digging in and trying to find the root cause of the bug. Some context in case it's helpful: I experienced a ton of segfaults while using pgi as Chapel's target compiler with jemalloc 4.0.4. The little bit of debugging I did pointed me somewhere deep in red-black tree manipulation, but I didn't get a chance to investigate further. It looks like 4.2.0 replaced most uses of red-black trees with pairing-heaps, which seems to avoid whatever bug I was hitting. However, `make check_unit` was still failing on the rb test, so I figured the core issue was just being masked. Here's the `make check_unit` failure: ```sh === test/unit/rb === test_rb_empty: pass tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced node_remove:test/unit/rb.c:190: Failed assertion: (imbalances) == (0) --> 2 != 0: Tree is unbalanced <jemalloc>: test/unit/rb.c:43: Failed assertion: "pathp[-1].cmp < 0" test/test.sh: line 22: 12926 Aborted Test harness error ``` While starting to debug I saw the RB_COMPACT option and decided to check if turning that off resolved the bug. It seems to have fixed it (`make check_unit` passes and the segfaults under Chapel are gone) so it seems like on okay work-around. I'd imagine this has performance implications for red-black trees under pgi, but if they're not going to be used much anymore it's probably not a big deal.
* Check for __builtin_unreachable at configure timeElliot Ronaghan2016-09-262-16/+12
| | | | | | | | | | | | | | | | | | | Add a configure check for __builtin_unreachable instead of basing its availability on the __GNUC__ version. On OS X using gcc (a real gcc, not the bundled version that's just a gcc front-end) leads to a linker assertion: https://github.com/jemalloc/jemalloc/issues/266 It turns out that this is caused by a gcc bug resulting from the use of __builtin_unreachable(): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438 To work around this bug, check that __builtin_unreachable() actually works at configure time, and if it doesn't use abort() instead. The check is based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438#c21. With this `make check` passes with a homebrew installed gcc-5 and gcc-6.
* Add a missing prof_alloc_rollback() call.Jason Evans2016-06-081-0/+1
| | | | | | In the case where prof_alloc_prep() is called with an over-estimate of allocation size, and sampling doesn't end up being triggered, the tctx must be discarded.
* Fix potential VM map fragmentation regression.Jason Evans2016-06-071-2/+2
| | | | | | | | Revert 245ae6036c09cc11a72fab4335495d95cddd5beb (Support --with-lg-page values larger than actual page size.), because it could cause VM map fragmentation if the kernel grows mmap()ed memory downward. This resolves #391.
* Optimize witness fast path.Jason Evans2016-05-113-14/+153
| | | | | | | | | | | Short-circuit commonly called witness functions so that they only execute in debug builds, and remove equivalent guards from mutex functions. This avoids pointless code execution in witness_assert_lockless(), which is typically called twice per allocation/deallocation function invocation. Inline commonly called witness functions so that optimized builds can completely remove calls as dead code.
* Resolve bootstrapping issues when embedded in FreeBSD libc.Jason Evans2016-05-1115-270/+362
| | | | | | | | | | | | | b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.
* Add LG_QUANTUM definition for the RISC-V architecture.Jason Evans2016-05-071-0/+3
|
* Update private_symbols.txt.Jason Evans2016-05-061-2/+11
|
* Optimize the fast paths of calloc() and [m,d,sd]allocx().Jason Evans2016-05-063-55/+21
| | | | | | | | This is a broader application of optimizations to malloc() and free() in f4a0f32d340985de477bbe329ecdaecd69ed1055 (Fast-path improvement: reduce # of branches and unnecessary operations.). This resolves #321.
* Modify pages_map() to support mapping uncommitted virtual memory.Jason Evans2016-05-063-2/+13
| | | | | | | | | | | If the OS overcommits: - Commit all mappings in pages_map() regardless of whether the caller requested committed memory. - Linux-specific: Specify MAP_NORESERVE to avoid unfortunate interactions with heuristic overcommit mode during fork(2). This resolves #193.
* Add the stats.retained and stats.arenas.<i>.retained statistics.Jason Evans2016-05-042-0/+9
| | | | This resolves #367.
* Fix huge_palloc() regression.Jason Evans2016-05-043-6/+27
| | | | | | | | | | Split arena_choose() into arena_[i]choose() and use arena_ichoose() for arena lookup during internal allocation. This fixes huge_palloc() so that it always succeeds during extent node allocation. This regression was introduced by 66cd953514a18477eb49732e40d5c2ab5f1b12c5 (Do not allocate metadata via non-auto arenas, nor tcaches.).
* Fix witness/fork() interactions.Jason Evans2016-04-262-2/+4
| | | | | | Fix witness to clear its list of owned mutexes in the child if platform-specific malloc_mutex code re-initializes mutexes rather than unlocking them.
* Fix fork()-related lock rank ordering reversals.Jason Evans2016-04-265-5/+21
|
* Fix degenerate mb_write() compilation error.Jason Evans2016-04-231-3/+3
| | | | This resolves #375.
* Implement the arena.<i>.reset mallctl.Jason Evans2016-04-223-1/+7
| | | | | | | This makes it possible to discard all of an arena's allocations in a single operation. This resolves #146.
* Do not allocate metadata via non-auto arenas, nor tcaches.Jason Evans2016-04-226-23/+47
| | | | | This assures that all internally allocated metadata come from the first opt_narenas arenas, i.e. the automatically multiplexed arenas.
* Fix malloc_mutex_assert_[not_]owner() for --enable-lazy-lock case.Jason Evans2016-04-181-2/+2
|
* Update private_symbols.txt.Jason Evans2016-04-181-8/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | Change test-related mangling to simplify symbol filtering. The following commands can be used to detect missing/obsolete symbol mangling, with the caveat that the full set of symbols is based on the union of symbols generated by all configurations, some of which are platform-specific: ./autogen.sh --enable-debug --enable-prof --enable-lazy-lock make all tests nm -a lib/libjemalloc.a src/*.jet.o \ |grep " [TDBCR] " \ |awk '{print $3}' \ |sed -e 's/^\(je_\|jet_\(n_\)\?\)\([a-zA-Z0-9_]*\)/\3/g' \ |LC_COLLATE=C sort -u \ |grep -v \ -e '^\(malloc\|calloc\|posix_memalign\|aligned_alloc\|realloc\|free\)$' \ -e '^\(m\|r\|x\|s\|d\|sd\|n\)allocx$' \ -e '^mallctl\(\|nametomib\|bymib\)$' \ -e '^malloc_\(stats_print\|usable_size\|message\)$' \ -e '^\(memalign\|valloc\)$' \ -e '^__\(malloc\|memalign\|realloc\|free\)_hook$' \ -e '^pthread_create$' \ > /tmp/private_symbols.txt
* Update private_symbols.txtRajat Goel2016-04-181-0/+4
| | | Add 4 missing symbols
* Fix style nits.Jason Evans2016-04-171-1/+1
|
* Fix malloc_mutex_[un]lock() to conditionally check witness.Jason Evans2016-04-171-2/+2
| | | | Also remove tautological cassert(config_debug) calls.
* s/MALLOC_MUTEX_RANK_OMIT/WITNESS_RANK_OMIT/Jason Evans2016-04-141-1/+1
| | | | | | | | This fixes a compilation error caused by b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.). This resolves #375.
* Fix a compilation error.Jason Evans2016-04-141-2/+2
| | | | | | Fix a compilation error that occurs if Valgrind is not enabled. This regression was caused by b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.).
* Add witness, a simple online locking validator.Jason Evans2016-04-1415-187/+353
| | | | This resolves #358.
* Fix a style nit.Jason Evans2016-04-131-1/+2
|
* Simplify RTREE_HEIGHT_MAX definition.Jason Evans2016-04-111-29/+4
| | | | | Use 1U rather than ZU(1) in macro definitions, so that the preprocessor can evaluate the resulting expressions.