summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Do not assume dss never decreases.stable-4Uwe L. Korn2017-09-111-36/+35
| | | | | | | | | An sbrk() caller outside jemalloc can decrease the dss, so add a separate atomic boolean to explicitly track whether jemalloc is concurrently calling sbrk(), rather than depending on state outside jemalloc's full control. Fixes #802 for stable-4
* Use openat syscall if availableJim Chen2017-05-122-0/+6
| | | | | | | | | | Some architectures like AArch64 may not have the open syscall because it was superseded by the openat syscall, so check and use SYS_openat if SYS_open is not available. Additionally, Android headers for AArch64 define SYS_open to __NR_open, even though __NR_open is undefined. Undefine SYS_open in that case so SYS_openat is used.
* Fix decommit-related run fragmentation.Jason Evans2017-04-182-39/+71
| | | | | | | | | | | | | | | | When allocating runs with alignment stricter than one page, commit after trimming the head/tail from the initial over-sized allocation, rather than before trimming. This avoids creating clean-but-committed runs; such runs do not get purged (and decommitted as a side effect), so they can cause unnecessary long-term run fragmentation. Do not commit decommitted memory in chunk_recycle() unless asked to by the caller. This allows recycled arena chunks to start in the decommitted state, and therefore increases the likelihood that purging after run deallocation will allow the arena chunk to become a single unused run, thus allowing the chunk as a whole to be discarded. This resolves #766.
* Add casts to CONF_HANDLE_T_U().Jason Evans2017-03-011-6/+6
| | | | | | | | This avoids signed/unsigned comparison warnings when specifying integer constants as inputs. Clean up whitespace and add clarifying parentheses for CONF_HANDLE_SIZE_T(opt_lg_chunk, ...).
* Fix/enhance THP integration.Jason Evans2017-02-285-16/+103
| | | | | | | | | | | | Detect whether chunks start off as THP-capable by default (according to the state of /sys/kernel/mm/transparent_hugepage/enabled), and use this as the basis for whether to call pages_nohuge() once per chunk during first purge of any of the chunk's page runs. Add the --disable-thp configure option, as well as the the opt.thp mallctl. This resolves #541.
* Handle race in stats_arena_bins_printQi Wang2017-02-271-3/+13
| | | | | | | | | When multiple threads calling stats_print, race could happen as we read the counters in separate mallctl calls; and the removed assertion could fail when other operations happened in between the mallctl calls. For simplicity, output "race" in the utilization field in this case. This resolves #616.
* Fix lg_chunk clamping for config_cache_oblivious.Jason Evans2017-02-272-16/+12
| | | | | | | | | | | | | | | Fix lg_chunk clamping to take into account cache-oblivious large allocation. This regression only resulted in incorrect behavior if !config_fill (false unless --disable-fill specified) and config_cache_oblivious (true unless --disable-cache-oblivious specified). This regression was introduced by 8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index randomization for large allocations.), which was first released in 4.0.0. This resolves #555.
* Fix huge-aligned allocation.Jason Evans2017-02-272-11/+35
| | | | | | | | This regression was caused by b9408d77a63a54fd331f9b81c884f68e6d57f2e5 (Fix/simplify chunk_recycle() allocation size computations.). This resolves #647.
* Test JSON output of malloc_stats_print() and fix bugs.Jason Evans2017-02-261-27/+34
| | | | | | | | Implement and test a JSON validation parser. Use the parser to validate JSON output from malloc_stats_print(), with a significant subset of supported output options. This resolves #583.
* Fix JSON-mode output for !config_stats and/or !config_prof cases.Jason Evans2017-02-261-9/+10
| | | | | | | | These bugs were introduced by b599b32280e1142856b0b96293a71e1684b1ccfb (Add "J" (JSON) support to malloc_stats_print().), which was first released in 4.3.0. This resolves #615.
* Fix chunk_alloc_dss() regression.Jason Evans2017-02-261-18/+27
| | | | | | | Fix chunk_alloc_dss() to account for bytes that are not a multiple of the chunk size. This regression was introduced by e2bcf037d445a84a71c7997670819ebd0a893b4a (Make dss operations lockless.), which was first released in 4.3.0.
* Relax witness assertions related to prof_gdump().Jason Evans2017-02-232-6/+11
| | | | | | | | | | In some cases the prof machinery allocates (in order to modify the bt2gctx hash table), and such operations are synchronized via bt2gctx_mtx. Rather than asserting that no locks are held on entry into functions that may call prof_gdump(), make the weaker assertion that no "core" locks are held. The prof machinery enqueues dumps triggered by prof_gdump() calls when bt2gctx_mtx is held, so this weakened assertion avoids false failures in such cases.
* Add witness_assert_depth[_to_rank]().Jason Evans2017-02-234-54/+53
| | | | | This makes it possible to make lock state assertions about precisely which locks are held.
* Fix/refactor tcaches synchronization.Jason Evans2017-02-233-28/+93
| | | | | | | Synchronize tcaches with tcaches_mtx rather than ctl_mtx. Add missing synchronization for tcache flushing. This bug was introduced by 1cb181ed632e7573fb4eab194e4d216867222d27 (Implement explicit tcache support.), which was first released in 4.0.0.
* Repair file permissions.Jason Evans2017-02-223-0/+0
| | | | | | | This regression was caused by 8f61fdedb908c29905103b22dda32ceb29cd8ede (Uniformly cast mallctl[bymib]() oldp/newp arguments to (void *).). This resolves #538.
* Avoid redeclaring glibc's secure_getenvTamir Duberstein2017-01-251-5/+6
| | | | | | Avoid the name secure_getenv to avoid redeclaring secure_getenv when secure_getenv is present but its use is manually disabled via ac_cv_func_secure_getenv=no.
* Fix lock order reversal during gdump.Jason Evans2017-01-243-28/+59
|
* Convert witness_assert_lockless() to witness_assert_lock_depth().Jason Evans2017-01-242-46/+47
| | | | | This makes it possible to make lock state assertions about precisely which locks are held.
* Add dummy implementations for most remaining OSX zone allocator functionsMike Hommey2017-01-181-10/+108
| | | | | | | | | | | | | | | | | | | | | | | | | Some system libraries are using malloc_default_zone() and then using some of the malloc_zone_* API. Under normal conditions, those functions check the malloc_zone_t/malloc_introspection_t struct for the values that are allowed to be NULL, so that a NULL deref doesn't happen. As of OSX 10.12, malloc_default_zone() doesn't return the actual default zone anymore, but returns a fake, wrapper zone. The wrapper zone defines all the possible functions in the malloc_zone_t/malloc_introspection_t struct (almost), and calls the function from the registered default zone (jemalloc in our case) on its own. Without checking whether the pointers are NULL. This means that a system library that calls e.g. malloc_zone_batch_malloc(malloc_default_zone(), ...) ends up trying to call jemalloc_zone.batch_malloc, which is NULL, and crash follows. So as of OSX 10.12, the default zone is required to have all the functions available (really, the same as the wrapper zone), even if they do nothing. This is arguably a bug in libsystem_malloc in OSX 10.12, but jemalloc still needs to work in that case.
* Don't rely on OSX SDK malloc/malloc.h for malloc_zone struct definitionsMike Hommey2017-01-181-36/+86
| | | | | | | | | | The SDK jemalloc is built against might be not be the latest for various reasons, but the resulting binary ought to work on newer versions of OSX. In order to ensure this, we need the fullest definitions possible, so copy what we need from the latest version of malloc/malloc.h available on opensource.apple.com.
* Add --disable-syscall.Jason Evans2016-12-042-4/+4
| | | | This resolves #517.
* Fix pages_purge() when using MADV_DONTNEED.Jason Evans2016-12-041-1/+1
| | | | | | This fixes a regression caused by e98a620c59ac20b13e2de796164cc67f050ed2bf (Mark partially purged arena chunks as non-hugepage.).
* Mark partially purged arena chunks as non-hugepage.Jason Evans2016-11-242-2/+53
| | | | | | | | | | | | Add the pages_[no]huge() functions, which toggle huge page state via madvise(..., MADV_[NO]HUGEPAGE) calls. The first time a page run is purged from within an arena chunk, call pages_nohuge() to tell the kernel to make no further attempts to back the chunk with huge pages. Upon arena chunk deletion, restore the associated virtual memory to its original state via pages_huge(). This resolves #243.
* Add pthread_atfork(3) feature test.Jason Evans2016-11-171-2/+3
| | | | | | Some versions of Android provide a pthreads library without providing pthread_atfork(), so in practice a separate feature test is necessary for the latter.
* Refactor madvise(2) configuration.Jason Evans2016-11-171-5/+5
| | | | | | | | | Add feature tests for the MADV_FREE and MADV_DONTNEED flags to madvise(2), so that MADV_FREE is detected and used for Linux kernel versions 4.5 and newer. Refactor pages_purge() so that on systems which support both flags, MADV_FREE is preferred over MADV_DONTNEED. This resolves #387.
* Avoid gcc tautological-compare warnings.Jason Evans2016-11-171-6/+6
|
* Avoid gcc type-limits warnings.Jason Evans2016-11-171-14/+32
|
* Fix an MSVC compiler warning.Jason Evans2016-11-161-1/+1
|
* Uniformly cast mallctl[bymib]() oldp/newp arguments to (void *).Jason Evans2016-11-153-9/+10
| | | | | This avoids warnings in some cases, and is otherwise generally good hygiene.
* Consistently use size_t rather than uint64_t for extent serial numbers.Jason Evans2016-11-152-3/+3
|
* Add extent serial numbers.Jason Evans2016-11-156-128/+225
| | | | | | | | Add extent serial numbers and use them where appropriate as a sort key that is higher priority than address, so that the allocation policy prefers older extents. This resolves #147.
* Simplify extent_quantize().Jason Evans2016-11-121-6/+3
| | | | | | | | 2cdf07aba971d1e21edc203e7d4073b6ce8e72b9 (Fix extent_quantize() to handle greater-than-huge-size extents.) solved a non-problem; the expression passed in to index2size() was never too large. However the expression could in principle underflow, so fix the actual (latent) bug and remove unnecessary complexity.
* Fix/simplify chunk_recycle() allocation size computations.Jason Evans2016-11-121-1/+4
| | | | | | | | | | | | Remove outer CHUNK_CEILING(s2u(...)) from alloc_size computation, since s2u() may overflow (and return 0), and CHUNK_CEILING() is only needed around the alignment portion of the computation. This fixes a regression caused by 5707d6f952c71baa2f19102479859012982ac821 (Quantize szad trees by size class.) and first released in 4.0.0. This resolves #497.
* Fix extent_quantize() to handle greater-than-huge-size extents.Jason Evans2016-11-121-5/+19
| | | | | | | | | | | Allocation requests can't directly create extents that exceed HUGE_MAXCLASS, but extent merging can create them. This fixes a regression caused by 8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index randomization for large allocations.) and first released in 4.0.0. This resolves #497.
* Refactor prng to not use 64-bit atomics on 32-bit platforms.Jason Evans2016-11-073-6/+8
| | | | This resolves #495.
* Fix run leak.Jason Evans2016-11-071-5/+7
| | | | | | | | | | | Fix arena_run_first_best_fit() to search all potentially non-empty runs_avail heaps, rather than ignoring the heap that contains runs larger than large_maxclass, but less than chunksize. This fixes a regression caused by f193fd80cf1f99bce2bc9f5f4a8b149219965da2 (Refactor runs_avail.). This resolves #493.
* Fix arena data structure size calculation.Jason Evans2016-11-041-2/+2
| | | | | | | | | | Fix paren placement so that QUANTUM_CEILING() applies to the correct portion of the expression that computes how much memory to base_alloc(). In practice this bug had no impact. This was caused by 5d8db15db91c85d47b343cfc07fc6ea736f0de48 (Simplify run quantization.), which in turn fixed an over-allocation regression caused by 3c4d92e82a31f652a7c77ca937a02d0185085b06 (Add per size class huge allocation statistics.).
* Fix large allocation to search optimal size class heap.Jason Evans2016-11-041-1/+1
| | | | | | | | | | | | | | | | | Fix arena_run_alloc_large_helper() to not convert size to usize when searching for the first best fit via arena_run_first_best_fit(). This allows the search to consider the optimal quantized size class, so that e.g. allocating and deallocating 40 KiB in a tight loop can reuse the same memory. This regression was nominally caused by 5707d6f952c71baa2f19102479859012982ac821 (Quantize szad trees by size class.), but it did not commonly cause problems until 8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index randomization for large allocations.). These regressions were first released in 4.0.0. This resolves #487.
* Fix chunk_alloc_cache() to support decommitted allocation.Jason Evans2016-11-042-11/+13
| | | | | | | | Fix chunk_alloc_cache() to support decommitted allocation, and use this ability in arena_chunk_alloc_internal() and arena_stash_dirty(), so that chunks don't get permanently stuck in a hybrid state. This resolves #487.
* Check for existance of CPU_COUNT macro before using it.Dave Watson2016-11-031-1/+7
| | | | This resolves #485.
* Do not use syscall(2) on OS X 10.12 (deprecated).Jason Evans2016-11-032-4/+4
|
* Add os_unfair_lock support.Jason Evans2016-11-031-0/+2
| | | | | OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended replacement.
* Fix/refactor zone allocator integration code.Jason Evans2016-11-031-85/+107
| | | | | | | | | Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes, since OS X 10.12 cannot tolerate a child unlocking mutexes that were locked by its parent. Refactor; this was a side effect of experimenting with zone {de,re}registration during fork(2).
* Add "J" (JSON) support to malloc_stats_print().Jason Evans2016-11-011-377/+854
| | | | This resolves #474.
* Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW.Jason Evans2016-10-301-2/+2
| | | | | | | | The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC), whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but still has resolution (~1ms) that is adequate for our purposes. This resolves #479.
* Use syscall(2) rather than {open,read,close}(2) during boot.Jason Evans2016-10-301-0/+19
| | | | | | | | | Some applications wrap various system calls, and if they call the allocator in their wrappers, unexpected reentry can result. This is not a general solution (many other syscalls are spread throughout the code), but this resolves a bootstrapping issue that is apparently common. This resolves #443.
* Do not mark malloc_conf as weak on Windows.Jason Evans2016-10-291-1/+1
| | | | | | | This works around malloc_conf not being properly initialized by at least the cygwin toolchain. Prior build system changes to use -Wl,--[no-]whole-archive may be necessary for malloc_conf resolution to work properly as a non-weak symbol (not tested).
* Do not mark malloc_conf as weak for unit tests.Jason Evans2016-10-291-1/+5
| | | | | | | This is generally correct (no need for weak symbols since no jemalloc library is involved in the link phase), and avoids linking problems (apparently unininitialized non-NULL malloc_conf) when using cygwin with gcc.
* Support static linking of jemalloc with glibcDave Watson2016-10-281-0/+31
| | | | | | | | | | | | | | | | | | | | | | | glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.
* Fix over-sized allocation of rtree leaf nodes.Jason Evans2016-10-281-1/+1
| | | | | Use the correct level metadata when allocating child nodes so that leaf nodes don't end up over-sized (2^16 elements vs 2^4 elements).