summaryrefslogtreecommitdiffstats
path: root/src/jemalloc.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Header refactoring: move assert.h out of the catch-allDavid Goldblatt2017-04-191-0/+1
|
* Header refactoring: move util.h out of the catchallDavid Goldblatt2017-04-191-0/+1
|
* Header refactoring: move malloc_io.h out of the catchallDavid Goldblatt2017-04-191-0/+1
|
* Switch to fine-grained reentrancy support.Qi Wang2017-04-151-71/+50
| | | | | | | Previously we had a general detection and support of reentrancy, at the cost of having branches and inc / dec operations on fast paths. To avoid taxing fast paths, we move the reentrancy operations onto tsd slow state, and only modify reentrancy level around external calls (that might trigger reentrancy).
* Bundle 3 branches on fast path into tsd_state.Qi Wang2017-04-141-36/+65
| | | | | | Added tsd_state_nominal_slow, which on fast path malloc() incorporates tcache_enabled check, and on fast path free() bundles both malloc_slow and tcache_enabled branches.
* Pass alloc_ctx down profiling path.Qi Wang2017-04-121-30/+54
| | | | | | With this change, when profiling is enabled, we avoid doing redundant rtree lookups. Also changed dalloc_atx_t to alloc_atx_t, as it's now used on allocation path as well (to speed up profiling).
* Pass dalloc_ctx down the sdalloc path.Qi Wang2017-04-121-2/+11
| | | | This avoids redundant rtree lookups.
* Header refactoring: move atomic.h out of the catch-allDavid Goldblatt2017-04-111-0/+2
|
* Header refactoring: Split up jemalloc_internal.hDavid Goldblatt2017-04-111-1/+2
| | | | | | | | | | | | | | This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while now: - The source of system-wide definitions. - The catch-all include file. - The module header file for jemalloc.c This commit splits up this functionality. The system-wide definitions responsibility has moved to jemalloc_preamble.h. The catch-all include file is now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in jemalloc_internal_[externs|inlines|types].h, just as they are for the other modules.
* Pass dealloc_ctx down free() fast path.Qi Wang2017-04-111-5/+13
| | | | This gets rid of the redundent rtree lookup down fast path.
* Move reentrancy_level to the beginning of TSD.Qi Wang2017-04-071-1/+1
|
* Add basic reentrancy-checking support, and allow arena_new to reenter.David Goldblatt2017-04-071-12/+82
| | | | | | | | | This checks whether or not we're reentrant using thread-local data, and, if we are, moves certain internal allocations to use arena 0 (which should be properly initialized after bootstrapping). The immediate thing this allows is spinning up threads in arena_new, which will enable spinning up background threads there.
* Integrate auto tcache into TSD.Qi Wang2017-04-071-8/+8
| | | | | | | | | The embedded tcache is initialized upon tsd initialization. The avail arrays for the tbins will be allocated / deallocated accordingly during init / cleanup. With this change, the pointer to the auto tcache will always be available, as long as we have access to the TSD. tcache_available() (called in tcache_get()) is provided to check if we should use tcache.
* Move arena-tracking atomics in jemalloc.c to C11-styleDavid Goldblatt2017-04-051-6/+8
|
* Do proper cleanup for tsd_state_reincarnated.Qi Wang2017-04-041-7/+2
| | | | | Also enable arena_bind under non-nominal state, as the cleanup will be handled correctly now.
* Force inline ifree to avoid function call costs on fast path.Qi Wang2017-03-251-2/+2
| | | | | Without ALWAYS_INLINE, sometimes ifree() gets compiled into its own function, which adds overhead on the fast path.
* Push down iealloc() calls.Jason Evans2017-03-231-84/+57
| | | | | Call iealloc() as deep into call chains as possible without causing redundant calls.
* Remove extent dereferences from the deallocation fast paths.Jason Evans2017-03-231-18/+14
|
* Remove extent arg from isalloc() and arena_salloc().Jason Evans2017-03-231-16/+13
|
* Not re-binding iarena when migrate between arenas.Qi Wang2017-03-211-1/+0
|
* Implement two-phase decay-based purging.Jason Evans2017-03-151-2/+4
| | | | | | | | | | | | | | | | | | | | Split decay-based purging into two phases, the first of which uses lazy purging to convert dirty pages to "muzzy", and the second of which uses forced purging, decommit, or unmapping to convert pages to clean or destroy them altogether. Not all operating systems support lazy purging, yet the application may provide extent hooks that implement lazy purging, so care must be taken to dynamically omit the first phase when necessary. The mallctl interfaces change as follows: - opt.decay_time --> opt.{dirty,muzzy}_decay_time - arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time - arenas.decay_time --> arenas.{dirty,muzzy}_decay_time - stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy} - stats.arenas.<i>.{npurge,nmadvise,purged} --> stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged} This resolves #521.
* Implement per-CPU arena.Qi Wang2017-03-091-24/+115
| | | | | | | | | | | | | | | | | | The new feature, opt.percpu_arena, determines thread-arena association dynamically based CPU id. Three modes are supported: "percpu", "phycpu" and disabled. "percpu" uses the current core id (with help from sched_getcpu()) directly as the arena index, while "phycpu" will assign threads on the same physical CPU to the same arena. In other words, "percpu" means # of arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of CPUs). Note that no runtime check on whether hyper threading is enabled is added yet. When enabled, threads will be migrated between arenas when a CPU change is detected. In the current design, to reduce overhead from reading CPU id, each arena tracks the thread accessed most recently. When a new thread comes in, we will read CPU id and update arena if necessary.
* Fix arena_prefork lock rank order for witness.Qi Wang2017-03-091-6/+14
| | | | | | | | When witness is enabled, lock rank order needs to be preserved during prefork, not only for each arena, but also across arenas. This change breaks arena_prefork into further stages to ensure valid rank order across arenas. Also changed test/unit/fork to use a manual arena to catch this case.
* Store associated arena in tcache.Qi Wang2017-03-071-0/+1
| | | | | | This fixes tcache_flush for manual tcaches, which wasn't able to find the correct arena it associated with. Also changed the decay test to cover this case (by using manually created arenas).
* Add casts to CONF_HANDLE_T_U().Jason Evans2017-03-011-4/+4
| | | | | This avoids signed/unsigned comparison warnings when specifying integer constants as inputs.
* Synchronize arena->tcache_ql with arena->tcache_ql_mtx.Jason Evans2017-02-161-8/+3
| | | | This replaces arena->lock synchronization.
* Replace spin_init() with SPIN_INITIALIZER.Jason Evans2017-02-091-3/+1
|
* Optimize compute_size_with_overflow().Jason Evans2017-02-041-5/+16
| | | | Do not check for overflow unless it is actually a possibility.
* Fix compute_size_with_overflow().Jason Evans2017-02-041-1/+1
| | | | | | | Fix compute_size_with_overflow() to use a high_bits mask that has the high bits set, rather than the low bits. This regression was introduced by 5154ff32ee8c37bacb6afd8a07b923eb33228357 (Unify the allocation paths).
* Fix/refactor tcaches synchronization.Jason Evans2017-02-021-0/+3
| | | | | | | Synchronize tcaches with tcaches_mtx rather than ctl_mtx. Add missing synchronization for tcache flushing. This bug was introduced by 1cb181ed632e7573fb4eab194e4d216867222d27 (Implement explicit tcache support.), which was first released in 4.0.0.
* Fix a bug in which a potentially invalid usize replaced sizeDavid Goldblatt2017-01-251-3/+3
| | | | | | | | | In the refactoring that unified the allocation paths, usize was substituted for size. This worked fine under the default test configuration, but triggered asserts when we started beefing up our CI testing. This change fixes the issue, and clarifies the comment describing the argument selection that it got wrong.
* Avoid redeclaring glibc's secure_getenvTamir Duberstein2017-01-251-4/+6
| | | | | | Avoid the name secure_getenv to avoid redeclaring secure_getenv when secure_getenv is present but its use is manually disabled via ac_cv_func_secure_getenv=no.
* Replace tabs following #define with spaces.Jason Evans2017-01-211-47/+47
| | | | This resolves #564.
* Remove extraneous parens around return arguments.Jason Evans2017-01-211-83/+81
| | | | This resolves #540.
* Update brace style.Jason Evans2017-01-211-243/+254
| | | | | | | Add braces around single-line blocks, and remove line breaks before function-opening braces. This resolves #537.
* Unify the allocation pathsDavid Goldblatt2017-01-201-392/+505
| | | | | | | | | This unifies the allocation paths for malloc, posix_memalign, aligned_alloc, calloc, memalign, valloc, and mallocx, so that they all share common code where they can. There's more work that could be done here, but I think this is the smallest discrete change in this direction.
* Remove leading blank lines from function bodies.Jason Evans2017-01-131-27/+0
| | | | This resolves #535.
* Implement arena.<i>.destroy .Jason Evans2017-01-071-1/+1
| | | | | | | Add MALLCTL_ARENAS_DESTROYED for accessing destroyed arena stats as an analogue to MALLCTL_ARENAS_ALL. This resolves #382.
* Rename the arenas.extend mallctl to arenas.create.Jason Evans2017-01-071-1/+1
|
* Implement per arena base allocators.Jason Evans2016-12-271-18/+17
| | | | | | | | | | | | | Add/rename related mallctls: - Add stats.arenas.<i>.base . - Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal . - Add stats.arenas.<i>.resident . Modify the arenas.extend mallctl to take an optional (extent_hooks_t *) argument so that it is possible for all base allocations to be serviced by the specified extent hooks. This resolves #463.
* Add pthread_atfork(3) feature test.Jason Evans2016-11-171-2/+3
| | | | | | Some versions of Android provide a pthreads library without providing pthread_atfork(), so in practice a separate feature test is necessary for the latter.
* Avoid gcc type-limits warnings.Jason Evans2016-11-171-12/+30
|
* Fix psz/pind edge cases.Jason Evans2016-11-041-1/+2
| | | | | | | | | Add an "over-size" extent heap in which to store extents which exceed the maximum size class (plus cache-oblivious padding, if enabled). Remove psz2ind_clamp() and use psz2ind() instead so that trying to allocate the maximum size class can in principle succeed. In practice, this allows assertions to hold so that OOM errors can be successfully generated.
* Check for existance of CPU_COUNT macro before using it.Dave Watson2016-11-031-1/+7
| | | | This resolves #485.
* Do not mark malloc_conf as weak on Windows.Jason Evans2016-10-291-1/+1
| | | | | | | This works around malloc_conf not being properly initialized by at least the cygwin toolchain. Prior build system changes to use -Wl,--[no-]whole-archive may be necessary for malloc_conf resolution to work properly as a non-weak symbol (not tested).
* Do not mark malloc_conf as weak for unit tests.Jason Evans2016-10-291-1/+5
| | | | | | | This is generally correct (no need for weak symbols since no jemalloc library is involved in the link phase), and avoids linking problems (apparently unininitialized non-NULL malloc_conf) when using cygwin with gcc.
* Support static linking of jemalloc with glibcDave Watson2016-10-281-0/+31
| | | | | | | | | | | | | | | | | | | | | | | glibc defines its malloc implementation with several weak and strong symbols: strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc) strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree) strong_alias (__libc_free, __free) strong_alias (__libc_free, free) strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc) The issue is not with the weak symbols, but that other parts of glibc depend on __libc_malloc explicitly. Defining them in terms of jemalloc API's allows the linker to drop glibc's malloc.o completely from the link, and static linking no longer results in symbol collisions. Another wrinkle: jemalloc during initialization calls sysconf to get the number of CPU's. GLIBC allocates for the first time before setting up isspace (and other related) tables, which are used by sysconf. Instead, use the pthread API to get the number of CPUs with GLIBC, which seems to work. This resolves #442.
* Do not (recursively) allocate within tsd_fetch().Jason Evans2016-10-211-1/+1
| | | | | | | Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.
* Make dss operations lockless.Jason Evans2016-10-131-5/+1
| | | | | | | | | | | | | | Rather than protecting dss operations with a mutex, use atomic operations. This has negligible impact on synchronization overhead during typical dss allocation, but is a substantial improvement for extent_in_dss() and the newly added extent_dss_mergeable(), which can be called multiple times during extent deallocations. This change also has the advantage of avoiding tsd in deallocation paths associated with purging, which resolves potential deadlocks during thread exit due to attempted tsd resurrection. This resolves #425.
* Add/use adaptive spinning.Jason Evans2016-10-131-1/+4
| | | | | | | | Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.