summaryrefslogtreecommitdiffstats
path: root/include/jemalloc/internal/jemalloc_internal.h.in
Commit message (Collapse)AuthorAgeFilesLines
* Header refactoring: Split up jemalloc_internal.hDavid Goldblatt2017-04-111-1303/+0
| | | | | | | | | | | | | | This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while now: - The source of system-wide definitions. - The catch-all include file. - The module header file for jemalloc.c This commit splits up this functionality. The system-wide definitions responsibility has moved to jemalloc_preamble.h. The catch-all include file is now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in jemalloc_internal_[externs|inlines|types].h, just as they are for the other modules.
* Header refactoring: break out ql.h dependenciesDavid Goldblatt2017-04-111-2/+0
|
* Header refactoring: break out qr.h dependenciesDavid Goldblatt2017-04-111-1/+0
|
* Header refactoring: break out rb.h dependenciesDavid Goldblatt2017-04-111-4/+0
|
* Header refactoring: break out ph.h dependenciesDavid Goldblatt2017-04-111-1/+0
|
* Header refactoring: Add CPP_PROLOGUE and CPP_EPILOGUE macrosDavid Goldblatt2017-04-111-4/+8
|
* Pass dealloc_ctx down free() fast path.Qi Wang2017-04-111-6/+6
| | | | This gets rid of the redundent rtree lookup down fast path.
* Add basic reentrancy-checking support, and allow arena_new to reenter.David Goldblatt2017-04-071-1/+8
| | | | | | | | | This checks whether or not we're reentrant using thread-local data, and, if we are, moves certain internal allocations to use arena 0 (which should be properly initialized after bootstrapping). The immediate thing this allows is spinning up threads in arena_new, which will enable spinning up background threads there.
* Add hooking functionalityDavid Goldblatt2017-04-071-0/+7
| | | | | This allows us to hook chosen functions and do interesting things there (in particular: reentrancy checking).
* Optimizing TSD and thread cache layout.Qi Wang2017-04-071-27/+42
| | | | | | | | | | 1) Re-organize TSD so that frequently accessed fields are closer to the beginning and more compact. Assuming 64-bit, the first 2.5 cachelines now contains everything needed on tcache fast path, expect the tcache struct itself. 2) Re-organize tcache and tbins. Take lg_fill_div out of tbin, and reduce tbin to 24 bytes (down from 32). Split tbins into tbins_small and tbins_large, and place tbins_small close to the beginning.
* Get rid of tcache_enabled_t as we have runtime init support.Qi Wang2017-04-071-1/+1
|
* Integrate auto tcache into TSD.Qi Wang2017-04-071-23/+73
| | | | | | | | | The embedded tcache is initialized upon tsd initialization. The avail arrays for the tbins will be allocated / deallocated accordingly during init / cleanup. With this change, the pointer to the auto tcache will always be available, as long as we have access to the TSD. tcache_available() (called in tcache_get()) is provided to check if we should use tcache.
* Move arena-tracking atomics in jemalloc.c to C11-styleDavid Goldblatt2017-04-051-4/+3
|
* Store arena index rather than (arena_t *) in extent_t.Jason Evans2017-03-261-1/+1
|
* Push down iealloc() calls.Jason Evans2017-03-231-30/+29
| | | | | Call iealloc() as deep into call chains as possible without causing redundant calls.
* Remove extent dereferences from the deallocation fast paths.Jason Evans2017-03-231-21/+20
|
* Remove extent arg from isalloc() and arena_salloc().Jason Evans2017-03-231-16/+6
|
* Incorporate szind/slab into rtree leaves.Jason Evans2017-03-231-21/+7
| | | | | | Expand and restructure the rtree API such that all common operations can be achieved with minimal work, regardless of whether the rtree leaf fields are independent versus packed into a single atomic pointer.
* Convert extent_t's usize to szind.Jason Evans2017-03-231-5/+2
| | | | | | | | Rather than storing usize only for large (and prof-promoted) allocations, store the size class index for allocations that reside within the extent, such that the size class index is valid for all extents that contain extant allocations, and invalid otherwise (mainly to make debugging simpler).
* Implement per-CPU arena.Qi Wang2017-03-091-20/+103
| | | | | | | | | | | | | | | | | | The new feature, opt.percpu_arena, determines thread-arena association dynamically based CPU id. Three modes are supported: "percpu", "phycpu" and disabled. "percpu" uses the current core id (with help from sched_getcpu()) directly as the arena index, while "phycpu" will assign threads on the same physical CPU to the same arena. In other words, "percpu" means # of arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of CPUs). Note that no runtime check on whether hyper threading is enabled is added yet. When enabled, threads will be migrated between arenas when a CPU change is detected. In the current design, to reduce overhead from reading CPU id, each arena tracks the thread accessed most recently. When a new thread comes in, we will read CPU id and update arena if necessary.
* Make type abbreviations consistent: ssize_t is zd everywhereDavid Goldblatt2017-03-071-2/+2
|
* Disentangle assert and utilDavid Goldblatt2017-03-061-3/+4
| | | | | | | | | This is the first header refactoring diff, #533. It splits the assert and util components into separate, hermetic, header files. In the process, it splits out two of the large sub-components of util (the stdio.h replacement, and bit manipulation routines) into their own components (malloc_io.h and bit_util.h). This is mostly to break up cyclic dependencies, but it also breaks off a good chunk of the catch-all-ness of util, which is nice.
* Introduce a backport of C11 atomicsDavid Goldblatt2017-03-031-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a backport of C11 atomics. It has four implementations; ranked in order of preference, they are: - GCC/Clang __atomic builtins - GCC/Clang __sync builtins - MSVC _Interlocked builtins - C11 atomics, from <stdatomic.h> The primary advantages are: - Close adherence to the standard API gives us a defined memory model. - Type safety: atomic objects are now separate types from non-atomic ones, so that it's impossible to mix up atomic and non-atomic updates (which is undefined behavior that compilers are starting to take advantage of). - Efficiency: we can specify ordering for operations, avoiding fences and atomic operations on strongly ordered architectures (example: `atomic_write_u32(ptr, val);` involves a CAS loop, whereas `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store. This diff leaves in the current atomics API (implementing them in terms of the backport). This lets us transition uses over piecemeal. Testing: This is by nature hard to test. I've manually tested the first three options on Linux on gcc by futzing with the #defines manually, on freebsd with gcc and clang, on MSVC, and on OS X with clang. All of these were x86 machines though, and we don't have any test infrastructure set up for non-x86 platforms.
* Convert arena->stats synchronization to atomics.Jason Evans2017-02-161-2/+2
|
* Convert arena->prof_accumbytes synchronization to atomics.Jason Evans2017-02-161-3/+4
|
* Disentangle arena and extent locking.Jason Evans2017-02-021-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor arena and extent locking protocols such that arena and extent locks are never held when calling into the extent_*_wrapper() API. This requires extra care during purging since the arena lock no longer protects the inner purging logic. It also requires extra care to protect extents from being merged with adjacent extents. Convert extent_t's 'active' flag to an enumerated 'state', so that retained extents are explicitly marked as such, rather than depending on ring linkage state. Refactor the extent collections (and their synchronization) for cached and retained extents into extents_t. Incorporate LRU functionality to support purging. Incorporate page count accounting, which replaces arena->ndirty and arena->stats.retained. Assert that no core locks are held when entering any internal [de]allocation functions. This is in addition to existing assertions that no locks are held when entering external [de]allocation functions. Audit and document synchronization protocols for all arena_t fields. This fixes a potential deadlock due to recursive allocation during gdump, in a similar fashion to b49c649bc18fff4bd10a1c8adbaf1f25f6453cb6 (Fix lock order reversal during gdump.), but with a necessarily much broader code impact.
* Replace tabs following #define with spaces.Jason Evans2017-01-211-35/+35
| | | | This resolves #564.
* Remove extraneous parens around return arguments.Jason Evans2017-01-211-62/+62
| | | | This resolves #540.
* Update brace style.Jason Evans2017-01-211-95/+77
| | | | | | | Add braces around single-line blocks, and remove line breaks before function-opening braces. This resolves #537.
* Don't rely on OSX SDK malloc/malloc.h for malloc_zone struct definitionsMike Hommey2017-01-181-1/+0
| | | | | | | | | | The SDK jemalloc is built against might be not be the latest for various reasons, but the resulting binary ought to work on newer versions of OSX. In order to ensure this, we need the fullest definitions possible, so copy what we need from the latest version of malloc/malloc.h available on opensource.apple.com.
* Remove leading blank lines from function bodies.Jason Evans2017-01-131-25/+0
| | | | This resolves #535.
* Break up headers into constituent partsDavid Goldblatt2017-01-121-125/+112
| | | | | | | | | | This is part of a broader change to make header files better represent the dependencies between one another (see https://github.com/jemalloc/jemalloc/issues/533). It breaks up component headers into smaller parts that can be made to have a simpler dependency graph. For the autogenerated headers (smoothstep.h and size_classes.h), no splitting was necessary, so I didn't add support to emit multiple headers.
* Remove mb.h, which is unusedDavid Goldblatt2017-01-111-4/+0
|
* Implement arena.<i>.destroy .Jason Evans2017-01-071-0/+2
| | | | | | | Add MALLCTL_ARENAS_DESTROYED for accessing destroyed arena stats as an analogue to MALLCTL_ARENAS_ALL. This resolves #382.
* Refactor ctl_stats_t.Jason Evans2017-01-071-7/+14
| | | | | | | Refactor ctl_stats_t to be a demand-zeroed non-growing data structure. To keep the size from being onerous (~60 MiB) on 32-bit systems, convert the arenas field to contain pointers rather than directly embedded ctl_arena_stats_t elements.
* Implement per arena base allocators.Jason Evans2016-12-271-24/+29
| | | | | | | | | | | | | Add/rename related mallctls: - Add stats.arenas.<i>.base . - Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal . - Add stats.arenas.<i>.resident . Modify the arenas.extend mallctl to take an optional (extent_hooks_t *) argument so that it is possible for all base allocations to be serviced by the specified extent hooks. This resolves #463.
* Add huge page configuration and pages_[no}huge().Jason Evans2016-12-271-0/+7
| | | | | | | | Add the --with-lg-hugepage configure option, but automatically configure LG_HUGEPAGE even if it isn't specified. Add the pages_[no]huge() functions, which toggle huge page state via madvise(..., MADV_[NO]HUGEPAGE) calls.
* jemalloc cpp new/delete bindingsDave Watson2016-12-131-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds cpp bindings for jemalloc, along with necessary autoconf settings. This is mostly to add sized deallocation support, which can't be added from C directly. Sized deallocation is ~10% microbench improvement. * Import ax_cxx_compile_stdcxx.m4 from the autoconf repo, seems like the easiest way to get c++14 detection. * Adds various other changes, like CXXFLAGS, to configure.ac. * Adds new rules to Makefile.in for src/jemalloc-cpp.cpp, and a basic unittest. * Both new and delete are overridden, to ensure jemalloc is used for both. * TODO future enhancement of avoiding extra PLT thunks for new and delete - sdallocx and malloc are publicly exported jemalloc symbols, using an alias would link them directly. Unfortunately, was having trouble getting it to play nice with jemalloc's namespace support. Testing: Tested gcc 4.8, gcc 5, gcc 5.2, clang 4.0. Only gcc >= 5 has sized deallocation support, verified that the rest build correctly. Tested mac osx and Centos. Tested --with-jemalloc-prefix and --without-export. This resolves #202.
* Fix psz/pind edge cases.Jason Evans2016-11-041-21/+7
| | | | | | | | | Add an "over-size" extent heap in which to store extents which exceed the maximum size class (plus cache-oblivious padding, if enabled). Remove psz2ind_clamp() and use psz2ind() instead so that trying to allocate the maximum size class can in principle succeed. In practice, this allows assertions to hold so that OOM errors can be successfully generated.
* Avoid negation of unsigned numbers.Jason Evans2016-10-281-2/+2
| | | | | | Rather than relying on two's complement negation for alignment mask generation, use bitwise not and addition. This dodges warnings from MSVC, and should be strength-reduced by compiler optimization anyway.
* Do not (recursively) allocate within tsd_fetch().Jason Evans2016-10-211-7/+3
| | | | | | | Refactor tsd so that tsdn_fetch() does not trigger allocation, since allocation could cause infinite recursion. This resolves #458.
* Add/use adaptive spinning.Jason Evans2016-10-131-0/+4
| | | | | | | | Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.
* Remove all vestiges of chunks.Jason Evans2016-10-121-4/+0
| | | | | | | | Remove mallctls: - opt.lg_chunk - stats.cactive This resolves #464.
* Fix size class overflow bugs.Jason Evans2016-10-031-3/+19
| | | | | | | Avoid calling s2u() on raw extent sizes in extent_recycle(). Clamp psz2ind() (implemented as psz2ind_clamp()) when inserting/removing into/from size-segregated extent heaps.
* Fix LG_QUANTUM definition for sparc64Eric Le Bihan2016-09-261-1/+1
| | | | | | GCC 4.9.3 cross-compiled for sparc64 defines __sparc_v9__, not __sparc64__ nor __sparcv9. This prevents LG_QUANTUM from being defined properly. Adding this new value to the check solves the issue.
* Don't use compact red-black trees with the pgi compilerElliot Ronaghan2016-06-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some bug (either in the red-black tree code, or in the pgi compiler) seems to cause red-black trees to become unbalanced. This issue seems to go away if we don't use compact red-black trees. Since red-black trees don't seem to be used much anymore, I opted for what seems to be an easy fix here instead of digging in and trying to find the root cause of the bug. Some context in case it's helpful: I experienced a ton of segfaults while using pgi as Chapel's target compiler with jemalloc 4.0.4. The little bit of debugging I did pointed me somewhere deep in red-black tree manipulation, but I didn't get a chance to investigate further. It looks like 4.2.0 replaced most uses of red-black trees with pairing-heaps, which seems to avoid whatever bug I was hitting. However, `make check_unit` was still failing on the rb test, so I figured the core issue was just being masked. Here's the `make check_unit` failure: ```sh === test/unit/rb === test_rb_empty: pass tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced node_remove:test/unit/rb.c:190: Failed assertion: (imbalances) == (0) --> 2 != 0: Tree is unbalanced <jemalloc>: test/unit/rb.c:43: Failed assertion: "pathp[-1].cmp < 0" test/test.sh: line 22: 12926 Aborted Test harness error ``` While starting to debug I saw the RB_COMPACT option and decided to check if turning that off resolved the bug. It seems to have fixed it (`make check_unit` passes and the segfaults under Chapel are gone) so it seems like on okay work-around. I'd imagine this has performance implications for red-black trees under pgi, but if they're not going to be used much anymore it's probably not a big deal.
* Fix potential VM map fragmentation regression.Jason Evans2016-06-071-1/+1
| | | | | | | | Revert 245ae6036c09cc11a72fab4335495d95cddd5beb (Support --with-lg-page values larger than actual page size.), because it could cause VM map fragmentation if the kernel grows mmap()ed memory downward. This resolves #391.
* Make tsd cleanup functions optional, remove noop cleanup functions.Jason Evans2016-06-061-4/+0
|
* Remove obsolete stats.arenas.<i>.metadata.mapped mallctl.Jason Evans2016-06-061-4/+4
| | | | | Rename stats.arenas.<i>.metadata.allocated mallctl to stats.arenas.<i>.metadata .
* Better document --enable-ivsalloc.Jason Evans2016-06-061-2/+9
|