summaryrefslogtreecommitdiffstats
path: root/include/jemalloc/internal/rtree.h
Commit message (Collapse)AuthorAgeFilesLines
* ARM: Don't extend bit LG_VADDR to compute high address bits.David Goldblatt2017-10-021-0/+12
| | | | | | | In userspace ARM on Linux, zero-ing the high bits is the correct way to do this. This doesn't fix the fact that we currently set LG_VADDR to 48 on ARM, when in fact larger virtual address sizes are coming soon. We'll cross that bridge when we come to it.
* Header refactoring: unify and de-catchall rtree module.David Goldblatt2017-05-311-0/+474
|
* Break up headers into constituent partsDavid Goldblatt2017-01-121-608/+0
| | | | | | | | | | This is part of a broader change to make header files better represent the dependencies between one another (see https://github.com/jemalloc/jemalloc/issues/533). It breaks up component headers into smaller parts that can be made to have a simpler dependency graph. For the autogenerated headers (smoothstep.h and size_classes.h), no splitting was necessary, so I didn't add support to emit multiple headers.
* jemalloc cpp new/delete bindingsDave Watson2016-12-131-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds cpp bindings for jemalloc, along with necessary autoconf settings. This is mostly to add sized deallocation support, which can't be added from C directly. Sized deallocation is ~10% microbench improvement. * Import ax_cxx_compile_stdcxx.m4 from the autoconf repo, seems like the easiest way to get c++14 detection. * Adds various other changes, like CXXFLAGS, to configure.ac. * Adds new rules to Makefile.in for src/jemalloc-cpp.cpp, and a basic unittest. * Both new and delete are overridden, to ensure jemalloc is used for both. * TODO future enhancement of avoiding extra PLT thunks for new and delete - sdallocx and malloc are publicly exported jemalloc symbols, using an alias would link them directly. Unfortunately, was having trouble getting it to play nice with jemalloc's namespace support. Testing: Tested gcc 4.8, gcc 5, gcc 5.2, clang 4.0. Only gcc >= 5 has sized deallocation support, verified that the rest build correctly. Tested mac osx and Centos. Tested --with-jemalloc-prefix and --without-export. This resolves #202.
* Fix long spinning in rtree_node_initDave Watson2016-11-031-4/+2
| | | | | | | | | | | | | | | | | rtree_node_init spinlocks the node, allocates, and then sets the node. This is under heavy contention at the top of the tree if many threads start to allocate at the same time. Instead, take a per-rtree sleeping mutex to reduce spinning. Tested both pthreads and osx OSSpinLock, and both reduce spinning adequately Previous benchmark time: ./ttest1 500 100 ~15s New benchmark time: ./ttest1 500 100 .57s
* Add rtree lookup path caching.Jason Evans2016-06-061-35/+148
| | | | | | | | | rtree-based extent lookups remain more expensive than chunk-based run lookups, but with this optimization the fast path slowdown is ~3 CPU cycles per metadata lookup (on Intel Core i7-4980HQ), versus ~11 cycles prior. The path caching speedup tends to degrade gracefully unless allocated memory is spread far apart (as is the case when using a mixture of sbrk() and mmap()).
* Make tsd cleanup functions optional, remove noop cleanup functions.Jason Evans2016-06-061-1/+0
|
* Miscellaneous s/chunk/extent/ updates.Jason Evans2016-06-061-1/+1
|
* Add rtree element witnesses.Jason Evans2016-06-031-10/+66
|
* Refactor rtree to always use base_alloc() for node allocation.Jason Evans2016-06-031-38/+37
|
* Add element acquire/release capabilities to rtree.Jason Evans2016-06-031-80/+155
| | | | | | | | This makes it possible to acquire short-term "ownership" of rtree elements so that it is possible to read an extent pointer *and* read the extent's contents with a guarantee that the element will not be modified until the ownership is released. This is intended as a mechanism for resolving rtree read/write races rather than as a way to lock extents.
* Rename extent_node_t to extent_t.Jason Evans2016-05-161-9/+9
|
* Simplify RTREE_HEIGHT_MAX definition.Jason Evans2016-04-111-29/+4
| | | | | Use 1U rather than ZU(1) in macro definitions, so that the preprocessor can evaluate the resulting expressions.
* Always inline performance-critical rtree operations.Jason Evans2016-03-231-9/+10
|
* Optimize rtree_get().Jason Evans2016-03-231-35/+131
| | | | | | | Specialize fast path to avoid code that cannot execute for dependent loads. Manually unroll.
* Avoid atomic operations for dependent rtree reads.Jason Evans2015-05-161-7/+24
|
* Fix type punning in calls to atomic operation functions.Jason Evans2015-05-081-5/+9
|
* Fix unsigned comparison underflow.Jason Evans2015-03-121-1/+1
| | | | These bugs only affected tests and debug builds.
* Move centralized chunk management into arenas.Jason Evans2015-02-121-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.
* Refactor rtree to be lock-free.Jason Evans2015-02-051-122/+222
| | | | | | | | | | | | | | | | | | Recent huge allocation refactoring associates huge allocations with arenas, but it remains necessary to quickly look up huge allocation metadata during reallocation/deallocation. A global radix tree remains a good solution to this problem, but locking would have become the primary bottleneck after (upcoming) migration of chunk management from global to per arena data structures. This lock-free implementation uses double-checked reads to traverse the tree, so that in the steady state, each read or write requires only a single atomic operation. This implementation also assures that no more than two tree levels actually exist, through a combination of careful virtual memory allocation which makes large sparse nodes cheap, and skipping the root node on x64 (possible because the top 16 bits are all 0 in practice).
* Convert rtree from (void *) to (uint8_t) storage.Jason Evans2014-01-031-18/+20
| | | | | | | | | | | | | Reduce rtree memory usage by storing booleans (1 byte each) rather than pointers. The rtree code is only used to record whether jemalloc manages a chunk of memory, so there's no need to store pointers in the rtree. Increase rtree node size to 64 KiB in order to reduce tree depth from 13 to 3 on 64-bit systems. The conversion to more compact leaf nodes was enough by itself to make the rtree depth 1 on 32-bit systems; due to the fact that root nodes are smaller than the specified node size if possible, the node size change has no impact on 32-bit systems (assuming default chunk size).
* Add rtree unit tests.Jason Evans2014-01-031-4/+10
|
* Fix fork(2)-related deadlocks.Jason Evans2012-10-091-0/+3
| | | | | | | | | | | | | | | | | Add a library constructor for jemalloc that initializes the allocator. This fixes a race that could occur if threads were created by the main thread prior to any memory allocation, followed by fork(2), and then memory allocation in the child process. Fix the prefork/postfork functions to acquire/release the ctl, prof, and rtree mutexes. This fixes various fork() child process deadlocks, but one possible deadlock remains (intentionally) unaddressed: prof backtracing can acquire runtime library mutexes, so deadlock is still possible if heap profiling is enabled during fork(). This deadlock is known to be a real issue in at least the case of libgcc-based backtracing. Reported by tfengjun.
* Move repo contents in jemalloc/ to top level.Jason Evans2011-04-011-0/+161