summaryrefslogtreecommitdiffstats
path: root/src/tcache.c
Commit message (Collapse)AuthorAgeFilesLines
* Refactor rtree to always use base_alloc() for node allocation.Jason Evans2016-06-031-6/+7
|
* Use rtree-based chunk lookups rather than pointer bit twiddling.Jason Evans2016-06-031-14/+17
| | | | | | | | | | | Look up chunk metadata via the radix tree, rather than using CHUNK_ADDR2BASE(). Propagate pointer's containing extent. Minimize extent lookups by doing a single lookup (e.g. in free()) and propagating the pointer's extent into nearly all the functions that may need it.
* Rename extent_node_t to extent_t.Jason Evans2016-05-161-5/+4
|
* Resolve bootstrapping issues when embedded in FreeBSD libc.Jason Evans2016-05-111-61/+62
| | | | | | | | | | | | | b2c0d6322d2307458ae2b28545f8a5c9903d7ef5 (Add witness, a simple online locking validator.) caused a broad propagation of tsd throughout the internal API, but tsd_fetch() was designed to fail prior to tsd bootstrapping. Fix this by splitting tsd_t into non-nullable tsd_t and nullable tsdn_t, and modifying all internal APIs that do not critically rely on tsd to take nullable pointers. Furthermore, add the tsd_booted_get() function so that tsdn_fetch() can probe whether tsd bootstrapping is complete and return NULL if not. All dangerous conversions of nullable pointers are tsdn_tsd() calls that assert-fail on invalid conversion.
* Fix huge_palloc() regression.Jason Evans2016-05-041-5/+5
| | | | | | | | | | Split arena_choose() into arena_[i]choose() and use arena_ichoose() for arena lookup during internal allocation. This fixes huge_palloc() so that it always succeeds during extent node allocation. This regression was introduced by 66cd953514a18477eb49732e40d5c2ab5f1b12c5 (Do not allocate metadata via non-auto arenas, nor tcaches.).
* Do not allocate metadata via non-auto arenas, nor tcaches.Jason Evans2016-04-221-8/+12
| | | | | This assures that all internally allocated metadata come from the first opt_narenas arenas, i.e. the automatically multiplexed arenas.
* Add witness, a simple online locking validator.Jason Evans2016-04-141-44/+47
| | | | This resolves #358.
* Constify various internal arena APIs.Jason Evans2016-03-231-1/+1
|
* Code formatting fixes.Jason Evans2016-03-231-1/+2
|
* Refactor arenas array (fixes deadlock).Jason Evans2016-02-251-2/+3
| | | | | | | | | | | | Refactor the arenas array, which contains pointers to all extant arenas, such that it starts out as a sparse array of maximum size, and use double-checked atomics-based reads as the basis for fast and simple arena_get(). Additionally, reduce arenas_lock's role such that it only protects against arena initalization races. These changes remove the possibility for arena lookups to trigger locking, which resolves at least one known (fork-related) deadlock. This resolves #315.
* Silence miscellaneous 64-to-32-bit data loss warnings.Jason Evans2016-02-241-1/+1
|
* Make nhbins unsigned rather than size_t.Jason Evans2016-02-241-1/+1
|
* Implement decay-based unused dirty page purging.Jason Evans2016-02-201-1/+3
| | | | | | | | | | | | | | | | This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. Add mallctls: - opt.purge - opt.decay_time - arena.<i>.decay - arena.<i>.decay_time - arenas.decay_time - stats.arenas.<i>.decay_time This resolves #325.
* Use ticker for incremental tcache GC.Jason Evans2016-02-201-1/+2
|
* Fast-path improvement: reduce # of branches and unnecessary operations.Qi Wang2015-11-101-14/+19
| | | | | | - Combine multiple runtime branches into a single malloc_slow check. - Avoid calling arena_choose / size2index / index2size on fast path. - A few micro optimizations.
* Rename arena_maxclass to large_maxclass.Jason Evans2015-09-121-3/+3
| | | | | arena_maxclass is no longer an appropriate name, because arenas also manage huge allocations.
* Rename index_t to szind_t to avoid an existing type on Solaris.Jason Evans2015-08-191-4/+4
| | | | This resolves #256.
* Impose a minimum tcache count for small size classes.Jason Evans2015-05-201-1/+5
| | | | | | Now that small allocation runs have fewer regions due to run metadata residing in chunk headers, an explicit minimum tcache count is needed to make sure that tcache adequately amortizes synchronization overhead.
* Fix nhbins calculation.Jason Evans2015-05-201-1/+1
| | | | | This regression was introduced by 155bfa7da18cab0d21d87aa2dce4554166836f5d (Normalize size classes.).
* Integrate whole chunks into unused dirty page purging machinery.Jason Evans2015-02-171-4/+5
| | | | | | | | | | | | Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.
* If MALLOCX_ARENA(a) is specified, use it during tcache fill.Jason Evans2015-02-131-9/+10
|
* Move centralized chunk management into arenas.Jason Evans2015-02-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.
* Fix a regression in tcache_bin_flush_small().Jason Evans2015-02-121-1/+1
| | | | | | Fix a serious regression in tcache_bin_flush_small() that was introduced by 1cb181ed632e7573fb4eab194e4d216867222d27 (Implement explicit tcache support.).
* Test and fix tcache ID recycling.Jason Evans2015-02-101-1/+1
|
* Implement explicit tcache support.Jason Evans2015-02-101-41/+125
| | | | | | | | | Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.
* Implement metadata statistics.Jason Evans2015-01-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.
* Add configure options.Jason Evans2014-10-101-41/+12
| | | | | | | | | | | | Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.
* Refactor/fix arenas manipulation.Jason Evans2014-10-081-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.
* Normalize size classes.Jason Evans2014-10-061-4/+4
| | | | | | | | | | Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).
* Fix tsd cleanup regressions.Jason Evans2014-10-041-1/+2
| | | | | | | | | | | | | | | | Fix tsd cleanup regressions that were introduced in 5460aa6f6676c7f253bfcb75c028dfd38cae8aaf (Convert all tsd variables to reside in a single tsd structure.). These regressions were twofold: 1) tsd_tryget() should never (and need never) return NULL. Rename it to tsd_fetch() and simplify all callers. 2) tsd_*_set() must only be called when tsd is in the nominal state, because cleanup happens during the nominal-->purgatory transition, and re-initialization must not happen while in the purgatory state. Add tsd_nominal() and use it as needed. Note that tsd_*{p,}_get() can still be used as long as no re-initialization that would require cleanup occurs. This means that e.g. the thread_allocated counter can be updated unconditionally.
* Convert to uniform style: cond == false --> !condJason Evans2014-10-031-4/+4
|
* Convert all tsd variables to reside in a single tsd structure.Jason Evans2014-09-231-77/+24
|
* Refactor chunk map.Qinfan Wu2014-09-051-5/+6
| | | | | Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.
* Remove junk filling in tcache_bin_flush_small().Qinfan Wu2014-08-271-4/+0
| | | | | | Junk filling is done in arena_dalloc_bin_locked(), so arena_alloc_junk_small() is redundant. Also, we should use arena_dalloc_junk_small() instead of arena_alloc_junk_small().
* outline rare tcache_get codepathsBen Maurer2014-04-161-0/+40
|
* Implement the *allocx() API.Jason Evans2013-12-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Implement the *allocx() API, which is a successor to the *allocm() API. The *allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t *foo; allocm((void **)&foo, NULL, 42, 0); whereas the following is safe: foo_t *foo; void *p; allocm(&p, NULL, 42, 0); foo = (foo_t *)p; mallocx() does not have this problem: foo_t *foo = (foo_t *)mallocx(42, 0);
* Fix a data race for large allocation stats counters.Jason Evans2013-10-211-1/+4
| | | | Reported by Pat Lynch.
* Fix a prof-related locking order bug.Jason Evans2013-02-061-4/+11
| | | | | Fix a locking order bug that could cause deadlock during fork if heap profiling were enabled.
* Avoid arena_prof_accum()-related locking when possible.Jason Evans2012-11-131-7/+2
| | | | | | | Refactor arena_prof_accum() and its callers to avoid arena locking when prof_interval is 0 (as when profiling is disabled). Reported by Ben Maurer.
* Add arena-specific and selective dss allocation.Jason Evans2012-10-131-2/+2
| | | | | | | | | | | | | | | | | | | Add the "arenas.extend" mallctl, so that it is possible to create new arenas that are outside the set that jemalloc automatically multiplexes threads onto. Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible to explicitly allocate from a particular arena. Add the "opt.dss" mallctl, which controls the default precedence of dss allocation relative to mmap allocation. Add the "arena.<i>.dss" mallctl, which makes it possible to set the default dss precedence on a per arena or global basis. Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". Add the "stats.arenas.<i>.dss" mallctl.
* Optimize malloc() and free() fast paths.Jason Evans2012-05-021-13/+46
| | | | | | | | | | Embed the bin index for small page runs into the chunk page map, in order to omit [...] in the following dependent load sequence: ptr-->mapelm-->[run-->bin-->]bin_info Move various non-critcal code out of the inlined function chain into helper functions (tcache_event_hard(), arena_dalloc_small(), and locking).
* Make arena_salloc() an inline function.Jason Evans2012-04-201-0/+6
|
* Implement Valgrind support, redzones, and quarantine.Jason Evans2012-04-111-1/+5
| | | | | | | | | | | | | Implement Valgrind support, as well as the redzone and quarantine features, which help Valgrind detect memory errors. Redzones are only implemented for small objects because the changes necessary to support redzones around large and huge objects are complicated by in-place reallocation, to the point that it isn't clear that the maintenance burden is worth the incremental improvement to Valgrind support. Merge arena_salloc() and arena_salloc_demote(). Refactor i[v]salloc() to expose the 'demote' option.
* Always initialize tcache data structures.Jason Evans2012-04-061-46/+38
| | | | | | | Always initialize tcache data structures if the tcache configuration option is enabled, regardless of opt_tcache. This fixes "thread.tcache.enabled" mallctl manipulation in the case when opt_tcache is false.
* Clean up *PAGE* macros.Jason Evans2012-04-021-5/+5
| | | | | | | | | | | s/PAGE_SHIFT/LG_PAGE/g and s/PAGE_SIZE/PAGE/g. Remove remnants of the dynamic-page-shift code. Rename the "arenas.pagesize" mallctl to "arenas.page". Remove the "arenas.chunksize" mallctl, which is redundant with "opt.lg_chunk".
* Add the "thread.tcache.enabled" mallctl.Jason Evans2012-03-271-12/+15
|
* Implement tsd.Jason Evans2012-03-231-47/+54
| | | | | | | | | | | | | Implement tsd, which is a TLS/TSD abstraction that uses one or both internally. Modify bootstrapping such that no tsd's are utilized until allocation is safe. Remove malloc_[v]tprintf(), and use malloc_snprintf() instead. Fix %p argument size handling in malloc_vsnprintf(). Fix a long-standing statistics-related bug in the "thread.arena" mallctl that could cause crashes due to linked list corruption.
* Invert NO_TLS to JEMALLOC_TLS.Jason Evans2012-03-191-1/+1
|
* Remove the lg_tcache_gc_sweep option.Jason Evans2012-03-051-10/+0
| | | | | | | Remove the lg_tcache_gc_sweep option, because it is no longer very useful. Prior to the addition of dynamic adjustment of tcache fill count, it was possible for fill/flush overhead to be a problem, but this problem no longer occurs.
* Simplify small size class infrastructure.Jason Evans2012-02-291-15/+16
| | | | | | | | | | | | Program-generate small size class tables for all valid combinations of LG_TINY_MIN, LG_QUANTUM, and PAGE_SHIFT. Use the appropriate table to generate all relevant data structures, and remove the distinction between tiny/quantum/cacheline/subpage bins. Remove --enable-dynamic-page-shift. This option didn't prove useful in practice, and it prevented optimizations. Add Tilera architecture support.