summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Use linear scan for small bitmapsDave Watson2016-02-261-1/+40
| | | | | | | | | | | | | For small bitmaps, a linear scan of the bitmap is slightly faster than a tree search - bitmap_t is more compact, and there are fewer writes since we don't have to propogate state transitions up the tree. On x86_64 with the current settings, I'm seeing ~.5%-1% CPU improvement in production canaries with this change. The old tree code is left since 32bit sizes are much larger (and ffsl smaller), and maybe the run sizes will change in the future. This resolves #339.
* Miscellaneous bitmap refactoring.Jason Evans2016-02-261-18/+15
|
* Silence miscellaneous 64-to-32-bit data loss warnings.Jason Evans2016-02-263-5/+5
| | | | This resolves #341.
* Remove a superfluous comment.Jason Evans2016-02-261-1/+0
|
* Add more HUGE_MAXCLASS overflow checks.Jason Evans2016-02-261-23/+34
| | | | | | | Add HUGE_MAXCLASS overflow checks that are specific to heap profiling code paths. This fixes test failures that were introduced by 0c516a00c4cb28cff55ce0995f756b5aae074c9e (Make *allocx() size class overflow behavior defined.).
* Make *allocx() size class overflow behavior defined.Jason Evans2016-02-254-59/+85
| | | | | | | Limit supported size and alignment to HUGE_MAXCLASS, which in turn is now limited to be less than PTRDIFF_MAX. This resolves #278 and #295.
* Refactor arenas array (fixes deadlock).Jason Evans2016-02-255-185/+129
| | | | | | | | | | | | Refactor the arenas array, which contains pointers to all extant arenas, such that it starts out as a sparse array of maximum size, and use double-checked atomics-based reads as the basis for fast and simple arena_get(). Additionally, reduce arenas_lock's role such that it only protects against arena initalization races. These changes remove the possibility for arena lookups to trigger locking, which resolves at least one known (fork-related) deadlock. This resolves #315.
* Fix arena_size computation.Dave Watson2016-02-251-1/+1
| | | | | | | | | | | | | Fix arena_size arena_new() computation to incorporate runs_avail_nclasses elements for runs_avail, rather than (runs_avail_nclasses - 1) elements. Since offsetof(arena_t, runs_avail) is used rather than sizeof(arena_t) for the first term of the computation, all of the runs_avail elements must be added into the second term. This bug was introduced (by Jason Evans) while merging pull request #330 as 3417a304ccde61ac1f68b436ec22c03f1d6824ec (Separate arena_avail trees).
* Fix arena_run_first_best_fitDave Watson2016-02-251-1/+1
| | | | | | Merge of 3417a304ccde61ac1f68b436ec22c03f1d6824ec looks like a small bug: first_best_fit doesn't scan through all the classes, since ind is offset from runs_avail_nclasses by run_avail_bias.
* Attempt mmap-based in-place huge reallocation.Jason Evans2016-02-252-11/+10
| | | | | | | | Attempt mmap-based in-place huge reallocation by plumbing new_addr into chunk_alloc_mmap(). This can dramatically speed up incremental huge reallocation. This resolves #335.
* Silence miscellaneous 64-to-32-bit data loss warnings.Jason Evans2016-02-242-2/+6
|
* Silence miscellaneous 64-to-32-bit data loss warnings.Jason Evans2016-02-247-23/+26
|
* Use ssize_t for readlink() rather than int.Jason Evans2016-02-241-1/+1
|
* Make opt_narenas unsigned rather than size_t.Jason Evans2016-02-243-11/+21
|
* Make nhbins unsigned rather than size_t.Jason Evans2016-02-241-1/+1
|
* Explicitly cast mib[] elements to unsigned where appropriate.Jason Evans2016-02-241-9/+9
|
* Refactor jemalloc_ffs*() into ffs_*().Jason Evans2016-02-242-3/+2
| | | | Use appropriate versions to resolve 64-to-32-bit data loss warnings.
* Collapse arena_avail_tree_* into arena_run_tree_*.Jason Evans2016-02-241-11/+7
| | | | | These tree types converged to become identical, yet they still had independently generated red-black tree implementations.
* Separate arena_avail treesDave Watson2016-02-241-88/+50
| | | | | | | | | | | Separate run trees by index, replacing the previous quantize logic. Quantization by index is now performed only on insertion / removal from the tree, and not on node comparison, saving some cpu. This also means we don't have to dereference the miscelm* pointers, saving half of the memory loads from miscelms/mapbits that have fallen out of cache. A linear scan of the indicies appears to be fast enough. The only cost of this is an extra tree array in each arena.
* Use table lookup for run_quantize_{floor,ceil}().Jason Evans2016-02-231-21/+86
| | | | | Reduce run quantization overhead by generating lookup tables during bootstrapping, and using the tables for all subsequent run quantization.
* Fix run_quantize_ceil().Jason Evans2016-02-231-1/+1
| | | | | | | | | | | In practice this bug had limited impact (and then only by increasing chunk fragmentation) because run_quantize_ceil() returned correct results except for inputs that could only arise from aligned allocation requests that required more than page alignment. This bug existed in the original run quantization implementation, which was introduced by 8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index randomization for large allocations.).
* Test run quantization.Jason Evans2016-02-221-10/+28
| | | | | Also rename run_quantize_*() to improve clarity. These tests demonstrate that run_quantize_ceil() is flawed.
* Refactor time_* into nstime_*.Jason Evans2016-02-224-227/+174
| | | | | | | Use a single uint64_t in nstime_t to store nanoseconds rather than using struct timespec. This reduces fragility around conversions between long and uint64_t, especially missing casts that only cause problems on 32-bit platforms.
* Fix Windows-specific prof-related compilation portability issues.Jason Evans2016-02-211-3/+16
|
* Fix time_update() to compile and work on MinGW.Jason Evans2016-02-211-6/+9
|
* Prevent MSVC from optimizing away tls_callback (resolves #318)rustyx2016-02-201-1/+3
|
* getpid() fix for Win32rustyx2016-02-201-0/+2
|
* Implement decay-based unused dirty page purging.Jason Evans2016-02-207-85/+559
| | | | | | | | | | | | | | | | This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. Add mallctls: - opt.purge - opt.decay_time - arena.<i>.decay - arena.<i>.decay_time - arenas.decay_time - stats.arenas.<i>.decay_time This resolves #325.
* Refactor out arena_compute_npurge().Jason Evans2016-02-201-43/+37
| | | | | Refactor out arena_compute_npurge() by integrating its logic into arena_stash_dirty() as an incremental computation.
* Refactor arenas_cache tsd.Jason Evans2016-02-202-64/+89
| | | | | Refactor arenas_cache tsd into arenas_tdata, which is a structure of type arena_tdata_t.
* Refactor arena_ralloc_no_move().Jason Evans2016-02-201-11/+10
| | | | | Refactor early return logic in arena_ralloc_no_move() to return early on failure rather than on success.
* Refactor arena_malloc_hard() out of arena_malloc().Jason Evans2016-02-201-1/+17
|
* Refactor prng* from cpp macros into inline functions.Jason Evans2016-02-204-7/+6
| | | | | Remove 32-bit variant, convert prng64() to prng_lg_range(), and add prng_range().
* Use ticker for incremental tcache GC.Jason Evans2016-02-201-1/+2
|
* Implement ticker.Jason Evans2016-02-201-0/+2
| | | | | Implement ticker, which provides a simple API for ticking off some number of events before indicating that the ticker has hit its limit.
* Flesh out time_*() API.Jason Evans2016-02-201-4/+154
|
* Add time_update().Cameron Evans2016-02-201-0/+36
|
* Add --with-malloc-conf.Jason Evans2016-02-203-20/+28
| | | | | Add --with-malloc-conf, which makes it possible to embed a default options string during configuration.
* Call malloc_test_boot0() from malloc_init_hard_recursible().Cosmin Paraschiv2016-01-111-5/+16
| | | | | | | | | | | When using LinuxThreads, malloc bootstrapping deadlocks, since malloc_tsd_boot0() ends up calling pthread_setspecific(), which causes recursive allocation. Fix it by moving the malloc_tsd_boot0() call to malloc_init_hard_recursible(). The deadlock was introduced by 8bb3198f72fc7587dc93527f9f19fb5be52fa553 (Refactor/fix arenas manipulation.), when tsd_boot() was split and the top half, tsd_boot0(), got an extra tsd_wrapper_set() call.
* Tweak code to allow compilation of concatenated src/*.c sources.Jason Evans2015-11-122-3/+16
| | | | This resolves #294.
* Reuse previously computed valueDmitry-Me2015-11-121-2/+4
|
* Fast-path improvement: reduce # of branches and unnecessary operations.Qi Wang2015-11-107-114/+204
| | | | | | - Combine multiple runtime branches into a single malloc_slow check. - Avoid calling arena_choose / size2index / index2size on fast path. - A few micro optimizations.
* Allow const keys for lookupJoshua Kahn2015-11-092-5/+6
| | | | | | Signed-off-by: Steve Dougherty <sdougherty@barracuda.com> This resolves #281.
* Remove arena_run_dalloc_decommit().Mike Hommey2015-11-091-23/+2
| | | | This resolves #284.
* Fix a xallocx(..., MALLOCX_ZERO) bug.Jason Evans2015-09-251-3/+9
| | | | | | | | Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large allocations that have been randomly assigned an offset of 0 when --enable-cache-oblivious configure option is enabled. This addresses a special case missed in d260f442ce693de4351229027b37b3293fcbfd7d (Fix xallocx(..., MALLOCX_ZERO) bugs.).
* Work around an NPTL-specific TSD issue.Jason Evans2015-09-241-0/+3
| | | | | | | Work around a potentially bad thread-specific data initialization interaction with NPTL (glibc's pthreads implementation). This resolves #283.
* Fix xallocx(..., MALLOCX_ZERO) bugs.Jason Evans2015-09-242-14/+26
| | | | | | | | | | Zero all trailing bytes of large allocations when --enable-cache-oblivious configure option is enabled. This regression was introduced by 8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index randomization for large allocations.). Zero trailing bytes of huge allocations when resizing from/to a size class that is not a multiple of the chunk size.
* Fix prof_tctx_dump_iter() to filter.Jason Evans2015-09-221-5/+17
| | | | | | | Fix prof_tctx_dump_iter() to filter out nodes that were created after heap profile dumping started. Prior to this fix, spurious entries with arbitrary object/byte counts could appear in heap profiles, which resulted in jeprof inaccuracies or failures.
* Make arena_dalloc_large_locked_impl() static.Jason Evans2015-09-201-1/+1
|
* Add mallocx() OOM tests.Jason Evans2015-09-171-0/+2
|