summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Impose a minimum tcache count for small size classes.Jason Evans2015-05-201-1/+5
| | | | | | Now that small allocation runs have fewer regions due to run metadata residing in chunk headers, an explicit minimum tcache count is needed to make sure that tcache adequately amortizes synchronization overhead.
* Fix performance regression in arena_palloc().Jason Evans2015-05-201-2/+13
| | | | | | Pass large allocation requests to arena_malloc() when possible. This regression was introduced by 155bfa7da18cab0d21d87aa2dce4554166836f5d (Normalize size classes.).
* Fix nhbins calculation.Jason Evans2015-05-201-1/+1
| | | | | This regression was introduced by 155bfa7da18cab0d21d87aa2dce4554166836f5d (Normalize size classes.).
* Avoid atomic operations for dependent rtree reads.Jason Evans2015-05-161-1/+1
|
* Implement cache index randomization for large allocations.Jason Evans2015-05-063-51/+193
| | | | | | | | | | | | | | | | | | | | Extract szad size quantization into {extent,run}_quantize(), and . quantize szad run sizes to the union of valid small region run sizes and large run sizes. Refactor iteration in arena_run_first_fit() to use run_quantize{,_first,_next(), and add support for padded large runs. For large allocations that have no specified alignment constraints, compute a pseudo-random offset from the beginning of the first backing page that is a multiple of the cache line size. Under typical configurations with 4-KiB pages and 64-byte cache lines this results in a uniform distribution among 64 page boundary offsets. Add the --disable-cache-oblivious option, primarily intended for performance testing. This resolves #13.
* Rename pprof to jeprof.Jason Evans2015-05-011-1/+1
| | | | | | | | | | This rename avoids installation collisions with the upstream gperftools. Additionally, jemalloc's per thread heap profile functionality introduced an incompatible file format, so it's now worthwhile to clearly distinguish jemalloc's version of this script from the upstream version. This resolves #229.
* Prefer /proc/<pid>/task/<pid>/maps over /proc/<pid>/maps on Linux.Jason Evans2015-05-011-5/+24
| | | | This resolves #227.
* Concise JEMALLOC_HAVE_ISSETUGID case in secure_getenv().Igor Podlesny2015-04-301-11/+3
|
* Fix in-place shrinking huge reallocation purging bugs.Jason Evans2015-03-262-21/+17
| | | | | | | | | | | Fix the shrinking case of huge_ralloc_no_move_similar() to purge the correct number of pages, at the correct offset. This regression was introduced by 8d6a3e8321a7767cb2ca0930b85d5d488a8cc659 (Implement dynamic per arena control over dirty page purging.). Fix huge_ralloc_no_move_shrink() to purge the correct number of pages. This bug was introduced by 9673983443a0782d975fbcb5d8457cfd411b8b56 (Purge/zero sub-chunk huge allocations as necessary.).
* Add the "stats.arenas.<i>.lg_dirty_mult" mallctl.Jason Evans2015-03-243-16/+14
|
* Fix signed/unsigned comparison in arena_lg_dirty_mult_valid().Jason Evans2015-03-241-1/+2
|
* Fix arena_get() usage.Jason Evans2015-03-241-5/+17
| | | | | | | | Fix arena_get() calls that specify refresh_if_missing=false. In ctl_refresh() and ctl.c's arena_purge(), these calls attempted to only refresh once, but did so in an unreliable way. arena_i_lg_dirty_mult_ctl() was simply wrong to pass refresh_if_missing=false.
* We have pages_unmap(ret, size) so we use it.Igor Podlesny2015-03-241-9/+1
|
* Add the "stats.allocated" mallctl.Jason Evans2015-03-243-18/+42
|
* Fix a compile error caused by mixed declarations and code.Qinfan Wu2015-03-211-2/+3
|
* Fix lg_dirty_mult-related stats printing.Jason Evans2015-03-211-66/+82
| | | | | | | | This regression was introduced by 8d6a3e8321a7767cb2ca0930b85d5d488a8cc659 (Implement dynamic per arena control over dirty page purging.). This resolves #215.
* Restore --enable-ivsalloc.Jason Evans2015-03-191-2/+2
| | | | | | | | | However, unlike before it was removed do not force --enable-ivsalloc when Darwin zone allocator integration is enabled, since the zone allocator code uses ivsalloc() regardless of whether malloc_usable_size() and sallocx() do. This resolves #211.
* Implement dynamic per arena control over dirty page purging.Jason Evans2015-03-195-65/+228
| | | | | | | | | | | | | | Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.
* Fix heap profiling regressions.Jason Evans2015-03-161-12/+9
| | | | | | | | | | | | | | | Remove the prof_tctx_state_destroying transitory state and instead add the tctx_uid field, so that the tuple <thr_uid, tctx_uid> uniquely identifies a tctx. This assures that tctx's are well ordered even when more than two with the same thr_uid coexist. A previous attempted fix based on prof_tctx_state_destroying was only sufficient for protecting against two coexisting tctx's, but it also introduced a new dumping race. These regressions were introduced by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.) and 764b00023f2bc97f240c3a758ed23ce9c0ad8526 (Fix a heap profiling regression.).
* Eliminate innocuous compiler warnings.Jason Evans2015-03-141-0/+2
|
* Fix a heap profiling regression.Jason Evans2015-03-141-13/+31
| | | | | | | | | | Add the prof_tctx_state_destroying transitionary state to fix a race between a thread destroying a tctx and another thread creating a new equivalent tctx. This regression was introduced by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.).
* Use the error code given to buferror on WindowsMike Hommey2015-03-131-1/+1
| | | | | | | | a14bce85 made buferror not take an error code, and make the Windows code path for buferror use GetLastError, while the alternative code paths used errno. Then 2a83ed02 made buferror take an error code again, and while it changed the non-Windows code paths to use that error code, the Windows code path was not changed accordingly.
* Fix a heap profiling regression.Jason Evans2015-03-121-2/+7
| | | | | | | | | | Fix prof_tctx_comp() to incorporate tctx state into the comparison. During a dump it is possible for both a purgatory tctx and an otherwise equivalent nominal tctx to reside in the tree at the same time. This regression was introduced by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.).
* Fix unsigned comparison underflow.Jason Evans2015-03-121-1/+1
| | | | These bugs only affected tests and debug builds.
* Fix a declaration-after-statement regression.Jason Evans2015-03-111-3/+2
|
* Normalize rdelm/rd structure field naming.Jason Evans2015-03-111-4/+4
|
* Refactor dirty run linkage to reduce sizeof(extent_node_t).Jason Evans2015-03-111-41/+48
|
* Fix a chunk_recycle() regression.Jason Evans2015-03-071-4/+15
| | | | | | This regression was introduced by 97c04a93838c4001688fe31bf018972b4696efe2 (Use first-fit rather than first-best-fit run/chunk allocation.).
* Use first-fit rather than first-best-fit run/chunk allocation.Jason Evans2015-03-072-29/+86
| | | | | | | This tends to more effectively pack active memory toward low addresses. However, additional tree searches are required in many cases, so whether this change stands the test of time will depend on real-world benchmarks.
* Quantize szad trees by size class.Jason Evans2015-03-074-14/+39
| | | | | | | Treat sizes that round down to the same size class as size-equivalent in trees that are used to search for first best fit, so that there are only as many "firsts" as there are size classes. This comes closer to the ideal of first fit.
* Fix a compilation error and an incorrect assertion.Jason Evans2015-02-191-2/+2
|
* Fix chunk cache races.Jason Evans2015-02-192-128/+242
| | | | | | These regressions were introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
* Rename "dirty chunks" to "cached chunks".Jason Evans2015-02-183-68/+49
| | | | | | | | | | Rename "dirty chunks" to "cached chunks", in order to avoid overloading the term "dirty". Fix the regression caused by 339c2b23b2d61993ac768afcc72af135662c6771 (Fix chunk_unmap() to propagate dirty state.), and actually address what that change attempted, which is to only purge chunks once, and propagate whether zeroed pages resulted into chunk_record().
* Fix chunk_unmap() to propagate dirty state.Jason Evans2015-02-182-7/+13
| | | | | | | | | | | | Fix chunk_unmap() to propagate whether a chunk is dirty, and modify dirty chunk purging to record this information so it can be passed to chunk_unmap(). Since the broken version of chunk_unmap() claimed that all chunks were clean, this resulted in potential memory corruption for purging implementations that do not zero (e.g. MADV_FREE). This regression was introduced by ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into unused dirty page purging machinery.).
* arena_chunk_dirty_node_init() --> extent_node_dirty_linkage_init()Jason Evans2015-02-181-11/+3
|
* Simplify extent_node_t and add extent_node_init().Jason Evans2015-02-174-30/+16
|
* Integrate whole chunks into unused dirty page purging machinery.Jason Evans2015-02-177-216/+437
| | | | | | | | | | | | Extend per arena unused dirty page purging to manage unused dirty chunks in aaddtion to unused dirty runs. Rather than immediately unmapping deallocated chunks (or purging them in the --disable-munmap case), store them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially allocate dirty chunks. When excessive unused dirty pages accumulate, purge runs and chunks in ingegrated LRU order (and unmap chunks in the --enable-munmap case). Refactor extent_node_t to provide accessor functions.
* Normalize *_link and link_* fields to all be *_link.Jason Evans2015-02-163-10/+9
|
* Remove redundant tcache_boot() call.Jason Evans2015-02-151-2/+0
|
* If MALLOCX_ARENA(a) is specified, use it during tcache fill.Jason Evans2015-02-131-9/+10
|
* Refactor huge_*() calls into arena internals.Jason Evans2015-02-121-56/+104
| | | | | Make redirects to the huge_*() API the arena code's responsibility, since arenas now take responsibility for all allocation sizes.
* add missing check for new_addr chunk sizeDaniel Micay2015-02-121-1/+1
| | | | | | 8ddc93293cd8370870f221225ef1e013fbff6d65 switched this to over using the address tree in order to avoid false negatives, so it now needs to check that the size of the free extent is large enough to satisfy the request.
* Move centralized chunk management into arenas.Jason Evans2015-02-129-368/+281
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.
* Update ckh to support metadata allocation tracking.Jason Evans2015-02-121-9/+11
|
* Fix a regression in tcache_bin_flush_small().Jason Evans2015-02-121-1/+1
| | | | | | Fix a serious regression in tcache_bin_flush_small() that was introduced by 1cb181ed632e7573fb4eab194e4d216867222d27 (Implement explicit tcache support.).
* Test and fix tcache ID recycling.Jason Evans2015-02-101-1/+1
|
* Implement explicit tcache support.Jason Evans2015-02-108-187/+362
| | | | | | | | | Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.
* Refactor rtree to be lock-free.Jason Evans2015-02-052-71/+92
| | | | | | | | | | | | | | | | | | Recent huge allocation refactoring associates huge allocations with arenas, but it remains necessary to quickly look up huge allocation metadata during reallocation/deallocation. A global radix tree remains a good solution to this problem, but locking would have become the primary bottleneck after (upcoming) migration of chunk management from global to per arena data structures. This lock-free implementation uses double-checked reads to traverse the tree, so that in the steady state, each read or write requires only a single atomic operation. This implementation also assures that no more than two tree levels actually exist, through a combination of careful virtual memory allocation which makes large sparse nodes cheap, and skipping the root node on x64 (possible because the top 16 bits are all 0 in practice).
* Refactor base_alloc() to guarantee demand-zeroed memory.Jason Evans2015-02-053-66/+104
| | | | | | | | | | | | Refactor base_alloc() to guarantee that allocations are carved from demand-zeroed virtual memory. This supports sparse data structures such as multi-page radix tree nodes. Enhance base_alloc() to keep track of fragments which were too small to support previous allocation requests, and try to consume them during subsequent requests. This becomes important when request sizes commonly approach or exceed the chunk size (as could radix tree node allocations).
* Fix chunk_recycle()'s new_addr functionality.Jason Evans2015-02-051-2/+6
| | | | | | | | Fix chunk_recycle()'s new_addr functionality to search by address rather than just size if new_addr is specified. The functionality added by a95018ee819abf897562d9d1f3bc31d4dd725a8d (Attempt to expand huge allocations in-place.) only worked if the two search orders happened to return the same results (e.g. in simple test cases).