summaryrefslogtreecommitdiffstats
path: root/jemalloc/src/arena.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix an assertion in arena_purge().Jason Evans2011-03-241-3/+6
| | | | | arena_purge() may be called even when there are no dirty pages, so loosen an assertion accordingly.
* Fix error detection for ipalloc() when profiling.Jason Evans2011-03-231-7/+12
| | | | | | | | | sa2u() returns 0 on overflow, but the profiling code was blindly calling sa2u() and allowing the error to silently propagate, ultimately ending in a later assertion failure. Refactor all ipalloc() callers to call sa2u(), check for overflow before calling ipalloc(), and pass usize rather than size. This allows ipalloc() to avoid calling sa2u() in the common case.
* Avoid overflow in arena_run_regind().Jason Evans2011-03-221-1/+11
| | | | | | | | | | | Fix a regression due to: Remove an arena_bin_run_size_calc() constraint. 2a6f2af6e446a98a635caadd281a23ca09a491cb The removed constraint required that small run headers fit in one page, which indirectly limited runs such that they would not cause overflow in arena_run_regind(). Add an explicit constraint to arena_bin_run_size_calc() based on the largest number of regions that arena_run_regind() can handle (2^11 as currently configured).
* Dynamically adjust tcache fill count.Jason Evans2011-03-211-4/+3
| | | | | | | | Dynamically adjust tcache fill count (number of objects allocated per tcache refill) such that if GC has to flush inactive objects, the fill count gradually decreases. Conversely, if refills occur while the fill count is depressed, the fill count gradually increases back to its maximum value.
* Add the "stats.cactive" mallctl.Jason Evans2011-03-191-0/+34
| | | | | | Add the "stats.cactive" mallctl, which can be used to efficiently and repeatedly query approximately how much active memory the application is utilizing.
* Improve thread-->arena assignment.Jason Evans2011-03-181-0/+1
| | | | | | | | Rather than blindly assigning threads to arenas in round-robin fashion, choose the lowest-numbered arena that currently has the smallest number of threads assigned to it. Add the "stats.arenas.<i>.nthreads" mallctl.
* Reverse tcache fill order.Jason Evans2011-03-181-1/+2
| | | | | | Refill the thread cache such that low regions get used first. This fixes a regression due to the recent transition to bitmap-based region management.
* Use bitmaps to track small regions.Jason Evans2011-03-171-51/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous free list implementation, which embedded singly linked lists in available regions, had the unfortunate side effect of causing many cache misses during thread cache fills. Fix this in two places: - arena_run_t: Use a new bitmap implementation to track which regions are available. Furthermore, revert to preferring the lowest available region (as jemalloc did with its old bitmap-based approach). - tcache_t: Move read-only tcache_bin_t metadata into tcache_bin_info_t, and add a contiguous array of pointers to tcache_t in order to track cached objects. This substantially increases the size of tcache_t, but results in much higher data locality for common tcache operations. As a side benefit, it is again possible to efficiently flush the least recently used cached objects, so this change changes flushing from MRU to LRU. The new bitmap implementation uses a multi-level summary approach to make finding the lowest available region very fast. In practice, bitmaps only have one or two levels, though the implementation is general enough to handle extremely large bitmaps, mainly so that large page sizes can still be entertained. Fix tcache_bin_flush_large() to always flush statistics, in the same way that tcache_bin_flush_small() was recently fixed. Use JEMALLOC_DEBUG rather than NDEBUG. Add dassert(), and use it for debug-only asserts.
* Create arena_bin_info_t.Jason Evans2011-03-151-178/+248
| | | | | Move read-only fields from arena_bin_t into arena_bin_info_t, primarily in order to avoid false cacheline sharing.
* Reduce size of small_size2bin lookup table.Jason Evans2011-03-151-38/+42
| | | | | | | Convert all direct small_size2bin[...] accesses to SMALL_SIZE2BIN(...) macro calls, and use a couple of cheap math operations to allow compacting the table by 4X or 8X, on 32- and 64-bit systems, respectively.
* Remove an arena_bin_run_size_calc() constraint.Jason Evans2010-12-161-3/+1
| | | | | | Remove the constraint that small run headers fit in one page. This constraint was necessary to avoid dirty page purging issues for unused pages within runs for medium size classes (which no longer exist).
* Remove high_water from tcache_bin_t.Jason Evans2010-12-161-2/+0
| | | | | Remove the high_water field from tcache_bin_t, since it is not useful for anything.
* Fix compilation error.Jason Evans2010-10-251-1/+3
| | | | Don't declare loop variable inside for (...) clause.
* Use madvise(..., MADV_FREE) on OS X.Jason Evans2010-10-241-3/+0
| | | | | Use madvise(..., MADV_FREE) rather than msync(..., MS_KILLPAGES) on OS X, since it works for at least OS X 10.5 and 10.6.
* Replace JEMALLOC_OPTIONS with MALLOC_CONF.Jason Evans2010-10-241-3/+3
| | | | | | | | | | | Replace the single-character run-time flags with key/value pairs, which can be set via the malloc_conf global, /etc/malloc.conf, and the MALLOC_CONF environment variable. Replace the JEMALLOC_PROF_PREFIX environment variable with the "opt.prof_prefix" option. Replace umax2s() with u2s().
* Fix heap profiling bugs.Jason Evans2010-10-221-40/+5
| | | | | | | | | | | | | Fix a regression due to the recent heap profiling accuracy improvements: prof_{m,re}alloc() must set the object's profiling context regardless of whether it is sampled. Fix management of the CHUNK_MAP_CLASS chunk map bits, such that all large object (re-)allocation paths correctly initialize the bits. Prior to this fix, in-place realloc() cleared the bits, resulting in incorrect reported object size from arena_salloc_demote(). After this fix the non-demoted bit pattern is all zeros (instead of all ones), which makes it easier to assure that the bits are properly set.
* Fix a heap profiling regression.Jason Evans2010-10-211-99/+0
| | | | | | Call prof_ctx_set() in all paths through prof_{m,re}alloc(). Inline arena_prof_ctx_get().
* Add per thread allocation counters, and enhance heap sampling.Jason Evans2010-10-211-1/+2
| | | | | | | | | | | | | | | | | | | Add the "thread.allocated" and "thread.deallocated" mallctls, which can be used to query the total number of bytes ever allocated/deallocated by the calling thread. Add s2u() and sa2u(), which can be used to compute the usable size that will result from an allocation request of a particular size/alignment. Re-factor ipalloc() to use sa2u(). Enhance the heap profiler to trigger samples based on usable size, rather than request size. This has a subtle, but important, impact on the accuracy of heap sampling. For example, previous to this change, 16- and 17-byte objects were sampled at nearly the same rate, but 17-byte objects actually consume 32 bytes each. Therefore it was possible for the sample to be somewhat skewed compared to actual memory usage of the allocated objects.
* Fix a bug in arena_dalloc_bin_run().Jason Evans2010-10-191-13/+53
| | | | | | | | | | | Fix the newsize argument to arena_run_trim_tail() that arena_dalloc_bin_run() passes. Previously, oldsize-newsize (i.e. the complement) was passed, which could erroneously cause dirty pages to be returned to the clean available runs tree. Prior to the CHUNK_MAP_ZEROED --> CHUNK_MAP_UNZEROED conversion, this bug merely caused dirty pages to be unaccounted for (and therefore never get purged), but with CHUNK_MAP_UNZEROED, this could cause dirty pages to be treated as zeroed (i.e. memory corruption).
* Fix arena bugs.Jason Evans2010-10-181-6/+19
| | | | | | | | | Split arena_dissociate_bin_run() out of arena_dalloc_bin_run(), so that arena_bin_malloc_hard() can avoid dissociation when recovering from losing a race. This fixes a bug introduced by a recent attempted fix. Fix a regression in arena_ralloc_large_grow() that was introduced by recent fixes.
* Fix arena bugs.Jason Evans2010-10-181-43/+58
| | | | | | | | | Move part of arena_bin_lower_run() into the callers, since the conditions under which it should be called differ slightly between callers. Fix arena_chunk_purge() to omit run size in the last map entry for each run it temporarily allocates.
* Add assertions to run coalescing.Jason Evans2010-10-181-7/+17
| | | | | Assert that the chunk map bits at the ends of the runs that participate in coalescing are self-consistent.
* Fix numerous arena bugs.Jason Evans2010-10-181-76/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In arena_ralloc_large_grow(), update the map element for the end of the newly grown run, rather than the interior map element that was the beginning of the appended run. This is a long-standing bug, and it had the potential to cause massive corruption, but triggering it required roughly the following sequence of events: 1) Large in-place growing realloc(), with left-over space in the run that followed the large object. 2) Allocation of the remainder run left over from (1). 3) Deallocation of the remainder run *before* deallocation of the large run, with unfortunate interior map state left over from previous run allocation/deallocation activity, such that one or more pages of allocated memory would be treated as part of the remainder run during run coalescing. In summary, this was a bad bug, but it was difficult to trigger. In arena_bin_malloc_hard(), if another thread wins the race to allocate a bin run, dispose of the spare run via arena_bin_lower_run() rather than arena_run_dalloc(), since the run has already been prepared for use as a bin run. This bug has existed since March 14, 2010: e00572b384c81bd2aba57fac32f7077a34388915 mmap()/munmap() without arena->lock or bin->lock. Fix bugs in arena_dalloc_bin_run(), arena_trim_head(), arena_trim_tail(), and arena_ralloc_large_grow() that could cause the CHUNK_MAP_UNZEROED map bit to become corrupted. These are all long-standing bugs, but the chances of them actually causing problems was much lower before the CHUNK_MAP_ZEROED --> CHUNK_MAP_UNZEROED conversion. Fix a large run statistics regression in arena_ralloc_large_grow() that was introduced on September 17, 2010: 8e3c3c61b5bb676a705450708e7e79698cdc9e0c Add {,r,s,d}allocm(). Add debug code to validate that supposedly pre-zeroed memory really is.
* Preserve CHUNK_MAP_UNZEROED for small runs.Jason Evans2010-10-161-4/+8
| | | | | | | | | Preserve CHUNK_MAP_UNZEROED when allocating small runs, because it is possible that untouched pages will be returned to the tree of clean runs, where the CHUNK_MAP_UNZEROED flag matters. Prior to the conversion from CHUNK_MAP_ZEROED, this was already a bug, but in the worst case extra zeroing occurred. After the conversion, this bug made it possible to incorrectly treat pages as pre-zeroed.
* Fix a regression in CHUNK_MAP_UNZEROED change.Jason Evans2010-10-141-2/+3
| | | | | | | | | | | | Fix a regression added by revision: 3377ffa1f4f8e67bce1e36624285e5baf5f9ecef Change CHUNK_MAP_ZEROED to CHUNK_MAP_UNZEROED. A modified chunk->map dereference was missing the subtraction of map_bias, which caused incorrect chunk map initialization, as well as potential corruption of the first non-header page of memory within each chunk.
* Change CHUNK_MAP_ZEROED to CHUNK_MAP_UNZEROED.Jason Evans2010-10-021-20/+26
| | | | | | Invert the chunk map bit that tracks whether a page is zeroed, so that for zeroed arena chunks, the interior of the page map does not need to be initialized (as it consists entirely of zero bytes).
* Omit chunk header in arena chunk map.Jason Evans2010-10-021-143/+166
| | | | | | Omit the first map_bias elements of the map in arena_chunk_t. This avoids barely spilling over into an extra chunk header page for common chunk sizes.
* Add the "arenas.purge" mallctl.Jason Evans2010-09-301-7/+17
|
* Add {,r,s,d}allocm().Jason Evans2010-09-171-47/+95
| | | | | | Add allocm(), rallocm(), sallocm(), and dallocm(), which are a functional superset of malloc(), calloc(), posix_memalign(), malloc_usable_size(), and free().
* Port to Mac OS X.Jason Evans2010-09-121-29/+23
| | | | | Add Mac OS X support, based in large part on the OS X support in Mozilla's version of jemalloc.
* Move assert() calls up in arena_run_reg_alloc().Jason Evans2010-08-051-1/+1
| | | | | | Move assert() calls up in arena_run_reg_alloc(), so that a corrupt pointer will likely be caught by an assertion *before* it is dereferenced.
* Fix arena chunk purge/dealloc race conditions.Jason Evans2010-04-141-24/+30
| | | | | | | | | Fix arena_chunk_dealloc() to put the new spare in a consistent state before dropping the arena mutex to deallocate the previous spare. Fix arena_run_dalloc() to insert a newly dirtied chunk into the chunks_dirty list before potentially deallocating the chunk, so that dirty page accounting is self-consistent.
* Fix threads-related profiling bugs.Jason Evans2010-04-141-24/+23
| | | | | | | | Initialize bt2cnt_tsd so that cleanup at thread exit actually happens. Associate (prof_ctx_t *) with allocated objects, rather than (prof_thr_cnt_t *). Each thread must always operate on its own (prof_thr_cnt_t *), and an object may outlive the thread that allocated it.
* Revert re-addition of purge_lock.Jason Evans2010-04-091-37/+43
| | | | | Linux kernels have been capable of concurrent page table access since 2.6.27, so this hack is not necessary for modern kernels.
* Reduce statistical heap sampling memory overhead.Jason Evans2010-03-311-29/+110
| | | | | | | | | If the mean heap sampling interval is larger than one page, simulate sampled small objects with large objects. This allows profiling context pointers to be omitted for small objects. As a result, the memory overhead for sampling decreases as the sampling interval is increased. Fix a compilation error in the profiling code.
* Re-add purge_lock to funnel madvise(2) calls.Jason Evans2010-03-271-43/+37
|
* Set/clear CHUNK_MAP_ZEROED in arena_chunk_purge().Jason Evans2010-03-221-11/+32
| | | | | | | Properly set/clear CHUNK_MAP_ZEROED for all purged pages, according to whether the pages are (potentially) file-backed or anonymous. This was merely a performance pessimization for the anonymous mapping case, but was a calloc()-related bug for the swap_enabled case.
* Track dirty and clean runs separately.Jason Evans2010-03-191-193/+243
| | | | | Split arena->runs_avail into arena->runs_avail_{clean,dirty}, and preferentially allocate dirty runs.
* Remove medium size classes.Jason Evans2010-03-171-169/+49
| | | | | | | | | | Remove medium size classes, because concurrent dirty page purging is no longer capable of purging inactive dirty pages inside active runs (due to recent arena/bin locking changes). Enhance tcache to support caching large objects, so that the same range of size classes is still cached, despite the removal of medium size class support.
* Fix a run initialization race condition.Jason Evans2010-03-161-9/+17
| | | | | | | | Initialize small run header before dropping arena->lock, arena_chunk_purge() relies on valid small run headers during run iteration. Add some assertions.
* Add assertions.Jason Evans2010-03-151-0/+4
| | | | | | Check for interior pointers in arena_[ds]alloc(). Check for corrupt pointers in tcache_alloc().
* arena_chunk_purge() arena->nactive fix.Jason Evans2010-03-151-0/+1
| | | | | | Update arena->nactive when pseudo-allocating runs in arena_chunk_purge(), since arena_run_dalloc() subtracts from arena->nactive.
* mmap()/munmap() without arena->lock or bin->lock.Jason Evans2010-03-151-41/+118
|
* Purge dirty pages without arena->lock.Jason Evans2010-03-151-68/+230
|
* Push locks into arena bins.Jason Evans2010-03-151-81/+84
| | | | | | | | | | For bin-related allocation, protect data structures with bin locks rather than arena locks. Arena locks remain for run allocation/deallocation and other miscellaneous operations. Restructure statistics counters to maintain per bin allocated/nmalloc/ndalloc, but continue to provide arena-wide statistics via aggregation in the ctl code.
* Simplify small object allocation/deallocation.Jason Evans2010-03-141-314/+123
| | | | | | | Use chained run free lists instead of bitmaps to track free objects within small runs. Remove reference counting for small object run pages.
* Simplify tcache object caching.Jason Evans2010-03-141-26/+34
| | | | | | | | | | | | | | | | | | | | Use chains of cached objects, rather than using arrays of pointers. Since tcache_bin_t is no longer dynamically sized, convert tcache_t's tbin to an array of structures, rather than an array of pointers. This implicitly removes tcache_bin_{create,destroy}(), which further simplifies the fast path for malloc/free. Use cacheline alignment for tcache_t allocations. Remove runtime configuration option for number of tcache bin slots, and replace it with a boolean option for enabling/disabling tcache. Limit the number of tcache objects to the lesser of TCACHE_NSLOTS_MAX and 2X the number of regions per run for the size class. For GC-triggered flush, discard 3/4 of the objects below the low water mark, rather than 1/2.
* Modify dirty page purging algorithm.Jason Evans2010-03-051-68/+61
| | | | | | | | | | | | | | | | Convert chunks_dirty from a red-black tree to a doubly linked list, and use it to purge dirty pages from chunks in FIFO order. Add a lock around the code that purges dirty pages via madvise(2), in order to avoid kernel contention. If lock acquisition fails, indefinitely postpone purging dirty pages. Add a lower limit of one chunk worth of dirty pages per arena for purging, in addition to the active:dirty ratio. When purging, purge all dirty pages from at least one chunk, but rather than purging enough pages to drop to half the purging threshold, merely drop to the threshold.
* Simplify malloc_message().Jason Evans2010-03-041-2/+3
| | | | | Rather than passing four strings to malloc_message(), malloc_write4(), and all the functions that use them, only pass one string.
* Rewrite red-black trees.Jason Evans2010-02-281-11/+20
| | | | | | | | | | | Use left-leaning 2-3 red-black trees instead of left-leaning 2-3-4 red-black trees. This reduces maximum tree height from (3 lg n) to (2 lg n). Do lazy balance fixup, rather than transforming the tree during the down pass. This improves insert/remove speed by ~30%. Use callback-based iteration rather than macros.