summaryrefslogtreecommitdiffstats
path: root/jemalloc/include
Commit message (Collapse)AuthorAgeFilesLines
* Move repo contents in jemalloc/ to top level.Jason Evans2011-04-0127-5415/+0
|
* Implement atomic operations for x86/x64.Jason Evans2011-03-241-0/+56
| | | | | | Add inline assembly implementations of atomic_{add,sub}_uint{32,64}() for x86/x64, in order to support compilers that are missing the relevant gcc intrinsics.
* Revert "Add support for libunwind backtrace caching."Jason Evans2011-03-231-3/+0
| | | | | | | This reverts commit adc675c8ef55b59bb2facf795a3c26411cfbf3ed. The original commit added support for a non-standard libunwind API, so it was not of general utility.
* Add support for libunwind backtrace caching.je@facebook.com2011-03-241-0/+3
| | | | Use libunwind's unw_tdep_trace() if it is available.
* Fix error detection for ipalloc() when profiling.Jason Evans2011-03-231-21/+38
| | | | | | | | | sa2u() returns 0 on overflow, but the profiling code was blindly calling sa2u() and allowing the error to silently propagate, ultimately ending in a later assertion failure. Refactor all ipalloc() callers to call sa2u(), check for overflow before calling ipalloc(), and pass usize rather than size. This allows ipalloc() to avoid calling sa2u() in the common case.
* Avoid overflow in arena_run_regind().Jason Evans2011-03-224-5/+11
| | | | | | | | | | | Fix a regression due to: Remove an arena_bin_run_size_calc() constraint. 2a6f2af6e446a98a635caadd281a23ca09a491cb The removed constraint required that small run headers fit in one page, which indirectly limited runs such that they would not cause overflow in arena_run_regind(). Add an explicit constraint to arena_bin_run_size_calc() based on the largest number of regions that arena_run_regind() can handle (2^11 as currently configured).
* Dynamically adjust tcache fill count.Jason Evans2011-03-211-3/+21
| | | | | | | | Dynamically adjust tcache fill count (number of objects allocated per tcache refill) such that if GC has to flush inactive objects, the fill count gradually decreases. Conversely, if refills occur while the fill count is depressed, the fill count gradually increases back to its maximum value.
* Use OSSpinLock*() for locking on OS X.Jason Evans2011-03-194-11/+41
| | | | | | pthread_mutex_lock() can call malloc() on OS X (!!!), which causes deadlock. Work around this by using spinlocks that are built of more primitive stuff.
* Add atomic operation support for OS X.Jason Evans2011-03-193-0/+38
|
* Add atomic.[ch].Jason Evans2011-03-191-0/+77
| | | | Add atomic.[ch], which should have been part of the previous commit.
* Add the "stats.cactive" mallctl.Jason Evans2011-03-195-5/+42
| | | | | | Add the "stats.cactive" mallctl, which can be used to efficiently and repeatedly query approximately how much active memory the application is utilizing.
* Improve thread-->arena assignment.Jason Evans2011-03-183-3/+15
| | | | | | | | Rather than blindly assigning threads to arenas in round-robin fashion, choose the lowest-numbered arena that currently has the smallest number of threads assigned to it. Add the "stats.arenas.<i>.nthreads" mallctl.
* Use bitmaps to track small regions.Jason Evans2011-03-176-35/+280
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous free list implementation, which embedded singly linked lists in available regions, had the unfortunate side effect of causing many cache misses during thread cache fills. Fix this in two places: - arena_run_t: Use a new bitmap implementation to track which regions are available. Furthermore, revert to preferring the lowest available region (as jemalloc did with its old bitmap-based approach). - tcache_t: Move read-only tcache_bin_t metadata into tcache_bin_info_t, and add a contiguous array of pointers to tcache_t in order to track cached objects. This substantially increases the size of tcache_t, but results in much higher data locality for common tcache operations. As a side benefit, it is again possible to efficiently flush the least recently used cached objects, so this change changes flushing from MRU to LRU. The new bitmap implementation uses a multi-level summary approach to make finding the lowest available region very fast. In practice, bitmaps only have one or two levels, though the implementation is general enough to handle extremely large bitmaps, mainly so that large page sizes can still be entertained. Fix tcache_bin_flush_large() to always flush statistics, in the same way that tcache_bin_flush_small() was recently fixed. Use JEMALLOC_DEBUG rather than NDEBUG. Add dassert(), and use it for debug-only asserts.
* Improve backtracing-related configuration.Jason Evans2011-03-161-0/+3
| | | | | | | | | | Clean up configuration for backtracing when profiling is enabled, and document the configuration logic in INSTALL. Disable libgcc-based backtracing except on x64 (where it is known to work). Add the --disable-prof-gcc option.
* Clean up after arena_bin_info_t change.Jason Evans2011-03-161-7/+7
| | | | Fix a couple of problems related to the addition of arena_bin_info_t.
* Create arena_bin_info_t.Jason Evans2011-03-153-39/+70
| | | | | Move read-only fields from arena_bin_t into arena_bin_info_t, primarily in order to avoid false cacheline sharing.
* Reduce size of small_size2bin lookup table.Jason Evans2011-03-153-3/+10
| | | | | | | Convert all direct small_size2bin[...] accesses to SMALL_SIZE2BIN(...) macro calls, and use a couple of cheap math operations to allow compacting the table by 4X or 8X, on 32- and 64-bit systems, respectively.
* Expand a comment regarding geometric sampling.Jason Evans2011-03-151-2/+16
|
* Fix a cpp logic regression.Jason Evans2011-03-071-2/+2
| | | | | Fix a cpp logic error that was introduced by the recent commit: Fix "thread.{de,}allocatedp" mallctl.
* Build both PIC and no PIC static librariesArun Sharma2011-03-021-5/+0
| | | | | | | | | | When jemalloc is linked into an executable (as opposed to a shared library), compiling with -fno-pic can have significant advantages, mainly because we don't have to go throught the GOT (global offset table). Users who want to link jemalloc into a shared library that could be dlopened need to link with libjemalloc_pic.a or libjemalloc.so.
* Fix style nits.Jason Evans2011-02-142-4/+4
|
* Fix "thread.{de,}allocatedp" mallctl.Jason Evans2011-02-141-30/+36
| | | | | | | | | For the non-TLS case (as on OS X), if the "thread.{de,}allocatedp" mallctl was called before any allocation occurred for that thread, the TSD was still NULL, thus putting the application at risk of dereferencing NULL. Fix this by refactoring the initialization code, and making it part of the conditional logic for all per thread allocation counter accesses.
* Fix ALLOCM_LG_ALIGN definition.Jason Evans2011-01-261-1/+1
| | | | | | Fix ALLOCM_LG_ALIGN to take a parameter and use it. Apparently, an editing error left ALLOCM_LG_ALIGN with the same definition as ALLOCM_LG_ALIGN_MASK.
* Fix assertion typos.Jason Evans2011-01-151-1/+1
| | | | s/=/==/ in several assertions, as well as fixing spelling errors.
* Update various comments.Jason Evans2010-12-181-47/+38
|
* Remove high_water from tcache_bin_t.Jason Evans2010-12-161-6/+0
| | | | | Remove the high_water field from tcache_bin_t, since it is not useful for anything.
* Use mremap(2) for huge realloc().Jason Evans2010-12-015-2/+7
| | | | | | | | | | If mremap(2) is available and supports MREMAP_FIXED, use it for huge realloc(). Initialize rtree later during bootstrapping, so that --enable-debug --enable-dss works. Fix a minor swap_avail stats bug.
* Use madvise(..., MADV_FREE) on OS X.Jason Evans2010-10-241-6/+3
| | | | | Use madvise(..., MADV_FREE) rather than msync(..., MS_KILLPAGES) on OS X, since it works for at least OS X 10.5 and 10.6.
* Replace JEMALLOC_OPTIONS with MALLOC_CONF.Jason Evans2010-10-247-13/+19
| | | | | | | | | | | Replace the single-character run-time flags with key/value pairs, which can be set via the malloc_conf global, /etc/malloc.conf, and the MALLOC_CONF environment variable. Replace the JEMALLOC_PROF_PREFIX environment variable with the "opt.prof_prefix" option. Replace umax2s() with u2s().
* Fix heap profiling bugs.Jason Evans2010-10-223-17/+45
| | | | | | | | | | | | | Fix a regression due to the recent heap profiling accuracy improvements: prof_{m,re}alloc() must set the object's profiling context regardless of whether it is sampled. Fix management of the CHUNK_MAP_CLASS chunk map bits, such that all large object (re-)allocation paths correctly initialize the bits. Prior to this fix, in-place realloc() cleared the bits, resulting in incorrect reported object size from arena_salloc_demote(). After this fix the non-demoted bit pattern is all zeros (instead of all ones), which makes it easier to assure that the bits are properly set.
* Fix a heap profiling regression.Jason Evans2010-10-212-2/+110
| | | | | | Call prof_ctx_set() in all paths through prof_{m,re}alloc(). Inline arena_prof_ctx_get().
* Inline the fast path for heap sampling.Jason Evans2010-10-212-25/+373
| | | | | | | | Inline the heap sampling code that is executed for every allocation event (regardless of whether a sample is taken). Combine all prof TLS data into a single data structure, in order to reduce the TLS lookup volume.
* Add per thread allocation counters, and enhance heap sampling.Jason Evans2010-10-212-87/+177
| | | | | | | | | | | | | | | | | | | Add the "thread.allocated" and "thread.deallocated" mallctls, which can be used to query the total number of bytes ever allocated/deallocated by the calling thread. Add s2u() and sa2u(), which can be used to compute the usable size that will result from an allocation request of a particular size/alignment. Re-factor ipalloc() to use sa2u(). Enhance the heap profiler to trigger samples based on usable size, rather than request size. This has a subtle, but important, impact on the accuracy of heap sampling. For example, previous to this change, 16- and 17-byte objects were sampled at nearly the same rate, but 17-byte objects actually consume 32 bytes each. Therefore it was possible for the sample to be somewhat skewed compared to actual memory usage of the allocated objects.
* Fix numerous arena bugs.Jason Evans2010-10-181-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In arena_ralloc_large_grow(), update the map element for the end of the newly grown run, rather than the interior map element that was the beginning of the appended run. This is a long-standing bug, and it had the potential to cause massive corruption, but triggering it required roughly the following sequence of events: 1) Large in-place growing realloc(), with left-over space in the run that followed the large object. 2) Allocation of the remainder run left over from (1). 3) Deallocation of the remainder run *before* deallocation of the large run, with unfortunate interior map state left over from previous run allocation/deallocation activity, such that one or more pages of allocated memory would be treated as part of the remainder run during run coalescing. In summary, this was a bad bug, but it was difficult to trigger. In arena_bin_malloc_hard(), if another thread wins the race to allocate a bin run, dispose of the spare run via arena_bin_lower_run() rather than arena_run_dalloc(), since the run has already been prepared for use as a bin run. This bug has existed since March 14, 2010: e00572b384c81bd2aba57fac32f7077a34388915 mmap()/munmap() without arena->lock or bin->lock. Fix bugs in arena_dalloc_bin_run(), arena_trim_head(), arena_trim_tail(), and arena_ralloc_large_grow() that could cause the CHUNK_MAP_UNZEROED map bit to become corrupted. These are all long-standing bugs, but the chances of them actually causing problems was much lower before the CHUNK_MAP_ZEROED --> CHUNK_MAP_UNZEROED conversion. Fix a large run statistics regression in arena_ralloc_large_grow() that was introduced on September 17, 2010: 8e3c3c61b5bb676a705450708e7e79698cdc9e0c Add {,r,s,d}allocm(). Add debug code to validate that supposedly pre-zeroed memory really is.
* Increase PRN 'a' and 'c' constants.Jason Evans2010-10-031-1/+1
| | | | | Increase PRN 'a' and 'c' constants, so that high bits tend to cascade more.
* Increase default backtrace depth from 4 to 128.Jason Evans2010-10-031-4/+7
| | | | | Increase the default backtrace depth, because shallow backtraces tend to result in confusing pprof output graphs.
* Make cumulative heap profile data optional.Jason Evans2010-10-032-21/+43
| | | | | | | | Add the R option to control whether cumulative heap profile data are maintained. Add the T option to control the size of per thread backtrace caches, primarily because when the R option is specified, backtraces that no longer have allocations associated with them are discarded as soon as no thread caches refer to them.
* Change CHUNK_MAP_ZEROED to CHUNK_MAP_UNZEROED.Jason Evans2010-10-021-6/+6
| | | | | | Invert the chunk map bit that tracks whether a page is zeroed, so that for zeroed arena chunks, the interior of the page map does not need to be initialized (as it consists entirely of zero bytes).
* Omit chunk header in arena chunk map.Jason Evans2010-10-024-10/+20
| | | | | | Omit the first map_bias elements of the map in arena_chunk_t. This avoids barely spilling over into an extra chunk header page for common chunk sizes.
* Disable interval-based profile dumps by default.Jason Evans2010-10-011-1/+1
| | | | | | | | It is common to have to specify something like JEMALLOC_OPTIONS=F31i, because interval-based dumps are often unuseful or too expensive. Therefore, disable interval-based dumps by default. To get the previous default behavior it is now necessary to specify 31I as part of the options.
* Add the "arenas.purge" mallctl.Jason Evans2010-09-301-3/+4
|
* Fix compiler warnings and errors.Jason Evans2010-09-211-1/+2
| | | | | | | | Use INT_MAX instead of MAX_INT in ALLOCM_ALIGN(), and #include <limits.h> in order to get its definition. Modify prof code related to hash tables to avoid aliasing warnings from gcc 4.1.2 (gcc 4.4.0 and 4.4.3 do not warn).
* Fix compiler warnings.Jason Evans2010-09-212-15/+18
| | | | | | Add --enable-cc-silence, which can be used to silence harmless warnings. Fix an aliasing bug in ckh_pointer_hash().
* Add memalign() and valloc() overrides.Jason Evans2010-09-201-0/+7
| | | | | If memalign() and/or valloc() are present on the system, override them in order to avoid mixed allocator usage.
* Wrap strerror_r().Jason Evans2010-09-201-2/+3
| | | | | Create the buferror() function, which wraps strerror_r(). This is necessary because glibc provides a non-standard strerror_r().
* Add gcc attributes for *allocm() prototypes.Jason Evans2010-09-181-4/+6
|
* Add {,r,s,d}allocm().Jason Evans2010-09-174-18/+93
| | | | | | Add allocm(), rallocm(), sallocm(), and dallocm(), which are a functional superset of malloc(), calloc(), posix_memalign(), malloc_usable_size(), and free().
* Fix porting regressions.Jason Evans2010-09-121-2/+2
| | | | | Fix new build failures and test failures on Linux that were introduced by the port to OS X.
* Port to Mac OS X.Jason Evans2010-09-128-58/+333
| | | | | Add Mac OS X support, based in large part on the OS X support in Mozilla's version of jemalloc.
* Add MAP_NORESERVE support.Jordan DeLong2010-05-111-0/+1
| | | | | Add MAP_NORESERVE to the chunk_mmap() case being used by chunk_swap_enable(), if the system supports it.