summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* Convert all tsd variables to reside in a single tsd structure.Jason Evans2014-09-239-414/+444
|
* Apply likely()/unlikely() to allocation/deallocation fast paths.Jason Evans2014-09-124-46/+53
|
* mark some conditions as unlikelyDaniel Micay2014-09-113-10/+10
| | | | | | | | | | | | * assertion failure * malloc_init failure * malloc not already initialized (in malloc_init) * running in valgrind * thread cache disabled at runtime Clang and GCC already consider a comparison with NULL or -1 to be cold, so many branches (out-of-memory) are already correctly considered as cold and marking them is not important.
* add likely / unlikely macrosDaniel Micay2014-09-101-0/+8
|
* Fix a profile sampling race.Jason Evans2014-09-102-20/+18
| | | | | | | | | | | Fix a profile sampling race that was due to preparing to sample, yet doing nothing to assure that the context remains valid until the stats are updated. These regressions were caused by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.
* Fix prof_tdata_get()-related regressions.Jason Evans2014-09-091-5/+6
| | | | | | | | | | | | | Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer (when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}). Fix prof_tdata_get() callers to check for invalid results besides NULL (PROF_TDATA_STATE_{REINCARNATED,PURGATORY}). These regressions were caused by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.
* fix isqalloct (should call isdalloct)Daniel Micay2014-09-091-1/+1
|
* Add support for sized deallocation.Daniel Micay2014-09-094-3/+60
| | | | | | | | | | | | | | | | | This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28
* Add relevant function attributes to [msn]allocx().Jason Evans2014-09-081-3/+6
|
* Move typedefs from jemalloc_protos.h.in to jemalloc_typedefs.h.in.Jason Evans2014-09-083-4/+3
| | | | | Move typedefs from jemalloc_protos.h.in to jemalloc_typedefs.h.in, so that typedefs aren't redefined when compiling stress tests.
* Optimize [nmd]alloc() fast paths.Jason Evans2014-09-074-29/+31
| | | | | | Optimize [nmd]alloc() fast paths such that the (flags == 0) case is streamlined, flags decoding only happens to the minimum degree necessary, and no conditionals are repeated.
* Whitespace cleanups.Jason Evans2014-09-051-1/+1
|
* Refactor chunk map.Qinfan Wu2014-09-054-45/+70
| | | | | Break the chunk map into two separate arrays, in order to improve cache locality. This is related to issue #23.
* Test for availability of malloc hooks via autoconfSara Golemon2014-08-221-0/+6
| | | | | | | | | __*_hook() is glibc, but on at least one glibc platform (homebrew), the __GLIBC__ define isn't set correctly and we miss being able to use these hooks. Do a feature test for it during configuration so that we enable it anywhere the hooks are actually available.
* Implement per thread heap profiling.Jason Evans2014-08-205-266/+223
| | | | | | | | | | | | | | | | | | | | | | | | Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.
* Add rb_empty().Jason Evans2014-08-201-0/+13
|
* Dump heap profile backtraces in a stable order.Jason Evans2014-08-201-10/+14
| | | | | Also iterate over per thread stats in a stable order, which prepares the way for stable ordering of per thread heap profile dumps.
* Directly embed prof_ctx_t's bt.Jason Evans2014-08-201-5/+8
|
* Convert prof_tdata_t's bt2cnt to a comprehensive map.Jason Evans2014-08-201-16/+8
| | | | | | Treat prof_tdata_t's bt2cnt as a comprehensive map of the thread's extant allocation samples (do not limit the total number of entries). This helps prepare the way for per thread heap profiling.
* Fix and refactor runs_dirty-based purging.Jason Evans2014-08-141-23/+11
| | | | | | | | | | | | | Fix runs_dirty-based purging to also purge dirty pages in the spare chunk. Refactor runs_dirty manipulation into arena_dirty_{insert,remove}(), and move the arena->ndirty accounting into those functions. Remove the u.ql_link field from arena_chunk_map_t, and get rid of the enclosing union for u.rb_link, since only rb_link remains. Remove the ndirty field from arena_chunk_t.
* arena->npurgatory is no longer needed since we drop arena's lockQinfan Wu2014-08-121-8/+0
| | | | after stashing all the purgeable runs.
* Remove chunks_dirty tree, nruns_avail and nruns_adjac since we noQinfan Wu2014-08-121-19/+0
| | | | longer need to maintain the tree for dirty page purging.
* Maintain all the dirty runs in a linked list for each arenaQinfan Wu2014-08-121-0/+6
|
* Add atomic operations tests and fix latent bugs.Jason Evans2014-08-071-12/+29
|
* Add OpenRISC/or1k LG_QUANTUM size definitionManuel A. Fernandez Montecelo2014-07-291-0/+3
|
* Allow to build with clang-clMike Hommey2014-06-121-0/+4
|
* Add check for madvise(2) to configure.ac.Richard Diamond2014-06-031-0/+5
| | | | | | Some platforms, such as Google's Portable Native Client, use Newlib and thus lack access to madvise(2). In those instances, pages_purge() is transformed into a no-op.
* Try to use __builtin_ffsl if ffsl is unavailable.Richard Diamond2014-06-026-9/+43
| | | | | | | | | | | Some platforms (like those using Newlib) don't have ffs/ffsl. This commit adds a check to configure.ac for __builtin_ffsl if ffsl isn't found. __builtin_ffsl performs the same function as ffsl, and has the added benefit of being available on any platform utilizing Gcc-compatible compiler. This change does not address the used of ffs in the MALLOCX_ARENA() macro.
* Fix fallback lg_floor() implementations.Jason Evans2014-06-021-10/+16
|
* Don't use msvc_compat's C99 headers with MSVC versions that have (some) C99 ↵Mike Hommey2014-06-023-0/+0
| | | | support
* Use KQU() rather than QU() where applicable.Jason Evans2014-05-292-6/+6
| | | | Fix KZI() and KQI() to append LL rather than ULL.
* Add size class computation capability.Jason Evans2014-05-297-62/+406
| | | | | | | Add size class computation capability, currently used only as validation of the size class lookup tables. Generalize the size class spacing used for bins, for eventual use throughout the full range of allocation sizes.
* Move platform headers and tricks from jemalloc_internal.h.in to a new ↵Mike Hommey2014-05-283-56/+59
| | | | jemalloc_internal_decls.h header
* Move __func__ to jemalloc_internal_macros.hMike Hommey2014-05-272-1/+4
| | | | test/integration/aligned_alloc.c needs it.
* Use ULL prefix instead of LLU for unsigned long longsMike Hommey2014-05-271-4/+4
| | | | MSVC only supports the former.
* Refactor huge allocation to be managed by arenas.Jason Evans2014-05-1611-59/+35
| | | | | | | | | | | | | | | | | | | | Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to *dalloc* (some were *dealloc*). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.
* Add support for user-specified chunk allocators/deallocators.aravind2014-05-127-12/+33
| | | | | | | Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.
* Simplify backtracing.Jason Evans2014-04-231-4/+3
| | | | | | | | | | | Simplify backtracing to not ignore any frames, and compensate for this in pprof in order to increase flexibility with respect to function-based refactoring even in the presence of non-deterministic inlining. Modify pprof to blacklist all jemalloc allocation entry points including non-standard ones like mallocx(), and ignore all allocator-internal frames. Prior to this change, pprof excluded the specifically blacklisted functions from backtraces, but it left allocator-internal frames intact.
* prof_backtrace: use unw_backtraceLucian Adrian Grijincu2014-04-231-2/+2
| | | | | | unw_backtrace: - does internal per-thread caching - doesn't acquire an internal lock
* Refactor small_size2bin and small_bin2size.Jason Evans2014-04-174-20/+52
| | | | | Refactor small_size2bin and small_bin2size to be inline functions rather than directly accessed arrays.
* Fix debug-only compilation failures.Jason Evans2014-04-161-3/+2
| | | | | | | | Fix debug-only compilation failures introduced by changes to prof_sample_accum_update() in: 6c39f9e059d0825f4c29d8cec9f318b798912c3c refactor profiling. only use a bytes till next sample variable.
* Merge pull request #73 from bmaurer/smallmallocJason Evans2014-04-165-188/+81
|\ | | | | Smaller malloc hot path
| * Create a const array with only a small bin to size mapBen Maurer2014-04-164-6/+8
| |
| * refactor profiling. only use a bytes till next sample variable.Ben Maurer2014-04-162-149/+70
| |
| * outline rare tcache_get codepathsBen Maurer2014-04-162-33/+3
| |
* | Optimize Valgrind integration.Jason Evans2014-04-154-86/+121
| | | | | | | | | | | | | | | | | | | | | | Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().
* | Remove the "opt.valgrind" mallctl.Jason Evans2014-04-152-5/+6
| | | | | | | | | | Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc automatically detects whether it is running inside valgrind.
* | Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.Jason Evans2014-04-153-5/+2
| | | | | | | | | | | | | | Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.
* | Remove the *allocm() API, which is superceded by the *allocx() API.Jason Evans2014-04-155-34/+0
|/
* Remove support for non-prof-promote heap profiling metadata.Jason Evans2014-04-115-76/+18
| | | | | | | | | | | | | | | Make promotion of sampled small objects to large objects mandatory, so that profiling metadata can always be stored in the chunk map, rather than requiring one pointer per small region in each small-region page run. In practice the non-prof-promote code was only useful when using jemalloc to track all objects and report them as leaks at program exit. However, Valgrind is at least as good a tool for this particular use case. Furthermore, the non-prof-promote code is getting in the way of some optimizations that will make heap profiling much cheaper for the predominant use case (sampling a small representative proportion of all allocations).