summaryrefslogtreecommitdiffstats
path: root/doc
Commit message (Collapse)AuthorAgeFilesLines
* Fix huge allocation statistics.Jason Evans2014-10-151-3/+2
|
* Update size class documentation.Jason Evans2014-10-151-26/+84
|
* Add per size class huge allocation statistics.Jason Evans2014-10-131-17/+81
| | | | | | | | | | | | | Add per size class huge allocation statistics, and normalize various stats: - Change the arenas.nlruns type from size_t to unsigned. - Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's. - Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with stats.arenas.<i>.bins.<j>.curregs . - Add the stats.arenas.<i>.hchunks.<j>.nmalloc, stats.arenas.<i>.hchunks.<j>.ndalloc, stats.arenas.<i>.hchunks.<j>.nrequests, and stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.
* Avoid atexit(3) when possible, disable prof_final by default.Jason Evans2014-10-091-3/+15
| | | | | | | | | | | | atexit(3) can deadlock internally during its own initialization if jemalloc calls atexit() during jemalloc initialization. Mitigate the impact by restructuring prof initialization to avoid calling atexit() unless the registered function will actually dump a final heap profile. Additionally, disable prof_final by default so that this land mine is opt-in rather than opt-out. This resolves #144.
* Fix a docbook element nesting nit.Jason Evans2014-10-051-4/+4
| | | | | | | According to the docbook documentation for <funcprototype>, its parent must be <funcsynopsis>; fix accordingly. Nonetheless, the man page processor fails badly when this construct is embedded in a <para> (which is documented to be legal), although the html processor does fine.
* Attempt to expand huge allocations in-place.Daniel Micay2014-10-051-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void *ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size *= 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137
* Add missing header includes in jemalloc/jemalloc.h .Jason Evans2014-10-051-2/+1
| | | | | | | Add stdlib.h, stdbool.h, and stdint.h to jemalloc/jemalloc.h so that applications only have to #include <jemalloc/jemalloc.h>. This resolves #132.
* Implement/test/fix prof-related mallctl's.Jason Evans2014-10-041-6/+46
| | | | | | | | | | | Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.
* Test prof.reset mallctl and fix numerous discovered bugs.Jason Evans2014-10-031-2/+3
|
* Add support for sized deallocation.Daniel Micay2014-09-091-1/+18
| | | | | | | | | | | | | | | | | This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28
* Implement per thread heap profiling.Jason Evans2014-08-201-1/+55
| | | | | | | | | | | | | | | | | | | | | | | | Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.
* Minor doc edit.Jason Evans2014-05-161-4/+4
|
* Refactor huge allocation to be managed by arenas.Jason Evans2014-05-161-63/+65
| | | | | | | | | | | | | | | | | | | | Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to *dalloc* (some were *dealloc*). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.
* Add support for user-specified chunk allocators/deallocators.aravind2014-05-121-0/+63
| | | | | | | Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.
* Optimize Valgrind integration.Jason Evans2014-04-151-1/+2
| | | | | | | | | | | Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().
* Remove the "opt.valgrind" mallctl.Jason Evans2014-04-151-13/+0
| | | | | Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc automatically detects whether it is running inside valgrind.
* Remove the "arenas.purge" mallctl.Jason Evans2014-04-151-11/+1
| | | | | Remove the "arenas.purge" mallctl, which was obsoleted by the "arena.<i>.purge" mallctl in 3.1.0.
* Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.Jason Evans2014-04-151-16/+13
| | | | | | | Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.
* Update MALLOCX_ARENA() documentation.Jason Evans2014-04-151-4/+4
| | | | | Update MALLOCX_ARENA() documentation to no longer claim that it has no effect for huge region allocations.
* Remove the *allocm() API, which is superceded by the *allocx() API.Jason Evans2014-04-151-189/+2
|
* Document how dss precedence affects huge allocation.Jason Evans2014-03-311-2/+6
|
* Extract profiling code from [re]allocation functions.Jason Evans2014-01-121-10/+16
| | | | | | | | | | | | | | | | | | | Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().
* Fix a few mallctl() documentation errors.Jason Evans2013-12-201-17/+20
| | | | Normalize mallctl() order (code and documentation).
* Add mallctl*() unit tests.Jason Evans2013-12-201-3/+2
|
* Remove ENOMEM from the documented set of *mallctl() errors.Jason Evans2013-12-181-6/+0
| | | | | | | *mallctl() always returns EINVAL and does partial result copying when *oldlenp is to short to hold the requested value, rather than returning ENOMEM. Therefore remove ENOMEM from the documented set of possible errors.
* Implement the *allocx() API.Jason Evans2013-12-131-47/+201
| | | | | | | | | | | | | | | | | | | | | | | Implement the *allocx() API, which is a successor to the *allocm() API. The *allocx() functions are slightly simpler to use because they have fewer parameters, they directly return the results of primary interest, and mallocx()/rallocx() avoid the strict aliasing pitfall that allocm()/rallocx() share with posix_memalign(). The following code violates strict aliasing rules: foo_t *foo; allocm((void **)&foo, NULL, 42, 0); whereas the following is safe: foo_t *foo; void *p; allocm(&p, NULL, 42, 0); foo = (foo_t *)p; mallocx() does not have this problem: foo_t *foo = (foo_t *)mallocx(42, 0);
* Fix ALLOCM_ARENA(a) handling in rallocm().Jason Evans2013-11-261-4/+6
| | | | | | | Fix rallocm() to use the specified arena for allocation, not just deallocation. Clarify ALLOCM_ARENA(a) documentation.
* Add ids for all mallctl entries.Jason Evans2013-10-301-69/+69
| | | | | Add ids for all mallctl entries, so that external documents can link to arbitrary mallctl entries.
* Clarify how to use malloc_conf.Jason Evans2013-03-191-1/+8
| | | | | | Clarify that malloc_conf is intended only for compile-time configuration, since jemalloc may be initialized before main() is entered.
* Add clipping support to lg_chunk option processing.Jason Evans2012-12-231-2/+5
| | | | | | | | | Modify processing of the lg_chunk option so that it clips an out-of-range input to the edge of the valid range. This makes it possible to request the minimum possible chunk size without intimate knowledge of allocator internals. Submitted by Ian Lepore (see FreeBSD PR bin/174641).
* document what stats.active does not trackJan Beich2012-11-071-2/+4
| | | | Based on http://www.canonware.com/pipermail/jemalloc-discuss/2012-March/000164.html
* Purge unused dirty pages in a fragmentation-reducing order.Jason Evans2012-11-061-1/+1
| | | | | | | | | | | | | | | | Purge unused dirty pages in an order that first performs clean/dirty run defragmentation, in order to mitigate available run fragmentation. Remove the limitation that prevented purging unless at least one chunk worth of dirty pages had accumulated in an arena. This limitation was intended to avoid excessive purging for small applications, but the threshold was arbitrary, and the effect of questionable utility. Relax opt_lg_dirty_mult from 5 to 3. This compensates for increased likelihood of allocating clean runs, given the same ratio of clean:dirty runs, and reduces the potential for repeated purging in pathological large malloc/free loops that push the active:dirty page ratio just over the purge threshold.
* Add arena-specific and selective dss allocation.Jason Evans2012-10-131-9/+80
| | | | | | | | | | | | | | | | | | | Add the "arenas.extend" mallctl, so that it is possible to create new arenas that are outside the set that jemalloc automatically multiplexes threads onto. Add the ALLOCM_ARENA() flag for {,r,d}allocm(), so that it is possible to explicitly allocate from a particular arena. Add the "opt.dss" mallctl, which controls the default precedence of dss allocation relative to mmap allocation. Add the "arena.<i>.dss" mallctl, which makes it possible to set the default dss precedence on a per arena or global basis. Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". Add the "stats.arenas.<i>.dss" mallctl.
* Disable tcache by default if running inside Valgrind.Jason Evans2012-05-161-1/+2
| | | | | Disable tcache by default if running inside Valgrind, in order to avoid making unallocated objects appear reachable to Valgrind.
* Auto-detect whether running inside Valgrind.Jason Evans2012-05-151-16/+11
| | | | | Auto-detect whether running inside Valgrind, thus removing the need to manually specify MALLOC_CONF=valgrind:true.
* Generalize "stats.mapped" documentation.Jason Evans2012-05-101-2/+2
| | | | | | Generalize "stats.mapped" documentation to state that all inactive chunks are omitted, now that it is possible for mmap'ed chunks to be omitted in addition to DSS chunks.
* Add the --enable-mremap option.Jason Evans2012-05-091-0/+10
| | | | | | Add the --enable-mremap option, and disable the use of mremap(2) by default, for the same reason that freeing chunks via munmap(2) is disabled by default on Linux: semi-permanent VM map fragmentation.
* Fix Valgrind URL in documentation.Jason Evans2012-04-261-20/+20
| | | | Reported by Daichi GOTO.
* Fix a memory corruption bug in chunk_alloc_dss().Jason Evans2012-04-211-2/+2
| | | | | | | | | Fix a memory corruption bug in chunk_alloc_dss() that was due to claiming newly allocated memory is zeroed. Reverse order of preference between mmap() and sbrk() to prefer mmap(). Clean up management of 'zero' parameter in chunk_alloc*().
* Update prof defaults to match common usage.Jason Evans2012-04-171-17/+28
| | | | | | | | | Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB). Change the "opt.prof_accum" default from true to false. Add the "opt.prof_final" mallctl, so that "opt.prof_prefix" need not be abused to disable final profile dumping.
* Update pprof (from gperftools 2.0).Jason Evans2012-04-171-1/+1
|
* Add the --disable-munmap option.Jason Evans2012-04-171-0/+10
| | | | | | Add the --disable-munmap option, remove the configure test that attempted to detect the VM allocation quirk known to exist on Linux x86[_64], and make --disable-munmap implicit on Linux.
* Always disable redzone by default.Jason Evans2012-04-131-3/+1
| | | | | | Always disable redzone by default, even when --enable-debug is specified. The memory overhead for redzones can be substantial, which makes this feature something that should only be opted into.
* Implement Valgrind support, redzones, and quarantine.Jason Evans2012-04-111-4/+75
| | | | | | | | | | | | | Implement Valgrind support, as well as the redzone and quarantine features, which help Valgrind detect memory errors. Redzones are only implemented for small objects because the changes necessary to support redzones around large and huge objects are complicated by in-place reallocation, to the point that it isn't clear that the maintenance burden is worth the incremental improvement to Valgrind support. Merge arena_salloc() and arena_salloc_demote(). Refactor i[v]salloc() to expose the 'demote' option.
* Add utrace(2)-based tracing (--enable-utrace).Jason Evans2012-04-051-0/+25
|
* Remove obsolete "config.dynamic_page_shift" mallctl documentation.Jason Evans2012-04-031-10/+0
|
* Clean up *PAGE* macros.Jason Evans2012-04-021-10/+1
| | | | | | | | | | | s/PAGE_SHIFT/LG_PAGE/g and s/PAGE_SIZE/PAGE/g. Remove remnants of the dynamic-page-shift code. Rename the "arenas.pagesize" mallctl to "arenas.page". Remove the "arenas.chunksize" mallctl, which is redundant with "opt.lg_chunk".
* Add the "thread.tcache.enabled" mallctl.Jason Evans2012-03-271-0/+14
|
* Fix various documentation formatting regressions.Jason Evans2012-03-191-18/+20
|
* Rename the "tcache.flush" mallctl to "thread.tcache.flush".Jason Evans2012-03-171-18/+18
|