summaryrefslogtreecommitdiffstats
path: root/doc
Commit message (Collapse)AuthorAgeFilesLines
* Make *allocx() size class overflow behavior defined.Jason Evans2016-02-251-8/+6
| | | | | | | Limit supported size and alignment to HUGE_MAXCLASS, which in turn is now limited to be less than PTRDIFF_MAX. This resolves #278 and #295.
* Document the heap profile format.Jason Evans2016-02-241-1/+49
| | | | This resolves #258.
* Update manual to reflect removal of global huge object tree.Jason Evans2016-02-241-16/+11
| | | | This resolves #323.
* Make opt_narenas unsigned rather than size_t.Jason Evans2016-02-241-1/+1
|
* Implement decay-based unused dirty page purging.Jason Evans2016-02-201-1/+94
| | | | | | | | | | | | | | | | This is an alternative to the existing ratio-based unused dirty page purging, and is intended to eventually become the sole purging mechanism. Add mallctls: - opt.purge - opt.decay_time - arena.<i>.decay - arena.<i>.decay_time - arenas.decay_time - stats.arenas.<i>.decay_time This resolves #325.
* Add --with-malloc-conf.Jason Evans2016-02-201-8/+20
| | | | | Add --with-malloc-conf, which makes it possible to embed a default options string during configuration.
* Fix a documentation editing error.Jason Evans2016-02-201-1/+1
|
* Fix a manual editing error.Jason Evans2015-10-191-2/+2
|
* Improve arena.<i>.chunk_hooks documentation formatting.Jason Evans2015-08-141-37/+46
|
* Update in-place reallocation documentation.Jason Evans2015-08-141-3/+9
|
* Update large/huge size class cutoff documentation.Jason Evans2015-08-141-9/+9
|
* Implement chunk hook support for page run commit/decommit.Jason Evans2015-08-071-24/+37
| | | | | | | | | Cascade from decommit to purge when purging unused dirty pages, so that it is possible to decommit cleaned memory rather than just purging. For non-Windows debug builds, decommit runs rather than purging them, since this causes access of deallocated runs to segfault. This resolves #251.
* Generalize chunk management hooks.Jason Evans2015-08-041-68/+133
| | | | | | | | | | | | | | | | | | | | Add the "arena.<i>.chunk_hooks" mallctl, which replaces and expands on the "arena.<i>.chunk.{alloc,dalloc,purge}" mallctls. The chunk hooks allow control over chunk allocation/deallocation, decommit/commit, purging, and splitting/merging, such that the application can rely on jemalloc's internal chunk caching and retaining functionality, yet implement a variety of chunk management mechanisms and policies. Merge the chunks_[sz]ad_{mmap,dss} red-black trees into chunks_[sz]ad_retained. This slightly reduces how hard jemalloc tries to honor the dss precedence setting; prior to this change the precedence setting was also consulted when recycling chunks. Fix chunk purging. Don't purge chunks in arena_purge_stashed(); instead deallocate them in arena_unstash_purged(), so that the dirty memory linkage remains valid until after the last time it is used. This resolves #176 and #201.
* Add the config.cache_oblivious mallctl.Jason Evans2015-07-171-0/+10
|
* Change default chunk size from 256 KiB to 2 MiB.Jason Evans2015-07-161-1/+1
| | | | | | This change improves interaction with transparent huge pages, e.g. reduced page faults (at least in the absence of unused dirty page purging).
* Clarify relationship between stats.resident and stats.mapped.Jason Evans2015-05-301-4/+6
|
* Rename pprof to jeprof.Jason Evans2015-05-011-2/+3
| | | | | | | | | | This rename avoids installation collisions with the upstream gperftools. Additionally, jemalloc's per thread heap profile functionality introduced an incompatible file format, so it's now worthwhile to clearly distinguish jemalloc's version of this script from the upstream version. This resolves #229.
* Fix mallctl doc: arenas.hchunk.<i>.sizeQinfan Wu2015-04-301-2/+2
|
* Add the "stats.arenas.<i>.lg_dirty_mult" mallctl.Jason Evans2015-03-241-0/+12
|
* Add the "stats.allocated" mallctl.Jason Evans2015-03-241-3/+20
|
* Implement dynamic per arena control over dirty page purging.Jason Evans2015-03-191-8/+80
| | | | | | | | | | | | | | Add mallctls: - arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be modified to change the initial lg_dirty_mult setting for newly created arenas. - arena.<i>.lg_dirty_mult controls an individual arena's dirty page purging threshold, and synchronously triggers any purging that may be necessary to maintain the constraint. - arena.<i>.chunk.purge allows the per arena dirty page purging function to be replaced. This resolves #93.
* Change default chunk size from 4 MiB to 256 KiB.Jason Evans2015-03-071-13/+13
| | | | | | | | Recent changes have improved huge allocation scalability, which removes upward pressure to set the chunk size so large that huge allocations are rare. Smaller chunks are more likely to completely drain, so set the default to the smallest size that doesn't leave excessive unusable trailing space in chunk headers.
* Move centralized chunk management into arenas.Jason Evans2015-02-121-34/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migrate all centralized data structures related to huge allocations and recyclable chunks into arena_t, so that each arena can manage huge allocations and recyclable virtual memory completely independently of other arenas. Add chunk node caching to arenas, in order to avoid contention on the base allocator. Use chunks_rtree to look up huge allocations rather than a red-black tree. Maintain a per arena unsorted list of huge allocations (which will be needed to enumerate huge allocations during arena reset). Remove the --enable-ivsalloc option, make ivsalloc() always available, and use it for size queries if --enable-debug is enabled. The only practical implications to this removal are that 1) ivsalloc() is now always available during live debugging (and the underlying radix tree is available during core-based debugging), and 2) size query validation can no longer be enabled independent of --enable-debug. Remove the stats.chunks.{current,total,high} mallctls, and replace their underlying statistics with simpler atomically updated counters used exclusively for gdump triggering. These statistics are no longer very useful because each arena manages chunks independently, and per arena statistics provide similar information. Simplify chunk synchronization code, now that base chunk allocation cannot cause recursive lock acquisition.
* Implement explicit tcache support.Jason Evans2015-02-101-21/+85
| | | | | | | | | Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.
* Implement the prof.gdump mallctl.Jason Evans2015-01-261-7/+21
| | | | | | | | This feature makes it possible to toggle the gdump feature on/off during program execution, whereas the the opt.prof_dump mallctl value can only be set during program startup. This resolves #72.
* Implement metadata statistics.Jason Evans2015-01-241-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.
* Document under what circumstances in-place resizing succeeds.Jason Evans2015-01-221-0/+16
| | | | This resolves #100.
* Introduce two new modes of junk filling: "alloc" and "free".Guilherme Goncalves2014-12-151-9/+11
| | | | | | | | In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.
* Fix huge allocation statistics.Jason Evans2014-10-151-3/+2
|
* Update size class documentation.Jason Evans2014-10-151-26/+84
|
* Add per size class huge allocation statistics.Jason Evans2014-10-131-17/+81
| | | | | | | | | | | | | Add per size class huge allocation statistics, and normalize various stats: - Change the arenas.nlruns type from size_t to unsigned. - Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's. - Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with stats.arenas.<i>.bins.<j>.curregs . - Add the stats.arenas.<i>.hchunks.<j>.nmalloc, stats.arenas.<i>.hchunks.<j>.ndalloc, stats.arenas.<i>.hchunks.<j>.nrequests, and stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.
* Avoid atexit(3) when possible, disable prof_final by default.Jason Evans2014-10-091-3/+15
| | | | | | | | | | | | atexit(3) can deadlock internally during its own initialization if jemalloc calls atexit() during jemalloc initialization. Mitigate the impact by restructuring prof initialization to avoid calling atexit() unless the registered function will actually dump a final heap profile. Additionally, disable prof_final by default so that this land mine is opt-in rather than opt-out. This resolves #144.
* Fix a docbook element nesting nit.Jason Evans2014-10-051-4/+4
| | | | | | | According to the docbook documentation for <funcprototype>, its parent must be <funcsynopsis>; fix accordingly. Nonetheless, the man page processor fails badly when this construct is embedded in a <para> (which is documented to be legal), although the html processor does fine.
* Attempt to expand huge allocations in-place.Daniel Micay2014-10-051-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void *ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size *= 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137
* Add missing header includes in jemalloc/jemalloc.h .Jason Evans2014-10-051-2/+1
| | | | | | | Add stdlib.h, stdbool.h, and stdint.h to jemalloc/jemalloc.h so that applications only have to #include <jemalloc/jemalloc.h>. This resolves #132.
* Implement/test/fix prof-related mallctl's.Jason Evans2014-10-041-6/+46
| | | | | | | | | | | Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.
* Test prof.reset mallctl and fix numerous discovered bugs.Jason Evans2014-10-031-2/+3
|
* Add support for sized deallocation.Daniel Micay2014-09-091-1/+18
| | | | | | | | | | | | | | | | | This adds a new `sdallocx` function to the external API, allowing the size to be passed by the caller. It avoids some extra reads in the thread cache fast path. In the case where stats are enabled, this avoids the work of calculating the size from the pointer. An assertion validates the size that's passed in, so enabling debugging will allow users of the API to debug cases where an incorrect size is passed in. The performance win for a contrived microbenchmark doing an allocation and immediately freeing it is ~10%. It may have a different impact on a real workload. Closes #28
* Implement per thread heap profiling.Jason Evans2014-08-201-1/+55
| | | | | | | | | | | | | | | | | | | | | | | | Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.
* Minor doc edit.Jason Evans2014-05-161-4/+4
|
* Refactor huge allocation to be managed by arenas.Jason Evans2014-05-161-63/+65
| | | | | | | | | | | | | | | | | | | | Refactor huge allocation to be managed by arenas (though the global red-black tree of huge allocations remains for lookup during deallocation). This is the logical conclusion of recent changes that 1) made per arena dss precedence apply to huge allocation, and 2) made it possible to replace the per arena chunk allocation/deallocation functions. Remove the top level huge stats, and replace them with per arena huge stats. Normalize function names and types to *dalloc* (some were *dealloc*). Remove the --enable-mremap option. As jemalloc currently operates, this is a performace regression for some applications, but planned work to logarithmically space huge size classes should provide similar amortized performance. The motivation for this change was that mremap-based huge reallocation forced leaky abstractions that prevented refactoring.
* Add support for user-specified chunk allocators/deallocators.aravind2014-05-121-0/+63
| | | | | | | Add new mallctl endpoints "arena<i>.chunk.alloc" and "arena<i>.chunk.dealloc" to allow userspace to configure jemalloc's chunk allocator and deallocator on a per-arena basis.
* Optimize Valgrind integration.Jason Evans2014-04-151-1/+2
| | | | | | | | | | | Forcefully disable tcache if running inside Valgrind, and remove Valgrind calls in tcache-specific code. Restructure Valgrind-related code to move most Valgrind calls out of the fast path functions. Take advantage of static knowledge to elide some branches in JEMALLOC_VALGRIND_REALLOC().
* Remove the "opt.valgrind" mallctl.Jason Evans2014-04-151-13/+0
| | | | | Remove the "opt.valgrind" mallctl because it is unnecessary -- jemalloc automatically detects whether it is running inside valgrind.
* Remove the "arenas.purge" mallctl.Jason Evans2014-04-151-11/+1
| | | | | Remove the "arenas.purge" mallctl, which was obsoleted by the "arena.<i>.purge" mallctl in 3.1.0.
* Make dss non-optional, and fix an "arena.<i>.dss" mallctl bug.Jason Evans2014-04-151-16/+13
| | | | | | | Make dss non-optional on all platforms which support sbrk(2). Fix the "arena.<i>.dss" mallctl to return an error if "primary" or "secondary" precedence is specified, but sbrk(2) is not supported.
* Update MALLOCX_ARENA() documentation.Jason Evans2014-04-151-4/+4
| | | | | Update MALLOCX_ARENA() documentation to no longer claim that it has no effect for huge region allocations.
* Remove the *allocm() API, which is superceded by the *allocx() API.Jason Evans2014-04-151-189/+2
|
* Document how dss precedence affects huge allocation.Jason Evans2014-03-311-2/+6
|
* Extract profiling code from [re]allocation functions.Jason Evans2014-01-121-10/+16
| | | | | | | | | | | | | | | | | | | Extract profiling code from malloc(), imemalign(), calloc(), realloc(), mallocx(), rallocx(), and xallocx(). This slightly reduces the amount of code compiled into the fast paths, but the primary benefit is the combinatorial complexity reduction. Simplify iralloc[t]() by creating a separate ixalloc() that handles the no-move cases. Further simplify [mrxn]allocx() (and by implication [mrn]allocm()) to make request size overflows due to size class and/or alignment constraints trigger undefined behavior (detected by debug-only assertions). Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling backtrace creation in imemalign(). This bug impacted posix_memalign() and aligned_alloc().