jemalloc.git - jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add (x != 0) assertion to lg_floor(x).	Jason Evans	2015-02-05	1	-6/+14
\| \| \| \| \| \|	lg_floor(0) is undefined, but depending on compiler options may not cause a crash. This assertion makes it harder to accidentally abuse lg_floor().
*	Refactor base_alloc() to guarantee demand-zeroed memory.	Jason Evans	2015-02-05	2	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Refactor base_alloc() to guarantee that allocations are carved from demand-zeroed virtual memory. This supports sparse data structures such as multi-page radix tree nodes. Enhance base_alloc() to keep track of fragments which were too small to support previous allocation requests, and try to consume them during subsequent requests. This becomes important when request sizes commonly approach or exceed the chunk size (as could radix tree node allocations).
*	Reduce extent_node_t size to fit in one cache line.	Jason Evans	2015-02-05	1	-5/+11
\|
*	Implement more atomic operations.	Jason Evans	2015-02-05	2	-82/+391
\| \| \| \| \| \|	- atomic__p(). - atomic_cas_(). - atomic_write_*().
*	Add missing prototypes for bootstrap_{malloc,calloc,free}().	Jason Evans	2015-02-05	1	-1/+3
\|
*	Implement the prof.gdump mallctl.	Jason Evans	2015-01-26	2	-0/+22
\| \| \| \| \| \| \| \|	This feature makes it possible to toggle the gdump feature on/off during program execution, whereas the the opt.prof_dump mallctl value can only be set during program startup. This resolves #72.
*	Implement metadata statistics.	Jason Evans	2015-01-24	7	-49/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.
*	Refactor bootstrapping to delay tsd initialization.	Jason Evans	2015-01-22	3	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Refactor bootstrapping to delay tsd initialization, primarily to support integration with FreeBSD's libc. Refactor a0() for internal-only use, and add the bootstrap_{malloc,calloc,free}() API for use by FreeBSD's libc. This separation limits use of the a0() functions to metadata allocation, which doesn't require malloc/calloc/free API compatibility. This resolves #170.
*	Add missing symbols to private_symbols.txt.	Abhishek Kulkarni	2015-01-21	1	-0/+4
\| \| \| \|	This resolves #185.
*	Add a isblank definition for MSVC < 2013	Guilherme Goncalves	2015-01-09	1	-0/+8
\|
*	Introduce two new modes of junk filling: "alloc" and "free".	Guilherme Goncalves	2014-12-15	3	-6/+10
\| \| \| \| \| \| \| \|	In addition to true/false, opt.junk can now be either "alloc" or "free", giving applications the possibility of junking memory only on allocation or deallocation. This resolves #172.
*	Ignore MALLOC_CONF in set{uid,gid,cap} binaries.	Daniel Micay	2014-12-14	1	-0/+10
\| \| \| \| \| \|	This eliminates the malloc tunables as tools for an attacker. Closes #173
*	Style and spelling fixes.	Jason Evans	2014-12-09	11	-27/+25
\|
*	Add a C11 atomics-based implementation of atomic.h API.	Chih-hung Hsieh	2014-12-07	3	-0/+35
\|
*	Style fixes.	Jason Evans	2014-12-06	1	-2/+2
\|
*	teach the dss chunk allocator to handle new_addr	Daniel Micay	2014-11-29	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides in-place expansion of huge allocations when the end of the allocation is at the end of the sbrk heap. There's already the ability to extend in-place via recycled chunks but this handles the initial growth of the heap via repeated vector / string reallocations. A possible future extension could allow realloc to go from the following: \| huge allocation \| recycled chunks \| ^ dss_end To a larger allocation built from recycled and new chunks: \| huge allocation \| ^ dss_end Doing that would involve teaching the chunk recycling code to request new chunks to satisfy the request. The chunk_dss code wouldn't require any further changes. #include <stdlib.h> int main(void) { size_t chunk = 4 * 1024 * 1024; void ptr = NULL; for (size_t size = chunk; size < chunk 128; size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; } } dss:secondary: 0.083s dss:primary: 0.083s After: dss:secondary: 0.083s dss:primary: 0.003s The dss heap grows in the upwards direction, so the oldest chunks are at the low addresses and they are used first. Linux prefers to grow the mmap heap downwards, so the trick will not work in the current* mmap chunk allocator as a huge allocation will only be at the top of the heap in a contrived case.
*	Remove extra definition of je_tsd_boot on win32.	Guilherme Goncalves	2014-11-18	1	-6/+0
\|
*	Make quarantine_init() static.	Jason Evans	2014-11-07	2	-3/+1
\|
*	Fix two quarantine regressions.	Jason Evans	2014-11-05	2	-2/+4
\| \| \| \| \| \| \| \|	Fix quarantine to actually update tsd when expanding, and to avoid double initialization (leaking the first quarantine) due to recursive initialization. This resolves #161.
*	Fix arena_sdalloc() to use promoted size (second attempt).	Jason Evans	2014-11-01	1	-8/+11
\| \| \| \| \|	Unlike the preceeding attempted fix, this version avoids the potential for converting an invalid bin index to a size class.
*	Fix arena_sdalloc() to use promoted size.	Jason Evans	2014-11-01	1	-7/+15
\|
*	Miscellaneous cleanups.	Jason Evans	2014-10-31	1	-6/+4
\|
*	avoid redundant chunk header reads	Daniel Micay	2014-10-31	1	-17/+16
\| \| \| \| \| \|	* use sized deallocation in iralloct_realign * iralloc and ixalloc always need the old size, so pass it in from the caller where it's often already calculated
*	mark huge allocations as unlikely	Daniel Micay	2014-10-31	2	-12/+12
\| \| \| \|	This cleans up the fast path a bit more by moving away more code.
*	Fix huge allocation statistics.	Jason Evans	2014-10-15	2	-2/+11
\|
*	Add per size class huge allocation statistics.	Jason Evans	2014-10-13	3	-12/+32
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add per size class huge allocation statistics, and normalize various stats: - Change the arenas.nlruns type from size_t to unsigned. - Add the arenas.nhchunks and arenas.hchunks.<i>.size mallctl's. - Replace the stats.arenas.<i>.bins.<j>.allocated mallctl with stats.arenas.<i>.bins.<j>.curregs . - Add the stats.arenas.<i>.hchunks.<j>.nmalloc, stats.arenas.<i>.hchunks.<j>.ndalloc, stats.arenas.<i>.hchunks.<j>.nrequests, and stats.arenas.<i>.hchunks.<j>.curhchunks mallctl's.
*	Fix a prof_tctx_t/prof_tdata_t cleanup race.	Jason Evans	2014-10-12	1	-0/+6
\| \| \| \| \| \|	Fix a prof_tctx_t/prof_tdata_t cleanup race by storing a copy of thr_uid in prof_tctx_t, so that the associated tdata need not be present during tctx teardown.
*	Remove arena_dalloc_bin_run() clean page preservation.	Jason Evans	2014-10-11	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Remove code in arena_dalloc_bin_run() that preserved the "clean" state of trailing clean pages by splitting them into a separate run during deallocation. This was a useful mechanism for reducing dirty page churn when bin runs comprised many pages, but bin runs are now quite small. Remove the nextind field from arena_run_t now that it is no longer needed, and change arena_run_t's bin field (arena_bin_t *) to binind (index_t). These two changes remove 8 bytes of chunk header overhead per page, which saves 1/512 of all arena chunk memory.
*	Add --with-lg-tiny-min, generalize --with-lg-quantum.	Jason Evans	2014-10-11	3	-6/+8
\|
*	Add configure options.	Jason Evans	2014-10-10	7	-29/+46
\| \| \| \| \| \| \| \| \| \| \| \|	Add: --with-lg-page --with-lg-page-sizes --with-lg-size-class-group --with-lg-quantum Get rid of STATIC_PAGE_SHIFT, in favor of directly setting LG_PAGE. Fix various edge conditions exposed by the configure options.
*	Use regular arena allocation for huge tree nodes.	Daniel Micay	2014-10-08	2	-3/+3
\| \| \| \| \| \|	This avoids grabbing the base mutex, as a step towards fine-grained locking for huge allocations. The thread cache also provides a tiny (~3%) improvement for serial huge allocations.
*	Refactor/fix arenas manipulation.	Jason Evans	2014-10-08	5	-108/+264
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Abstract arenas access to use arena_get() (or a0get() where appropriate) rather than directly reading e.g. arenas[ind]. Prior to the addition of the arenas.extend mallctl, the worst possible outcome of directly accessing arenas was a stale read, but arenas.extend may allocate and assign a new array to arenas. Add a tsd-based arenas_cache, which amortizes arenas reads. This introduces some subtle bootstrapping issues, with tsd_boot() now being split into tsd_boot[01]() to support tsd wrapper allocation bootstrapping, as well as an arenas_cache_bypass tsd variable which dynamically terminates allocation of arenas_cache itself. Promote a0malloc(), a0calloc(), and a0free() to be generally useful for internal allocation, and use them in several places (more may be appropriate). Abstract arena->nthreads management and fix a missing decrement during thread destruction (recent tsd refactoring left arenas_cleanup() unused). Change arena_choose() to propagate OOM, and handle OOM in all callers. This is important for providing consistent allocation behavior when the MALLOCX_ARENA() flag is being used. Prior to this fix, it was possible for an OOM to result in allocation silently allocating from a different arena than the one specified.
*	Normalize size classes.	Jason Evans	2014-10-06	8	-318/+311
\| \| \| \| \| \| \| \| \| \|	Normalize size classes to use the same number of size classes per size doubling (currently hard coded to 4), across the intire range of size classes. Small size classes already used this spacing, but in order to support this change, additional small size classes now fill [4 KiB .. 16 KiB). Large size classes range from [16 KiB .. 4 MiB). Huge size classes now support non-multiples of the chunk size in order to fill (4 MiB .. 16 MiB).
*	Attempt to expand huge allocations in-place.	Daniel Micay	2014-10-05	5	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for expanding huge allocations in-place by requesting memory at a specific address from the chunk allocator. It's currently only implemented for the chunk recycling path, although in theory it could also be done by optimistically allocating new chunks. On Linux, it could attempt an in-place mremap. However, that won't work in practice since the heap is grown downwards and memory is not unmapped (in a normal build, at least). Repeated vector reallocation micro-benchmark: #include <string.h> #include <stdlib.h> int main(void) { for (size_t i = 0; i < 100; i++) { void ptr = NULL; size_t old_size = 0; for (size_t size = 4; size < (1 << 30); size = 2) { ptr = realloc(ptr, size); if (!ptr) return 1; memset(ptr + old_size, 0xff, size - old_size); old_size = size; } free(ptr); } } The glibc allocator fails to do any in-place reallocations on this benchmark once it passes the M_MMAP_THRESHOLD (default 128k) but it elides the cost of copies via mremap, which is currently not something that jemalloc can use. With this improvement, jemalloc still fails to do any in-place huge reallocations for the first outer loop, but then succeeds 100% of the time for the remaining 99 iterations. The time spent doing allocations and copies drops down to under 5%, with nearly all of it spent doing purging + faulting (when huge pages are disabled) and the array memset. An improved mremap API (MREMAP_RETAIN - #138) would be far more general but this is a portable optimization and would still be useful on Linux for xallocx. Numbers with transparent huge pages enabled: glibc (copies elided via MREMAP_MAYMOVE): 8.471s jemalloc: 17.816s jemalloc + no-op madvise: 13.236s jemalloc + this commit: 6.787s jemalloc + this commit + no-op madvise: 6.144s Numbers with transparent huge pages disabled: glibc (copies elided via MREMAP_MAYMOVE): 15.403s jemalloc: 39.456s jemalloc + no-op madvise: 12.768s jemalloc + this commit: 15.534s jemalloc + this commit + no-op madvise: 6.354s Closes #137
*	Add missing header includes in jemalloc/jemalloc.h .	Jason Evans	2014-10-05	1	-0/+3
\| \| \| \| \| \| \|	Add stdlib.h, stdbool.h, and stdint.h to jemalloc/jemalloc.h so that applications only have to #include <jemalloc/jemalloc.h>. This resolves #132.
*	Don't disable tcache for lazy-lock.	Jason Evans	2014-10-04	1	-2/+0
\| \| \| \| \| \|	Don't disable tcache when lazy-lock is configured. There already exists a mechanism to disable tcache, but doing so automatically due to lazy-lock causes surprising performance behavior.
*	Make prof-related inline functions always-inline.	Jason Evans	2014-10-04	1	-9/+9
\|
*	Fix tsd cleanup regressions.	Jason Evans	2014-10-04	5	-50/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix tsd cleanup regressions that were introduced in 5460aa6f6676c7f253bfcb75c028dfd38cae8aaf (Convert all tsd variables to reside in a single tsd structure.). These regressions were twofold: 1) tsd_tryget() should never (and need never) return NULL. Rename it to tsd_fetch() and simplify all callers. 2) tsd__set() must only be called when tsd is in the nominal state, because cleanup happens during the nominal-->purgatory transition, and re-initialization must not happen while in the purgatory state. Add tsd_nominal() and use it as needed. Note that tsd_{p,}_get() can still be used as long as no re-initialization that would require cleanup occurs. This means that e.g. the thread_allocated counter can be updated unconditionally.
*	Implement/test/fix prof-related mallctl's.	Jason Evans	2014-10-04	2	-9/+30
\| \| \| \| \| \| \| \| \| \| \|	Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.
*	Convert to uniform style: cond == false --> !cond	Jason Evans	2014-10-03	6	-20/+18
\|
*	Test prof.reset mallctl and fix numerous discovered bugs.	Jason Evans	2014-10-03	2	-10/+15
\|
*	correctly detect adaptive mutexes in pthreads	Eric Wong	2014-09-29	2	-1/+4
\| \| \| \| \|	PTHREAD_MUTEX_ADAPTIVE_NP is an enum on glibc and not a macro, we must test for their existence by attempting compilation.
*	Merge pull request #129 from daverigby/msvc_lg_floor	Jason Evans	2014-09-29	1	-0/+15
\|\ \| \| \| \|	Use MSVC intrinsics for lg_floor
\| *	Use MSVC intrinsics for lg_floor	Dave Rigby	2014-09-24	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	When using MSVC make use of its intrinsic functions (supported on x86, amd64 & ARM) for lg_floor.
* \|	Move small run metadata into the arena chunk header.	Jason Evans	2014-09-29	2	-67/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move small run metadata into the arena chunk header, with multiple expected benefits: - Lower run fragmentation due to reduced run sizes; runs are more likely to completely drain when there are fewer total regions. - Improved cache behavior. Prior to this change, run headers were always page-aligned, which put extra pressure on some CPU cache sets. The degree to which this was a problem was hardware dependent, but it likely hurt some even for the most advanced modern hardware. - Buffer overruns/underruns are less likely to corrupt allocator metadata. - Size classes between 4 KiB and 16 KiB become reasonable to support without any special handling, and the runs are small enough that dirty unused pages aren't a significant concern.
* \|	Implement compile-time bitmap size computation.	Jason Evans	2014-09-28	1	-0/+46
\| \|
* \|	Fix profile dumping race.	Jason Evans	2014-09-25	1	-0/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a race that caused a non-critical assertion failure. To trigger the race, a thread had to be part way through initializing a new sample, such that it was discoverable by the dumping thread, but not yet linked into its gctx by the time a later dump phase would normally have reset its state to 'nominal'. Additionally, lock access to the state field during modification to transition to the dumping state. It's not apparent that this oversight could have caused an actual problem due to outer locking that protects the dumping machinery, but the added locking pedantically follows the stated locking protocol for the state field.
*	Convert all tsd variables to reside in a single tsd structure.	Jason Evans	2014-09-23	9	-414/+444
\|
*	Apply likely()/unlikely() to allocation/deallocation fast paths.	Jason Evans	2014-09-12	4	-46/+53
\|
*	mark some conditions as unlikely	Daniel Micay	2014-09-11	3	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \|	* assertion failure * malloc_init failure * malloc not already initialized (in malloc_init) * running in valgrind * thread cache disabled at runtime Clang and GCC already consider a comparison with NULL or -1 to be cold, so many branches (out-of-memory) are already correctly considered as cold and marking them is not important.