jemalloc.git - jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Workaround to address g++ unused variable warnings	Yinan Zhang	2019-07-30	1	-2/+4
\| \| \| \| \| \|	g++ 5.5.0+ complained `parameter ‘expected’ set but not used [-Werror=unused-but-set-parameter]` (despite that `expected` is in fact used).
*	Revert "Refactor prof log"	Qi Wang	2019-07-29	1	-8/+0
\| \| \| \|	This reverts commit 7618b0b8e458d9c0db6e4b05ccbe6c6308952890.
*	Revert "Refactor profiling"	Qi Wang	2019-07-29	1	-14/+0
\| \| \| \|	This reverts commit 0b462407ae84a62b3c097f0e9f18df487a47d9a7.
*	Refactor profiling	Yinan Zhang	2019-07-29	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	Refactored core profiling codebase into two logical parts: (a) `prof_data.c`: core internal data structure managing & dumping; (b) `prof.c`: mutexes & outward-facing APIs. Some internal functions had to be exposed out, but there are not that many of them if the modularization is (hopefully) clean enough.
*	Refactor prof log	Yinan Zhang	2019-07-29	1	-0/+8
\| \| \| \| \| \|	`prof.c` is growing too long, so trying to modularize it. There are a few internal functions that had to be exposed but I think it is a fair trade-off.
*	Refactor arena_dalloc() / _sdalloc().	Qi Wang	2019-07-25	1	-24/+18
\|
*	Invoke arena_dalloc_promoted() properly w/o tcache.	Qi Wang	2019-07-25	1	-4/+12
\| \| \| \|	When tcache was disabled, the dalloc promoted case was missing.
*	Track the leaked VM space via the abandoned_vm counter.	Qi Wang	2019-07-24	1	-0/+3
\| \| \| \| \|	The counter is 0 unless metadata allocation failed (indicates OOM), and is mainly for sanity checking.
*	Implement retain on Windows.	Qi Wang	2019-07-24	3	-2/+35
\| \| \| \| \| \| \| \| \| \| \|	The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot be used across multiple VirtualAlloc regions. To properly support decommit, only allow merge / split within the same region -- this is done by tracking the "is_head" state of extents and not merging cross-region. Add a new state is_head (only relevant for retain && !maps_coalesce), which is true for the first extent in each VirtualAlloc region. Determine if two extents can be merged based on the head state, and use serial numbers for sanity checks.
*	Remove prof_accumbytes in arena	Yinan Zhang	2019-07-16	1	-1/+0
\| \| \| \| \|	`prof_accumbytes` was supposed to be replaced by `prof_accum` in https://github.com/jemalloc/jemalloc/pull/623.
*	Fix logic in printing	Yinan Zhang	2019-07-16	1	-1/+1
\| \| \| \| \| \|	`cbopaque` can now be overriden without overriding `write_cb` in the first place. (Otherwise there would be no need to have the `cbopaque` parameter in `malloc_message`.)
*	Fix redzone setting and checking	Yinan Zhang	2019-07-12	1	-2/+2
\|
*	Add confirm_conf option	Yinan Zhang	2019-05-22	1	-0/+1
\| \| \| \| \| \|	If the confirm_conf option is set, when the program starts, each of the four malloc_conf strings will be printed, and each option will be printed when being set.
*	Track nfills and nflushes for arenas.i.small / large.	Qi Wang	2019-05-15	2	-3/+15
\| \| \| \| \| \|	Small is added purely for convenience. Large flushes wasn't tracked before and can be useful in analysis. Large fill simply reports nmalloc, since there is no batch fill for large currently.
*	Add nonfull_slabs to bin_stats_t.	Doron Roberts-Kedes	2019-04-29	2	-0/+4
\| \| \| \| \| \| \|	When config_stats is enabled track the size of bin->slabs_nonfull in the new nonfull_slabs counter in bin_stats_t. This metric should be useful for establishing an upper ceiling on the savings possible by meshing.
*	Improve size class header	Yinan Zhang	2019-04-24	1	-8/+21
\| \| \| \| \| \|	Mainly fixing typos. The only non-trivial change is in the computation for SC_NPSIZES, though the result wouldn't be any different when SC_NGROUP = 4 as is always the case at the moment.
*	Enforce TLS_MODEL attribute.	Qi Wang	2019-04-16	2	-3/+7
\| \| \| \| \|	Caught by @zoulasc in #1460. The attribute needs to be added in the headers as well.
*	Safety checks: Add a redzoning feature.	David Goldblatt	2019-04-15	5	-4/+25
\|
*	Safety checks: Indirect through a function.	David Goldblatt	2019-04-15	1	-0/+6
\| \| \| \|	This will let us share code on failure pathways.pathways
*	Move extra size checks behind a config flag.	David Goldblatt	2019-04-15	2	-2/+21
\| \| \| \| \|	This will let us turn that flag into a generic "turn on runtime checks" flag that guards other functionality we have planned.
*	Add an autoconf feature test for format_arg and a jemalloc-specific	zoulasc	2019-04-15	3	-1/+10
\| \| \| \|	macro for it.
*	Fix incorrect macro use.	zoulasc	2019-04-15	1	-1/+1
\| \| \| \|	Compiling with warnings produces missing prototype warnings.
*	Convert the format generator function to an annotated format function,	zoulasc	2019-04-15	1	-15/+18
\| \| \| \|	so that the generated formats can be checked by the compiler.
*	remove compare and branch in fast path for c++ operator delete[]	mgrice	2019-04-08	1	-0/+1
\| \| \| \|	Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation). This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).
*	Add memory utilization analytics to mallctl	Yinan Zhang	2019-04-04	3	-0/+30
\| \| \| \| \| \|	The analytics tool is put under experimental.utilization namespace in mallctl. Input is one pointer or an array of pointers and the output is a list of memory utilization statistics.
*	Eagerly purge oversized merged extents.	Qi Wang	2019-03-15	1	-0/+20
\| \| \| \|	This change improves memory usage slightly, at virtually no CPU cost.
*	Remove some unused comments.	Qi Wang	2019-03-15	1	-3/+0
\|
*	Fallback to 32-bit when 8-bit atomics are missing for TSD.	Qi Wang	2019-03-09	1	-2/+17
\| \| \| \| \|	When it happens, this might cause a slowdown on the fast path operations. However such case is very rare.
*	Detect if 8-bit atomics are available.	Qi Wang	2019-03-09	2	-0/+14
\| \| \| \| \|	In some rare cases (older compiler, e.g. gcc 4.2 w/ MIPS), 8-bit atomics might be unavailable. Detect such cases so that we can workaround.
*	Do not use #pragma GCC diagnostic with gcc < 4.6.	Jason Evans	2019-03-09	1	-10/+12
\| \| \| \| \|	This regression was introduced by 3d29d11ac2c1583b9959f73c0548545018d31c8a (Clean compilation -Wextra).
*	Remove JE_FORCE_SYNC_COMPARE_AND_SWAP_[48].	Jason Evans	2019-02-22	1	-16/+0
\| \| \| \| \| \|	These macros have been unused since d4ac7582f32f506d5203bea2f0115076202add38 (Introduce a backport of C11 atomics).
*	Avoid redefining tsd_t.	Jason Evans	2019-02-21	1	-1/+1
\| \| \| \| \| \|	This fixes a build failure when integrating with FreeBSD's libc. This regression was introduced by d1e11d48d4c706e17ef3508e2ddb910f109b779f (Move tsd link and in_hook after tcache.).
*	Disable muzzy decay by default.	Qi Wang	2019-02-04	1	-1/+1
\|
*	Sanity check szind on tcache flush.	Qi Wang	2019-02-01	1	-0/+3
\| \| \| \| \|	This adds some overhead to the tcache flush path (which is one of the popular paths). Guard it behind a config option.
*	Rename huge_threshold to oversize_threshold.	Qi Wang	2019-01-25	3	-6/+6
\| \| \| \| \|	The keyword huge tend to remind people of huge pages which is not relevent to the feature.
*	Set huge_threshold to 8M by default.	Qi Wang	2019-01-24	1	-1/+1
\| \| \| \| \| \|	This feature uses an dedicated arena to handle huge requests, which significantly improves VM fragmentation. In production workload we tested it often reduces VM size by >30%.
*	Avoid creating bg thds for huge arena lone.	Qi Wang	2019-01-16	1	-0/+1
\| \| \| \| \| \|	For low arena count settings, the huge threshold feature may trigger an unwanted bg thd creation. Given that the huge arena does eager purging by default, bypass bg thd creation when initializing the huge arena.
*	Avoid potential issues on extent zero-out.	Qi Wang	2019-01-12	1	-0/+5
\| \| \| \| \| \|	When custom extent_hooks or transparent huge pages are in use, the purging semantics may change, which means we may not get zeroed pages on repopulating. Fixing the issue by manually memset for such cases.
*	implement malloc_getcpu for windows	Leonardo Santagada	2019-01-08	2	-2/+4
\|
*	Only read arena index from extent on the tcache flush path.	Qi Wang	2018-12-18	1	-9/+10
\| \| \| \| \|	Add exten_arena_ind_get() to avoid loading the actual arena ptr in case we just need to check arena matching.
*	Add rate counters to stats	Alexander Zinoviev	2018-12-18	2	-8/+19
\|
*	Store the bin shard selection in TSD.	Qi Wang	2018-12-04	3	-5/+21
\| \| \| \| \|	This avoids having to choose bin shard on the fly, also will allow flexible bin binding for each thread.
*	Add opt.bin_shards to specify number of bin shards.	Qi Wang	2018-12-04	2	-9/+5
\| \| \| \| \|	The option uses the same format as "slab_sizes" to specify number of shards for each bin size.
*	Add support for sharded bins within an arena.	Qi Wang	2018-12-04	7	-7/+87
\| \| \| \| \| \| \| \| \|	This makes it possible to have multiple set of bins in an arena, which improves arena scalability because the bins (especially the small ones) are always the limiting factor in production workload. A bin shard is picked on allocation; each extent tracks the bin shard id for deallocation. The shard size will be determined using runtime options.
*	mutex: fix trylock spin wait contention	Dave Watson	2018-11-28	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there are 3 or more threads spin-waiting on the same mutex, there will be excessive exclusive cacheline contention because pthread_trylock() immediately tries to CAS in a new value, instead of first checking if the lock is locked. This diff adds a 'locked' hint flag, and we will only spin wait without trylock()ing while set. I don't know of any other portable way to get the same behavior as pthread_mutex_lock(). This is pretty easy to test via ttest, e.g. ./ttest1 500 3 10000 1 100 Throughput is nearly 3x as fast. This blames to the mutex profiling changes, however, we almost never have 3 or more threads contending in properly configured production workloads, but still worth fixing.
*	Set the default number of background threads to 4.	Qi Wang	2018-11-16	1	-0/+1
\| \| \| \| \|	The setting has been tested in production for a while. No negative effect while we were able to reduce number of threads per process.
*	Deprecate OSSpinLock.	Qi Wang	2018-11-14	3	-17/+1
\|
*	Add a fastpath for arena_slab_reg_alloc_batch	Dave Watson	2018-11-14	2	-0/+25
\| \| \| \| \|	Also adds a configure.ac check for __builtin_popcount, which is used in the new fastpath.
*	add extent_nfree_sub	Dave Watson	2018-11-14	1	-0/+6
\|
*	Fix tcache_flush (follow up cd2931a).	Qi Wang	2018-11-13	2	-0/+6
\| \| \| \|	Also catch invalid tcache id.