| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Only use __declspec(nothrow) in C++ mode.
This resolves #244.
|
|
|
|
|
|
|
|
| |
As per gcc documentation:
The alloc_size attribute is used to tell the compiler that the function
return value points to memory (...)
This resolves #245.
|
|
|
|
|
|
|
|
| |
Fixes warning with newer GCCs:
include/jemalloc/jemalloc.h:229:2: warning: extra ';' [-Wpedantic]
};
^
|
|
|
|
|
|
| |
This change improves interaction with transparent huge pages, e.g.
reduced page faults (at least in the absence of unused dirty page
purging).
|
|
|
|
|
|
|
|
|
| |
This effectively reverts 97c04a93838c4001688fe31bf018972b4696efe2 (Use
first-fit rather than first-best-fit run/chunk allocation.). In some
pathological cases, first-fit search dominates allocation time, and it
also tends not to converge as readily on a steady state of memory
layout, since precise allocation order has a bigger effect than for
first-best-fit.
|
| |
|
|
|
|
|
|
|
|
|
| |
Add various function attributes to the exported functions to give the
compiler more information to work with during optimization, and also
specify throw() when compiling with C++ on Linux, in order to adequately
match what __THROW does in glibc.
This resolves #237.
|
|
|
|
|
|
|
| |
This {bug,regression} was introduced by
155bfa7da18cab0d21d87aa2dce4554166836f5d (Normalize size classes.).
This resolves #241.
|
|
|
|
|
|
|
|
|
|
| |
Conditionally define ENOENT, EINVAL, etc. (was unconditional).
Add/use PRIzu, PRIzd, and PRIzx for use in malloc_printf() calls. gcc issued
(harmless) warnings since e.g. "%zu" should be "%Iu" on Windows, and the
alternative to this workaround would have been to disable the function
attributes which cause gcc to look for type mismatches in formatted printing
function calls.
|
|
|
|
|
|
|
| |
- Set opt_lg_chunk based on run-time OS setting
- Verify LG_PAGE is compatible with run-time OS setting
- When targeting Windows Vista or newer, use SRWLOCK instead of CRITICAL_SECTION
- When targeting Windows Vista or newer, statically initialize init_lock
|
|
|
|
|
|
|
|
|
|
| |
Fix size class overflow handling for malloc(), posix_memalign(),
memalign(), calloc(), and realloc() when profiling is enabled.
Remove an assertion that erroneously caused arena_sdalloc() to fail when
profiling was enabled.
This resolves #232.
|
|
|
|
| |
This resolves #235.
|
| |
|
| |
|
|
|
|
|
|
| |
Now that small allocation runs have fewer regions due to run metadata
residing in chunk headers, an explicit minimum tcache count is needed to
make sure that tcache adequately amortizes synchronization overhead.
|
|
|
|
|
|
|
|
| |
Take into account large_pad when computing whether to pass the
deallocation request to tcache_dalloc_large(), so that the largest
cacheable size makes it back to tcache. This regression was introduced
by 8a03cf039cd06f9fa6972711195055d865673966 (Implement cache index
randomization for large allocations.).
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extract szad size quantization into {extent,run}_quantize(), and .
quantize szad run sizes to the union of valid small region run sizes and
large run sizes.
Refactor iteration in arena_run_first_fit() to use
run_quantize{,_first,_next(), and add support for padded large runs.
For large allocations that have no specified alignment constraints,
compute a pseudo-random offset from the beginning of the first backing
page that is a multiple of the cache line size. Under typical
configurations with 4-KiB pages and 64-byte cache lines this results in
a uniform distribution among 64 page boundary offsets.
Add the --disable-cache-oblivious option, primarily intended for
performance testing.
This resolves #13.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
However, unlike before it was removed do not force --enable-ivsalloc
when Darwin zone allocator integration is enabled, since the zone
allocator code uses ivsalloc() regardless of whether
malloc_usable_size() and sallocx() do.
This resolves #211.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add mallctls:
- arenas.lg_dirty_mult is initialized via opt.lg_dirty_mult, and can be
modified to change the initial lg_dirty_mult setting for newly created
arenas.
- arena.<i>.lg_dirty_mult controls an individual arena's dirty page
purging threshold, and synchronously triggers any purging that may be
necessary to maintain the constraint.
- arena.<i>.chunk.purge allows the per arena dirty page purging function
to be replaced.
This resolves #93.
|
|
|
|
| |
InterlockedCompareExchange32
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the prof_tctx_state_destroying transitory state and instead add
the tctx_uid field, so that the tuple <thr_uid, tctx_uid> uniquely
identifies a tctx. This assures that tctx's are well ordered even when
more than two with the same thr_uid coexist. A previous attempted fix
based on prof_tctx_state_destroying was only sufficient for protecting
against two coexisting tctx's, but it also introduced a new dumping
race.
These regressions were introduced by
602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap
profiling.) and 764b00023f2bc97f240c3a758ed23ce9c0ad8526 (Fix a heap
profiling regression.).
|
|
|
|
|
|
|
|
|
|
| |
Add the prof_tctx_state_destroying transitionary state to fix a race
between a thread destroying a tctx and another thread creating a new
equivalent tctx.
This regression was introduced by
602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap
profiling.).
|
|
|
|
| |
These bugs only affected tests and debug builds.
|
| |
|
| |
|
|
|
|
|
|
|
| |
This tends to more effectively pack active memory toward low addresses.
However, additional tree searches are required in many cases, so whether
this change stands the test of time will depend on real-world
benchmarks.
|
|
|
|
|
|
|
|
| |
Recent changes have improved huge allocation scalability, which removes
upward pressure to set the chunk size so large that huge allocations are
rare. Smaller chunks are more likely to completely drain, so set the
default to the smallest size that doesn't leave excessive unusable
trailing space in chunk headers.
|
|
|
|
|
|
|
| |
TlsGetValue has a semantic difference with pthread_getspecific, in that it
can return a non-error NULL value, so it always sets the LastError.
But allocator callers may not be expecting calling e.g. free() to change
the value of the last error, so preserve it.
|
|
|
|
|
| |
9906660 added a --without-export configure option to avoid exporting
jemalloc symbols, but the option didn't actually work.
|
|
|
|
|
|
| |
These regressions were introduced by
ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into
unused dirty page purging machinery.).
|
|
|
|
|
|
|
|
|
|
| |
Rename "dirty chunks" to "cached chunks", in order to avoid overloading
the term "dirty".
Fix the regression caused by 339c2b23b2d61993ac768afcc72af135662c6771
(Fix chunk_unmap() to propagate dirty state.), and actually address what
that change attempted, which is to only purge chunks once, and propagate
whether zeroed pages resulted into chunk_record().
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix chunk_unmap() to propagate whether a chunk is dirty, and modify
dirty chunk purging to record this information so it can be passed to
chunk_unmap(). Since the broken version of chunk_unmap() claimed that
all chunks were clean, this resulted in potential memory corruption for
purging implementations that do not zero (e.g. MADV_FREE).
This regression was introduced by
ee41ad409a43d12900a5a3108f6c14f84e4eb0eb (Integrate whole chunks into
unused dirty page purging machinery.).
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend per arena unused dirty page purging to manage unused dirty chunks
in aaddtion to unused dirty runs. Rather than immediately unmapping
deallocated chunks (or purging them in the --disable-munmap case), store
them in a separate set of trees, chunks_[sz]ad_dirty. Preferrentially
allocate dirty chunks. When excessive unused dirty pages accumulate,
purge runs and chunks in ingegrated LRU order (and unmap chunks in the
--enable-munmap case).
Refactor extent_node_t to provide accessor functions.
|
|
|
|
|
|
|
| |
This regression was introduced by
88fef7ceda6269598cef0cee8b984c8765673c27 (Refactor huge_*() calls into
arena internals.), and went undetected because of the --enable-debug
regression.
|
|
|
|
|
|
|
| |
This regression was introduced by
88fef7ceda6269598cef0cee8b984c8765673c27 (Refactor huge_*() calls into
arena internals.), and went undetected because of the --enable-debug
regression.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Although exceedingly unlikely, it appears that writes to the prof_tctx
field of arena_chunk_map_misc_t could be reordered such that a stale
value could be read during deallocation, with profiler metadata
corruption and invalid pointer dereferences being the most likely
effects.
|
|
|
|
|
| |
Make redirects to the huge_*() API the arena code's responsibility,
since arenas now take responsibility for all allocation sizes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Migrate all centralized data structures related to huge allocations and
recyclable chunks into arena_t, so that each arena can manage huge
allocations and recyclable virtual memory completely independently of
other arenas.
Add chunk node caching to arenas, in order to avoid contention on the
base allocator.
Use chunks_rtree to look up huge allocations rather than a red-black
tree. Maintain a per arena unsorted list of huge allocations (which
will be needed to enumerate huge allocations during arena reset).
Remove the --enable-ivsalloc option, make ivsalloc() always available,
and use it for size queries if --enable-debug is enabled. The only
practical implications to this removal are that 1) ivsalloc() is now
always available during live debugging (and the underlying radix tree is
available during core-based debugging), and 2) size query validation can
no longer be enabled independent of --enable-debug.
Remove the stats.chunks.{current,total,high} mallctls, and replace their
underlying statistics with simpler atomically updated counters used
exclusively for gdump triggering. These statistics are no longer very
useful because each arena manages chunks independently, and per arena
statistics provide similar information.
Simplify chunk synchronization code, now that base chunk allocation
cannot cause recursive lock acquisition.
|
| |
|
|
|
|
|
|
|
|
|
| |
Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be
used in conjunction with the *allocx() API.
Add the tcache.create, tcache.flush, and tcache.destroy mallctls.
This resolves #145.
|