| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
g++ 5.5.0+ complained `parameter ‘expected’ set but not used
[-Werror=unused-but-set-parameter]` (despite that `expected` is in
fact used).
|
| |
|
|
| |
This reverts commit 7618b0b8e458d9c0db6e4b05ccbe6c6308952890.
|
| |
|
|
| |
This reverts commit 0b462407ae84a62b3c097f0e9f18df487a47d9a7.
|
| |
|
|
|
|
|
|
|
|
| |
Refactored core profiling codebase into two logical parts:
(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.
Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
|
| |
|
|
|
|
| |
`prof.c` is growing too long, so trying to modularize it. There are
a few internal functions that had to be exposed but I think it is a
fair trade-off.
|
| | |
|
| |
|
|
| |
When tcache was disabled, the dalloc promoted case was missing.
|
| |
|
|
|
| |
The counter is 0 unless metadata allocation failed (indicates OOM), and is
mainly for sanity checking.
|
| |
|
|
|
|
|
|
|
|
|
| |
The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot
be used across multiple VirtualAlloc regions. To properly support decommit,
only allow merge / split within the same region -- this is done by tracking the
"is_head" state of extents and not merging cross-region.
Add a new state is_head (only relevant for retain && !maps_coalesce), which is
true for the first extent in each VirtualAlloc region. Determine if two extents
can be merged based on the head state, and use serial numbers for sanity checks.
|
| |
|
|
|
| |
`prof_accumbytes` was supposed to be replaced by `prof_accum` in
https://github.com/jemalloc/jemalloc/pull/623.
|
| |
|
|
|
|
| |
`cbopaque` can now be overriden without overriding `write_cb` in
the first place. (Otherwise there would be no need to have the
`cbopaque` parameter in `malloc_message`.)
|
| | |
|
| |
|
|
|
|
| |
If the confirm_conf option is set, when the program starts, each of
the four malloc_conf strings will be printed, and each option will
be printed when being set.
|
| |
|
|
|
|
| |
Small is added purely for convenience. Large flushes wasn't tracked before and
can be useful in analysis. Large fill simply reports nmalloc, since there is no
batch fill for large currently.
|
| |
|
|
|
|
|
| |
When config_stats is enabled track the size of bin->slabs_nonfull in
the new nonfull_slabs counter in bin_stats_t. This metric should be
useful for establishing an upper ceiling on the savings possible by
meshing.
|
| |
|
|
|
|
| |
Mainly fixing typos. The only non-trivial change is in the
computation for SC_NPSIZES, though the result wouldn't be any
different when SC_NGROUP = 4 as is always the case at the moment.
|
| |
|
|
|
| |
Caught by @zoulasc in #1460. The attribute needs to be added in the headers as
well.
|
| | |
|
| |
|
|
| |
This will let us share code on failure pathways.pathways
|
| |
|
|
|
| |
This will let us turn that flag into a generic "turn on runtime checks" flag
that guards other functionality we have planned.
|
| |
|
|
| |
macro for it.
|
| |
|
|
| |
Compiling with warnings produces missing prototype warnings.
|
| |
|
|
| |
so that the generated formats can be checked by the compiler.
|
| |
|
|
| |
Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation). This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).
|
| |
|
|
|
|
| |
The analytics tool is put under experimental.utilization namespace in
mallctl. Input is one pointer or an array of pointers and the output
is a list of memory utilization statistics.
|
| |
|
|
| |
This change improves memory usage slightly, at virtually no CPU cost.
|
| | |
|
| |
|
|
|
| |
When it happens, this might cause a slowdown on the fast path operations.
However such case is very rare.
|
| |
|
|
|
| |
In some rare cases (older compiler, e.g. gcc 4.2 w/ MIPS), 8-bit atomics might
be unavailable. Detect such cases so that we can workaround.
|
| |
|
|
|
| |
This regression was introduced by
3d29d11ac2c1583b9959f73c0548545018d31c8a (Clean compilation -Wextra).
|
| |
|
|
|
|
| |
These macros have been unused since
d4ac7582f32f506d5203bea2f0115076202add38 (Introduce a backport of C11
atomics).
|
| |
|
|
|
|
| |
This fixes a build failure when integrating with FreeBSD's libc. This
regression was introduced by d1e11d48d4c706e17ef3508e2ddb910f109b779f
(Move tsd link and in_hook after tcache.).
|
| | |
|
| |
|
|
|
| |
This adds some overhead to the tcache flush path (which is one of the
popular paths). Guard it behind a config option.
|
| |
|
|
|
| |
The keyword huge tend to remind people of huge pages which is not relevent to
the feature.
|
| |
|
|
|
|
| |
This feature uses an dedicated arena to handle huge requests, which
significantly improves VM fragmentation. In production workload we tested it
often reduces VM size by >30%.
|
| |
|
|
|
|
| |
For low arena count settings, the huge threshold feature may trigger an unwanted
bg thd creation. Given that the huge arena does eager purging by default,
bypass bg thd creation when initializing the huge arena.
|
| |
|
|
|
|
| |
When custom extent_hooks or transparent huge pages are in use, the purging
semantics may change, which means we may not get zeroed pages on repopulating.
Fixing the issue by manually memset for such cases.
|
| | |
|
| |
|
|
|
| |
Add exten_arena_ind_get() to avoid loading the actual arena ptr in case we just
need to check arena matching.
|
| | |
|
| |
|
|
|
| |
This avoids having to choose bin shard on the fly, also will allow flexible bin
binding for each thread.
|
| |
|
|
|
| |
The option uses the same format as "slab_sizes" to specify number of shards for
each bin size.
|
| |
|
|
|
|
|
|
|
| |
This makes it possible to have multiple set of bins in an arena, which improves
arena scalability because the bins (especially the small ones) are always the
limiting factor in production workload.
A bin shard is picked on allocation; each extent tracks the bin shard id for
deallocation. The shard size will be determined using runtime options.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If there are 3 or more threads spin-waiting on the same mutex,
there will be excessive exclusive cacheline contention because
pthread_trylock() immediately tries to CAS in a new value, instead
of first checking if the lock is locked.
This diff adds a 'locked' hint flag, and we will only spin wait
without trylock()ing while set. I don't know of any other portable
way to get the same behavior as pthread_mutex_lock().
This is pretty easy to test via ttest, e.g.
./ttest1 500 3 10000 1 100
Throughput is nearly 3x as fast.
This blames to the mutex profiling changes, however, we almost never
have 3 or more threads contending in properly configured production
workloads, but still worth fixing.
|
| |
|
|
|
| |
The setting has been tested in production for a while. No negative effect while
we were able to reduce number of threads per process.
|
| | |
|
| |
|
|
|
| |
Also adds a configure.ac check for __builtin_popcount, which is used
in the new fastpath.
|
| | |
|
| |
|
|
| |
Also catch invalid tcache id.
|