summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Sanity check on prof dump buffer size.Qi Wang2019-08-021-0/+1
|
* Quick fix for prof log printingYinan Zhang2019-07-311-2/+2
| | | | | The emitter APIs used were incorrect, a side effect of which was extra lines being printed.
* Limit to exact fit on Windows with retain off.Qi Wang2019-07-291-0/+10
| | | | | | W/o retain, split and merge are disallowed on Windows. Avoid doing first-fit which needs splitting almost always. Instead, try exact fit only and bail out early.
* Revert "Refactor prof log"Qi Wang2019-07-292-703/+677
| | | | This reverts commit 7618b0b8e458d9c0db6e4b05ccbe6c6308952890.
* Revert "Refactor profiling"Qi Wang2019-07-292-1478/+1450
| | | | This reverts commit 0b462407ae84a62b3c097f0e9f18df487a47d9a7.
* Refactor profilingYinan Zhang2019-07-292-1450/+1478
| | | | | | | | | | Refactored core profiling codebase into two logical parts: (a) `prof_data.c`: core internal data structure managing & dumping; (b) `prof.c`: mutexes & outward-facing APIs. Some internal functions had to be exposed out, but there are not that many of them if the modularization is (hopefully) clean enough.
* Refactor prof logYinan Zhang2019-07-292-677/+703
| | | | | | `prof.c` is growing too long, so trying to modularize it. There are a few internal functions that had to be exposed but I think it is a fair trade-off.
* Add indent to individual options for confirm_conf.Qi Wang2019-07-261-3/+4
|
* Invoke arena_dalloc_promoted() properly w/o tcache.Qi Wang2019-07-251-1/+1
| | | | When tcache was disabled, the dalloc promoted case was missing.
* Optimize max_active_fit in first_fit.Qi Wang2019-07-241-4/+2
| | | | Stop scanning once reached the first max_active_fit size.
* Track the leaked VM space via the abandoned_vm counter.Qi Wang2019-07-244-5/+23
| | | | | The counter is 0 unless metadata allocation failed (indicates OOM), and is mainly for sanity checking.
* extent_dalloc instead of leak when register fails.Qi Wang2019-07-241-6/+3
| | | | | | extent_register may only fail if the underlying extent and region got stolen / coalesced before we lock. Avoid doing extent_leak (which purges the region) since we don't really own the region.
* Avoid leaking extents / VM when split is not supported.Qi Wang2019-07-241-0/+11
| | | | | | This can only happen on Windows and with opt.retain disabled (which isn't the default). The solution is suboptimal, however not a common case as retain is the long term plan for all platforms anyway.
* Implement retain on Windows.Qi Wang2019-07-242-19/+62
| | | | | | | | | | | The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot be used across multiple VirtualAlloc regions. To properly support decommit, only allow merge / split within the same region -- this is done by tracking the "is_head" state of extents and not merging cross-region. Add a new state is_head (only relevant for retain && !maps_coalesce), which is true for the first extent in each VirtualAlloc region. Determine if two extents can be merged based on the head state, and use serial numbers for sanity checks.
* Fix posix_memalign with input size 0.Qi Wang2019-07-181-5/+17
| | | | Return a valid pointer instead of failed assertion.
* Fix a bug in prof_dump_writeYinan Zhang2019-07-161-1/+1
| | | | | | | | | | | | The original logic can be disastrous if `PROF_DUMP_BUFSIZE` is less than `slen` -- `prof_dump_buf_end + slen <= PROF_DUMP_BUFSIZE` would always be `false`, so `memcpy` would always try to copy `PROF_DUMP_BUFSIZE - prof_dump_buf_end` chars, which can be dangerous: in the last round of the `while` loop it would not only illegally read the memory beyond `s` (which might not always be disastrous), but it would also illegally overwrite the memory beyond `prof_dump_buf` (which can be pretty disastrous). `slen` probably has never gone beyond `PROF_DUMP_BUFSIZE` so we were just lucky.
* Fix logic in printingYinan Zhang2019-07-161-1/+0
| | | | | | `cbopaque` can now be overriden without overriding `write_cb` in the first place. (Otherwise there would be no need to have the `cbopaque` parameter in `malloc_message`.)
* Avoid blocking on background thread lock for stats.Qi Wang2019-05-221-1/+7
| | | | | | Background threads may run for a long time, especially when the # of dirty pages is high. Avoid blocking stats calls because of this (which may cause latency spikes).
* Add experimental.arenas.i.pactivep.Qi Wang2019-05-221-7/+82
| | | | | | | The new experimental mallctl exposes the arena pactive counter to applications, which allows fast read w/o going through the mallctl / epoch steps. This is particularly useful when frequent balancing is required, e.g. when having multiple manual arenas, and threads are multiplexed to them based on usage.
* Add confirm_conf optionYinan Zhang2019-05-223-107/+177
| | | | | | If the confirm_conf option is set, when the program starts, each of the four malloc_conf strings will be printed, and each option will be printed when being set.
* Improve memory utilization testsYinan Zhang2019-05-211-4/+4
| | | | | Added tests for large size classes and expanded the tests to cover wider range of allocation sizes.
* Fix GCC-9.1 warning with macro GET_ARG_NUMERICVaibhav Jain2019-05-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | GCC-9.1 reports following error when trying to compile file src/malloc_io.c and with CFLAGS='-Werror' : src/malloc_io.c: In function ‘malloc_vsnprintf’: src/malloc_io.c:369:2: error: case label value exceeds maximum value for type [-Werror] 369 | case '?' | 0x80: \ | ^~~~ src/malloc_io.c:581:5: note: in expansion of macro ‘GET_ARG_NUMERIC’ 581 | GET_ARG_NUMERIC(val, 'p'); | ^~~~~~~~~~~~~~~ ... <snip> cc1: all warnings being treated as errors make: *** [Makefile:388: src/malloc_io.sym.o] Error 1 The warning is reported as by default the type 'char' is 'signed char' and or-ing 0x80 will turn the case label char negative which will be beyond the printable ascii range (0 - 127). The patch fixes this by explicitly casting the 'len' variable as unsigned char' inside the 'switch' statement so that value of expression " '?' | 0x80 " falls within the legal values of the variable 'len'.
* Track nfills and nflushes for arenas.i.small / large.Qi Wang2019-05-154-11/+79
| | | | | | Small is added purely for convenience. Large flushes wasn't tracked before and can be useful in analysis. Large fill simply reports nmalloc, since there is no batch fill for large currently.
* Fix assert in free fastpathYinan Zhang2019-05-151-1/+1
| | | | | rtree_szind_slab_read_fast() may have not initialized alloc_ctx.szind, unless after confirming the return is true.
* Improve macro readability in malloc_conf_initYinan Zhang2019-05-081-22/+22
| | | | Define more readable macros than yes and no.
* Remove best fitDave Watson2019-05-081-32/+8
| | | | | | | | This option saves a few CPU cycles, but potentially adds a lot of fragmentation - so much so that there are workarounds like max_active. Instead, let's just drop it entirely. It only made a difference in one service I tested (.3% cpu regression), while many services saw a memory win (also small, less than 1% mem P99)
* Add max_active_fit to first_fitDave Watson2019-05-081-1/+10
| | | | | The max_active_fit check is currently only on the best_fit path, add it to the first_fit path also.
* Add nonfull_slabs to bin_stats_t.Doron Roberts-Kedes2019-04-293-0/+21
| | | | | | | When config_stats is enabled track the size of bin->slabs_nonfull in the new nonfull_slabs counter in bin_stats_t. This metric should be useful for establishing an upper ceiling on the savings possible by meshing.
* Enforce TLS_MODEL attribute.Qi Wang2019-04-161-3/+3
| | | | | Caught by @zoulasc in #1460. The attribute needs to be added in the headers as well.
* Safety checks: Add a redzoning feature.David Goldblatt2019-04-154-15/+43
|
* Safety checks: Indirect through a function.David Goldblatt2019-04-152-1/+13
| | | | This will let us share code on failure pathways.pathways
* Safety checks: Expose config value via mallctl and stats.David Goldblatt2019-04-152-0/+4
|
* Move extra size checks behind a config flag.David Goldblatt2019-04-151-9/+8
| | | | | This will let us turn that flag into a generic "turn on runtime checks" flag that guards other functionality we have planned.
* Separate tests for extent utilization APIYinan Zhang2019-04-101-2/+2
| | | | As title.
* remove compare and branch in fast path for c++ operator delete[]mgrice2019-04-082-3/+15
| | | | Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation). This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).
* Ensure page alignment on extent_alloc.Qi Wang2019-04-043-6/+7
| | | | | | This is discovered and suggested by @jasone in #1468. When custom extent hooks are in use, we should ensure page alignment on the extent alloc path, instead of relying on the user hooks to do so.
* Add memory utilization analytics to mallctlYinan Zhang2019-04-042-4/+301
| | | | | | The analytics tool is put under experimental.utilization namespace in mallctl. Input is one pointer or an array of pointers and the output is a list of memory utilization statistics.
* Use iallocztm instead of ialloc in prof_log functions.Qi Wang2019-04-021-5/+8
| | | | | Explicitly use iallocztm for internal allocations. ialloc could trigger arena creation, which may cause lock order reversal (narenas_mtx and log_mtx).
* Avoid check_min for opt_lg_extent_max_active_fit.Qi Wang2019-03-291-1/+1
| | | | This fixes a compiler warning.
* Add the missing unlock in the error path of extent_register.Qi Wang2019-03-291-0/+1
|
* Allow low values of oversize_threshold to disable the feature.Qi Wang2019-03-291-2/+2
| | | | | We should allow a way to easily disable the feature (e.g. not reserving the arena id at all).
* Output message before aborting on tcache size-matching check.Qi Wang2019-03-291-0/+3
|
* Eagerly purge oversized merged extents.Qi Wang2019-03-151-0/+7
| | | | This change improves memory usage slightly, at virtually no CPU cost.
* Fallback to 32-bit when 8-bit atomics are missing for TSD.Qi Wang2019-03-091-6/+7
| | | | | When it happens, this might cause a slowdown on the fast path operations. However such case is very rare.
* Stringify tls_callback linker directiveDave Rigby2019-02-221-1/+1
| | | Proposed fix for #1444 - ensure that `tls_callback` in the `#pragma comment(linker)`directive gets the same prefix added as it does i the C declaration.
* Guard libgcc unwind init with opt_prof.Qi Wang2019-02-221-8/+6
| | | | | Only triggers libgcc unwind init when prof is enabled. This helps workaround some bootstrapping issues.
* Make background_thread not dependent on libdl.Qi Wang2019-02-071-1/+8
| | | | When not using libdl, still allows background_thread to be enabled.
* Sanity check szind on tcache flush.Qi Wang2019-02-011-2/+40
| | | | | This adds some overhead to the tcache flush path (which is one of the popular paths). Guard it behind a config option.
* Tweak the spacing for the total_wait_time per second.Qi Wang2019-01-281-0/+1
|
* Rename huge_threshold to oversize_threshold.Qi Wang2019-01-254-14/+14
| | | | | The keyword huge tend to remind people of huge pages which is not relevent to the feature.