summaryrefslogtreecommitdiffstats
path: root/src/prof.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Refactor prng* from cpp macros into inline functions.Jason Evans2016-02-201-2/+1
| | | | | Remove 32-bit variant, convert prng64() to prng_lg_range(), and add prng_range().
* Fast-path improvement: reduce # of branches and unnecessary operations.Qi Wang2015-11-101-17/+20
| | | | | | - Combine multiple runtime branches into a single malloc_slow check. - Avoid calling arena_choose / size2index / index2size on fast path. - A few micro optimizations.
* Fix prof_tctx_dump_iter() to filter.Jason Evans2015-09-221-5/+17
| | | | | | | Fix prof_tctx_dump_iter() to filter out nodes that were created after heap profile dumping started. Prior to this fix, spurious entries with arbitrary object/byte counts could appear in heap profiles, which resulted in jeprof inaccuracies or failures.
* Fix prof_alloc_rollback().Jason Evans2015-09-171-1/+1
| | | | | Fix prof_alloc_rollback() to read tdata from thread-specific data rather than dereferencing a potentially invalid tctx.
* Reduce variable scope.Dmitry-Me2015-09-151-2/+2
| | | | This resolves #274.
* Fix "prof.reset" mallctl-related corruption.Jason Evans2015-09-101-3/+11
| | | | | | | Fix heap profiling to distinguish among otherwise identical sample sites with interposed resets (triggered via the "prof.reset" mallctl). This bug could cause data structure corruption that would most likely result in a segfault.
* Optimize arena_prof_tctx_set().Jason Evans2015-09-021-1/+1
| | | | | Optimize arena_prof_tctx_set() to avoid reading run metadata when deciding whether it's actually necessary to write.
* Fix MinGW-related portability issues.Jason Evans2015-07-231-10/+10
| | | | | | | | | | | | | Create and use FMT* macros that are equivalent to the PRI* macros that inttypes.h defines. This allows uniform use of the Unix-specific format specifiers, e.g. "%zu", as well as avoiding Windows-specific definitions of e.g. PRIu64. Add ffs()/ffsl() support for compiling with gcc. Extract compatibility definitions of ENOENT, EINVAL, EAGAIN, EPERM, ENOMEM, and ENORANGE into include/msvc_compat/windows_extra.h and use the file for tests as well as for core jemalloc code.
* Add JEMALLOC_FORMAT_PRINTF().Jason Evans2015-07-221-2/+2
| | | | | | Replace JEMALLOC_ATTR(format(printf, ...). with JEMALLOC_FORMAT_PRINTF(), so that configuration feature tests can omit the attribute if it would cause extraneous compilation warnings.
* Fix MinGW build warnings.Jason Evans2015-07-081-1/+1
| | | | | | | | | | Conditionally define ENOENT, EINVAL, etc. (was unconditional). Add/use PRIzu, PRIzd, and PRIzx for use in malloc_printf() calls. gcc issued (harmless) warnings since e.g. "%zu" should be "%Iu" on Windows, and the alternative to this workaround would have been to disable the function attributes which cause gcc to look for type mismatches in formatted printing function calls.
* Rename pprof to jeprof.Jason Evans2015-05-011-1/+1
| | | | | | | | | | This rename avoids installation collisions with the upstream gperftools. Additionally, jemalloc's per thread heap profile functionality introduced an incompatible file format, so it's now worthwhile to clearly distinguish jemalloc's version of this script from the upstream version. This resolves #229.
* Prefer /proc/<pid>/task/<pid>/maps over /proc/<pid>/maps on Linux.Jason Evans2015-05-011-5/+24
| | | | This resolves #227.
* Fix heap profiling regressions.Jason Evans2015-03-161-12/+9
| | | | | | | | | | | | | | | Remove the prof_tctx_state_destroying transitory state and instead add the tctx_uid field, so that the tuple <thr_uid, tctx_uid> uniquely identifies a tctx. This assures that tctx's are well ordered even when more than two with the same thr_uid coexist. A previous attempted fix based on prof_tctx_state_destroying was only sufficient for protecting against two coexisting tctx's, but it also introduced a new dumping race. These regressions were introduced by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.) and 764b00023f2bc97f240c3a758ed23ce9c0ad8526 (Fix a heap profiling regression.).
* Eliminate innocuous compiler warnings.Jason Evans2015-03-141-0/+2
|
* Fix a heap profiling regression.Jason Evans2015-03-141-13/+31
| | | | | | | | | | Add the prof_tctx_state_destroying transitionary state to fix a race between a thread destroying a tctx and another thread creating a new equivalent tctx. This regression was introduced by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.).
* Fix a heap profiling regression.Jason Evans2015-03-121-2/+7
| | | | | | | | | | Fix prof_tctx_comp() to incorporate tctx state into the comparison. During a dump it is possible for both a purgatory tctx and an otherwise equivalent nominal tctx to reside in the tree at the same time. This regression was introduced by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.).
* Implement explicit tcache support.Jason Evans2015-02-101-13/+22
| | | | | | | | | Add the MALLOCX_TCACHE() and MALLOCX_TCACHE_NONE macros, which can be used in conjunction with the *allocx() API. Add the tcache.create, tcache.flush, and tcache.destroy mallctls. This resolves #145.
* Implement the prof.gdump mallctl.Jason Evans2015-01-261-0/+34
| | | | | | | | This feature makes it possible to toggle the gdump feature on/off during program execution, whereas the the opt.prof_dump mallctl value can only be set during program startup. This resolves #72.
* Implement metadata statistics.Jason Evans2015-01-241-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are three categories of metadata: - Base allocations are used for bootstrap-sensitive internal allocator data structures. - Arena chunk headers comprise pages which track the states of the non-metadata pages. - Internal allocations differ from application-originated allocations in that they are for internal use, and that they are omitted from heap profiles. The metadata statistics comprise the metadata categories as follows: - stats.metadata: All metadata -- base + arena chunk headers + internal allocations. - stats.arenas.<i>.metadata.mapped: Arena chunk headers. - stats.arenas.<i>.metadata.allocated: Internal allocations. This is reported separately from the other metadata statistics because it overlaps with the allocated and active statistics, whereas the other metadata statistics do not. Base allocations are not reported separately, though their magnitude can be computed by subtracting the arena-specific metadata. This resolves #163.
* Don't dereference NULL tdata in prof_{enter,leave}().Jason Evans2014-11-011-13/+18
| | | | | | It is possible for the thread's tdata to be NULL late during thread destruction, so take care not to dereference a NULL pointer in such cases.
* Miscellaneous cleanups.Jason Evans2014-10-311-1/+3
|
* Fix prof_{enter,leave}() calls to pass tdata_self.Jason Evans2014-10-301-19/+24
|
* Use JEMALLOC_INLINE_C everywhere it's appropriate.Jason Evans2014-10-301-2/+2
|
* Fix a prof_tctx_t/prof_tdata_t cleanup race.Jason Evans2014-10-121-5/+5
| | | | | | Fix a prof_tctx_t/prof_tdata_t cleanup race by storing a copy of thr_uid in prof_tctx_t, so that the associated tdata need not be present during tctx teardown.
* Avoid atexit(3) when possible, disable prof_final by default.Jason Evans2014-10-091-8/+9
| | | | | | | | | | | | atexit(3) can deadlock internally during its own initialization if jemalloc calls atexit() during jemalloc initialization. Mitigate the impact by restructuring prof initialization to avoid calling atexit() unless the registered function will actually dump a final heap profile. Additionally, disable prof_final by default so that this land mine is opt-in rather than opt-out. This resolves #144.
* Fix a prof_tctx_t destruction race.Jason Evans2014-10-061-18/+32
|
* Fix prof regressions.Jason Evans2014-10-041-16/+23
| | | | | | | | | | | Fix prof regressions related to tdata (main per thread profiling data structure) destruction: - Deadlock. The fix for this was intended to be part of 20c31deaae38ed9aa4fe169ed65e0c45cd542955 (Test prof.reset mallctl and fix numerous discovered bugs.) but the fix was left incomplete. - Destruction race. Detaching tdata just prior to destruction without holding the tdatas lock made it possible for another thread to destroy the tdata out from under the thread that was on its way to doing so.
* Fix tsd cleanup regressions.Jason Evans2014-10-041-18/+11
| | | | | | | | | | | | | | | | Fix tsd cleanup regressions that were introduced in 5460aa6f6676c7f253bfcb75c028dfd38cae8aaf (Convert all tsd variables to reside in a single tsd structure.). These regressions were twofold: 1) tsd_tryget() should never (and need never) return NULL. Rename it to tsd_fetch() and simplify all callers. 2) tsd_*_set() must only be called when tsd is in the nominal state, because cleanup happens during the nominal-->purgatory transition, and re-initialization must not happen while in the purgatory state. Add tsd_nominal() and use it as needed. Note that tsd_*{p,}_get() can still be used as long as no re-initialization that would require cleanup occurs. This means that e.g. the thread_allocated counter can be updated unconditionally.
* Implement/test/fix prof-related mallctl's.Jason Evans2014-10-041-17/+123
| | | | | | | | | | | Implement/test/fix the opt.prof_thread_active_init, prof.thread_active_init, and thread.prof.active mallctl's. Test/fix the thread.prof.name mallctl. Refactor opt_prof_active to be read-only and move mutable state into the prof_active variable. Stop leaning on ctl-related locking for protection.
* Convert to uniform style: cond == false --> !condJason Evans2014-10-031-15/+15
|
* Test prof.reset mallctl and fix numerous discovered bugs.Jason Evans2014-10-031-64/+149
|
* Fix profile dumping race.Jason Evans2014-09-251-1/+9
| | | | | | | | | | | | | | Fix a race that caused a non-critical assertion failure. To trigger the race, a thread had to be part way through initializing a new sample, such that it was discoverable by the dumping thread, but not yet linked into its gctx by the time a later dump phase would normally have reset its state to 'nominal'. Additionally, lock access to the state field during modification to transition to the dumping state. It's not apparent that this oversight could have caused an actual problem due to outer locking that protects the dumping machinery, but the added locking pedantically follows the stated locking protocol for the state field.
* Convert all tsd variables to reside in a single tsd structure.Jason Evans2014-09-231-117/+127
|
* Fix prof regressions.Jason Evans2014-09-121-1/+22
| | | | | | | | | | | | Don't use atomic_add_uint64(), because it isn't available on 32-bit platforms. Fix forking support functions to manage all prof-related mutexes. These regressions were introduced by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.
* Fix a profile sampling race.Jason Evans2014-09-101-0/+35
| | | | | | | | | | | Fix a profile sampling race that was due to preparing to sample, yet doing nothing to assure that the context remains valid until the stats are updated. These regressions were caused by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.
* Fix prof_tdata_get()-related regressions.Jason Evans2014-09-091-25/+20
| | | | | | | | | | | | | Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer (when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}). Fix prof_tdata_get() callers to check for invalid results besides NULL (PROF_TDATA_STATE_{REINCARNATED,PURGATORY}). These regressions were caused by 602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap profiling.), which did not make it into any releases prior to these fixes.
* Implement per thread heap profiling.Jason Evans2014-08-201-359/+768
| | | | | | | | | | | | | | | | | | | | | | | | Rename data structures (prof_thr_cnt_t-->prof_tctx_t, prof_ctx_t-->prof_gctx_t), and convert to storing a prof_tctx_t for sampled objects. Convert PROF_ALLOC_PREP() to prof_alloc_prep(), since precise backtrace depth within jemalloc functions is no longer an issue (pprof prunes irrelevant frames). Implement mallctl's: - prof.reset implements full sample data reset, and optional change of sample interval. - prof.lg_sample reads the current sample interval (opt.lg_prof_sample was the permanent source of truth prior to prof.reset). - thread.prof.name provides naming capability for threads within heap profile dumps. - thread.prof.active makes it possible to activate/deactivate heap profiling for individual threads. Modify the heap dump files to contain per thread heap profile data. This change is incompatible with the existing pprof, which will require enhancements to read and process the enriched data.
* Dump heap profile backtraces in a stable order.Jason Evans2014-08-201-52/+105
| | | | | Also iterate over per thread stats in a stable order, which prepares the way for stable ordering of per thread heap profile dumps.
* Directly embed prof_ctx_t's bt.Jason Evans2014-08-201-51/+18
|
* Convert prof_tdata_t's bt2cnt to a comprehensive map.Jason Evans2014-08-201-50/+17
| | | | | | Treat prof_tdata_t's bt2cnt as a comprehensive map of the thread's extant allocation samples (do not limit the total number of entries). This helps prepare the way for per thread heap profiling.
* Fix -Wsign-compare warningsChris Peterson2014-06-021-2/+2
|
* Simplify backtracing.Jason Evans2014-04-231-34/+21
| | | | | | | | | | | Simplify backtracing to not ignore any frames, and compensate for this in pprof in order to increase flexibility with respect to function-based refactoring even in the presence of non-deterministic inlining. Modify pprof to blacklist all jemalloc allocation entry points including non-standard ones like mallocx(), and ignore all allocator-internal frames. Prior to this change, pprof excluded the specifically blacklisted functions from backtraces, but it left allocator-internal frames intact.
* prof_backtrace: use unw_backtraceLucian Adrian Grijincu2014-04-231-24/+9
| | | | | | unw_backtrace: - does internal per-thread caching - doesn't acquire an internal lock
* refactor profiling. only use a bytes till next sample variable.Ben Maurer2014-04-161-3/+62
|
* Remove support for non-prof-promote heap profiling metadata.Jason Evans2014-04-111-5/+2
| | | | | | | | | | | | | | | Make promotion of sampled small objects to large objects mandatory, so that profiling metadata can always be stored in the chunk map, rather than requiring one pointer per small region in each small-region page run. In practice the non-prof-promote code was only useful when using jemalloc to track all objects and report them as leaks at program exit. However, Valgrind is at least as good a tool for this particular use case. Furthermore, the non-prof-promote code is getting in the way of some optimizations that will make heap profiling much cheaper for the predominant use case (sampling a small representative proportion of all allocations).
* Consistently use debug lib(s) if presentHarald Weppner2014-03-281-8/+5
| | | | | | Fixes a situation where nm uses the debug lib but addr2line does not, which completely messes up the symbol lookup.
* Enable profiling / leak detection in FreeBSDHarald Weppner2014-03-181-1/+7
| | | | | * Assumes procfs is mounted at /proc, cf. <http://www.freebsd.org/doc/en/articles/linux-users/procfs.html>
* Add heap profiling tests.Jason Evans2014-01-171-13/+39
| | | | | | Fix a regression in prof_dump_ctx() due to an uninitized variable. This was caused by revision 4f37ef693e3d5903ce07dc0b61c0da320b35e3d9, so no releases are affected.
* Fix a variable prototype/definition mismatch.Jason Evans2014-01-171-2/+1
|
* Refactor prof_dump() to reduce contention.Jason Evans2014-01-161-172/+273
| | | | | | | | | | | | | | Refactor prof_dump() to use a two pass algorithm, and prof_leave() prior to the second pass. This avoids write(2) system calls while holding critical prof resources. Fix prof_dump() to close the dump file descriptor for all relevant error paths. Minimize the size of prof-related static buffers when prof is disabled. This saves roughly 65 KiB of application memory for non-prof builds. Refactor prof_ctx_init() out of prof_lookup_global().