| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Add stdlib.h, stdbool.h, and stdint.h to jemalloc/jemalloc.h so that
applications only have to #include <jemalloc/jemalloc.h>.
This resolves #132.
|
|
|
|
|
|
|
|
|
|
|
| |
Fix prof regressions related to tdata (main per thread profiling data
structure) destruction:
- Deadlock. The fix for this was intended to be part of
20c31deaae38ed9aa4fe169ed65e0c45cd542955 (Test prof.reset mallctl and
fix numerous discovered bugs.) but the fix was left incomplete.
- Destruction race. Detaching tdata just prior to destruction without
holding the tdatas lock made it possible for another thread to destroy
the tdata out from under the thread that was on its way to doing so.
|
|
|
|
|
|
| |
Don't disable tcache when lazy-lock is configured. There already exists
a mechanism to disable tcache, but doing so automatically due to
lazy-lock causes surprising performance behavior.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Revert 6716aa83526b3f866d73a033970cc920bc61c13f (Force use of TLS if
heap profiling is enabled.). No existing tests indicate that this is
necessary, nor does code inspection uncover any potential issues. Most
likely the original commit covered up a bug related to tsd-internal
allocation that has since been fixed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix tsd cleanup regressions that were introduced in
5460aa6f6676c7f253bfcb75c028dfd38cae8aaf (Convert all tsd variables to
reside in a single tsd structure.). These regressions were twofold:
1) tsd_tryget() should never (and need never) return NULL. Rename it to
tsd_fetch() and simplify all callers.
2) tsd_*_set() must only be called when tsd is in the nominal state,
because cleanup happens during the nominal-->purgatory transition,
and re-initialization must not happen while in the purgatory state.
Add tsd_nominal() and use it as needed. Note that tsd_*{p,}_get()
can still be used as long as no re-initialization that would require
cleanup occurs. This means that e.g. the thread_allocated counter
can be updated unconditionally.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Implement/test/fix the opt.prof_thread_active_init,
prof.thread_active_init, and thread.prof.active mallctl's.
Test/fix the thread.prof.name mallctl.
Refactor opt_prof_active to be read-only and move mutable state into the
prof_active variable. Stop leaning on ctl-related locking for
protection.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Refactor permuted backtrace test allocation that was originally used
only by the prof_accum test, so that it can be used by other heap
profiling test binaries.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trivial example:
#include <stdlib.h>
int main(void) {
void *ptr = malloc(1024 * 1024 * 8);
if (!ptr) return 1;
ptr = realloc(ptr, 1024 * 1024 * 4);
if (!ptr) return 1;
}
Before:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcfff000000
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcffec00000
madvise(0x7fcfff000000, 8388608, MADV_DONTNEED) = 0
After:
mmap(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1934800000
madvise(0x7f1934c00000, 4194304, MADV_DONTNEED) = 0
Closes #134
|
|
|
|
|
| |
PTHREAD_MUTEX_ADAPTIVE_NP is an enum on glibc and not a macro,
we must test for their existence by attempting compilation.
|
|\
| |
| | |
autoconf: Support cygwin in addition to mingw
|
| | |
|
|\ \
| | |
| | | |
Use MSVC intrinsics for lg_floor
|
| |/
| |
| |
| |
| | |
When using MSVC make use of its intrinsic functions (supported on
x86, amd64 & ARM) for lg_floor.
|
| |
| |
| |
| | |
This fixes issue #113 - je_malloc_conf is not respected on OS X
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Move small run metadata into the arena chunk header, with multiple
expected benefits:
- Lower run fragmentation due to reduced run sizes; runs are more likely
to completely drain when there are fewer total regions.
- Improved cache behavior. Prior to this change, run headers were
always page-aligned, which put extra pressure on some CPU cache sets.
The degree to which this was a problem was hardware dependent, but it
likely hurt some even for the most advanced modern hardware.
- Buffer overruns/underruns are less likely to corrupt allocator
metadata.
- Size classes between 4 KiB and 16 KiB become reasonable to support
without any special handling, and the runs are small enough that dirty
unused pages aren't a significant concern.
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a race that caused a non-critical assertion failure. To trigger the
race, a thread had to be part way through initializing a new sample,
such that it was discoverable by the dumping thread, but not yet linked
into its gctx by the time a later dump phase would normally have reset
its state to 'nominal'.
Additionally, lock access to the state field during modification to
transition to the dumping state. It's not apparent that this oversight
could have caused an actual problem due to outer locking that protects
the dumping machinery, but the added locking pedantically follows the
stated locking protocol for the state field.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
It has an unused variable, so it was always failing (at least with gcc
4.9.1). Alternatively, the `-Werror` flag could be removed if it isn't
strictly necessary.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't use atomic_add_uint64(), because it isn't available on 32-bit
platforms.
Fix forking support functions to manage all prof-related mutexes.
These regressions were introduced by
602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
|
|
|
|
|
|
|
|
| |
Fix irallocx_prof() sample logic to only update the threshold counter
after it knows what size the allocation ended up being. This regression
was caused by 6e73dc194ee9682d3eacaf725a989f04629718f7 (Fix a profile
sampling race.), which did not make it into any releases prior to this
fix.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* assertion failure
* malloc_init failure
* malloc not already initialized (in malloc_init)
* running in valgrind
* thread cache disabled at runtime
Clang and GCC already consider a comparison with NULL or -1 to be cold,
so many branches (out-of-memory) are already correctly considered as
cold and marking them is not important.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a profile sampling race that was due to preparing to sample, yet
doing nothing to assure that the context remains valid until the stats
are updated.
These regressions were caused by
602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix prof_tdata_get() to avoid dereferencing an invalid tdata pointer
(when it's PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).
Fix prof_tdata_get() callers to check for invalid results besides NULL
(PROF_TDATA_STATE_{REINCARNATED,PURGATORY}).
These regressions were caused by
602c8e0971160e4b85b08b16cf8a2375aa24bc04 (Implement per thread heap
profiling.), which did not make it into any releases prior to these
fixes.
|
|
|
|
|
| |
Fix ReadThreadedHeapProfile to pass the correct parameters to
AdjustSamples.
|
|
|
|
|
| |
Refactor sdallocx() and nallocx() to share inallocx(), and fix an
sdallocx() assertion to check usize rather than size.
|
|
|
|
|
|
|
|
|
| |
- Add a --thread N option to select profile for thread N (otherwise, all
threads will be printed)
- The $profile map now has a {threads} element that is a map from thread id to
a profile that has the same format as the {profile} element
- Refactor ReadHeapProfile into smaller components and use them to implement
ReadThreadedHeapProfile
|
|\
| |
| | |
fix isqalloct (should call isdalloct)
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new `sdallocx` function to the external API, allowing the
size to be passed by the caller. It avoids some extra reads in the
thread cache fast path. In the case where stats are enabled, this
avoids the work of calculating the size from the pointer.
An assertion validates the size that's passed in, so enabling debugging
will allow users of the API to debug cases where an incorrect size is
passed in.
The performance win for a contrived microbenchmark doing an allocation
and immediately freeing it is ~10%. It may have a different impact on a
real workload.
Closes #28
|
| |
|
| |
|
|\
| |
| | |
avoid conflict with the POSIX timer_t type
|
|/
|
|
| |
It hits a compilation error with glibc 2.19 without a rename.
|