| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
sa2u() returns 0 on overflow, but the profiling code was blindly calling
sa2u() and allowing the error to silently propagate, ultimately ending
in a later assertion failure. Refactor all ipalloc() callers to call
sa2u(), check for overflow before calling ipalloc(), and pass usize
rather than size. This allows ipalloc() to avoid calling sa2u() in the
common case.
|
| |
|
|
| |
Add code to set *rsize even when profiling is enabled.
|
| |
|
|
|
| |
Initialize arenas_tsd earlier, so that the non-TLS case works when
profiling is enabled.
|
| |
|
|
|
|
| |
pthread_mutex_lock() can call malloc() on OS X (!!!), which causes
deadlock. Work around this by using spinlocks that are built of more
primitive stuff.
|
| |
|
|
|
|
| |
Add the "stats.cactive" mallctl, which can be used to efficiently and
repeatedly query approximately how much active memory the application is
utilizing.
|
| |
|
|
|
|
|
|
| |
Rather than blindly assigning threads to arenas in round-robin fashion,
choose the lowest-numbered arena that currently has the smallest number
of threads assigned to it.
Add the "stats.arenas.<i>.nthreads" mallctl.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous free list implementation, which embedded singly linked
lists in available regions, had the unfortunate side effect of causing
many cache misses during thread cache fills. Fix this in two places:
- arena_run_t: Use a new bitmap implementation to track which regions
are available. Furthermore, revert to preferring the
lowest available region (as jemalloc did with its old
bitmap-based approach).
- tcache_t: Move read-only tcache_bin_t metadata into
tcache_bin_info_t, and add a contiguous array of pointers
to tcache_t in order to track cached objects. This
substantially increases the size of tcache_t, but results
in much higher data locality for common tcache operations.
As a side benefit, it is again possible to efficiently
flush the least recently used cached objects, so this
change changes flushing from MRU to LRU.
The new bitmap implementation uses a multi-level summary approach to
make finding the lowest available region very fast. In practice,
bitmaps only have one or two levels, though the implementation is
general enough to handle extremely large bitmaps, mainly so that large
page sizes can still be entertained.
Fix tcache_bin_flush_large() to always flush statistics, in the same way
that tcache_bin_flush_small() was recently fixed.
Use JEMALLOC_DEBUG rather than NDEBUG.
Add dassert(), and use it for debug-only asserts.
|
| |
|
|
|
| |
Add missing error checks for pthread_mutex_init() calls. In practice,
mutex initialization never fails, so this is merely good hygiene.
|
| |
|
|
|
|
|
|
|
| |
For the non-TLS case (as on OS X), if the "thread.{de,}allocatedp"
mallctl was called before any allocation occurred for that thread, the
TSD was still NULL, thus putting the application at risk of
dereferencing NULL. Fix this by refactoring the initialization code,
and making it part of the conditional logic for all per thread
allocation counter accesses.
|
| | |
|
| |
|
|
| |
Only call prof_boot0() if profiling is enabled.
|
| |
|
|
|
|
|
|
|
|
|
| |
Replace the single-character run-time flags with key/value pairs, which
can be set via the malloc_conf global, /etc/malloc.conf, and the
MALLOC_CONF environment variable.
Replace the JEMALLOC_PROF_PREFIX environment variable with the
"opt.prof_prefix" option.
Replace umax2s() with u2s().
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a regression due to the recent heap profiling accuracy improvements:
prof_{m,re}alloc() must set the object's profiling context regardless of
whether it is sampled.
Fix management of the CHUNK_MAP_CLASS chunk map bits, such that all
large object (re-)allocation paths correctly initialize the bits. Prior
to this fix, in-place realloc() cleared the bits, resulting in incorrect
reported object size from arena_salloc_demote(). After this fix the
non-demoted bit pattern is all zeros (instead of all ones), which makes
it easier to assure that the bits are properly set.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the "thread.allocated" and "thread.deallocated" mallctls, which can
be used to query the total number of bytes ever allocated/deallocated by
the calling thread.
Add s2u() and sa2u(), which can be used to compute the usable size that
will result from an allocation request of a particular size/alignment.
Re-factor ipalloc() to use sa2u().
Enhance the heap profiler to trigger samples based on usable size,
rather than request size. This has a subtle, but important, impact on
the accuracy of heap sampling. For example, previous to this change,
16- and 17-byte objects were sampled at nearly the same rate, but
17-byte objects actually consume 32 bytes each. Therefore it was
possible for the sample to be somewhat skewed compared to actual memory
usage of the allocated objects.
|
| |
|
|
|
|
|
|
| |
Add the R option to control whether cumulative heap profile data
are maintained. Add the T option to control the size of per thread
backtrace caches, primarily because when the R option is specified,
backtraces that no longer have allocations associated with them are
discarded as soon as no thread caches refer to them.
|
| |
|
|
|
|
|
| |
Remove malloc_swap_enable(), which was obsoleted by the "swap.fds"
mallctl. The prototype for malloc_swap_enable() was removed from
jemalloc/jemalloc.h, but the function itself was accidentally left in
place.
|
| |
|
|
|
|
| |
Base dynamic structure size on offsetof(), rather than subtracting the
size of the dynamic structure member. Results could differ on systems
with strict data structure alignment requirements.
|
| |
|
|
|
|
| |
Add --enable-cc-silence, which can be used to silence harmless warnings.
Fix an aliasing bug in ckh_pointer_hash().
|
| |
|
|
|
| |
If memalign() and/or valloc() are present on the system, override them
in order to avoid mixed allocator usage.
|
| |
|
|
|
| |
Create the buferror() function, which wraps strerror_r(). This is
necessary because glibc provides a non-standard strerror_r().
|
| |
|
|
|
|
|
| |
Remove assertions that malloc_{pre,post}fork() are only called if
threading is enabled. This was true of these functions in the context
of FreeBSD's libc, but now the functions are called unconditionally as a
result of registering them with pthread_atfork().
|
| |
|
|
|
|
| |
Add allocm(), rallocm(), sallocm(), and dallocm(), which are a
functional superset of malloc(), calloc(), posix_memalign(),
malloc_usable_size(), and free().
|
| |
|
|
|
|
|
| |
Move the table of size classes from jemalloc.c to the manual page. When
manually formatting the manual page, it is now necessary to use:
nroff -man -t jemalloc.3
|
| |
|
|
|
| |
Add Mac OS X support, based in large part on the OS X support in
Mozilla's version of jemalloc.
|
| |
|
|
|
|
|
|
|
|
| |
If multiple threads race to initialize malloc, the loser(s) busy-wait
until initialization is complete. Add a missing mutex lock so that the
loser(s) properly release the initialization mutex. Under some
race conditions, this flaw could have caused one or more threads to
become permanently blocked.
Reported by Terrell Magee.
|
| |
|
|
|
|
| |
If there is more than one arena, initialize next_arena so that the
first and second threads to allocate memory use arenas 0 and 1, rather
than both using arena 0.
|
| |
|
|
|
|
|
|
| |
Initialize bt2cnt_tsd so that cleanup at thread exit actually happens.
Associate (prof_ctx_t *) with allocated objects, rather than
(prof_thr_cnt_t *). Each thread must always operate on its own
(prof_thr_cnt_t *), and an object may outlive the thread that allocated it.
|
| |
|
|
|
|
|
| |
Add the E/e options to control whether the application starts with
sampling active/inactive (secondary control to F/f). Add the
prof.active mallctl so that the application can activate/deactivate
sampling on the fly.
|
| |
|
|
|
|
| |
Make it possible to disable interval-triggered profile dumping, even if
profiling is enabled. This is useful if the user only wants a single
dump at exit, or if the application manually triggers profile dumps.
|
| |
|
|
|
|
|
|
|
| |
If the mean heap sampling interval is larger than one page, simulate
sampled small objects with large objects. This allows profiling context
pointers to be omitted for small objects. As a result, the memory
overhead for sampling decreases as the sampling interval is increased.
Fix a compilation error in the profiling code.
|
| |
|
|
|
|
|
|
|
|
| |
Remove medium size classes, because concurrent dirty page purging is
no longer capable of purging inactive dirty pages inside active runs
(due to recent arena/bin locking changes).
Enhance tcache to support caching large objects, so that the same range
of size classes is still cached, despite the removal of medium size
class support.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use chains of cached objects, rather than using arrays of pointers.
Since tcache_bin_t is no longer dynamically sized, convert tcache_t's
tbin to an array of structures, rather than an array of pointers. This
implicitly removes tcache_bin_{create,destroy}(), which further
simplifies the fast path for malloc/free.
Use cacheline alignment for tcache_t allocations.
Remove runtime configuration option for number of tcache bin slots, and
replace it with a boolean option for enabling/disabling tcache.
Limit the number of tcache objects to the lesser of TCACHE_NSLOTS_MAX
and 2X the number of regions per run for the size class.
For GC-triggered flush, discard 3/4 of the objects below the low water
mark, rather than 1/2.
|
| |
|
|
|
| |
Rather than passing four strings to malloc_message(), malloc_write4(),
and all the functions that use them, only pass one string.
|
| | |
|
| | |
|
| |
|
|
|
|
|
| |
Remove all functionality related to tracing. This functionality was
useful for understanding memory fragmentation during early algorithmic
design of jemalloc, but it had little utility for non-trivial
applications, due to the sheer volume of data written to disk.
|
| |
|
|
|
|
| |
Add mallctl interfaces for profiling parameters.
Fix a file descriptor leak in heap profile dumping.
|
| |
|
|
|
| |
Bootstrap profiling in three stages, so that it is usable by the time
the first application allocation occurs.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the --enable-prof and --enable-prof-libunwind configure options.
Add the B/b, F/f, I/i, L/l, and U/u JEMALLOC_OPTIONS.
Interval-based profile dump triggering is not yet implemented.
Add supporting generic code:
* Add memory barriers.
* Add prn (LCG PRNG).
* Add hash (Murmur hash function).
* Add ckh (cuckoo hash tables).
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If a custom small_size2bin table was required due to non-default size
class settings, memory allocation prior to initializing chunk parameters
would cause a crash due to division by 0. The fix re-orders the various
*_boot() function calls.
Bootstrapping is simpler now than it was before the base allocator
started just using the chunk allocator directly. This allows
arena_boot[01]() to be combined.
Add error detection for pthread_atfork() and atexit() function calls.
|
| |
|
|
|
| |
This feature caused significant performance degradation, and the
fragmentation reduction benefits were difficult to quantify.
|
| |
|
|
| |
Initialize malloc before calling into the ctl_*() functions.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Replace chunk stats code that was missing locking; this fixes a race
condition that could corrupt chunk statistics.
Converting malloc_stats_print() to use mallctl*().
Add a missing semicolon in th DSS code.
Convert malloc_tcache_flush() to a mallctl.
Convert malloc_swap_enable() to a set of mallctl's.
|
| |
|
|
|
|
| |
Revert to simpler lock acquistion/release code in
malloc_{pre,post}fork(), since dynamic arena rebalancing is no longer
implemented.
|
| |
|
|
|
|
|
|
| |
Add malloc_swap_enable().
Add the O/o JEMALLOC_OPTIONS flags, which control memory overcommit.
Fix mapped memory stats reporting for arenas.
|
| |
|
|
|
| |
Add the w4opaque argument malloc_message() and malloc_stats_print(), and
propagate the change through all the internal APIs as necessary.
|
| |
|
|
| |
Add malloc_cprintf() and malloc_vcprintf().
|
| |
|
|
| |
Fix some bugs in the Makefile's install target.
|
| |
|
|
|
|
|
|
|
| |
Fix a stats bug in large object curruns accounting.
Replace tcache_bin_fill() with arena_tcache_fill(), and fix a bug in an OOM
error path.
Fix API name mangling to coexist with __attribute__((malloc)).
|
| |
|
|
| |
destructors may run after tcache_tsd's.
|