| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
On glibc and Android's bionic, strerror_r returns char* when
_GNU_SOURCE is defined.
Add a configure check for this rather than assume glibc is the
only libc that behaves this way.
|
|
|
|
|
|
|
|
|
| |
All the invocations of AC_COMPILE_IFELSE inside JE_CXXFLAGS_ADD were
running 'the compiler and compilation flags of the current language'
which was always the C compiler and the CXXFLAGS were never being tested
against a C++ compiler. This patch fixes this issue by temporarily
changing the chosen compiler to C++ by pushing it over the stack and
popping it immediately after the compilation check.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
On x86 Linux, we define our own MADV_FREE if madvise(2) is available, but no
MADV_FREE is detected. This allows the feature to be built in and enabled with
runtime detection.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quoting from https://github.com/jemalloc/jemalloc/issues/761 :
[...] reading the Power ISA documentation[1], the assembly in [the CPU_SPINWAIT
macro] isn't correct anyway (as @marxin points out): the setting of the
program-priority register is "sticky", and we never undo the lowering.
We could do something similar, but given that we don't have testing here in the
first place, I'm inclined to simply not try. I'll put something up reverting the
problematic commit tomorrow.
[1] Book II, chapter 3 of the 2.07B or 3.0B ISA documents.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The configure.ac seciton right now is the same for Linux and kFreeBSD,
which results into an incorrect configuration of e.g. defining
JEMALLOC_PROC_SYS_VM_OVERCOMMIT_MEMORY instead of FreeBSD's
JEMALLOC_SYSCTL_VM_OVERCOMMIT.
GNU/kFreeBSD is really a glibc + FreeBSD kernel system, so it needs its
own entry which has a mixture of configuration options from Linux and
FreeBSD.
|
|
|
|
|
| |
This option enables transparent huge page for base allocators (require
MADV_HUGEPAGE support).
|
|
|
|
|
|
| |
Currently, the log macro requires at least one argument after the format string,
because of the way the preprocessor handles varargs macros. We can hide some of
that irritation by pushing the extra arguments into a varargs function.
|
|
|
|
|
| |
This sets up a hierarchical logging facility, so that we can add logging
statements liberally, and turn them on in a fine-grained manner.
|
|
|
|
| |
This resolves #912.
|
|
|
|
| |
This resolves #883.
|
| |
|
|
|
|
| |
Also fix a compilation error #ifndef JEMALLOC_PTHREAD_CREATE_WRAPPER.
|
|
|
|
| |
This resolves #507.
|
|
|
|
| |
This resolves #669.
|
|
|
|
|
|
|
|
|
|
|
| |
Added opt.background_thread to enable background threads, which handles purging
currently. When enabled, decay ticks will not trigger purging (which will be
left to the background threads). We limit the max number of threads to NCPUs.
When percpu arena is enabled, set CPU affinity for the background threads as
well.
The sleep interval of background threads is dynamic and determined by computing
number of pages to purge in the future (based on backlog).
|
|
|
|
|
|
|
|
| |
Rather than using a manually maintained list of internal symbols to
drive name mangling, add a compilation phase to automatically extract
the list of internal symbols.
This resolves #677.
|
| |
|
|
|
|
|
|
|
| |
This simplifies configuration when embedding a jemalloc release into
another project's git repository.
This resolves #811.
|
|
|
|
|
|
|
|
|
|
| |
Add the extent_destroy_t extent destruction hook to extent_hooks_t, and
use it during arena destruction. This hook explicitly communicates to
the callee that the extent must be destroyed or tracked for later reuse,
lest it be permanently leaked. Prior to this change, retained extents
could unintentionally be leaked if extent retention was enabled.
This resolves #560.
|
| |
|
|
|
|
|
|
|
|
|
| |
Control use of munmap(2) via a run-time option rather than a
compile-time option (with the same per platform default). The old
behavior of --disable-munmap can be achieved with
--with-malloc-conf=munmap:false.
This partially resolves #580.
|
|
|
|
|
|
|
| |
This option hasn't been particularly useful since the original pre-3.0.0
push to broaden test coverage.
This partially resolves #580.
|
|
|
|
|
|
|
| |
The explicit compiler warning suppression controlled by this option is
universally desirable, so remove the ability to disable suppression.
This partially resolves #580.
|
|
|
|
|
|
| |
This option isn't useful in practice.
This partially resolves #580.
|
|
|
|
|
|
|
|
| |
Four size classes per size doubling has proven to be a universally good
choice for the entire 4.x release series, so there's little point to
preserving this configurability.
This partially resolves #580.
|
|
|
|
|
|
| |
This fixes a bug/regression introduced by
a01f99307719dcc8ca27cc70f0f0011beff914fa (Only disable munmap(2) by
default on 64-bit Linux.).
|
|
|
|
|
|
| |
This can catch bugs in which one header defines a numeric constant, and another
uses it without including the defining header. Undefined preprocessor symbols
expand to '0', so that this will compile fine, silently doing the math wrong.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continue to use ivsalloc() when --enable-debug is specified (and add
assertions to guard against 0 size), but stop providing a documented
explicit semantics-changing band-aid to dodge undefined behavior in
sallocx() and malloc_usable_size(). ivsalloc() remains compiled in,
unlike when #211 restored --enable-ivsalloc, and if
JEMALLOC_FORCE_IVSALLOC is defined during compilation, sallocx() and
malloc_usable_size() will still use ivsalloc().
This partially resolves #580.
|
|
|
|
|
|
|
| |
This option is no longer useful, because TLS is correctly configured
automatically on all supported platforms.
This partially resolves #580.
|
|
|
|
|
|
|
|
|
|
|
| |
Simplify configuration by removing the --disable-tcache option, but
replace the testing for that configuration with
--with-malloc-conf=tcache:false.
Fix the thread.arena and thread.tcache.flush mallctls to work correctly
if tcache is disabled.
This partially resolves #580.
|
|
|
|
|
|
|
| |
This reduces the likelihood of address space exhaustion on 32-bit
systems.
This resolves #350.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while
now:
- The source of system-wide definitions.
- The catch-all include file.
- The module header file for jemalloc.c
This commit splits up this functionality. The system-wide definitions
responsibility has moved to jemalloc_preamble.h. The catch-all include file is
now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in
jemalloc_internal_[externs|inlines|types].h, just as they are for the other
modules.
|
|
|
|
|
|
|
|
|
| |
Hyper-threaded CPUs may need a special instruction inside spin loops in
order to yield to another virtual CPU. The 'pause' instruction that is
available for x86 is not supported on Power.
Apparently the extended mnemonics like yield, mdoio, and mdoom are not
actually implemented on POWER8, although mentioned in the ISA 2.07
document. The recommended magic bits are an 'or 31,31,31'.
|
| |
|
|
|
|
|
| |
madvise(..., MADV_DONTNEED) only causes demand-zeroing on Linux, so fall
back to overlaying a new mapping.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new feature, opt.percpu_arena, determines thread-arena association
dynamically based CPU id. Three modes are supported: "percpu", "phycpu"
and disabled.
"percpu" uses the current core id (with help from sched_getcpu())
directly as the arena index, while "phycpu" will assign threads on the
same physical CPU to the same arena. In other words, "percpu" means # of
arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of
CPUs). Note that no runtime check on whether hyper threading is enabled
is added yet.
When enabled, threads will be migrated between arenas when a CPU change
is detected. In the current design, to reduce overhead from reading CPU
id, each arena tracks the thread accessed most recently. When a new
thread comes in, we will read CPU id and update arena if necessary.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces a backport of C11 atomics. It has four implementations; ranked
in order of preference, they are:
- GCC/Clang __atomic builtins
- GCC/Clang __sync builtins
- MSVC _Interlocked builtins
- C11 atomics, from <stdatomic.h>
The primary advantages are:
- Close adherence to the standard API gives us a defined memory model.
- Type safety: atomic objects are now separate types from non-atomic ones, so
that it's impossible to mix up atomic and non-atomic updates (which is
undefined behavior that compilers are starting to take advantage of).
- Efficiency: we can specify ordering for operations, avoiding fences and
atomic operations on strongly ordered architectures (example:
`atomic_write_u32(ptr, val);` involves a CAS loop, whereas
`atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store.
This diff leaves in the current atomics API (implementing them in terms of the
backport). This lets us transition uses over piecemeal.
Testing:
This is by nature hard to test. I've manually tested the first three options on
Linux on gcc by futzing with the #defines manually, on freebsd with gcc and
clang, on MSVC, and on OS X with clang. All of these were x86 machines though,
and we don't have any test infrastructure set up for non-x86 platforms.
|
| |
|
|
|
|
|
|
| |
This regression was introduced by
194d6f9de8ff92841b67f38a2a6a06818e3240dd (Restructure *CFLAGS/*CXXFLAGS
configuration.).
|
|
|
|
|
|
| |
This removes an unneeded library dependency when falling back to
intrinsics-based backtracing (or failing to enable heap profiling at
all).
|
|
|
|
|
|
|
| |
Rather than dynamically building a table to aid per level computations,
define a constant table at compile time. Omit both high and low
insignificant bits. Use one to three tree levels, depending on the
number of significant bits.
|
|
|
|
| |
This resolves #540.
|
|
|
|
| |
This partially resolves #536.
|