summaryrefslogtreecommitdiffstats
path: root/Makefile.in
Commit message (Collapse)AuthorAgeFilesLines
* Add the div module, which allows fast division by dynamic values.David Goldblatt2017-12-211-0/+2
|
* Pull out arena_bin_info_t and arena_bin_t into their own file.David T. Goldblatt2017-12-191-0/+1
| | | | | In the process, kill arena_bin_index, which is unused. To follow are several diffs continuing this separation.
* Remove external linkage for spin_adaptiveRyan Libby2017-08-081-1/+0
| | | | | | | The external linkage for spin_adaptive was not used, and the inline declaration of spin_adaptive that was used caused a probem on FreeBSD where CPU_SPINWAIT is implemented as a call to a static procedure for x86 architectures.
* Add a logging facility.David T. Goldblatt2017-07-211-0/+2
| | | | | This sets up a hierarchical logging facility, so that we can add logging statements liberally, and turn them on in a fine-grained manner.
* Header refactoring: Pull size helpers out of jemalloc module.David Goldblatt2017-05-311-0/+1
|
* Add test for excessive retained memory.Jason Evans2017-05-301-0/+1
|
* Add tests for background threads.Qi Wang2017-05-231-0/+1
|
* Implementing opt.background_thread.Qi Wang2017-05-231-0/+1
| | | | | | | | | | | Added opt.background_thread to enable background threads, which handles purging currently. When enabled, decay ticks will not trigger purging (which will be left to the background threads). We limit the max number of threads to NCPUs. When percpu arena is enabled, set CPU affinity for the background threads as well. The sleep interval of background threads is dynamic and determined by computing number of pages to purge in the future (based on backlog).
* Protect the rtree/extent interactions with a mutex pool.David Goldblatt2017-05-191-0/+1
| | | | | | | | | | | | | | | | | | Instead of embedding a lock bit in rtree leaf elements, we associate extents with a small set of mutexes. This gets us two things: - We can use the system mutexes. This (hypothetically) protects us from priority inversion, and lets us stop doing a backoff/sleep loop, instead opting for precise wakeups from the mutex. - Cuts down on the number of mutex acquisitions we have to do (from 4 in the worst case to two). We end up simplifying most of the rtree code (which no longer has to deal with locking or concurrency at all), at the cost of additional complexity in the extent code: since the mutex protecting the rtree leaf elements is determined by reading the extent out of those elements, the initial read is racy, so that we may acquire an out of date mutex. We re-check the extent in the leaf after acquiring the mutex to protect us from this race.
* Refactor *decay_time into *decay_ms.Jason Evans2017-05-181-2/+2
| | | | | | | | Support millisecond resolution for decay times. Among other use cases this makes it possible to specify a short initial dirty-->muzzy decay phase, followed by a longer muzzy-->clean decay phase. This resolves #812.
* Avoid over-rebuilding due to namespace mangling.Jason Evans2017-05-171-3/+8
| | | | | | | Take care not to touch generated namespace mangling headers unless their contents would change. This resolves #838.
* Use srcroot path for private_namespace.sh.Qi Wang2017-05-161-2/+2
|
* Automatically generate private symbol name mangling macros.Jason Evans2017-05-121-7/+46
| | | | | | | | Rather than using a manually maintained list of internal symbols to drive name mangling, add a compilation phase to automatically extract the list of internal symbols. This resolves #677.
* Remove --enable-code-coverage.Jason Evans2017-04-241-50/+0
| | | | | | | This option hasn't been particularly useful since the original pre-3.0.0 push to broaden test coverage. This partially resolves #580.
* Add hooking functionalityDavid Goldblatt2017-04-071-0/+2
| | | | | This allows us to hook chosen functions and do interesting things there (in particular: reentrancy checking).
* Implement two-phase decay-based purging.Jason Evans2017-03-151-2/+2
| | | | | | | | | | | | | | | | | | | | Split decay-based purging into two phases, the first of which uses lazy purging to convert dirty pages to "muzzy", and the second of which uses forced purging, decommit, or unmapping to convert pages to clean or destroy them altogether. Not all operating systems support lazy purging, yet the application may provide extent hooks that implement lazy purging, so care must be taken to dynamically omit the first phase when necessary. The mallctl interfaces change as follows: - opt.decay_time --> opt.{dirty,muzzy}_decay_time - arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time - arenas.decay_time --> arenas.{dirty,muzzy}_decay_time - stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy} - stats.arenas.<i>.{npurge,nmadvise,purged} --> stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged} This resolves #521.
* Disentangle assert and utilDavid Goldblatt2017-03-061-4/+4
| | | | | | | | | This is the first header refactoring diff, #533. It splits the assert and util components into separate, hermetic, header files. In the process, it splits out two of the large sub-components of util (the stdio.h replacement, and bit manipulation routines) into their own components (malloc_io.h and bit_util.h). This is mostly to break up cyclic dependencies, but it also breaks off a good chunk of the catch-all-ness of util, which is nice.
* Introduce a backport of C11 atomicsDavid Goldblatt2017-03-031-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a backport of C11 atomics. It has four implementations; ranked in order of preference, they are: - GCC/Clang __atomic builtins - GCC/Clang __sync builtins - MSVC _Interlocked builtins - C11 atomics, from <stdatomic.h> The primary advantages are: - Close adherence to the standard API gives us a defined memory model. - Type safety: atomic objects are now separate types from non-atomic ones, so that it's impossible to mix up atomic and non-atomic updates (which is undefined behavior that compilers are starting to take advantage of). - Efficiency: we can specify ordering for operations, avoiding fences and atomic operations on strongly ordered architectures (example: `atomic_write_u32(ptr, val);` involves a CAS loop, whereas `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store. This diff leaves in the current atomics API (implementing them in terms of the backport). This lets us transition uses over piecemeal. Testing: This is by nature hard to test. I've manually tested the first three options on Linux on gcc by futzing with the #defines manually, on freebsd with gcc and clang, on MSVC, and on OS X with clang. All of these were x86 machines though, and we don't have any test infrastructure set up for non-x86 platforms.
* Remove remainder of mb (memory barrier).Jason Evans2017-02-221-1/+0
| | | | | This complements 94c5d22a4da7844d0bdc5b370e47b1ba14268af2 (Remove mb.h, which is unused).
* Enhance spin_adaptive() to yield after several iterations.Jason Evans2017-02-091-0/+1
| | | | | This avoids worst case behavior if e.g. another thread is preempted while owning the resource the spinning thread is waiting for.
* Test JSON output of malloc_stats_print() and fix bugs.Jason Evans2017-01-191-0/+1
| | | | | | | | Implement and test a JSON validation parser. Use the parser to validate JSON output from malloc_stats_print(), with a significant subset of supported output options. This resolves #551.
* Fix prof_realloc() regression.Jason Evans2017-01-171-0/+1
| | | | | | | | | | | | | | Mostly revert the prof_realloc() changes in 498856f44a30b31fe713a18eb2fc7c6ecf3a9f63 (Move slabs out of chunks.) so that prof_free_sampled_object() is called when appropriate. Leave the prof_tctx_[re]set() optimization in place, but add an assertion to verify that all eight cases are correctly handled. Add a comment to make clear the code ordering, so that the regression originally fixed by ea8d97b8978a0c0423f0ed64332463a25b787c3d (Fix prof_{malloc,free}_sample_object() call order in prof_realloc().) is not repeated. This resolves #499.
* Implement arena.<i>.destroy .Jason Evans2017-01-071-0/+4
| | | | | | | Add MALLCTL_ARENAS_DESTROYED for accessing destroyed arena stats as an analogue to MALLCTL_ARENAS_ALL. This resolves #382.
* Implement per arena base allocators.Jason Evans2016-12-271-0/+1
| | | | | | | | | | | | | Add/rename related mallctls: - Add stats.arenas.<i>.base . - Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal . - Add stats.arenas.<i>.resident . Modify the arenas.extend mallctl to take an optional (extent_hooks_t *) argument so that it is possible for all base allocations to be serviced by the specified extent hooks. This resolves #463.
* Add huge page configuration and pages_[no}huge().Jason Evans2016-12-271-0/+1
| | | | | | | | Add the --with-lg-hugepage configure option, but automatically configure LG_HUGEPAGE even if it isn't specified. Add the pages_[no]huge() functions, which toggle huge page state via madvise(..., MADV_[NO]HUGEPAGE) calls.
* Simplify arena_slab_regind().Jason Evans2016-12-231-0/+1
| | | | | | | | | Rewrite arena_slab_regind() to provide sufficient constant data for the compiler to perform division strength reduction. This replaces more general manual strength reduction that was implemented before arena_bin_info was compile-time-constant. It would be possible to slightly improve on the compiler-generated division code by taking advantage of range limits that the compiler doesn't know about.
* Restructure *CFLAGS/*CXXFLAGS configuration.Jason Evans2016-12-161-2/+6
| | | | | | | | | | | | | Convert CFLAGS/CXXFLAGS to be concatenations: CFLAGS := CONFIGURE_CFLAGS SPECIFIED_CFLAGS EXTRA_CFLAGS CXXFLAGS := CONFIGURE_CXXFLAGS SPECIFIED_CXXFLAGS EXTRA_CXXFLAGS This ordering makes it possible to override the flags set by the configure script both during and after configuration, with CFLAGS/CXXFLAGS and EXTRA_CFLAGS/EXTRA_CXXFLAGS, respectively. This resolves #504.
* jemalloc cpp new/delete bindingsDave Watson2016-12-131-16/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds cpp bindings for jemalloc, along with necessary autoconf settings. This is mostly to add sized deallocation support, which can't be added from C directly. Sized deallocation is ~10% microbench improvement. * Import ax_cxx_compile_stdcxx.m4 from the autoconf repo, seems like the easiest way to get c++14 detection. * Adds various other changes, like CXXFLAGS, to configure.ac. * Adds new rules to Makefile.in for src/jemalloc-cpp.cpp, and a basic unittest. * Both new and delete are overridden, to ensure jemalloc is used for both. * TODO future enhancement of avoiding extra PLT thunks for new and delete - sdallocx and malloc are publicly exported jemalloc symbols, using an alias would link them directly. Unfortunately, was having trouble getting it to play nice with jemalloc's namespace support. Testing: Tested gcc 4.8, gcc 5, gcc 5.2, clang 4.0. Only gcc >= 5 has sized deallocation support, verified that the rest build correctly. Tested mac osx and Centos. Tested --with-jemalloc-prefix and --without-export. This resolves #202.
* Add packing test, which verifies stable layout policy.Jason Evans2016-11-151-0/+1
|
* Fix EXTRA_CFLAGS to not affect configuration.Jason Evans2016-10-301-1/+2
|
* Only link with libm (-lm) if necessary.Jason Evans2016-10-281-3/+4
| | | | This fixes warnings when building with MSVC.
* Only use --whole-archive with gcc.Jason Evans2016-10-281-2/+3
| | | | | | | Conditionalize use of --whole-archive on the platform plus compiler, rather than on the ABI. This fixes a regression caused by 7b24c6e5570062495243f1e55131b395adb31e33 (Use --whole-archive when linking integration tests on MinGW.).
* Use --whole-archive when linking integration tests on MinGW.Jason Evans2016-10-261-1/+10
| | | | | | | Prior to this change, the malloc_conf weak symbol provided by the jemalloc dynamic library is always used, even if the application provides a malloc_conf symbol. Use the --whole-archive linker option to allow the weak symbol to be overridden.
* Add/use adaptive spinning.Jason Evans2016-10-131-0/+1
| | | | | | | | Add spin_t and spin_{init,adaptive}(), which provide a simple abstraction for adaptive spinning. Adaptively spin during busy waits in bootstrapping and rtree node initialization.
* Remove all vestiges of chunks.Jason Evans2016-10-121-2/+0
| | | | | | | | Remove mallctls: - opt.lg_chunk - stats.cactive This resolves #464.
* Remove ratio-based purging.Jason Evans2016-10-121-5/+3
| | | | | | | | | | | | | Make decay-based purging the default (and only) mode. Remove associated mallctls: - opt.purge - opt.lg_dirty_mult - arena.<i>.lg_dirty_mult - arenas.lg_dirty_mult - stats.arenas.<i>.lg_dirty_mult This resolves #385.
* use install command determined by configureThomas Köckerbauer2016-09-261-20/+21
|
* Fix librt detection when using a Cray compiler wrapperElliot Ronaghan2016-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | The Cray compiler wrappers will often add `-lrt` to the base compiler with `-static` linking (the default at most sites.) However, `-lrt` isn't automatically added with `-dynamic`. This means that if jemalloc was built with `-static`, but then used in a program with `-dynamic` jemalloc won't have detected that librt is a dependency. The integration and stress tests use -dynamic, which is causing undefined references to clock_gettime(). This just adds an extra check for librt (ignoring the autoconf cache) with `-dynamic` thrown. It also stops filtering librt from the integration tests. With this `make check` passes for: - PrgEnv-gnu - PrgEnv-intel - PrgEnv-pgi PrgEnv-cray still needs more work (will be in a separate patch.)
* Add -dynamic for integration and stress tests with Cray compiler wrappersElliot Ronaghan2016-07-071-2/+3
| | | | | | | | | | | | | | | | | | | Cray systems come with compiler wrappers to simplify building parallel applications. CC is the C++ wrapper, and cc is the C wrapper. The wrappers call the base {Cray, Intel, PGI, or GNU} compiler with vendor specific flags. The "Programming Environment" (prgenv) that's currently loaded determines the base compiler. e.g. compiling with gnu looks something like: module load PrgEnv-gnu cc hello.c On most systems the wrappers defaults to `-static` mode, which causes them to only look for static libraries, and not for any dynamic ones (even if the dynamic version was explicitly listed.) The integration and stress tests expect to be using the .so, so we have to run the with -dynamic so that wrapper will find/use the .so.
* Rename most remaining *chunk* APIs to *extent*.Jason Evans2016-06-061-5/+5
|
* Rename huge to large.Jason Evans2016-06-061-1/+1
|
* Use huge size class infrastructure for large size classes.Jason Evans2016-06-061-1/+0
|
* Replace extent_tree_szad_* with extent_heap_*.Jason Evans2016-06-031-0/+1
|
* Remove quarantine support.Jason Evans2016-05-131-2/+0
|
* Remove Valgrind support.Jason Evans2016-05-131-4/+0
|
* Fix tsd bootstrapping for a0malloc().Jason Evans2016-05-071-0/+1
|
* Link against librt for clock_gettime(2) if glibc < 2.17.Jason Evans2016-05-041-4/+3
| | | | | | | Link libjemalloc against librt if clock_gettime(2) is in librt rather than libc, as for versions of glibc prior to 2.17. This resolves #349.
* Fix fork()-related lock rank ordering reversals.Jason Evans2016-04-261-0/+1
|
* Implement the arena.<i>.reset mallctl.Jason Evans2016-04-221-1/+3
| | | | | | | This makes it possible to discard all of an arena's allocations in a single operation. This resolves #146.
* Add witness, a simple online locking validator.Jason Evans2016-04-141-1/+3
| | | | This resolves #358.