| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This lives in the cache_bin module; just a typo.
|
|
|
|
|
| |
In the process, kill arena_bin_index, which is unused. To follow are several
diffs continuing this separation.
|
|
|
|
|
|
| |
This eliminates the need for the arena stats code to "know" about tcaches; all
that it needs is a cache_bin_array_descriptor_t to tell it where to find
cache_bins whose stats it should aggregate.
|
|
|
|
|
|
| |
This is the first step towards breaking up the tcache and arena (since they
interact primarily at the bin level). It should also make a future arena
caching implementation more straightforward.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
This lets us specify whether and how mutexes of the same rank are allowed to be
acquired. Currently, we only allow two polices (only a single mutex at a given
rank at a time, and mutexes acquired in ascending order), but we can plausibly
allow more (e.g. the "release uncontended mutexes before blocking").
|
|
|
|
|
| |
This reverts commit 8584adc451f31adfc4ab8693d9189cf3a7e5d858. Production
results not favorable. Will investigate separately.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the tsd macros (which are used only for tsd_t in real builds). We
break up the circular dependencies involving tsd.
We also move all tsd access through getters and setters. This allows us to
assert that we only touch data when tsd is in a valid state.
We simplify the usages of the x macro trick, removing all the customizability
(get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup.
This lets us make initialization order independent of order within tsd_t.
|
| |
|
|
|
|
|
| |
During tcache gc, use tcache_bin_try_flush_small / _large so that we can skip
items with their bins locked already.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Simplify configuration by removing the --disable-tcache option, but
replace the testing for that configuration with
--with-malloc-conf=tcache:false.
Fix the thread.arena and thread.tcache.flush mallctls to work correctly
if tcache is disabled.
This partially resolves #580.
|
| |
|
|
|
|
|
|
| |
Added tsd_state_nominal_slow, which on fast path malloc() incorporates
tcache_enabled check, and on fast path free() bundles both malloc_slow and
tcache_enabled branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a biggy. jemalloc_internal.h has been doing multiple jobs for a while
now:
- The source of system-wide definitions.
- The catch-all include file.
- The module header file for jemalloc.c
This commit splits up this functionality. The system-wide definitions
responsibility has moved to jemalloc_preamble.h. The catch-all include file is
now jemalloc_internal_includes.h. The module headers for jemalloc.c are now in
jemalloc_internal_[externs|inlines|types].h, just as they are for the other
modules.
|
|
|
|
| |
This gets rid of the redundent rtree lookup down fast path.
|
|
|
|
|
|
|
|
|
|
| |
1) Re-organize TSD so that frequently accessed fields are closer to the
beginning and more compact. Assuming 64-bit, the first 2.5 cachelines now
contains everything needed on tcache fast path, expect the tcache struct itself.
2) Re-organize tcache and tbins. Take lg_fill_div out of tbin, and reduce tbin
to 24 bytes (down from 32). Split tbins into tbins_small and tbins_large, and
place tbins_small close to the beginning.
|
| |
|
|
|
|
|
|
|
|
|
| |
The embedded tcache is initialized upon tsd initialization. The avail arrays
for the tbins will be allocated / deallocated accordingly during init / cleanup.
With this change, the pointer to the auto tcache will always be available, as
long as we have access to the TSD. tcache_available() (called in tcache_get())
is provided to check if we should use tcache.
|
|
|
|
| |
Caching the extents on stack to avoid redundant looking up overhead.
|
| |
|
| |
|
|
|
|
| |
Drop tcaches_mtx before calling tcache_destroy().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new feature, opt.percpu_arena, determines thread-arena association
dynamically based CPU id. Three modes are supported: "percpu", "phycpu"
and disabled.
"percpu" uses the current core id (with help from sched_getcpu())
directly as the arena index, while "phycpu" will assign threads on the
same physical CPU to the same arena. In other words, "percpu" means # of
arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of
CPUs). Note that no runtime check on whether hyper threading is enabled
is added yet.
When enabled, threads will be migrated between arenas when a CPU change
is detected. In the current design, to reduce overhead from reading CPU
id, each arena tracks the thread accessed most recently. When a new
thread comes in, we will read CPU id and update arena if necessary.
|
|
|
|
|
|
| |
This fixes tcache_flush for manual tcaches, which wasn't able to find
the correct arena it associated with. Also changed the decay test to
cover this case (by using manually created arenas).
|
|
|
|
| |
This replaces arena->lock synchronization.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor arena and extent locking protocols such that arena and
extent locks are never held when calling into the extent_*_wrapper()
API. This requires extra care during purging since the arena lock no
longer protects the inner purging logic. It also requires extra care to
protect extents from being merged with adjacent extents.
Convert extent_t's 'active' flag to an enumerated 'state', so that
retained extents are explicitly marked as such, rather than depending on
ring linkage state.
Refactor the extent collections (and their synchronization) for cached
and retained extents into extents_t. Incorporate LRU functionality to
support purging. Incorporate page count accounting, which replaces
arena->ndirty and arena->stats.retained.
Assert that no core locks are held when entering any internal
[de]allocation functions. This is in addition to existing assertions
that no locks are held when entering external [de]allocation functions.
Audit and document synchronization protocols for all arena_t fields.
This fixes a potential deadlock due to recursive allocation during
gdump, in a similar fashion to b49c649bc18fff4bd10a1c8adbaf1f25f6453cb6
(Fix lock order reversal during gdump.), but with a necessarily much
broader code impact.
|
|
|
|
|
|
|
| |
Synchronize tcaches with tcaches_mtx rather than ctl_mtx. Add missing
synchronization for tcache flushing. This bug was introduced by
1cb181ed632e7573fb4eab194e4d216867222d27 (Implement explicit tcache
support.), which was first released in 4.0.0.
|
|
|
|
| |
This resolves #564.
|
|
|
|
| |
This resolves #540.
|
|
|
|
|
|
|
| |
Add braces around single-line blocks, and remove line breaks before
function-opening braces.
This resolves #537.
|
|
|
|
| |
The removed stats merging logic is already taken care of by tcache_flush.
|
|
|
|
| |
This resolves #535.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add/rename related mallctls:
- Add stats.arenas.<i>.base .
- Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal .
- Add stats.arenas.<i>.resident .
Modify the arenas.extend mallctl to take an optional (extent_hooks_t *)
argument so that it is possible for all base allocations to be serviced
by the specified extent hooks.
This resolves #463.
|
|
|
|
|
| |
This avoids warnings in some cases, and is otherwise generally good
hygiene.
|
|
|
|
|
|
|
| |
Refactor tsd so that tsdn_fetch() does not trigger allocation, since
allocation could cause infinite recursion.
This resolves #458.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Look up chunk metadata via the radix tree, rather than using
CHUNK_ADDR2BASE().
Propagate pointer's containing extent.
Minimize extent lookups by doing a single lookup (e.g. in free()) and
propagating the pointer's extent into nearly all the functions that may
need it.
|
| |
|