| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Trust Me
|
| |
|
|
|
|
|
|
|
|
|
| |
Memory is allocated on 16 bytes boundaries. We can do the
aligned load without risking a invalid memory access.
This simplify the code.
Reviewed-by: Samuel Rødal
|
|
|
|
|
|
|
|
|
|
| |
Qt does not build on PowerPC when compiling for both x86 and PPC on Mac.
The compiler is invoked only once for both architecture so the defines
are there in order to get the optimized path for x86. Those defines
needs to be removed from the compilation environment when the target
is set to PPC by GCC.
Reviewed-by: Kent Hansen
|
|
|
|
|
|
|
|
|
|
| |
The data loaded for the first were incorrect because the offset was
incorrect. The correct offset should be up to the alignment point.
Instead of trying to load a temporary array, we just move one vector
further since we know reading there is always safe.
Reviewed-by: Andreas Kling
|
|
SSSE3 provides two tools to improve the blending speed over SSE2:
-palignr
-byte permutation
The alignement is enforced on src and dst with palignr to always make
aligned access.
The extraction of the alpha mask is done with a byte permutation in
order to save two instructions per cycle.
On Atom, this patch gives between 0% (aligned src) to 10% of
improvement (unaligned 4 and 12 bytes).
On Core 2, this patch gives consistently 8% to 10% of improvement
for every miss-alignment.
Reviewed-by: Samuel Rødal
|