summaryrefslogtreecommitdiffstats
path: root/src/gui/painting/qdrawhelper.cpp
Commit message (Collapse)AuthorAgeFilesLines
* qdrawhelper: Optimisations in fetchTransformedBilinearOlivier Goffart2010-09-081-30/+59
| | | | | | | Scaling down, no rotation, with SSE2. Specialize for ARGB32 not tiled, so the pixelbound can be simplified Reviewed-by: Samuel
* qdrawhelper: use SSE2 for interpolation in fetchTransformedBilinearOlivier Goffart2010-09-081-0/+77
| | | | | | | | when scale down, with no rotations Process the pixel 4 by 4 and do the interpolation using SSE2 Reviewed-by: Samuel
* qdrawhelper: small optimisations in fetchTransformBilinearOlivier Goffart2010-09-081-16/+15
| | | | | | | | Another way to compute the interpolation that does less multiplications. Small inpact on benchmark Made-with: Samuel
* Allow Windows x64 to use SSE2 to speed up blendingliang jian2010-09-061-105/+119
| | | | | | | | | | | Windows 64 does not support MMX with MSVC. This is a problem with the way SSE is currently used because it rely on previous vector instructions being available. This patches fixes that by using the intended functions for SSE2 on Windows. Merge-request: 792 Reviewed-by: Benjamin Poulain <benjamin.poulain@nokia.com>
* qdrawhelper: micro optimisation in fetchTransformBilinearOlivier Goffart2010-09-031-7/+7
| | | | | | move the -1 out of the loop Reviewed-by: Benjamin Poulain
* qdrawhelper: Remove blend_transformed_bilinear_argbOlivier Goffart2010-09-021-179/+4
| | | | | | | With the recent optimisation in fetchTransformedBilinear, the generic path is faster than the 'optimized' path Reviewed-by: Samuel
* Use NEON and preloading for 16 bit small / medium sized image blits.Samuel Rødal2010-09-011-0/+1
| | | | | | | | | | | | | | This gives a nice speedup for blitting of small and medium sized images by using preloading and avoiding function call overhead to memcpy for each scanline. For larger image widths memcpy becomes more efficient. Speedups of up to 40 % for 64 pixel wide images were measured. For image widths between 2 and 16 the speedup ranges between 12 % and 28 %. Task-number: QT-3401 Reviewed-by: Benjamin Poulain <benjamin.poulain@nokia.com>
* Undefined SSE symbols when crosscompiling Qt on PPC.Benjamin Poulain2010-08-311-1/+0
| | | | | | | | | | Qt does not build on PowerPC when compiling for both x86 and PPC on Mac. The compiler is invoked only once for both architecture so the defines are there in order to get the optimized path for x86. Those defines needs to be removed from the compilation environment when the target is set to PPC by GCC. Reviewed-by: Kent Hansen
* qdrawhelper: backport the optimisations in fetchTransformBilinear from ↵Olivier Goffart2010-08-301-138/+257
| | | | | | | | | | | | | | | master to 4.7 This backport the following commits: e55b6a3 qdrawhelper: remove code duplication 0d7e683 qdrawhelper: optimize fetchTransformedBilinear 29ef46e Fix compilation with RVCT 6601458 qdrawhelper: Use SSE2 in fetchTransformedBilinear (when scalling up) 398ef0ca Fix nasty copy-paste bug in fetchTransformedBilinear() d585ece qdrawhelper: fix assert in fetchTransformedBilinear Reviewed-by: Benjamin Poulain
* Revert "Refactor blend_transformed_bilinear to simplify the blend type checking"Olivier Goffart2010-08-261-67/+66
| | | | | | | | | This reverts commit d2089600ea247b5d6354e8eee4becf40802c4693. Reverting in order to avoid conflict in the master branch. We are going to consider backporting the changes to Qt 4.7.1 Reviewed-by: Benjamin Poulain
* Refactor blend_transformed_bilinear to simplify the blend type checkingBenjamin Poulain2010-08-251-66/+67
| | | | | | | | | | | The function blend_transformed_bilinear_argb() was checking the blend type at runtime for each pixel in order to clamp the coordinates. This code was duplicated in both branch of the function. This patch factorize the code by doing the clamping in a template function. Reviewed-by: Samuel Rødal
* Implement qt_memfill32 with Neon.Benjamin Poulain2010-08-251-0/+1
| | | | | | | | | | | This patch introduce a implementation of qt_memfill32 with the Neon instructions set from ARMv7. The loop is unrolled 1 time to get better performance. This implementation of memfill is 330% faster on the N900. Reviewed-by: Samuel Rødal
* Implement the composition mode Plus with Neon.Benjamin Poulain2010-08-251-6/+3
| | | | | | | | | On the benchmark tst_QPainter::compositionModes(), this patches gives the following improvements: -300x300:opaque: 390% -300x300:!opaque: 1085% Reviewed-by: Samuel Rødal
* Implement the general blending of ARGB32_pm with SSSE3Benjamin Poulain2010-08-161-0/+11
| | | | | | | | | | | | | | | | | | | SSSE3 provides two tools to improve the blending speed over SSE2: -palignr -byte permutation The alignement is enforced on src and dst with palignr to always make aligned access. The extraction of the alpha mask is done with a byte permutation in order to save two instructions per cycle. On Atom, this patch gives between 0% (aligned src) to 10% of improvement (unaligned 4 and 12 bytes). On Core 2, this patch gives consistently 8% to 10% of improvement for every miss-alignment. Reviewed-by: Samuel Rødal
* Implement comp_Source with SSE2 when there is a const alphaBenjamin Poulain2010-08-041-0/+2
| | | | | | On Atom, comp_Source is 280% faster with the SSE2 implementation. Reviewed-by: Andreas Kling
* Implement the composition mode "Plus" with SSE2Benjamin Poulain2010-07-271-5/+3
| | | | | | | | | | | Implement the composition function for CompositionMode_Plus with SSE2. The macro MIX() can be replaced by a single instruction add-saturate, which increase the speed a lot (13 times faster on the blend benchmark). Reviewed-by: Olivier Goffart Reviewed-by: Andreas Kling
* Clean the CompositionFunction tables of drawhelperBenjamin Poulain2010-07-271-68/+59
| | | | | | | | | | | | | | | | Some instructions sets were defining partial table of composition functions. To work around that, qInitDrawhelperAsm() was resetting the composition function of QPainter::CompositionMode_Destination and anything above QPainter::CompositionMode_Xor. This was a problem because it makes it impossible to implement fast path for those composition mode. This patch export prototypes for the generic functions of each composition mode. The specialized implementations now define a complete table. Reviewed-by: Andreas Kling
* Fixed bug in drawTiledPixmap when width of pixmap matches target rect.Samuel Rødal2010-07-011-1/+2
| | | | | | | qt_memconvert's duff's device implementation assumes that count is > 0, if count is 0 it will still blit eight pixels. Reviewed-by: Trond
* Improved performance of 16 bit memrotates using NEON instructions.Samuel Rødal2010-07-011-0/+3
| | | | | | | | | | | Make the memrotate functions a function pointer table so that we can replace it with optimized versions, and implement an optimized NEON version for the 90 and 270 rotations. Measured performance improvement for a 400x400 16-bit pixmap was 17 % for 270 degree rotation and 11 % for 90 degree rotation. Reviewed-by: Trond
* Add an implementation of comp_func_solid_SourceOver_neon() with Neon.Benjamin Poulain2010-06-231-0/+1
| | | | | | | | | | The function comp_func_solid_SourceOver_neon() is use extensively by WebKit via the calls to fillRect() of QPainter(). Implementing the function with Neon provides some performance improvement (around 175% of the previous speed). Reviewed-by: Samuel Rødal
* Add a SSE2 implementation of comp_func_solid_SourceOver()Benjamin Poulain2010-06-231-1/+4
| | | | | | | | This function is used quite a lot by WebKit animations, the SSE2 implementation is twice as fast in those uses cases. Reviewed-by: Andreas Kling Reviewed-by: Samuel Rødal
* Add a SSE2 version of comp_func_SourceOver()Benjamin Poulain2010-06-231-1/+6
| | | | | | | | | | | | Implement a version of comp_func_SourceOver() with SSE2. This gives a performance boost of 11% on some WebKit animations. Two new macros were added to simplify the implementation of the different blending primitives: BLEND_SOURCE_OVER_ARGB32_SSE2() and BLEND_SOURCE_OVER_ARGB32_WITH_CONST_ALPHA_SSE2() Reviewed-by: Samuel Rødal
* Fixed pixel-bleeding when stretching subrected pixmaps.Gunnar Sletta2010-04-201-8/+13
| | | | | | | | | When stretching a subrect of a pixmap we need to clamp the sampling to the subrect. This was done for the ARGB32_Premultiplied target format but not for the generic fallback. This patch adapts the code so that the two code paths are equivalent. Reviewed-by: Samuel
* qdrawhelper: fix optim in 2245641baOlivier Goffart2010-04-131-29/+88
| | | | | | | | | | The commit 2245641baa58125b57faf12496bc472491565498 was not correct as the x2 (and y2) were not correct on the first line (the value was 1 instead of 0) Fixes tst_QImage::smoothScale3 Reviewed-by: Samuel Rødal
* Safeguard ourselves against corrupt registry values for cleartype gammaGunnar Sletta2010-04-121-0/+5
| | | | | Reviewed-by: Eskil Task: http://bugreports.qt.nokia.com/browse/QTBUG-7596
* qdrawhelper: optimize the fetch transformed bilinear functionsOlivier Goffart2010-04-121-46/+46
| | | | | | | | | Since we know that x1 and y1 are between the bounds, and than x2 and y2 are bigger, we only need to check the upper bound for x2 and y2 This gives 5% speedup on a trace from the QML samegame demo. Reviewed-by: Samuel Rødal
* QDrawHelper: Reduce code duplicationsOlivier Goffart2010-04-091-215/+61
| | | | | | | And apply same optimisation as in 6a9e6f4780f5fc3aaf34f93c85de575f81c91e48 for duplicated code Reviewed-by: Samuel Rødal
* Speedup fetchTransformedBilinear in the fast_matrix caseOlivier Goffart2010-04-091-7/+9
| | | | | | 3% faster on a trace from the qml samegame. Reviewed-by: Samuel Rødal
* Optimized SourceOver and 16 bit dest fetches, dest stores using NEON.Samuel Rødal2010-03-261-4/+8
| | | | | | | | This makes for example linear gradient blending on top of RGB16 156 % faster (from 20.4 fps to 52.3 fps in my benchmark). Task-number: QTBUG-6684 Reviewed-by: Gunnar Sletta
* Optimized scaled/transformed image blending for ARGB32PM and RGB16 on RGB16.Samuel Rødal2010-03-261-0/+6
| | | | | | | | | | | | | Before: :/traces/qmlphoneconcept.trace, iterations: 5, frames: 48, min(ms): 1207, median(ms): 1212, stddev: 0,165153 %, max(fps): 39,768020 After: traces/qmlphoneconcept.trace, iterations: 3, frames: 48, min(ms): 884, median(ms): 886, stddev: 0,383097 %, max(fps): 54,298643 Task-number: QTBUG-6684 Reviewed-by: Gunnar Sletta
* Included ARM NEON optimizations from pixman in Qt.Samuel Rødal2010-03-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the N900 16 bit text blending is 30 - 50 % faster, and ARGB32PM on RGB16 image blending now runs in 1/10th of the time it used to. We now make ARGB32PM the default pixmap format for alpha pixmaps instead of ARGB8565PM which is unaligned and bad for performance. The relevant numbers: Mostly opaque pixels: ARGB24 on ARGB24 using QPainter..................: 336,813033 ARGB32 on ARGB32 using QPainter.................: 18,419387 RGB16 on ARGB24 using QPainter..................: 167,301014 RGB16 on ARGB32 using QPainter..................: 17,279372 ARGB24 on RGB16 using QPainter..................: 35,100147 ARGB32PM on RGB16 using QPainter................: 15,924256 No opaque pixels: ARGB24 on ARGB24 using QPainter..................: 412,190765 ARGB32 on ARGB32 using QPainter.................: 16,818389 RGB16 on ARGB24 using QPainter..................: 170,957878 RGB16 on ARGB32 using QPainter..................: 16,742984 ARGB24 on RGB16 using QPainter..................: 93,600482 ARGB32PM on RGB16 using QPainter................: 15,999310 So switching to ARGB32PM should give a boost in all areas. Task-number: QTBUG-6684 Reviewed-by: Gunnar Sletta
* compile fix for mingw (also removes some warnings)Thierry Bastian2010-03-231-10/+10
|
* Merge remote branch 'origin/4.6' into qt-4.7-from-4.6Thiago Macieira2010-03-171-3/+3
|\ | | | | | | | | | | Conflicts: src/s60installs/bwins/QtCoreu.def src/s60installs/eabi/QtCoreu.def
| * Fixed cleartype text rendering on translucent surfaces.Trond Kjernaasen2010-03-151-3/+3
| | | | | | | | | | | | | | | | We were using gamma corrected 11 bit values instead of the 8 bit non- corrected values, which caused some strange rendering effects. Task-number: QTBUG-9036 Reviewed-by: Samuel
* | Enable the fast paths when converting to Rgb565Benjamin Poulain2010-03-151-4/+4
| | | | | | | | | | | | | | | | | | | | The generic conversion is too slow for conversions to RGB565, which are common on embedded platforms (e.g.: Maemo). This patch enable the fast path for all conversion to rgb_565, it is a follow-up of 7d7a85fa16b28fdba257bb466be5a6d2b4bf5d2f Reviewed-by: Tom Cooksey
* | MAke the SIMD fiunctions be used even when the QT_NO_DEBUG macro is usedThierry Bastian2010-03-101-3/+0
| | | | | | | | Reviewed-by: Benjamin Poulain
* | Merge branch '4.7' of scm.dev.nokia.troll.no:qt/oslo-staging-1 into ↵Qt Continuous Integration System2010-03-041-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 4.7-integration * '4.7' of scm.dev.nokia.troll.no:qt/oslo-staging-1: (63 commits) doc: Fixed some qdoc errors. Setting ImhHiddenText for NoEcho line edits is not 100% correct, but still way better than fully visible text. Allow building documentation without all of Qt Added a documentation for the new enum value in gesture api. Remove the OBJECTS_DIR variable assignment from some projets in Qt. Fix compile qmake/MinGw: Link statically for Qt Creator to be able to detect it. Enable two fast path for blend_tiled_rgb565 Avoid QString reallocation for smallcaps fonts in Itemizer::generate() Make QLabel::text a reloadable property remove non wifi interfaces from being handled. Disable auto-uppercasing and predictive text for password line edits. Avoid QString reallocation in QTextEngine::itemize() Remove the Qt 4.7 #if guards that were needed for 4.6 Always redraw the complete control when an input event comes in. Make sure not to crash if createStandardContextMenu() returns 0 (e.g. on Maemo5) Fix compilation: include QString in order to use QString. Fix compile Block the Maemo5 window attribute values from being assigned to something else on other platforms. be more verbose when warning about incompatible libraries ...
| * | Enable two fast path for blend_tiled_rgb565Benjamin Poulain2010-03-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Blending ARGB8565 and RGB16 on top of RGB16 is common on system with 16 bits color depth. The faster blending functions can be used instead of blend_tiled_generic. Reviewed-by: Tom Cooksey
* | | Fix warnings on MSVCThierry Bastian2010-03-031-1/+1
|/ /
* | Merge branch 'master' of scm.dev.nokia.troll.no:qt/oslo-staging-1 into ↵Qt Continuous Integration System2010-02-251-197/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master-integration * 'master' of scm.dev.nokia.troll.no:qt/oslo-staging-1: (35 commits) Doc: Added a config file for creating Simplified Chinese docs directly. Doc: add a few lines about bearer managment to "What's New" page. Fix the SIMD implementations of QString::toLatin1() Update of the QScriptValue autotest suite. New data set for QScriptValue autotest generator. Autotest: make tst_qchar run out-of-source too Autotest: add a test for roundtrips through toLatin1/fromLatin1 Implement toLatin1_helper with Neon QRegExp::pos() should return -1 for empty/non-matching captures Revert "qdoc: Finished "Inherited by" list for QML elements." Revert "qdoc: List new QML elements in \sincelist for What's New page." Add the Unicode normalisation properties. Autotest: add a test for QDBusPendingCallWatcher use in threads Doc: placeholders for new feature highlights. doc: mark as reimplemented. Update of the QScriptValue autotest suite. New autotests cases for QScriptValue autotests generator. QScriptValue autotest generator templates change. Fix license template. QScriptValue::isQMetaObject crash fix. ...
| * \ Merge branch 'master' of scm.dev.nokia.troll.no:qt/oslo-staging-1Benjamin Poulain2010-02-231-33/+56
| |\ \ | | |/
| * | Cache the result of qDetectCPUFeatures()Benjamin Poulain2010-02-231-4/+1
| | | | | | | | | | | | | | | | | | | | | Move the caching of the result from drawhelper to qsimd.cpp. Avoid getting the environment variables when not necessary Reviewed-by: Samuel Rødal
| * | Move the SIMD detection from QtGui to QtCoreBenjamin Poulain2010-02-231-194/+2
| | | | | | | | | | | | | | | | | | | | | The SIMD instructions are useful outside painting code, the common functions are moved to QtCore Reviewed-by: Samuel Rødal
* | | Imporve win64 support for mingwThierry Bastian2010-02-221-17/+17
| |/ |/| | | | | Reviewed-by: ogoffart
* | Implement the blend functions with SSE2Benjamin Poulain2010-02-121-13/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | When available, use SSE2 to blend images, computation is done on 4 pixels at the time instead of 1 with MMX. Performance improvements: - Blending ARGB32 on RGB32/ARGB32, mostly opaque: 188% - Blending ARGB32 on RGB32/ARGB32, no opaque pixels: 180% - Blending ARGB32 on RGB32/ARGB32, with 0.5 opacity: 187% - Blending RGB32 on RGB32/ARGB32, with 0.5 opacity: 206% Reviewed-by: Samuel Rødal
* | Refactor comp_func_solid_Clear() and comp_func_solid_Source()Benjamin Poulain2010-02-091-20/+16
| | | | | | | | | | | | Put the common code together with a #define. Remove the check for the length from comp_func_Clear_impl and move it to qt_memfill()
* | Fixing 'softvfp+vfpv2' compiling issue for Tb9.2Aleksandar Sasha Babic2010-02-051-0/+4
|/ | | | | | | | | | | When compiling with -fpu=softvfp+vfpv2 on Tb9.2 we were getting the segmentattion fault. This seems to be due to the RVCT bug when it comes to using int->float (and reverse) castings. One extra level of indirection (function call) has to be applied. Task-number: QTBUG-4893 Reviewed-by: TrustMe
* Merge branch '4.6' of scm.dev.nokia.troll.no:qt/oslo-staging-1 into ↵Qt Continuous Integration System2010-01-071-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 4.6-integration * '4.6' of scm.dev.nokia.troll.no:qt/oslo-staging-1: QIODevice: Fix readAll() Temporary hackiesh solution to prevent BOM in the xml data. Fixed qxmlstream autotest when using shadow builds. Attempt at readding the capital P headers for Phonon Remove special Phonon processing from syncqt. Use the lowercase/shortname.h headers for Phonon includes Fixes a crash when setting focus on a widget with a focus proxy. Update copyright year to 2010 doc: Clarified activeSubControls and subControls. Remove warning "statement with no effect" doc: Clarified that .lnk files are System files on Windows.
| * Update copyright year to 2010Jason McDonald2010-01-061-1/+1
| | | | | | | | Reviewed-by: Trust Me
* | Slight performance improvement in comp_func_SourceOver.Samuel Rødal2010-01-041-1/+4
|/ | | | | | | We should check for the fully opaque and fully transparent special cases, like we do in the dedicated image blend functions. Reveiewed-by: Gunnar Sletta