summaryrefslogtreecommitdiffstats
path: root/bin
diff options
context:
space:
mode:
authorThiago Macieira <thiago.macieira@nokia.com>2011-03-22 13:14:58 (GMT)
committerThiago Macieira <thiago.macieira@nokia.com>2011-03-22 14:52:04 (GMT)
commit835108c44c0f5263856cb96c257e11600ffbf9e6 (patch)
tree8d8318c12831ced6b964af42cb4dedbbaee1f5c9 /bin
parent220658198238ccdede7fb933c16c7119dcb6863b (diff)
downloadQt-835108c44c0f5263856cb96c257e11600ffbf9e6.zip
Qt-835108c44c0f5263856cb96c257e11600ffbf9e6.tar.gz
Qt-835108c44c0f5263856cb96c257e11600ffbf9e6.tar.bz2
Add ARM Neon versions of fromLatin1 and fromUtf8
The fromLatin1 code is very simple, yet the handwritten assembly performs better due to the use of post-increments. The fromUtf8 code has two alternatives. Neon lacks an instruction similar to SSE2's _mm_movemask_epi8 (PMOVMSKB) which extracts one bit from each byte and stores it in a register. We used that in the UTF-8 code to detect bytes with the highest bit set. To compensate, we used two alternatives: 1) AND the comparison result with a vector containing {128, 64, ...,1 } Do 3 parallel-adds (VPADD.I8), which will make the mask propagate to the lowest component in the vector. Trick found in: http://hilbert-space.de/?p=22 (comment 16-17) 2) Extract the two words from the doubleword Neon register and do the work in ARM assembly. It looks like the latter version is performing better.
Diffstat (limited to 'bin')
0 files changed, 0 insertions, 0 deletions