diff options
author | Thiago Macieira <thiago.macieira@nokia.com> | 2011-03-22 13:14:58 (GMT) |
---|---|---|
committer | Thiago Macieira <thiago.macieira@nokia.com> | 2011-03-22 14:52:04 (GMT) |
commit | 835108c44c0f5263856cb96c257e11600ffbf9e6 (patch) | |
tree | 8d8318c12831ced6b964af42cb4dedbbaee1f5c9 /bin | |
parent | 220658198238ccdede7fb933c16c7119dcb6863b (diff) | |
download | Qt-835108c44c0f5263856cb96c257e11600ffbf9e6.zip Qt-835108c44c0f5263856cb96c257e11600ffbf9e6.tar.gz Qt-835108c44c0f5263856cb96c257e11600ffbf9e6.tar.bz2 |
Add ARM Neon versions of fromLatin1 and fromUtf8
The fromLatin1 code is very simple, yet the handwritten assembly
performs better due to the use of post-increments.
The fromUtf8 code has two alternatives. Neon lacks an instruction
similar to SSE2's _mm_movemask_epi8 (PMOVMSKB) which extracts one bit
from each byte and stores it in a register. We used that in the UTF-8
code to detect bytes with the highest bit set. To compensate, we used
two alternatives:
1) AND the comparison result with a vector containing {128, 64, ...,1 }
Do 3 parallel-adds (VPADD.I8), which will make the mask propagate
to the lowest component in the vector.
Trick found in: http://hilbert-space.de/?p=22 (comment 16-17)
2) Extract the two words from the doubleword Neon register and do the
work in ARM assembly.
It looks like the latter version is performing better.
Diffstat (limited to 'bin')
0 files changed, 0 insertions, 0 deletions