Use a different instruction for armv8 neon loads.#606
Conversation
The instruction used for load/store assumes that the data is interleaved and produces incorrect results for the 64-bit byteswap. Use a different instruction that doesn't assume interleaving. Signed-off-by: John Sallay <jasallay@gmail.com>
|
I'm happy to add a test, but I'm not exactly sure where to do it. |
| for (number = 0; number < n4points; ++number) { | ||
| __VOLK_PREFETCH(inputPtr + 8); | ||
| input = vld2q_u8((uint8_t*)inputPtr); | ||
| input = vld1q_u8_x2((uint8_t*)inputPtr); |
There was a problem hiding this comment.
Our GCC 8 test fails with this:
/usr/bin/ld: libvolk.so.2.5.2: undefined reference to vst1q_u8_x2'`
Since newer GCC versions seem to work, We might require smth like an #ifdef to switch between those functions.
There was a problem hiding this comment.
I can do that, but I'll have to figure out what is defined when the vld1q_u8_x2 instruction exists.
There was a problem hiding this comment.
That, or require GCC 9 and up. But we need to find some documentation that clearly explains what's going on.
|
#680 proposes to simply remove the buggy kernel instead, since it is slower than the generic kernel. |
|
Thanks for the PR. Since we just removed these kernels, I'm closing this PR now. |
The instruction used for load/store assumes that the data is interleaved and produces incorrect results for the 64-bit byteswap.
Use a different instruction that doesn't assume interleaving.
Signed-off-by: John Sallay jasallay@gmail.com