Use a different instruction for armv8 neon loads. by jsallay · Pull Request #606 · gnuradio/volk

jsallay · 2022-10-26T01:41:08Z

The instruction used for load/store assumes that the data is interleaved and produces incorrect results for the 64-bit byteswap.

Use a different instruction that doesn't assume interleaving.

Signed-off-by: John Sallay jasallay@gmail.com

The instruction used for load/store assumes that the data is interleaved and produces incorrect results for the 64-bit byteswap. Use a different instruction that doesn't assume interleaving. Signed-off-by: John Sallay <jasallay@gmail.com>

jsallay · 2022-10-26T01:44:06Z

I'm happy to add a test, but I'm not exactly sure where to do it.

jdemel · 2022-10-26T07:34:03Z

    for (number = 0; number < n4points; ++number) {
        __VOLK_PREFETCH(inputPtr + 8);
-        input = vld2q_u8((uint8_t*)inputPtr);
+        input = vld1q_u8_x2((uint8_t*)inputPtr);


Our GCC 8 test fails with this:
/usr/bin/ld: libvolk.so.2.5.2: undefined reference to vst1q_u8_x2'`

Since newer GCC versions seem to work, We might require smth like an #ifdef to switch between those functions.

I can do that, but I'll have to figure out what is defined when the vld1q_u8_x2 instruction exists.

That, or require GCC 9 and up. But we need to find some documentation that clearly explains what's going on.

argilo · 2023-10-31T00:44:36Z

#680 proposes to simply remove the buggy kernel instead, since it is slower than the generic kernel.

jdemel · 2023-11-04T09:11:09Z

Thanks for the PR. Since we just removed these kernels, I'm closing this PR now.

jdemel reviewed Oct 26, 2022

View reviewed changes

jdemel mentioned this pull request Oct 24, 2023

granular parallel generic kernel for 64u_byteswap #679

Open

jsallay mentioned this pull request Oct 28, 2023

Fix random numbers #689

Closed

jdemel mentioned this pull request Nov 4, 2023

remove buggy and slow neonv8 kernel #680

Merged

jdemel closed this Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a different instruction for armv8 neon loads.#606

Use a different instruction for armv8 neon loads.#606
jsallay wants to merge 1 commit into
gnuradio:mainfrom
jsallay:byteswap-64

jsallay commented Oct 26, 2022

Uh oh!

jsallay commented Oct 26, 2022

Uh oh!

jdemel Oct 26, 2022

Uh oh!

jsallay Oct 26, 2022

Uh oh!

jdemel Oct 26, 2022

Uh oh!

argilo commented Oct 31, 2023

Uh oh!

jdemel commented Nov 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jsallay commented Oct 26, 2022

Uh oh!

jsallay commented Oct 26, 2022

Uh oh!

jdemel Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

jsallay Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

jdemel Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

argilo commented Oct 31, 2023

Uh oh!

jdemel commented Nov 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants