Skip to content

Use a different instruction for armv8 neon loads.#606

Closed
jsallay wants to merge 1 commit into
gnuradio:mainfrom
jsallay:byteswap-64
Closed

Use a different instruction for armv8 neon loads.#606
jsallay wants to merge 1 commit into
gnuradio:mainfrom
jsallay:byteswap-64

Conversation

@jsallay

@jsallay jsallay commented Oct 26, 2022

Copy link
Copy Markdown
Contributor

The instruction used for load/store assumes that the data is interleaved and produces incorrect results for the 64-bit byteswap.

Use a different instruction that doesn't assume interleaving.

Signed-off-by: John Sallay jasallay@gmail.com

The instruction used for load/store assumes that the data is interleaved and
produces incorrect results for the 64-bit byteswap.

Use a different instruction that doesn't assume interleaving.

Signed-off-by: John Sallay <jasallay@gmail.com>
@jsallay

jsallay commented Oct 26, 2022

Copy link
Copy Markdown
Contributor Author

I'm happy to add a test, but I'm not exactly sure where to do it.

for (number = 0; number < n4points; ++number) {
__VOLK_PREFETCH(inputPtr + 8);
input = vld2q_u8((uint8_t*)inputPtr);
input = vld1q_u8_x2((uint8_t*)inputPtr);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our GCC 8 test fails with this:
/usr/bin/ld: libvolk.so.2.5.2: undefined reference to vst1q_u8_x2'`

Since newer GCC versions seem to work, We might require smth like an #ifdef to switch between those functions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but I'll have to figure out what is defined when the vld1q_u8_x2 instruction exists.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That, or require GCC 9 and up. But we need to find some documentation that clearly explains what's going on.

@argilo

argilo commented Oct 31, 2023

Copy link
Copy Markdown
Member

#680 proposes to simply remove the buggy kernel instead, since it is slower than the generic kernel.

@jdemel

jdemel commented Nov 4, 2023

Copy link
Copy Markdown
Contributor

Thanks for the PR. Since we just removed these kernels, I'm closing this PR now.

@jdemel jdemel closed this Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants