Skip to content

fix: handle non-multiple-of-4 inputs in histogram_i32x4_kernel#425

Open
Ajith-Kumar-Nelliparthi wants to merge 1 commit into
xlite-dev:mainfrom
Ajith-Kumar-Nelliparthi:fix-histogram-tail-handling
Open

fix: handle non-multiple-of-4 inputs in histogram_i32x4_kernel#425
Ajith-Kumar-Nelliparthi wants to merge 1 commit into
xlite-dev:mainfrom
Ajith-Kumar-Nelliparthi:fix-histogram-tail-handling

Conversation

@Ajith-Kumar-Nelliparthi

Copy link
Copy Markdown

Fix histogram_i32x4_kernel boundary handling

Problem

The current implementation loads 4 integers unconditionally using INT4(a[idx]).

When N is not divisible by 4, the kernel may read out of bounds because only idx < N is checked.

Fix

Added a safe fast path for full int4 loads using:

if ((idx + 3) < N)
Added tail handling for remaining 1-3 elements.
Result

This prevents out-of-bounds memory accesses for non-multiple-of-4 input sizes while preserving the vectorized fast path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant