JIT: Use faster mod for uint16 values#128509
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
This PR updates CoreCLR JIT morphing/assertion propagation/lowering to enable a cheaper remainder sequence when both operands are proven to fit in uint16 and the divisor is a non-zero constant, and adds a JIT regression test that exercises relevant % const patterns (including char-based modulo).
Changes:
- Teach morphing to avoid rewriting
% constintoa - (a / b) * bwhen lowering can apply a cheaperuint16FastMod-style sequence (and convertMOD→UMODwhen safe). - Add a new
GTF_UMOD_UINT16_OPERANDShint set by assertion propagation and consumed by lowering to trigger the specialized expansion. - Add a new JIT test under
src/tests/JIT/opt/Divide/Regressions/to cover representative modulo patterns.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/morph.cpp | Skips early MOD-to-SUB/MUL/DIV morphing (and may flip MOD→UMOD) so lowering can apply the cheaper uint16 modulo path. |
| src/coreclr/jit/lower.cpp | Implements the uint16-specialized FastMod lowering for GT_UMOD by constant divisors. |
| src/coreclr/jit/assertionprop.cpp | Improves IntegralRange reasoning and sets GTF_UMOD_UINT16_OPERANDS when uint16-range operands are proven. |
| src/coreclr/jit/gentree.h | Introduces the new GTF_UMOD_UINT16_OPERANDS flag. |
| src/coreclr/jit/gentree.cpp | Ensures tree comparison accounts for the new mod/div-related flag. |
| src/tests/JIT/opt/Divide/Regressions/Regression4_Divide.csproj | Adds the new regression test project. |
| src/tests/JIT/opt/Divide/Regressions/Regression4_Divide.cs | Adds the new regression test cases exercising modulo scenarios. |
|
/azp run Fuzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| BlockRange().InsertBefore(divMod, multiplier, mul1, castUp); | ||
|
|
||
| // Reuse the existing constant divisor as a TYP_LONG operand for the second multiply. | ||
| divisor->BashToConst(static_cast<int64_t>(divisorValue)); |
The VN-based small-type refinement in IntegralRange::ForNode for GT_LCL_VAR claimed a tight range whenever the local's conservative VN was a CAST to a small type. That is unsound for normalize-on-load locals: their storage only contains the small-type bits in the low byte (upper bytes are stale), but the tight range let fgOptimizeCast drop a required sign/zero extending load downstream, causing it to read those stale bits. Restrict the refinement to locals whose storage is fully normalized: either non-small locals, or small locals with lvNormalizeOnStore set. Repros fuzzlyn seed 12902382323863156506 (and others) where '(short)(arg0 ^ arg2)' with sbyte arg0 began returning the unsigned byte interpretation in release. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run Fuzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| // extending load and read those stale bits. | ||
| if ((compiler->vnStore != nullptr) && (!varTypeIsSmall(varDsc->TypeGet()) || varDsc->lvNormalizeOnStore())) | ||
| { | ||
| ValueNum vn = compiler->vnStore->VNConservativeNormalValue(node->gtVNPair); |
There was a problem hiding this comment.
I do not think IntegralRange::ForNode should depend on VNs. VNs are a flow sensitive concept that is only valid during specific phases. I think this is too general a utility to start using VNs.
I would be ok with introducing specific helpers that can refine integral ranges based on VNs though, and then using it explicitly from phases where VNs are known to be valid. But in the end you would probably want to use range check instead.
Another attempt at #111535
Closes #111492
Change the transformation for
Let's see what CI thinks.
Diffs: https://gist.github.com/MihuBot/38b0b6eeabd80de528a6e1967a7df7cb