Skip to content

[draft](ivm) Add mv ivm test for pipeline#62606

Draft
yujun777 wants to merge 87 commits intoapache:masterfrom
yujun777:ivm
Draft

[draft](ivm) Add mv ivm test for pipeline#62606
yujun777 wants to merge 87 commits intoapache:masterfrom
yujun777:ivm

Conversation

@yujun777
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

yujun777 added 30 commits April 20, 2026 09:52
…ogging

- Require non-null detailMessage in IVMRefreshResult.fallback() to match
  IVMCapabilityResult.unsupported() contract; remove single-arg overload
- Add toString() to IVMRefreshResult for log readability
- Add WARN logging on all fallback paths in IVMRefreshManager with MV name
- Make doRefresh() the public API; remove redundant ivmRefresh() wrapper
- Remove IVMPlanPattern, IVMPlanAnalysis, IVMPlanAnalyzer, IVMDeltaPlannerDispatcher
- IVMCapabilityChecker now takes List<DeltaPlanBundle> instead of IVMPlanAnalysis
- IVMRefreshManager simplified to 2 deps: capabilityChecker + deltaExecutor
- Delta bundles produced by Nereids rules, retrieved via MTMVAnalyzeQueryInfo
- Add analyzeDeltaBundles() hook for testability
- Add ivmDeltaBundles to MTMVAnalyzeQueryInfo, populated from CascadesContext
- Update tests to JUnit 5 and new interface signatures
- Fix checkstyle import order in CreateTableCommandTest
- Delete IvmRewriteMtmvPlan placeholder and its test
- Remove rewriteRootPlan field from CascadesContext (no longer needed)
- Replace IVM_REWRITE_MTMV_PLAN with IVM_NORMALIZE_MTMV_PLAN in RuleType
- Add IvmNormalizeMtmvPlan skeleton (row-id injection, avg rewrite, TODO)
- Add IvmDeltaScanOnly and IvmDeltaAggRoot skeletons
- Merge delta rules into single topic in Rewriter
- Add IvmAnalyzeMode enum (NONE/NORMALIZE_ONLY/FULL) to replace boolean flags
- Replace enableIvmRewriteInNereids with enableIvmNormalRewrite + enableIvmDeltaRewrite
- MTMVPlanUtil.analyzeQuery/analyzeQueryWithSql take IvmAnalyzeMode parameter
- CreateMTMVInfo: NORMALIZE_ONLY for incremental MV, NONE otherwise
- ensureMTMVQueryUsable: same mode as CREATE MV
- IVMRefreshManager: FULL mode (normalize + delta)
- Update IvmNormalizeMtmvPlan/IvmDeltaScanOnly/IvmDeltaAggRoot to use new session vars
- Fix MTMVPlanUtilTest: JUnit5, IvmAnalyzeMode.NONE, updated CountingSessionVariable
- Add testAnalyzeQueryIvmAnalyzeModeSetSessionVariables covering all 3 modes
- Add IvmContext: holds Map<Slot, isDeterministic> rowIdDeterminism + List<DeltaPlanBundle>
- Replace ivmDeltaBundles in CascadesContext with Optional<IvmContext>
- IvmNormalizeMtmvPlan: whitelist-based visitor (DefaultPlanRewriter<IvmContext>)
  - visitLogicalOlapScan: inject __IVM_ROW_ID__ at index 0 via LogicalProject
    - MOW: Alias(buildRowIdHash(uk...), __IVM_ROW_ID__) -> deterministic
    - DUP_KEYS: Alias(UuidNumeric(), __IVM_ROW_ID__) -> non-deterministic
    - MOR / AGG_KEYS: throw AnalysisException
  - visitLogicalProject: propagate child row-id; throw if child has none
  - visit: throw for any unwhitelisted node
  - buildRowIdHash: uses murmur_hash3_64 (TODO: replace with 128-bit hash)
- MTMVPlanUtil: read delta bundles from IvmContext instead of direct field
- Tests: DUP_KEYS, MOW (deterministic), MOR (throws), AGG_KEYS (throws),
  project propagation, unsupported node, gate disabled
- IvmNormalizeMtmvPlan: whitelist LogicalResultSink, prepend row-id;
  extract hasRowIdInOutputs/prependRowId helpers
- ColumnDefinition: add newIvmRowIdColumnDefinition with mv_ prefix
- MTMVPlanUtil: prepend row-id ColumnDefinition at index 0; reset IVM
  session vars in finally block to prevent test leakage
- BaseViewInfo: extract static rewriteProjectsToUserDefineAlias overload
- CreateMTMVInfo: fix rewriteQuerySql to snapshot/restore rewrite map
  and call alias rewrite when simpleColumnDefinitions present
- CreateTableCommandTest: add 4 IVM UTs covering scan, project-scan,
  no-alias, alias rewrite, and column count mismatch
- CreateMTMVInfo: set UNIQUE_KEYS + enable_unique_key_merge_on_write=true
  for INCREMENTAL refresh MVs; reject user-specified key columns
- MTMVPlanUtil.analyzeKeys: return new List instead of mutating the
  immutable input list; throw if IVM row-id column not found in columns
- MTMVPlanUtil.analyzeQuery: only reset IVM session vars in finally block
  for modes that actually set them (NORMALIZE_ONLY resets NORMAL only,
  FULL resets both, NONE resets neither)
- MTMVPlanUtilTest: add 4 new UTs covering UNIQUE_KEYS+MOW assertion,
  DUP_KEYS for non-IVM, and rejection of user-specified UNIQUE/DUP keys
- CountingSessionVariable: count only enabling ("true") setVarOnce calls
…aRewriter

Move IVM delta plan generation out of Nereids rewrite rules into an
external IvmDeltaRewriter that will be called by IVMRefreshManager.
IvmNormalizeMtmvPlan now stores the normalized plan in IvmContext so
IVMRefreshManager can retrieve it for delta rewriting.

- Add normalizedPlan field to IvmContext, store after normalization
- Add ivmNormalizedPlan field to MTMVAnalyzeQueryInfo
- Delete IvmDeltaScanOnly, IvmDeltaAggRoot, IvmAnalyzeMode
- Remove IVM_DELTA_SCAN_ONLY/IVM_DELTA_AGG_ROOT from RuleType/Rewriter
- Remove ENABLE_IVM_DELTA_REWRITE session variable
- Remove deltaCommandBundles from IvmContext
- Replace IvmAnalyzeMode enum with boolean enableIvmNormalize
- Create skeleton IvmDeltaRewriter + IvmDeltaRewriteContext
- Rewrite IVMRefreshManager.analyzeDeltaCommandBundles to use
  normalized plan (returns empty bundles for now, triggers fallback)
IvmDeltaRewriter no longer extends DefaultPlanRewriter. It now validates
the normalized plan is a supported scan-only or project-scan pattern,
extracts the base table, and produces an INSERT INTO mv command wrapped
in a DeltaCommandBundle. IvmDeltaRewriteContext gains a ConnectContext
field, and IVMRefreshManager.analyzeDeltaCommandBundles is wired to
call the rewriter.
…es to concrete classes

IVMDeltaExecutor now contains real execution logic following the MTMVTask.exec()
pattern: creates ConnectContext/StatementContext/StmtExecutor, runs the command,
and checks query state. IVMCapabilityChecker returns ok() by default.
IVMRefreshManager uses a no-arg public constructor, instantiating both
collaborators internally, with a @VisibleForTesting constructor for injection.
For INCREMENTAL MVs, attempt IVM refresh first via IVMRefreshManager.
On success, return early and skip partition-based refresh.
On fallback, log the reason and continue with existing refresh path.
…eltaExecutor

Extract common command execution boilerplate shared by MTMVTask.exec()
and IVMDeltaExecutor.executeBundle() into MTMVPlanUtil.executeCommand().
This also adds the missing audit logging to IVM delta execution.
Keep the hidden IVM row id in refresh planning and exclude it from MV nondeterministic checks.
Adjust exchange fragment output expr handling for incremental refresh, rename the MV-specific collector, and add FE UT plus mtmv regression coverage.

Tests: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.IvmNormalizeMtmvPlanTest,org.apache.doris.nereids.trees.plans.PlanVisitorTest,org.apache.doris.nereids.trees.plans.commands.UpdateMvByPartitionCommandTest
Tests: ./run-regression-test.sh --run -d mtmv_p0 -s test_ivm_basic_mtmv
Ensure the root fragment always rewrites output exprs from the final physical plan outputs so aggregate and TopN plans do not keep stale SlotRefs. Add FE/unit and regression coverage for MTMV hidden row-id changes after complete refresh.
Disable table-sink MV rewrite in the MTMV refresh execution context so refresh planning cannot rewrite back to the target MV.

Add a SessionVariable setter and extend UpdateMvByPartitionCommandTest to assert both MV rewrite switches are disabled for the refresh executor.

Test: ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.UpdateMvByPartitionCommandTest
Move IVM normalization after sink binding so incremental MTMV inserts keep hidden columns aligned with bound olap sink outputs and target slots.

Tests:
- bash ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.CreateTableCommandTest
- bash ./run-fe-ut.sh --run org.apache.doris.mtmv.MTMVTest,org.apache.doris.nereids.rules.rewrite.IvmNormalizeMtmvPlanTest
… normalization

- Remove dead condition in BindSink.getColumnToChildOutput: the second
  clause of the IVM hidden column skip guard was always true by definition
  of missingIvmHiddenColumns (columns guaranteed absent from child output)
- Add integration test testSinkWithPlaceholderChildReplacesRowIdAndPreservesExprId
  covering the BindSink placeholder -> IvmNormalizeMtmvPlan replacement pipeline
- Fix checkstyle import order violations in BindSink, IvmNormalizeMtmvPlan,
  MTMVTest, and IvmNormalizeMtmvPlanTest introduced in the previous commit
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Rename the IVM MTMV normalize rule class, its RuleType constant, and the matching FE unit test to remove the stale Plan suffix and keep the analyzer registration aligned with the new symbol names.

### Release note

None

### Check List (For Author)

- Test: FE unit test via   bash ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.IvmNormalizeMtmvTest
- Behavior changed: No
- Does this need documentation: No
yujun777 and others added 28 commits April 20, 2026 09:52
…ad of internal HASH(row_id)

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
IVM materialized views internally rewrite distribution to HASH(__DORIS_IVM_ROW_ID_COL__),
but SHOW CREATE MATERIALIZED VIEW was exposing this internal physical detail. Since row_id
is a hidden column invisible to users, the DDL output should show DISTRIBUTED BY RANDOM
with the same bucket count/auto-bucket setting instead. This makes the output re-executable
and preserves the bucket configuration on re-creation.

Also updates CreateMTMVCommandTest to reflect that MIN/MAX aggregates are now supported
for IVM (no longer throws AnalysisException), and adds a roundtrip test (TC-4-8) that
verifies SHOW CREATE DDL can recreate an identical IVM MV with preserved bucket count.

### Release note

None

### Check List (For Author)

- Test: Unit Test (ShowCreateMTMVTest 8/8, CreateMTMVCommandTest 24/24)
- Behavior changed: Yes (IVM SHOW CREATE now outputs DISTRIBUTED BY RANDOM instead of HASH on hidden column)
- Does this need documentation: No
…otes

Add development guidelines for the IVM module covering:
- Recommended regression and FE unit test suites to run before committing
- Documentation that binlog/stream is not ready (delta is mocked via full scan)
- No backward compatibility requirement before July 2026 public release
…R hints

When an MTMV definition SQL contains SET_VAR hints (e.g.,
/*+ SET_VAR(enable_force_spill = true) */), the COMPLETE refresh task
crashes with NPE at SelectHintSetVar.setVarOnceInSql() because
ConnectContext.get().getStatementContext() returns null.

The StatementContext was only installed on the ConnectContext inside
MTMVPlanUtil.executeCommand(), but UpdateMvByPartitionCommand.from()
calls NereidsParser.parseSingle() earlier, which triggers
LogicalPlanBuilder.withHints() that requires the StatementContext.

Key changes:
- Install StatementContext on ConnectContext immediately after creation
  in MTMVTask.exec(), before parsing the MV definition SQL
- Update test_commit_mtmv.out expected output to include the new
  "refreshMode" field in TaskContext JSON

Unit Test: IvmAggDeltaStrategyTest, IvmNormalizeMtmvTest,
  CreateMTMVCommandTest, ShowCreateMTMVTest (all pass)
Regression Test: test_mv_case, test_commit_mtmv, test_ivm_agg_mtmv
  (all pass)
…_true constant folding bug

Problem: IVM delta rewrite hardcoded dml_factor=1 (insert-only assumption), ignoring
delete operations. Additionally, assert_true guards in MIN/MAX boundary checks were
silently eliminated by FoldConstantRuleOnFE.visitIf when both IF branches were identical.

Key changes:
- IvmSimpleScanDeltaStrategy.visitLogicalOlapScan now checks for a binlog_op column in
  the base table; if present, derives dml_factor = IF(binlog_op = 0, 1, -1) instead of
  literal 1. This enables delete-aware INCREMENTAL refresh for tables with binlog_op.
- Fix assert_true being folded away: change IF false branches from identical expressions
  to NullLiteral, preventing FoldConstantRuleOnFE from collapsing IF(cond, x, x) into x.
  Affects assertNonNegative, MIN guard, and MAX guard in IvmAggDeltaStrategy.
- Rename transientDelHiddenName prefix to use Column.IVM_HIDDEN_COLUMN_PREFIX convention.
- Add Column.BINLOG_OPERATION_COL constant.

Unit tests:
- IvmSimpleScanDeltaStrategyTest: 14 tests (3 new for op-based dml_factor)
- IvmAggDeltaStrategyTest: 18 tests (all pass with NullLiteral fix)
- IvmNormalizeMtmvTest: 23 tests
- CreateMTMVCommandTest: 24 tests
- ShowCreateMTMVTest: 8 tests

Regression tests:
- test_ivm_basic_mtmv Part 2: scan MV + binlog_op delete semantics
- test_ivm_basic_mtmv Part 3: filter + scan + binlog_op delete propagation
- test_ivm_agg_mtmv Part 4: agg MV MIN boundary delete -> assert_true guard -> FAILED -> COMPLETE recovery
…cument binlog_op/row_id

Reduce code duplication in IvmAggDeltaStrategy by extracting four shared
helper methods and consolidating SUM/AVG into a single case block.
Also document the binlog_op-based dml_factor derivation and row_id
generation approach in the IVM AGENTS.md.

Key changes:
- buildExtremalDeltaOutputs: shared MIN/MAX delta aggregate output builder
- putExtremalSemanticSlots: shared MIN/MAX semantic slot mapping
- buildNewCount: shared assertNonNegative(COALESCE+delta) for all agg types
- buildExtremalTargetExpressions: shared MIN/MAX guard+merge+visible logic
- Consolidate SUM and AVG cases in buildTargetExpressions
- Document dml_factor from binlog_op and row_id generation in AGENTS.md

Unit Test: IvmAggDeltaStrategyTest (18), IvmSimpleScanDeltaStrategyTest (14),
  IvmNormalizeMtmvTest (23), CreateMTMVCommandTest (24), ShowCreateMTMVTest (8)
Add Part 5 and Part 6 to test_ivm_agg_mtmv to cover NULL value
handling in IVM incremental aggregation with binlog_op-based delta.

Key changes:
- Part 5: grouped agg MV with NULL values across SUM/AVG/COUNT/MIN/MAX,
  including NULL insertion, NULL deletion via binlog_op=1
- Part 6: scalar agg MV starting from all-NULL values, transitioning
  to mixed NULL/non-NULL via incremental inserts

Unit Test:
- test_ivm_agg_mtmv regression test (all 6 parts pass)
…calar deletion, and MAX boundary

Add three new regression test parts to test_ivm_agg_mtmv:

Problem: IVM agg test coverage had gaps around group deletion, scalar empty table,
and MAX boundary deletion scenarios.

Key changes:
- Part 7: Group disappearing — all rows deleted via binlog_op=1, group_count reaches 0,
  DELETE_SIGN=1 removes group row from MV, then group resurrection
- Part 8: Scalar agg with binlog_op deletes — verifies scalar row persistence through
  delete/insert cycles (COUNT+SUM only, no MIN/MAX to avoid boundary guards)
- Part 9: MAX boundary deletion — symmetric to Part 4 (MIN), verifies assert_true guard
  fires when deleting MAX value, INCREMENTAL FAILs, COMPLETE recovers

Unit Test: Regression test test_ivm_agg_mtmv passes (all 9 parts)
Fix AVG(DECIMAL) Divide ClassCastException in IvmAggDeltaStrategy where
newSum is DecimalV3 but newCount is BIGINT, causing Divide.getDataType()
to fail. Cast newCount to newSum type before creating the Divide.

Key changes:
- Fix: cast newCount to newSum type in AVG visible value computation
- B1: Scalar MIN/MAX plan structure tests (no net-zero filter, constant delete sign, assert_true guard)
- B2: Combined MIN+MAX produces two independent assert_true guards
- B3: Boundary guard predicate is compound Or expression with correct error message
- B4: Composite GROUP BY k1,k2 plan structure (multi-key grouping, row_id join)
- Part 10: Composite group keys regression test with group deletion
- Part 11: Multiple same-type aggs (SUM/SUM/MIN/MAX) hidden state isolation
- Part 12: Negative values / SUM cancellation crossing zero
- Part 13: Empty delta produces NOT_REFRESH
- Part 14: Type widening (TINYINT/DECIMAL/DOUBLE) regression test
- Part 15: Expressions in agg args negative test

Unit Test: IvmAggDeltaStrategyTest (23 tests), all IVM FE tests (92 tests)
Regression: test_ivm_agg_mtmv (Parts 1-15), all IVM regression suites pass
Problem: murmur_hash3_64 implements PropagateNullable, so any NULL group key
argument makes the entire hash return NULL. This causes different groups with
NULL keys to collide (e.g., (NULL,'a') and (NULL,'b') both produce row_id=NULL),
leading to incorrect incremental refresh results.

Key changes:
- Replace direct hash(keys...) with null-safe pattern:
  hash(ifnull(cast(k AS VARCHAR),''), cast(isnull(k) AS VARCHAR), ...) per key
- Add MurmurHash364(List<Expression>) constructor for cleaner list-based construction
- Remove unused IvmUtil.newIvmCountColumnDefinition()
- Update IVM AGENTS.md with null-safe row_id documentation

Unit Test:
- IvmUtilTest: 7 new tests verifying expression tree structure and non-nullability
- All 99 existing IVM FE unit tests pass

Regression Test:
- test_ivm_agg_4: Parts 16-18 covering single/multiple NULL group keys
  and empty-string vs NULL distinction
…ions

Previously IVM rejected compound expressions like SUM(v1+v2) or MIN(v1*2)
inside aggregate functions, requiring bare column Slots only. This relaxes
the constraint to accept arbitrary expressions as aggregate arguments.

Key changes:
- IvmAggMeta.AggTarget: exprSlots (List<Slot>) -> exprArgs (List<Expression>)
- IvmNormalizeMtmv: removed instanceof Slot check in buildHiddenStateForAgg
- IvmAggDeltaStrategy: widened helper method params from Slot to Expression
- Renamed all misleading XXXSlot variables/methods to XXXArg where type is Expression

Unit Test: IvmNormalizeMtmvTest (25), IvmAggDeltaStrategyTest (25), all 103 IVM FE tests pass
Regression Test: all 7 IVM regression suites pass
…code

Replace bare "SUM"/"COUNT"/"MIN"/"MAX" strings used as hidden state slot
keys with a type-safe StateKey enum in IvmAggMeta. This prevents typos and
provides compile-time safety. Also extract addHiddenSumAndCount() and
addHiddenAlias() helper methods in IvmNormalizeMtmv to eliminate SUM/AVG
code duplication.

Key changes:
- Add IvmAggMeta.StateKey enum with SUM, COUNT, MIN, MAX values
- Change AggTarget.hiddenStateSlots from Map<String,Slot> to Map<StateKey,Slot>
- Update all callsites in IvmAggDeltaStrategy and IvmNormalizeMtmv
- Add DELMIN/DELMAX as private static final String constants (transient keys)
- Extract addHiddenSumAndCount() and addHiddenAlias() in IvmNormalizeMtmv
- Update IvmNormalizeMtmvTest to use StateKey

Unit Test: 103 IVM FE unit tests pass
Regression Test: all 7 IVM suites pass
When an IVM INCREMENTAL refresh fails because a deleted row equals
the current MIN or MAX aggregate value, the assert_true guard fires
at runtime. Previously this was caught as a generic
INCREMENTAL_EXECUTION_FAILED reason, making it hard to distinguish
from real execution errors in logs and task error messages.

Key changes:
- IvmRefreshManager.doRefreshInternal() inspects the caught exception
  message for the boundary guard marker ("IVM: deleted row may be
  current") and sets IvmFallbackReason.MIN_MAX_BOUNDARY_HIT, which
  was defined but never used before
- Boundary hits are logged at INFO level (expected path) while other
  execution failures remain at WARN level
- Update regression test error-message assertions (Parts 4 and 9) to
  also check for MIN_MAX_BOUNDARY in the reason string

Unit Test: 103 FE unit tests pass; 7/7 IVM regression suites pass
### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: Previously, IVM (Incremental View Maintenance) rejected
materialized views with GROUP BY but no aggregate functions, throwing
"GROUP BY without aggregate functions is not supported for IVM".

This is unnecessarily restrictive because the unconditionally-injected
hidden column `__DORIS_IVM_AGG_COUNT_COL__` alone is sufficient to track
group membership for incremental maintenance. A bare GROUP BY is
semantically equivalent to SELECT DISTINCT, and the delta/apply paths
already handle empty aggTargets lists correctly.

### Release note

Support bare GROUP BY (SELECT DISTINCT) queries in IVM materialized views.

### Check List (For Author)

- Test: Unit Test (IvmNormalizeMtmvTest) + Regression test (test_ivm_agg_5)
- Behavior changed: Yes — previously rejected bare GROUP BY with AnalysisException, now accepted
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssion tests

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: IVM agg regression suites (test_ivm_agg_1~3, 5) only
verified MV output after COMPLETE refresh, leaving INCREMENTAL refresh
code paths untested. Also, test_ivm_agg_5 had a logic flaw in Part 1
where consecutive INCREMENTAL refreshes inflated group counts, preventing
group deletion from triggering correctly.

### Release note

None

### Check List (For Author)

- Test: Regression test — all 8 suites under mtmv_p0/ivm pass
- Behavior changed: No
- Does this need documentation: No

Key changes:
- test_ivm_agg_1.groovy: Add 6 order_qt_* assertions after INCREMENTAL refreshes
- test_ivm_agg_2.groovy: Add 8 order_qt_* assertions after INCREMENTAL refreshes
- test_ivm_agg_3.groovy: Add 4 order_qt_* assertions after INCREMENTAL refreshes
- test_ivm_agg_5.groovy: Redesign Part 1 into 3 isolated Scenarios (A/B/C), each
  starting from a fresh COMPLETE to keep counts accurate; redesign Part 2 to combine
  delete+insert in one batch before a single INCREMENTAL; fix Scenario B comment errors
- Regenerate test_ivm_agg_1~3.out and generate new test_ivm_agg_5.out
…S.md

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: The binlog_op mocking guide for IVM regression tests was
only in an untracked AGENTS.md under the regression-test directory. Merge
it into the committed FE IVM AGENTS.md so it is preserved and visible to
all contributors, with an additional note about the COMPLETE-before-delete
requirement for correct group deletion testing.

### Release note

None

### Check List (For Author)

- Test: No need to test (documentation only)
- Behavior changed: No
- Does this need documentation: No
Rewrite the IVM FE unit tests that still depended on JMockit so FE test compilation works with the current test dependencies.

Key changes:
- replace JMockit usage in IvmDeltaExecutorTest with Mockito static mocks
- rewrite RefreshMTMVInfoAnalyzeTest to use Mockito for Env and catalog setup
- migrate the remaining IVM FE tests away from JMockit imports and expectations

Unit Test:
- mvn test -pl fe-core -Dtest="RefreshMTMVInfoAnalyzeTest" -Dmaven.build.cache.enabled=false
- mvn test -pl fe-core -Dtest="IvmDeltaExecutorTest,IvmDeltaRewriterTest,IvmSimpleScanDeltaStrategyTest,IvmRefreshManagerTest" -Dmaven.build.cache.enabled=false
…anagerTest

### What problem does this PR solve?

Problem Summary: IvmRefreshManagerTest still used JMockit annotations
(@mocked, Expectations) which are not available as a dependency,
causing FE compilation failure.

### Release note

None

### Check List (For Author)

- Test: No need to test (build fix only, replacing mock framework usage)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
IVM (Incremental View Maintenance) was creating redundant hidden columns
in the MV schema. For several aggregate types, the visible column already
stores the same value as the hidden column, wasting storage and adding
unnecessary complexity:

- COUNT(*): hidden COUNT duplicated the global group count
- COUNT(expr): hidden COUNT duplicated the visible COUNT(expr)
- SUM: hidden SUM duplicated the visible SUM value
- MIN/MAX: hidden MIN/MAX duplicated the visible extremal value

This commit removes these redundant hidden columns. The delta apply logic
now reads old state from the visible column instead. Only genuinely needed
hidden columns are retained:
- SUM/MIN/MAX: hidden COUNT (for assertNonNegative guard and null logic)
- AVG: hidden SUM + COUNT (visible is AVG, not SUM or COUNT)

Additional cleanup:
- Inline addHiddenSumAndCount (only called once for AVG)
- Remove hasIvmHiddenOutputInOutputs/isIvmHiddenOutput private methods
- Simplify group key resolution to direct Slot casting
- Add stateColumnName(StateKey) helper to AggTarget
- Fix toColumn() bug in IvmDeltaTestBase (isVisible/isKey params swapped)
- Update class Javadoc with accurate plan shape

### Release note

None

### Check List (For Author)

- Test: Unit Test (90 IVM FE UTs + 24 CreateMTMVCommandTest all pass) and Regression test (8 IVM regression tests all pass)
- Behavior changed: No (internal schema optimization, no user-visible change)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ey with AggType

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
The IVM (Incremental View Maintenance) code had two redundant abstractions:
1. AggType enum had separate COUNT_STAR and COUNT_EXPR values, but the only
   difference is whether exprArgs is empty. Merging them into a single COUNT
   type with an isCountStar() helper simplifies all switch statements.
2. StateKey enum {SUM, COUNT, MIN, MAX} was a strict subset of AggType
   {COUNT, SUM, AVG, MIN, MAX} (AVG is never used as a hidden-state key).
   Eliminating StateKey removes an unnecessary indirection layer.
3. caseWhenExprNotNull was renamed to ifExprNotNull since it generates an
   IF expression, not a CASE WHEN.

### Release note

None

### Check List (For Author)

- Test: Unit Test (90 IVM FE UTs + 24 CreateMTMVCommandTest) and Regression test (8 IVM suites)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fix compilation and checkstyle errors caused by upstream's TableNameInfo
class relocation (org.apache.doris.info → org.apache.doris.catalog.info)
and duplicate TStorageType import from conflict resolution.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ustness

### What problem does this PR solve?

Problem Summary:
1. Remove empty if-block in buildHiddenStateForAgg COUNT branch — neither
   COUNT(*) nor COUNT(expr) adds hidden columns, so the if was dead code.
2. Use Count.isCountStar() instead of Count.isStar() when determining
   exprArgs. isCountStar() also covers COUNT() and COUNT(literal) forms,
   making the code robust against optimizer rewrites like COUNT(*)->COUNT(1).

### Release note

None

### Check List (For Author)

- Test: Unit Test (IvmNormalizeMtmvTest, IvmAggDeltaStrategyTest, CreateMTMVCommandTest all pass)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…to zero

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When all non-null rows contributing to a MIN/MAX aggregate are deleted
(hidden non-null count drops to 0), the boundary guard assertion
(assert_true) would incorrectly fire because the deleted extremal value
equals the current extreme. This caused unnecessary COMPLETE fallback.

This change:
- Adds newCount==0 as the first disjunct in the guard OR condition,
  bypassing the boundary check when count is zero (no boundary to protect)
- Replaces nested IF merge logic with CASE WHEN for clarity:
  CASE WHEN newCount=0 THEN NULL
       WHEN old IS NULL THEN deltaInsert
       WHEN deltaInsert IS NULL THEN old
       ELSE LEAST/GREATEST END
- Uses flat Or(ImmutableList.of(...)) instead of nested binary Or
- Updates method Javadoc to document the four-way guard condition
- Adds regression tests (test_ivm_agg_6) for two scenarios:
  A) Delete all rows: cnt=0, min/max=NULL
  B) Delete last non-null row with NULL rows remaining: min/max=NULL

### Release note

None

### Check List (For Author)

- Test: Regression test (test_ivm_agg_6), FE unit test pass
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* [improvement](fe) Add IVM create and alter validation for MTMV (#5)

Issue Number: close #xxx

Problem Summary:
1. Remove redundant pre-analysis in IVM analyzeQuery - single analyzeQueryInternal call
2. Unify IVM base table validation error messages (AGG_KEYS vs UNIQUE without MOW)
3. Add excluded_trigger_tables support in IvmNormalizeMtmv (transient row-id for excluded tables)
4. Block ALTER MTMV refresh method to/from INCREMENTAL
5. Validate base table models when ALTER MTMV excluded_trigger_tables
6. Extract MTMVPropertyUtil.parseTableNameInfos utility
7. Add comprehensive unit tests for all validation paths

* [fix](fe) Fix IVM ExprId collision by reusing parser StatementContext

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
Commit 71a3086 introduced a new StatementContext in analyzeQueryInternal()
and restored the original (parser-created) StatementContext in the finally
block. This caused ExprId collisions during IVM INCREMENTAL refresh because
the parser StatementContext has a much smaller ExprId counter than the
analysis StatementContext, and IvmRefreshManager.doRefreshInternal() reads
exprIdStart from the ConnectContext's StatementContext after analysis.

The fix reverts to the pre-71a3086f pattern: reuse ctx.getStatementContext()
(the parser-created StatementContext) directly for analysis instead of
creating a new one. This way all ExprId allocations accumulate in the same
StatementContext that doRefreshInternal() later reads.

### Release note

None

### Check List (For Author)

- Test: Regression test (test_ivm_agg_2)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* [fix](fe) Fix buildRowId to compute proper row-id for excluded trigger tables

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
buildRowId() short-circuited to UuidNumeric() for excluded trigger tables,
losing the deterministic row-id (hash of unique keys) for MOW and non-MOW
UNIQUE_KEYS tables. The fix removes the early return and only uses the
isExcludedTriggerTable flag to suppress AnalysisException for unsupported
table types (AGG_KEYS etc.), while UNIQUE_KEYS tables always compute
buildRowIdHash(keySlots) regardless of exclusion status.

### Release note

None

### Check List (For Author)

- Test: Unit Test (IvmNormalizeMtmvTest)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* [test](fe) Update IvmNormalizeMtmvTest for excluded MOW table row-id fix

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
Update testExcludedMowTableUsesTransientRowId to expect deterministic
hash-based row-id (Cast expression) instead of UuidNumeric for excluded
MOW tables, matching the fix in buildRowId().

### Release note

None

### Check List (For Author)

- Test: Unit Test (IvmNormalizeMtmvTest - 27 tests pass)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* [fix](fe) Refine ALTER MTMV refresh method compatibility rules

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
The ALTER MTMV refresh method validation was too restrictive (blocking all
changes to/from INCREMENTAL) or too permissive in some cases. The new rules:
- COMPLETE <-> INCREMENTAL: forbidden (must recreate MV)
- COMPLETE/INCREMENTAL -> AUTO: allowed
- AUTO -> COMPLETE/INCREMENTAL: forbidden (must recreate MV)
- Same method (no-op): allowed

### Release note

None

### Check List (For Author)

- Test: Regression test
- Behavior changed: Yes (COMPLETE/INCREMENTAL to AUTO now allowed)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* opt code

* [test](fe) Isolate CreateMTMVCommandTest statement context

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: CreateMTMVCommandTest reused the same StatementContext across multiple statements, so table and excluded_trigger_tables state leaked between test cases and caused false incremental MV validation failures. This change resets the statement context for each statement in the test and adds the missing LinkedHashSet import required by StatementContext.

### Release note

None

### Check List (For Author)

- Test: FE unit test

    - ./run-fe-ut.sh --run org.apache.doris.mtmv.ivm.IvmAggDeltaStrategyTest,org.apache.doris.mtmv.ivm.IvmDeltaExecutorTest,org.apache.doris.mtmv.ivm.IvmDeltaRewriterTest,org.apache.doris.mtmv.ivm.IvmRefreshManagerTest,org.apache.doris.mtmv.ivm.IvmSimpleScanDeltaStrategyTest,org.apache.doris.mtmv.ivm.IvmUtilTest,org.apache.doris.nereids.rules.rewrite.IvmNormalizeMtmvTest,org.apache.doris.nereids.trees.plans.CreateMTMVCommandTest,org.apache.doris.catalog.ShowCreateMTMVTest

- Behavior changed: No

- Does this need documentation: No

* [fix](fe) Fix TableNameInfo imports after rebasing IVM branch

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Rebase onto the latest yujun777/ivm left several FE IVM files importing
org.apache.doris.info.TableNameInfo while the current base branch provides the type in
org.apache.doris.catalog.info.TableNameInfo. This fixes the stale imports so the rebased
branch compiles and the targeted IVM FE tests run successfully again.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - org.apache.doris.mtmv.ivm.IvmAggDeltaStrategyTest
    - org.apache.doris.mtmv.ivm.IvmDeltaExecutorTest
    - org.apache.doris.mtmv.ivm.IvmDeltaRewriterTest
    - org.apache.doris.mtmv.ivm.IvmRefreshManagerTest
    - org.apache.doris.mtmv.ivm.IvmSimpleScanDeltaStrategyTest
    - org.apache.doris.mtmv.ivm.IvmUtilTest
    - org.apache.doris.nereids.rules.rewrite.IvmNormalizeMtmvTest
    - org.apache.doris.nereids.trees.plans.CreateMTMVCommandTest
    - org.apache.doris.catalog.ShowCreateMTMVTest
- Behavior changed: No
- Does this need documentation: No

* [fix](fe) Use deterministic row id for excluded AGG_KEYS tables

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: IVM normalization treated excluded AGG_KEYS base tables as transient-row-id scans
and generated uuid_numeric() row ids. This changed excluded AGG_KEYS row identity across refreshes
and did not follow the base table aggregate keys. Fix buildRowId to hash base-schema key columns for
excluded AGG_KEYS tables, and add focused tests to verify the row id is deterministic and excludes
non-key value columns.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - org.apache.doris.mtmv.ivm.IvmAggDeltaStrategyTest
    - org.apache.doris.mtmv.ivm.IvmDeltaExecutorTest
    - org.apache.doris.mtmv.ivm.IvmDeltaRewriterTest
    - org.apache.doris.mtmv.ivm.IvmRefreshManagerTest
    - org.apache.doris.mtmv.ivm.IvmSimpleScanDeltaStrategyTest
    - org.apache.doris.mtmv.ivm.IvmUtilTest
    - org.apache.doris.nereids.rules.rewrite.IvmNormalizeMtmvTest
    - org.apache.doris.nereids.trees.plans.CreateMTMVCommandTest
    - org.apache.doris.catalog.ShowCreateMTMVTest
- Behavior changed: Yes (excluded AGG_KEYS row-id generation is now deterministic on agg keys)
- Does this need documentation: No

* fix comment

* fix comment

* opt code

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ages

### What problem does this PR solve?

Problem Summary:
1. Remove dead code in IvmNormalizeMtmv.buildRowId() — the isExcludedTriggerTable
   branch at lines 503-505 was unreachable because all KeysType cases (UNIQUE_KEYS,
   DUP_KEYS, AGG_KEYS) are already handled above it.
2. Improve the error message in AlterMTMVRefreshInfo.validateRefreshMethodCompat()
   when attempting to alter the refresh method of an INCREMENTAL materialized view,
   making it clearer that the operation is not allowed.

### Release note

None

### Check List (For Author)

- Test: Regression test / Unit Test
    - FE UT: 94/94 IVM tests passed
    - Regression: 10/10 IVM suites passed (mtmv_p0/ivm)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: Implements the core multi-bundle delta plan generation for
IVM incremental refresh. This enables producing one delta command bundle per
base table that has pending changes, with correct TSO snapshot binding across
all scans in the normalized plan.

Changes:
- IvmStreamRef: Replaced streamType/consumerId/properties with consumedTso
  (persisted) and latestTso (transient). Added isUpToDate(). Deleted StreamType enum.
- OlapTable: Added getVisibleTso() mock (delegates to getVisibleVersion).
- LogicalOlapScan: Added tso (default -1) and isDelta (default false) fields,
  with withTso()/withIsDelta() methods. Both participate in equals().
- IvmDeltaRewriter: Complete rewrite with generateDeltaPlans() multi-bundle
  logic, rewriteOlapScans() helper, replaceWithDelta() mock. Uses
  rewriteDownShortCircuit + AtomicInteger for deterministic scan traversal.
  TSO binding: j<i → latestTso (v2), j>i → consumedTso (v1).
  Includes latestTso >= consumedTso invariant check.
- IvmSimpleScanDeltaStrategy: isDelta check — non-delta scans skip dml_factor.
- IvmAggDeltaStrategy: ctx made final, set via constructor (single-use).
- IvmRefreshManager: Empty bundles = success (no-op, all tables up to date).
- IvmDeltaRewriteContext: Added baseTableStreams field.

### Release note

None

### Check List (For Author)

- Test: Unit Test (109 IVM tests pass: 21 rewriter, 25 agg strategy, 14 simple strategy, 28 normalize, 7 util, 10 refresh manager, 4 executor)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… latestTso reading

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
Implements Steps 6-7 of the multi-bundle IVM plan:

Step 6 - runningIvmRefresh crash recovery flag:
- Add runningIvmRefresh boolean to IvmInfo with @SerializedName("rr")
- Add ALTER_IVM_INFO editlog op type with full AlterMTMV persistence path
- In IvmRefreshManager: set flag=true before bundle execution, clear after
  success with consumedTso advance in one atomic editlog write
- On failure: leave flag set so next task detects and falls back to COMPLETE
- In MTMVTask: detect flag on COMPLETE refresh entry, capture pre-refresh
  TSOs, reset state after successful full refresh
- MTMV.alterIvmInfo() and getIvmInfo() use writeMvLock for thread safety

Step 7 - latestTso reading and baseTableStreams passing:
- populateLatestTso() reads OlapTable.getVisibleTso() for each base table
- ensureBaseTableStreamsInitialized() lazily populates from MTMV relation
  metadata on first incremental refresh (handles empty map from MTMV creation)
- Pass baseTableStreams to IvmDeltaRewriteContext for TSO binding
- Guard advanceConsumedTso: only advance if latestTso >= consumedTso to
  prevent regression when table resolution fails

### Release note

None

### Check List (For Author)

- Test: Unit Test (71 tests pass: IvmRefreshManager 17, IvmDeltaRewriter 21, IvmAggDeltaStrategy 25, AlterMTMV 8) + Regression test (10/10 IVM tests pass)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
1. Added unit tests for IvmStreamRef (7 tests) and LogicalOlapScan tso/isDelta fields (5 tests)
2. Fixed populateLatestTso() to throw AnalysisException on failure instead of silently
   swallowing exceptions, preventing false no-op when TSO read fails on first refresh
3. Fixed captureBaseTableTsos() to return empty map on any failure, ensuring IVM state
   reset is safely skipped rather than partially applied
4. Added isEmpty() check in MTMVTask for captured TSOs

### Release note

None

### Check List (For Author)

- Test: Unit Test (86 tests pass across 6 IVM test files)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 20, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants