Skip to content

[core] Track NaN counts in column statistics#7879

Open
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/stats-nan-count
Open

[core] Track NaN counts in column statistics#7879
ArnavBalyan wants to merge 1 commit into
apache:masterfrom
ArnavBalyan:arnavb/stats-nan-count

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Member

@ArnavBalyan ArnavBalyan commented May 17, 2026

Purpose

  • Spark, Flink and Iceberg all support IsNaN as a first class predicate and use NaN counts in file/partition pruning.
  • Paimon's SimpleColStats only tracks min, max and null count today, there is no signal at the manifest layer to skip files.
  • Add a nanCount field to SimpleColStats and update the collectors to count the nans, and further be used for engine level predicate pushdown.

Tests

  • UT

@ArnavBalyan
Copy link
Copy Markdown
Member Author

cc @JingsongLi could you PTAL thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant