fix: replace hardcoded fill value with dynamic min_nonzero/10 in get_binned_data by mukund1985 · Pull Request #1862 · evidentlyai/evidently

mukund1985 · 2026-04-21T12:08:43Z

Summary

Fixes #334

The zero-fill logic in get_binned_data used a hardcoded threshold and fallback of 0.0001. When all non-zero percentages in a distribution are smaller than 0.0001 (common with large datasets or rare categories), the fill value becomes larger than legitimate data values. This makes KL-divergence and other stattest calculations incorrect — the fill is supposed to be a negligible epsilon, not a dominant value.

Root cause:

# BEFORE — fill can exceed real values when min_nonzero <= 0.0001
np.place(reference_percents, reference_percents == 0,
    min(reference_percents[reference_percents != 0]) / 10**6
    if min(reference_percents[reference_percents != 0]) <= 0.0001
    else 0.0001)

Fix:

# AFTER — always proportional to the smallest real value in that array
ref_nonzero = reference_percents[reference_percents != 0]
if len(ref_nonzero) > 0:
    np.place(reference_percents, reference_percents == 0, min(ref_nonzero) / 10)

min_nonzero / 10 is guaranteed strictly smaller than any real data value at any scale. The empty-array guard prevents errors when one side has no non-zero entries.

Changes

src/evidently/legacy/calculations/stattests/utils.py — pandas implementation
src/evidently/legacy/spark/calculations/stattests/utils.py — Spark implementation

Both use identical logic with their respective parameter names (feel_zeroes / fill_zeroes).

Test plan

All 156 existing stattest unit tests pass (pytest tests/multitest/metrics/test_data_drift.py tests/stattests/ -v)
Smoke test confirms fill value equals exactly min_nonzero / 10 and old hardcoded 0.0001 is gone
Verified fix handles edge case where all values in one array are non-zero (empty guard)

…binned_data Fixes evidentlyai#334 The zero-fill logic in `get_binned_data` used a hardcoded threshold and fallback value of 0.0001. When all non-zero percentages were smaller than 0.0001 (e.g. for large datasets or rare categories), the fill value could be *larger* than legitimate data values. This caused KL-divergence and other stattest calculations to produce incorrect results because the fill was supposed to be a negligible epsilon, not a dominant value. Fix: always use `min(nonzero_values) / 10` as the fill value. This is guaranteed to be strictly smaller than any real data value, regardless of scale. Empty-array guards prevent errors when one side has no non-zero entries. Applied to both pandas (`calculations/stattests/utils.py`) and Spark (`spark/calculations/stattests/utils.py`) implementations.

mukund1985 · 2026-04-25T22:37:14Z

Hey, just flagging — ran the existing test suite locally and everything passes. The fix is pretty minimal, just handling that edge case. Happy to change the approach if there's a better way to do it, just let me know.

mukund1985 · 2026-04-29T18:04:01Z

@Liraim — would appreciate a review when you get a chance. Tests pass locally, happy to make any changes needed.

mukund1985 · 2026-04-29T18:06:44Z

@DimaAmega — looks like CI hasn't triggered yet, could you approve the workflow run when you get a chance?

github-actions · 2026-05-01T17:09:08Z

📚 Artifacts deployed to GitHub Pages: https://evidentlyai.github.io/evidently/ci/#pr-1862-fix-fill-zeroes-dynamic-value

mukund1985 force-pushed the fix/fill-zeroes-dynamic-value branch from 9b9872c to 99a8b3c Compare April 22, 2026 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace hardcoded fill value with dynamic min_nonzero/10 in get_binned_data#1862

fix: replace hardcoded fill value with dynamic min_nonzero/10 in get_binned_data#1862
mukund1985 wants to merge 1 commit intoevidentlyai:mainfrom
mukund1985:fix/fill-zeroes-dynamic-value

mukund1985 commented Apr 21, 2026 •

edited

Loading

Uh oh!

mukund1985 commented Apr 25, 2026

Uh oh!

mukund1985 commented Apr 29, 2026

Uh oh!

mukund1985 commented Apr 29, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mukund1985 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

mukund1985 commented Apr 25, 2026

Uh oh!

mukund1985 commented Apr 29, 2026

Uh oh!

mukund1985 commented Apr 29, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mukund1985 commented Apr 21, 2026 •

edited

Loading