Skip to content

fix: normalise string inputs to lists in ColumnMapping feature fields#1871

Open
YousefZahran1 wants to merge 1 commit intoevidentlyai:mainfrom
YousefZahran1:youssef/fix-datetime-features-string-input
Open

fix: normalise string inputs to lists in ColumnMapping feature fields#1871
YousefZahran1 wants to merge 1 commit intoevidentlyai:mainfrom
YousefZahran1:youssef/fix-datetime-features-string-input

Conversation

@YousefZahran1
Copy link
Copy Markdown

What

ColumnMapping.datetime_features (and the three sibling fields categorical_features, numerical_features, text_features) are typed Optional[List[str]], but a user who passes a plain string gets silent, incorrect behaviour: the downstream check column_name in mapping.datetime_features becomes a substring test instead of a list-membership test.

Why

If a datetime feature column is named "prediction_timestamp_utc" and the prediction column is named "prediction", then "prediction" in "prediction_timestamp_utc" is True — so the prediction column is incorrectly classified as ColumnType.Datetime, crashing any metric that runs on it.

Reproduce

from evidently.legacy.pipeline.column_mapping import ColumnMapping

cm = ColumnMapping()
cm.prediction = "prediction"
cm.datetime_features = "prediction_timestamp_utc"  # string, not list

# Downstream check in data_preprocessing.py line 569:
# "prediction" in "prediction_timestamp_utc" → True  (substring match, BUG)
print("prediction" in cm.datetime_features)  # True — wrong!

Fix

Add __post_init__ to ColumnMapping (it's a dataclass) that wraps any bare string in a one-element list for all four feature-list fields. This is called automatically by __init__ so existing code using the constructor is unaffected; the fix also handles the post-construction assignment pattern (cm.datetime_features = "...") when __post_init__ is called explicitly or the object is re-initialised.

cm = ColumnMapping(datetime_features="prediction_timestamp_utc")
print("prediction" in cm.datetime_features)  # False — correct ✓

Testing

New test test_column_mapping_normalizes_string_feature_lists covers:

  • string → list normalisation for all four fields
  • key regression: "prediction" not in ["prediction_timestamp_utc"]
  • list input preserved unchanged
  • None input preserved unchanged

Fixes #846

ColumnMapping.datetime_features (and the three sibling fields) are typed
as Optional[List[str]], but users reasonably pass a plain string.  When a
string was stored the downstream check

    column_name in mapping.datetime_features

became a substring test instead of a list-membership test.  This caused a
column whose name merely *contains* the prediction column name (e.g.
'prediction_timestamp_utc') to be misclassified as the prediction column,
triggering a spurious ColumnType.Datetime error.

Fix: add __post_init__ to ColumnMapping that wraps any bare string in a
one-element list for datetime_features, categorical_features,
numerical_features, and text_features.

Fixes evidentlyai#846
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix issue with datetime_features column mapping

1 participant