Is your feature request related to a problem? Please describe.
Keep's deduplication rules currently have a single purpose: controlling which fields are used to compute an alert's fingerprint (its identity key). This lets operators override how Keep decides "is this the same alert as one I've already seen?"
This works well for preventing stale alerts from being reused. But it doesn't support a common monitoring use case:
- The same alert fires on multiple instances simultaneously (e.g., the same check failing across several hosts, regions, or services). These are genuinely distinct alert events and must be stored individually. However, before they enter the workflows pipeline, they should be correlated — so the operator can act on that relationship: suppress redundant notifications, route them together or apply any other custom logic.
The key idea is that correlation is pre-processing, not a prescribed action. By flagging alerts as correlated before they reach workflows, the operator retains full control over what happens next.
Describe the solution you'd like
Add a rule_type field to AlertDeduplicationRule. Each provider can have one rule of each type:
rule_type |
Behavior |
split (default) |
Existing behavior. Computes the alert's primary fingerprint. |
correlate (new) |
Computes a secondary correlation_fingerprint without changing the primary fingerprint. |
Three new fields are set on AlertDto before it enters the workflow pipeline:
| Field |
Description |
correlation_fingerprint |
Hash of the correlate rule's fields. |
is_correlated |
true if another active (non-resolved, non-suppressed) alert with the same correlation_fingerprint already exists. |
correlated_to |
Fingerprint of the first (representative) alert in the group. |
Example
- Split rule:
[fingerprint, startsAt] — each firing event is distinct
- Correlate rule:
[alertname, service] — groups all hosts firing the same alert for the same service
| Host |
Alert name |
Service |
is_correlated |
correlated_to |
| host-a |
CheckoutFailureRateHigh |
payments |
false |
null |
| host-b |
CheckoutFailureRateHigh |
payments |
true |
fingerprint of host-a |
| host-c |
CheckoutFailureRateHigh |
payments |
true |
fingerprint of host-a |
3 separate alerts stored, all enriched with correlation metadata before reaching workflows.
Describe alternatives you've considered
Option A — custom deduplication rule grouping by shared fields
This makes all instance alerts share the same fingerprint. They get merged into a single alert — the per-instance context is lost, and only one alert appears in Keep. This is semantically wrong: these are distinct alert events.
Option B — provider-side grouping (e.g. Alertmanager group_by)
Provider-side grouping only affects how the provider batches notifications before sending them to Keep. It has no effect on Keep's alert model. Keep still receives and stores each alert individually, and fires a workflow/notification for each one.
Option C — Keep's alert grouping / incident correlation
Keep's grouping feature is designed to create incidents automatically. The operator's goal here is different: they want to pre-process alerts and decide themselves whether to escalate. No automatic incident creation.
Option D — Keep's Correlation feature
Keep's Correlation feature can group related alerts, but it does so by creating an incident automatically. This couples the act of recognizing related alerts with the decision to escalate them — which may not be desirable. The operator may want to correlate alerts as a first step and only create an incident manually after reviewing the situation.
Is your feature request related to a problem? Please describe.
Keep's deduplication rules currently have a single purpose: controlling which fields are used to compute an alert's fingerprint (its identity key). This lets operators override how Keep decides "is this the same alert as one I've already seen?"
This works well for preventing stale alerts from being reused. But it doesn't support a common monitoring use case:
The key idea is that correlation is pre-processing, not a prescribed action. By flagging alerts as correlated before they reach workflows, the operator retains full control over what happens next.
Describe the solution you'd like
Add a
rule_typefield toAlertDeduplicationRule. Each provider can have one rule of each type:rule_typesplit(default)correlate(new)correlation_fingerprintwithout changing the primary fingerprint.Three new fields are set on
AlertDtobefore it enters the workflow pipeline:correlation_fingerprintis_correlatedtrueif another active (non-resolved, non-suppressed) alert with the samecorrelation_fingerprintalready exists.correlated_toExample
[fingerprint, startsAt]— each firing event is distinct[alertname, service]— groups all hosts firing the same alert for the same serviceis_correlatedcorrelated_tofalsenulltruetrue3 separate alerts stored, all enriched with correlation metadata before reaching workflows.
Describe alternatives you've considered
Option A — custom deduplication rule grouping by shared fields
This makes all instance alerts share the same fingerprint. They get merged into a single alert — the per-instance context is lost, and only one alert appears in Keep. This is semantically wrong: these are distinct alert events.
Option B — provider-side grouping (e.g. Alertmanager
group_by)Provider-side grouping only affects how the provider batches notifications before sending them to Keep. It has no effect on Keep's alert model. Keep still receives and stores each alert individually, and fires a workflow/notification for each one.
Option C — Keep's alert grouping / incident correlation
Keep's grouping feature is designed to create incidents automatically. The operator's goal here is different: they want to pre-process alerts and decide themselves whether to escalate. No automatic incident creation.
Option D — Keep's Correlation feature
Keep's Correlation feature can group related alerts, but it does so by creating an incident automatically. This couples the act of recognizing related alerts with the decision to escalate them — which may not be desirable. The operator may want to correlate alerts as a first step and only create an incident manually after reviewing the situation.