Skip to content

[Metrics SDK] Implement configurable cardinality limit#4188

Open
om7057 wants to merge 4 commits into
open-telemetry:mainfrom
om7057:feature/configurable-cardinality-limit
Open

[Metrics SDK] Implement configurable cardinality limit#4188
om7057 wants to merge 4 commits into
open-telemetry:mainfrom
om7057:feature/configurable-cardinality-limit

Conversation

@om7057

@om7057 om7057 commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Fixes #3292

Changes

This PR implements configurable aggregation cardinality limits at three priority levels as specified by the OpenTelemetry Metrics SDK specification:

  1. View-level: Cardinality limit defined in a View's AggregationConfig (highest priority)
  2. MetricReader-level: Default limits per instrument type in the reader (fallback)
  3. SDK default: Fallback to 2000 if neither above is defined (final fallback)

Implementation Details

MetricReader Changes:

  • Added CardinalityLimitOptions struct with per-instrument-type limits
  • Added GetCardinalityLimit(InstrumentType) method
  • Added SetCardinalityLimitOptions() method

MetricCollector Changes:

  • Added GetCardinalityLimit() to CollectorHandle interface
  • Implemented delegation to underlying MetricReader

MeterContext Changes:

  • Added GetReaderCardinalityLimit() method
  • Returns maximum limit across all configured readers (prevents data loss)

Storage Classes Changes:

  • Updated SyncMetricStorage and AsyncMetricStorage constructors
  • Added optional owned_aggregation_config parameter for lifetime management
  • Ensures reader-level configs remain valid for storage lifetime

Meter Changes:

  • Modified RegisterSyncMetricStorage() and RegisterAsyncMetricStorage()
  • Apply reader-level fallback when view doesn't provide config
  • Creates temporary AggregationConfig with reader limit when needed

Configuration Changes:

  • Wire view-level aggregation_cardinality_limit from YAML config
  • Wire reader-level cardinality_limits from YAML config
  • Removed "cardinality limits not supported" warnings

Priority Order

The implementation follows the specification's priority order:

  1. View config is checked first (when view is created)
  2. Reader config is used as fallback (when storage is created)
  3. SDK default (2000) is the final fallback (via AggregationConfig::GetOrDefault())

Testing

The existing YAML parsing tests already verify that configuration is parsed correctly. Additional integration tests are recommended to verify:

  • View-level limits override reader-level limits
  • Reader-level limits are used when no view is configured
  • SDK default is used when neither is configured
  • Per-instrument-type reader limits work correctly

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

Related Issues

Fixes #3292


Note: This is a spec-compliance feature that enables users to configure cardinality limits at multiple levels, preventing memory exhaustion in high-cardinality scenarios while maintaining flexibility.

@om7057

om7057 commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Hi @dbarker
I've implemented configurable cardinality limits for the Metrics SDK to resolve issue #3292 and bring the C++ implementation into compliance with the OpenTelemetry specification.

Summary of Changes:

  • Added three-level cardinality limit configuration (View > Reader > SDK default)
  • Wired existing YAML config infrastructure to actually apply the limits
  • Ensured proper lifetime management for reader-level configs

This is my first contribution to OpenTelemetry C++. I'd appreciate guidance on:

  1. Whether the approach aligns with the project's design principles
  2. If additional tests beyond the existing YAML parsing tests are needed
  3. If I should update the CHANGELOG.md

Looking forward to your feedback!

cc: @lalitb

@marcalff marcalff left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

There is something wrong the way clang-format was applied, causing spurious changes in many files.

This:

  • is most likely incorrect, and will fail CI on clang-format
  • makes it difficult to review the real changes from this patch.

When formatting code, a precise version of clang-format must be used. Applying a different version will cause this. The best is to use the dev container to perform formatting, as it will use the exact same version as the github CI.

In any cases, adding a file in a commit that was not changed voluntarily (i.e., with a real fix) is a red flag, and should not happen.

Please rework the patch to remove the noise, so it can be reviewed.

@om7057 om7057 force-pushed the feature/configurable-cardinality-limit branch 4 times, most recently from a385928 to 7758926 Compare June 30, 2026 02:51
@om7057

om7057 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

@marcalff Thank you for the feedback! I've reworked the PR to remove all spurious formatting changes.

Current state
Files modified:

metric_reader.h

  • Added CardinalityLimitOptions
    metric_reader.cc
  • Implemented getter/setter
    metric_collector.h
  • Added interface method
    metric_collector.cc
  • Implemented delegation
    meter_context.h
  • Added reader query method
    meter_context.cc
  • Implemented max-across-readers logic
    sync_metric_storage.h
  • Added lifetime management
    async_metric_storage.h
  • Added lifetime management
    meter.cc
  • Applied reader-level fallback
    sdk_builder.cc
  • Wired YAML config to implementation

Regarding CI: the only failing check is W3C Distributed Tracing Validation V1 (test_tracestate_key_illegal_vendor_format and test_tracestate_key_length_limit). This failure is unrelated to this PR — the changes here are entirely in the Metrics SDK and do not touch any trace context code. The W3C V1 test suite checks out the latest [w3c/trace-context]HEAD at runtime, and these two tests appear to be failing intermittently across PRs.

The PR is now clean and ready for review. Please let me know if you need any clarifications on the implementation approach!

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.69%. Comparing base (a56f0e2) to head (9c0922d).

Files with missing lines Patch % Lines
sdk/src/metrics/metric_reader.cc 0.00% 32 Missing ⚠️
sdk/src/metrics/meter_context.cc 0.00% 7 Missing ⚠️
...opentelemetry/sdk/metrics/state/metric_collector.h 0.00% 2 Missing ⚠️
sdk/src/metrics/state/metric_collector.cc 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4188      +/-   ##
==========================================
- Coverage   82.89%   82.69%   -0.19%     
==========================================
  Files         405      405              
  Lines       17292    17335      +43     
==========================================
+ Hits        14332    14334       +2     
- Misses       2960     3001      +41     
Files with missing lines Coverage Δ
.../include/opentelemetry/sdk/metrics/metric_reader.h 66.67% <ø> (ø)
sdk/src/metrics/meter.cc 81.36% <ø> (ø)
...opentelemetry/sdk/metrics/state/metric_collector.h 60.00% <0.00%> (-40.00%) ⬇️
sdk/src/metrics/state/metric_collector.cc 93.55% <0.00%> (-3.11%) ⬇️
sdk/src/metrics/meter_context.cc 80.00% <0.00%> (-6.74%) ⬇️
sdk/src/metrics/metric_reader.cc 41.54% <0.00%> (-40.27%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@om7057 om7057 force-pushed the feature/configurable-cardinality-limit branch 3 times, most recently from 65c13b7 to 1f771a3 Compare June 30, 2026 04:54
@om7057 om7057 force-pushed the feature/configurable-cardinality-limit branch from 1f771a3 to 8d2967e Compare June 30, 2026 06:02
Comment thread sdk/src/metrics/meter.cc Outdated
auto ctx_ptr = meter_context_.lock();
if (ctx_ptr)
{
size_t reader_limit = ctx_ptr->GetReaderCardinalityLimit(instrument_descriptor.type_);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this breaks per-reader semantics. Example: reader A has limit 10, reader B has limit 1000. The storage uses 1000, so reader A can export far more series than configured. This is also a heap/memory regression for low-limit readers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible fix is to avoid resolving reader-level limits into a single storage-level config. The view-level limit can stay on the shared stream/storage, but the MetricReader fallback should be applied per collector/reader, since each reader may have a different configured limit. So instead of taking the max across readers, the collection path likely needs to use the current CollectorHandle's GetCardinalityLimit(instrument_type) when no view-level limit exists.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @lalitb, thanks! You're right, resolving the reader-level fallback into shared storage breaks the per-reader semantics you described.

I've removed that logic here and reverted the storage constructors. For now, this PR only enforces view-level aggregation_cardinality_limit (plus the existing SDK default of 2000). Reader-level limits are still parsed, but not enforced.

I agree the right place to apply the MetricReader fallback is in the collection path on a per-CollectorHandle basis when no view-level limit exists. Since that's a more involved change, I'd rather tackle it in a follow-up PR.

Also fixed the IWYU warning in the latest commit.

om7057 added 3 commits June 30, 2026 21:05
…er semantics

As noted in review, the reader-level fallback breaks per-reader semantics
because all readers share the same storage with a single cardinality limit.

This commit simplifies the implementation to support only:
1. View-level: aggregation_cardinality_limit in YAML view configuration
2. SDK default: 2000 (existing behavior via AggregationConfig::GetOrDefault)

Reader-level limits are kept in the API for future implementation but are
not applied during storage creation. This avoids the heap/memory regression
for low-limit readers and maintains correct per-reader semantics.

The view-level implementation still provides significant value as it's the
primary configuration mechanism for users who need custom cardinality limits.
Include What You Use (IWYU) reported that sdk_builder.cc was missing
an explicit include for cardinality_limits_configuration.h even though
it uses CardinalityLimitOptions types defined in that header.

Added the missing include to fix the IWYU warning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Metrics SDK] Make cardinality limit configurable

3 participants