Enable PHP FFE evaluation metric system tests by leoromanovsky · Pull Request #7033 · DataDog/system-tests

leoromanovsky · 2026-05-28T13:39:46Z

Motivation

PHP FFE evaluation metrics are merged in DataDog/dd-trace-php#3911, so the shared system-test coverage should be enabled for PHP.

Design doc: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0

Changes

This PR enables tests/ffe/test_flag_eval_metrics.py for PHP at v1.21.0-dev in manifests/php.yml.

Scope is intentionally only evaluation metrics. PHP evaluation and exposure system-test activation are already separate, so this PR stays as the metric-validation layer.

Decisions

The PR remains a manifest-only activation. The shared metric tests already cover successful evaluations, missing flags, malformed/empty RC payloads, disabled flags, type mismatches, targeting keys, and allocation metadata.

I checked the RC readiness concern locally. RemoteConfigState.apply() waits for the tracer RC ACK and then sleeps briefly; that ACK is not a formal guarantee that the PHP evaluator has installed the config. I did not add a new wait helper in this PR because the current suite passed against the merged PHP implementation. If this flakes in CI, the targeted follow-up is to wait for a successful /ffe evaluation before asserting metrics, not to broaden this PR.

v1.21.0-dev is kept as the activation floor. A prior CI run used php@1.21.0+f4ff7b4c82f6f25bf6486b700a1ba27875d48e09, which predates DataDog/dd-trace-php#3911 and therefore failed the metrics assertions. The PHP dev S3 latest artifact now points at the merged commit 1.21.0+87f1683bd2365ee388396502fc9ea9cd3a5d2e2e.

Related PRs

PHP runtime evaluation base: feat(ffe): add runtime-backed PHP feature flag evaluation dd-trace-php#3906
PHP metric implementation: Add FFE evaluation metrics dd-trace-php#3911
PHP evaluation system tests: Enable PHP FFE evaluation system tests #7003
PHP exposure system tests: Enable PHP FFE exposure system tests #7031
Sidecar metric delivery: feat(sidecar): forward FFE evaluation metrics to OTLP intake libdatadog#2052

Validation

Current pushed head: 2957e57b5, including a normal signed merge of latest origin/main.

Confirmed PHP dev S3 latest now targets the merged metrics artifact:

curl -fsSL https://s3.us-east-1.amazonaws.com/dd-trace-php-builds/latest/datadog-setup.php \
  | rg "RELEASE_VERSION"

Result:

define('RELEASE_VERSION', urlencode('1.21.0+87f1683bd2365ee388396502fc9ea9cd3a5d2e2e'));

Local Apple Silicon validation used the same S3 artifact. utils/scripts/load-binary.sh php dev requested arm64-linux-gnu, but PHP publishes the local ARM Linux artifact as aarch64-linux-gnu, so I fetched the matching tarball manually into binaries/ and rebuilt the weblog with no cache.

DOCKER_HOST=unix:///Users/leo.romanovsky/.colima/default/docker.sock \
DOCKER_CONFIG=/tmp/system-tests-docker-config-nocredsstore \
TEST_LIBRARY=php \
WEBLOG_VARIANT=php-fpm-8.2 \
EXTRA_DOCKER_ARGS=--no-cache \
./utils/build/build.sh --library php --weblog-variant php-fpm-8.2 --images weblog

Forced local run:

DOCKER_HOST=unix:///Users/leo.romanovsky/.colima/default/docker.sock \
DOCKER_CONFIG=/tmp/system-tests-docker-config-nocredsstore \
TEST_LIBRARY=php \
WEBLOG_VARIANT=php-fpm-8.2 \
./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION \
  -F tests/ffe/test_flag_eval_metrics.py \
  tests/ffe/test_flag_eval_metrics.py

Context:

Agent: 7.65.0
Backend: datad0g.com
Library: php@1.21.0+87f1683bd2365ee388396502fc9ea9cd3a5d2e2e
Weblog variant: php-fpm-8.2
Weblog system: Linux arm64

Result: 17 passed in 81.79s, exit code 0.

Local run notes:

On this Apple Silicon machine, datadog/agent:latest crashed before tests started, so validation pinned the local ignored binaries/agent-image override to datadog/agent:7.65.0. That override was removed after validation and is not part of this PR.

github-actions · 2026-05-28T13:40:25Z

CODEOWNERS have been resolved as:

manifests/php.yml                                                       @DataDog/apm-php @DataDog/asm-php

datadog-prod-us1-5 · 2026-05-28T13:48:45Z

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 2957e57 | Docs | Datadog PR Page | Give us feedback!}

## Motivation PHP FFE evaluation metrics need a native path for aggregation, OTLP encoding, and delivery without building PHP OTLP writer/transport machinery. The shared design doc is the cross-PR reference: https://docs.google.com/document/d/1NvMfTpZWLBlFmEFNjdnlMyeVpy5l7KD8qujGFco6w2w/edit?tab=t.0 This PR is metric-only. Exposures remain in #2026 so reviewers can evaluate OTLP metric delivery independently from exposure cache semantics. ## Changes This adds caller-driven FFE evaluation metric sidecar actions and OTLP export for `feature_flag.evaluations`. The reusable FFE-domain pieces now live in `datadog-ffe` behind the `evaluation-metrics` feature: evaluation metric input types, metric attribute normalization, aggregation by matching attribute sets, and OTLP/protobuf payload encoding. `datadog-sidecar` keeps only sidecar-specific work: parsing the configured endpoint URL, building the HTTP request, applying the timeout, logging delivery failures, and integrating with sidecar lifecycle/actions. The PHP companion PR uses this from native/C code for raw `DDTrace\ffe_evaluate` calls and from a thin PHP OpenFeature adapter for final OpenFeature-aware results. PHP does not aggregate, encode, or transport OTLP payloads. Current PHP MVP path: ```mermaid flowchart LR Eval["PHP evaluation raw API or OpenFeature adapter"] Record["PHP tracer native call record typed evaluation metric"] Action["sidecar action record FFE evaluation metrics"] Domain["datadog-ffe feature: evaluation-metrics attributes + aggregation + OTLP encoder"] Sidecar["shared sidecar metric flush lifecycle"] Collector["OTLP endpoint Agent or local collector"] Intake["feature_flag.evaluations"] Eval --> Record Record --> Action Action --> Domain Domain --> Sidecar Sidecar --> Collector Collector --> Intake ``` Future Python/Ruby connection: ```mermaid flowchart LR PyToday["dd-trace-py today OpenFeature hook + host metric writer"] RbToday["dd-trace-rb today OpenFeature hook + host metric writer"] PyFuture["dd-trace-py future explicit native opt-in"] RbFuture["dd-trace-rb future explicit native opt-in"] Native["libdatadog caller-driven FFE metric action"] Shared["shared sidecar aggregation + OTLP delivery"] Otlp["OTLP endpoint"] PyToday -. "current host metric path" .-> Otlp RbToday -. "current host metric path" .-> Otlp PyFuture -. "after ownership switch" .-> Native RbFuture -. "after ownership switch" .-> Native Native --> Shared Shared --> Otlp ``` The future Python/Ruby arrows are intentionally not active behavior in this PR. They show the reusable target for a later migration while preserving today's host-language metric writers. Why Python/Ruby do not double count today: - Python and Ruby use libdatadog for evaluation only; the evaluator returns assignment metadata and does not record `feature_flag.evaluations` as a side effect. - This PR adds a separate caller-driven sidecar action. Metric emission happens only when an SDK explicitly records a typed evaluation metric into that action. PHP wires this in its companion PR; Python and Ruby do not. - Python and Ruby therefore keep exactly their current host-language OpenFeature metric writers. They are not also sending evaluation metrics through this native sidecar path. - Evaluation metrics intentionally count every evaluation and do not have exposure-cache deduplication semantics. Future Python/Ruby migration must switch ownership to native logging and disable/bypass the host metric writer for the same evaluations. Reference implementation check: dd-trace-java's canonical metric path is OpenFeature hook based. Java's `Provider` creates `FlagEvalMetrics` and returns a `FlagEvalHook`; the hook runs in `finallyAfter`, reads the final OpenFeature `FlagEvaluationDetails` including flag key, variant, reason, error code, and allocation metadata, and records one `feature_flag.evaluations` counter. Application code only calls OpenFeature; it does not call a metric API. PHP mirrors that canonical OpenFeature shape. The PHP OpenFeature provider disables raw native metric recording while it asks the native evaluator for an assignment, then records exactly one final OpenFeature-aware metric through the Datadog-owned recorder. The raw Datadog PHP client has no direct Java equivalent, but it keeps the same SDK-owned ergonomics: normal evaluation APIs record one native metric per evaluation internally. For future Python/Ruby migration, the same rule applies: either keep the existing host-language OpenFeature metric hook, or switch ownership to the native recorder and disable/bypass the host metric writer for those evaluations. ## Decisions No telemetry is emitted automatically from shared libdatadog evaluator calls. SDKs must explicitly enqueue FFE telemetry actions. This avoids double counting for Python/Ruby, which currently log feature-flag telemetry in host-language code. Evaluation metrics intentionally count evaluations and do not use exposure-cache deduplication semantics. Future Python/Ruby migration must be an ownership switch, not an additional writer. If those SDKs opt into this native metric path, their host-language OpenFeature metric writers must stop recording the same evaluations. ## Validation Current head (`96d9a7bae`) local validation: ```sh cd /Users/leo.romanovsky/go/src/github.com/DataDog/libdatadog-ffe-sidecar-metrics cargo fmt --check cargo test -p datadog-ffe --features evaluation-metrics telemetry::evaluation_metrics cargo test -p datadog-sidecar ffe_metric cargo check -p datadog-ffe cargo check -p datadog-sidecar-ffi ``` Results: datadog-ffe metric tests passed (2 passed), sidecar metric tests passed (6 passed), default datadog-ffe check passed, sidecar FFI check passed, fmt check passed with only the repo stable-rustfmt warnings. Prior downstream PHP behavior validation before the reusable-crate refactor, from DataDog/dd-trace-php#3911 using this PR at `1f1fca439`: ```text ffe-dogfooding subject=php-3911-split-1779981881 php7_metrics=3 php8_metrics=3 php7_exposures=0 php8_exposures=0 ``` System-tests downstream validation: ```sh TEST_LIBRARY=php ./run.sh FEATURE_FLAGGING_AND_EXPERIMENTATION tests/ffe/test_flag_eval_metrics.py -vv ``` Result: 17 passed in 81.26 seconds. Related PRs: DataDog/dd-trace-php#3906, DataDog/dd-trace-php#3911, #2026, DataDog/system-tests#7033. Co-authored-by: leo.romanovsky <leo.romanovsky@datadoghq.com>

…hp-ffe-metrics

dd-octo-sts-019303 · 2026-06-05T16:05:38Z

🐑 PR Shepherd is maintaining this PR

I watch your PR and automatically fix CI failures, rebase your branch, handle flaky tests, and push it to the merge queue when it's ready.

More about what I do → Guide

To pause me on this PR, add the flow-skip label.

leoromanovsky mentioned this pull request May 28, 2026

Add FFE evaluation metrics DataDog/dd-trace-php#3911

Merged

This was referenced May 28, 2026

feat(sidecar): forward FFE evaluation metrics to OTLP intake DataDog/libdatadog#2052

Merged

Enable PHP FFE exposure system tests #7031

Merged

leoromanovsky force-pushed the leo.romanovsky/pr-i-php-ffe-metrics branch from a412f4f to fbc9822 Compare May 29, 2026 00:59

Base automatically changed from leo.romanovsky/pr-g-php-ffe-scaffold to main May 29, 2026 15:48

leoromanovsky force-pushed the leo.romanovsky/pr-i-php-ffe-metrics branch from fbc9822 to 581f541 Compare June 3, 2026 20:32

Enable PHP FFE evaluation metric system tests

a577bfd

leoromanovsky force-pushed the leo.romanovsky/pr-i-php-ffe-metrics branch from 581f541 to a577bfd Compare June 4, 2026 17:47

leoromanovsky mentioned this pull request Jun 4, 2026

test(ffe): wait for ready evaluation metric response #7090

Draft

Merge remote-tracking branch 'origin/main' into leo.romanovsky/pr-i-p…

2957e57

…hp-ffe-metrics

leoromanovsky marked this pull request as ready for review June 5, 2026 16:05

leoromanovsky requested review from a team as code owners June 5, 2026 16:05

leoromanovsky enabled auto-merge (squash) June 5, 2026 16:05

sameerank approved these changes Jun 5, 2026

View reviewed changes

leoromanovsky merged commit 629b3cf into main Jun 5, 2026
307 of 335 checks passed

leoromanovsky deleted the leo.romanovsky/pr-i-php-ffe-metrics branch June 5, 2026 16:07

sameerank mentioned this pull request Jun 5, 2026

[FFL-2449] Add server-side flag evaluation metrics documentation DataDog/documentation#37257

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable PHP FFE evaluation metric system tests#7033

Enable PHP FFE evaluation metric system tests#7033
leoromanovsky merged 2 commits into
mainfrom
leo.romanovsky/pr-i-php-ffe-metrics

leoromanovsky commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

datadog-prod-us1-5 Bot commented May 28, 2026 •

edited by datadog-prod-us1-3 Bot

Loading

Uh oh!

dd-octo-sts-019303 Bot commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leoromanovsky commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Decisions

Related PRs

Validation

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

datadog-prod-us1-5 Bot commented May 28, 2026 • edited by datadog-prod-us1-3 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dd-octo-sts-019303 Bot commented Jun 5, 2026

🐑 PR Shepherd is maintaining this PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leoromanovsky commented May 28, 2026 •

edited

Loading

datadog-prod-us1-5 Bot commented May 28, 2026 •

edited by datadog-prod-us1-3 Bot

Loading