Skip to content

feat(openai): Add gen_ai.client.operation.time_to_first_chunk metric for streaming#4415

Open
Nik-Reddy wants to merge 1 commit intoopen-telemetry:mainfrom
Nik-Reddy:feat/genai-ttft-metric-3932
Open

feat(openai): Add gen_ai.client.operation.time_to_first_chunk metric for streaming#4415
Nik-Reddy wants to merge 1 commit intoopen-telemetry:mainfrom
Nik-Reddy:feat/genai-ttft-metric-3932

Conversation

@Nik-Reddy
Copy link
Copy Markdown

@Nik-Reddy Nik-Reddy commented Apr 13, 2026

Description

Implement the gen_ai.client.operation.time_to_first_chunk histogram metric as defined in OpenTelemetry Semantic Conventions v1.38.0. This metric records the time (in seconds) from request start to the first output chunk received during streaming chat completions.

This was requested in #3932 -- the semantic convention defines the metric but no Python instrumentation existed for it.

Note: Issue #3932 references gen_ai.server.time_to_first_token (server-side). This PR implements the client-side equivalent gen_ai.client.operation.time_to_first_chunk per the semconv registry, which measures time from when the client issues the request to when the first response chunk arrives in the stream.

Fixes #3932

Changes

  • util/genai/types.py: Added time_to_first_token_s field to LLMInvocation dataclass
  • util/genai/instruments.py: Added GEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNK constant and create_ttfc_histogram() factory with semconv-specified bucket boundaries
  • util/genai/metrics.py: InvocationMetricsRecorder now creates and records TTFC histogram (only for successful streaming responses)
  • openai_v2/instruments.py: Added ttfc_histogram to Instruments class via shared helper
  • openai_v2/patch.py: First-token detection in stream wrappers, wired into both new and legacy paths
  • tests/test_ttft_metrics.py: 4 test cases covering sync/async streaming, non-streaming exclusion, and tool-call streaming

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

All new TTFC tests pass. All existing tests continue to pass.

Does This PR Require a Core Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@Nik-Reddy Nik-Reddy force-pushed the feat/genai-ttft-metric-3932 branch from bf9fbee to 40ac7e3 Compare April 14, 2026 18:05
@Nik-Reddy
Copy link
Copy Markdown
Author

Hi @lmolkova, I have addressed your feedback: (1) Renamed gen_ai.server.time_to_first_token to gen_ai.client.time_to_first_token throughout. (2) Moved the TTFT constant, bucket boundaries, and get_metric_data_points() helper into util/opentelemetry-util-genai instruments.py so they are shared across instrumentation libraries. @xrmx earlier feedback was also incorporated (helper returns all matches, tests assert count). Would appreciate a re-review when convenient. Thanks!

@Nik-Reddy Nik-Reddy requested review from lmolkova and xrmx April 15, 2026 01:00
@Nik-Reddy Nik-Reddy force-pushed the feat/genai-ttft-metric-3932 branch from 9a5cba5 to 69ab567 Compare April 15, 2026 01:08
@Nik-Reddy
Copy link
Copy Markdown
Author

Rebased on latest main and addressed @lmolkova's feedback — refactored \instruments.py\ to use the shared \create_duration_histogram, \create_token_histogram, and \create_ttft_histogram\ helpers from \opentelemetry.util.genai.instruments\ instead of defining bucket boundaries inline. This removes ~50 lines of duplicated configuration and aligns with the pattern of keeping common metric definitions in genai-utils.

Ready for re-review when you get a chance. Happy to address any further feedback.

@Nik-Reddy Nik-Reddy changed the title feat(openai): Add gen_ai.server.time_to_first_token metric for streaming feat(openai): Add gen_ai.client.operation.time_to_first_chunk metric for streaming Apr 15, 2026
@Nik-Reddy
Copy link
Copy Markdown
Author

Updated the metric name to align with the semantic conventions registry:

  • Before: gen_ai.client.time_to_first_token
  • After: gen_ai.client.operation.time_to_first_chunk

This matches the semconv-defined client metric with the correct .operation. infix (consistent with gen_ai.client.operation.duration) and uses time_to_first_chunk as the official metric name.

Bucket boundaries are now the semconv-specified values: [0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96, 81.92]

All helper functions and constants are in genai-utils as requested -- individual instrumentations just call the shared factory.

@xrmx @lmolkova Ready for re-review when you get a chance.

…for streaming

Implement the gen_ai.client.operation.time_to_first_chunk histogram metric
as defined in OpenTelemetry Semantic Conventions v1.38.0. This metric records
the time (in seconds) from request start to the first output chunk received
during streaming chat completions.

Changes:
- Add time_to_first_token_s field to LLMInvocation dataclass
- Add create_ttfc_histogram() factory with semconv-specified bucket boundaries
- InvocationMetricsRecorder now creates and records TTFC histogram
- First-token detection in stream wrappers for both new and legacy paths
- 4 test cases: sync/async streaming, non-streaming exclusion, tool-call streaming

Fixes open-telemetry#3932
@Nik-Reddy Nik-Reddy force-pushed the feat/genai-ttft-metric-3932 branch from 376170b to acbf5c4 Compare April 16, 2026 21:11
common_attributes[ServerAttributes.SERVER_PORT] = (
self._request_attributes[ServerAttributes.SERVER_PORT]
)
self._instruments.ttfc_histogram.record(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not happen in openai instrumentation, this logic is not openai specific and should live in utils

Comment on lines +116 to +117
self.time_to_first_token_s: float | None = None
"""Time to first token in seconds (streaming responses only)."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.time_to_first_token_s: float | None = None
"""Time to first token in seconds (streaming responses only)."""
self.time_to_first_chunk_s: float | None = None
"""Time to first chunk in seconds (streaming responses only)."""

seed: int | None = None
server_address: str | None = None
server_port: int | None = None
time_to_first_token_s: float | None = None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eternalcuriouslearner if I remember correctly you were exploring having common streaming helpers - I imagine if we had them in utils, we wouldn't need instrumentation libs to provide this and would populate it through that common code.

WDYT?

Copy link
Copy Markdown
Contributor

@eternalcuriouslearner eternalcuriouslearner Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @lmolkova. I am planning to move the streaming helpers into utils so that we can commonly reuse them. Those streaming helpers are going to follow the same ABC pattern we have in GenAiInvocaton class. I am planning to move them to utils after the following prs are merged:

#4443
#4274

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova I have a dumb question. I am assuming this attribute is going to available once after move to OpenTelemetry Semantic Conventions v1.38.0. Do we really need this pr?

Copy link
Copy Markdown
Member

@lmolkova lmolkova Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where 1.38.0 came from, time-to-first chunk metric was added in upcoming 1.41.0 (not released yet) and there is more coming in open-telemetry/semantic-conventions#3607, but I agree with you that stream helpers would be a better design choice. Also give that open-telemetry/semantic-conventions#3607 is not merged yet, I think it would be best to close this PR.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see #3607 landed and #4443 is merged too, so once #4274 goes in and @eternalcuriouslearner moves the streaming helpers into utils, I can rebase this to plug the TTFT metric into that shared infrastructure instead of having it in the openai instrumentation directly.

I'll keep this open for now and rework it once the streaming helpers are in place. Please let me know @lmolkova

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gen-ai Related to generative AI

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Metric: gen_ai.server.time_to_first_token

6 participants