feat(openai): Add gen_ai.client.operation.time_to_first_chunk metric for streaming#4415
feat(openai): Add gen_ai.client.operation.time_to_first_chunk metric for streaming#4415Nik-Reddy wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
bf9fbee to
40ac7e3
Compare
|
Hi @lmolkova, I have addressed your feedback: (1) Renamed gen_ai.server.time_to_first_token to gen_ai.client.time_to_first_token throughout. (2) Moved the TTFT constant, bucket boundaries, and get_metric_data_points() helper into util/opentelemetry-util-genai instruments.py so they are shared across instrumentation libraries. @xrmx earlier feedback was also incorporated (helper returns all matches, tests assert count). Would appreciate a re-review when convenient. Thanks! |
9a5cba5 to
69ab567
Compare
|
Rebased on latest main and addressed @lmolkova's feedback — refactored \instruments.py\ to use the shared \create_duration_histogram, \create_token_histogram, and \create_ttft_histogram\ helpers from \opentelemetry.util.genai.instruments\ instead of defining bucket boundaries inline. This removes ~50 lines of duplicated configuration and aligns with the pattern of keeping common metric definitions in genai-utils. Ready for re-review when you get a chance. Happy to address any further feedback. |
|
Updated the metric name to align with the semantic conventions registry:
This matches the semconv-defined client metric with the correct Bucket boundaries are now the semconv-specified values: All helper functions and constants are in |
…for streaming Implement the gen_ai.client.operation.time_to_first_chunk histogram metric as defined in OpenTelemetry Semantic Conventions v1.38.0. This metric records the time (in seconds) from request start to the first output chunk received during streaming chat completions. Changes: - Add time_to_first_token_s field to LLMInvocation dataclass - Add create_ttfc_histogram() factory with semconv-specified bucket boundaries - InvocationMetricsRecorder now creates and records TTFC histogram - First-token detection in stream wrappers for both new and legacy paths - 4 test cases: sync/async streaming, non-streaming exclusion, tool-call streaming Fixes open-telemetry#3932
376170b to
acbf5c4
Compare
| common_attributes[ServerAttributes.SERVER_PORT] = ( | ||
| self._request_attributes[ServerAttributes.SERVER_PORT] | ||
| ) | ||
| self._instruments.ttfc_histogram.record( |
There was a problem hiding this comment.
this should not happen in openai instrumentation, this logic is not openai specific and should live in utils
| self.time_to_first_token_s: float | None = None | ||
| """Time to first token in seconds (streaming responses only).""" |
There was a problem hiding this comment.
| self.time_to_first_token_s: float | None = None | |
| """Time to first token in seconds (streaming responses only).""" | |
| self.time_to_first_chunk_s: float | None = None | |
| """Time to first chunk in seconds (streaming responses only).""" |
| seed: int | None = None | ||
| server_address: str | None = None | ||
| server_port: int | None = None | ||
| time_to_first_token_s: float | None = None |
There was a problem hiding this comment.
@eternalcuriouslearner if I remember correctly you were exploring having common streaming helpers - I imagine if we had them in utils, we wouldn't need instrumentation libs to provide this and would populate it through that common code.
WDYT?
There was a problem hiding this comment.
There was a problem hiding this comment.
@lmolkova I have a dumb question. I am assuming this attribute is going to available once after move to OpenTelemetry Semantic Conventions v1.38.0. Do we really need this pr?
There was a problem hiding this comment.
I don't know where 1.38.0 came from, time-to-first chunk metric was added in upcoming 1.41.0 (not released yet) and there is more coming in open-telemetry/semantic-conventions#3607, but I agree with you that stream helpers would be a better design choice. Also give that open-telemetry/semantic-conventions#3607 is not merged yet, I think it would be best to close this PR.
There was a problem hiding this comment.
I see #3607 landed and #4443 is merged too, so once #4274 goes in and @eternalcuriouslearner moves the streaming helpers into utils, I can rebase this to plug the TTFT metric into that shared infrastructure instead of having it in the openai instrumentation directly.
I'll keep this open for now and rework it once the streaming helpers are in place. Please let me know @lmolkova
Description
Implement the
gen_ai.client.operation.time_to_first_chunkhistogram metric as defined in OpenTelemetry Semantic Conventions v1.38.0. This metric records the time (in seconds) from request start to the first output chunk received during streaming chat completions.This was requested in #3932 -- the semantic convention defines the metric but no Python instrumentation existed for it.
Note: Issue #3932 references
gen_ai.server.time_to_first_token(server-side). This PR implements the client-side equivalentgen_ai.client.operation.time_to_first_chunkper the semconv registry, which measures time from when the client issues the request to when the first response chunk arrives in the stream.Fixes #3932
Changes
util/genai/types.py: Addedtime_to_first_token_sfield toLLMInvocationdataclassutil/genai/instruments.py: AddedGEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNKconstant andcreate_ttfc_histogram()factory with semconv-specified bucket boundariesutil/genai/metrics.py:InvocationMetricsRecordernow creates and records TTFC histogram (only for successful streaming responses)openai_v2/instruments.py: Addedttfc_histogramtoInstrumentsclass via shared helperopenai_v2/patch.py: First-token detection in stream wrappers, wired into both new and legacy pathstests/test_ttft_metrics.py: 4 test cases covering sync/async streaming, non-streaming exclusion, and tool-call streamingType of change
How Has This Been Tested?
All new TTFC tests pass. All existing tests continue to pass.
Does This PR Require a Core Repo Change?
Checklist: