fix(openai): capture trailing streamed usage chunk after finish_reason by valentin-ib · Pull Request #780 · agentjido/req_llm

valentin-ib · 2026-06-18T14:31:49Z

Description

With stream_options: {include_usage: true}, Azure OpenAI and OpenAI-compatible gateways (e.g. LiteLLM) send token usage in a separate SSE chunk that arrives after the finish_reason chunk, just before [DONE]:

data: {"choices":[{"finish_reason":"stop","index":0,"delta":{}}]}
data: {"choices":[{"index":0,"delta":{}}],"usage":{"prompt_tokens":12,"completion_tokens":7,...}}
data: [DONE]

default_decode_stream_event/2 flags the finish_reason chunk terminal?: true, so StreamServer finalizes the stream there. A consumer that reads ReqLLM.Response.usage/1 as soon as the stream "completes" then races the still-in-flight usage chunk and gets input_tokens: 0 / output_tokens: 0 (and therefore no cost). Non-streaming responses are unaffected — usage is in the single body. In practice this is deterministic: the usage chunk is a separate network frame, so the read almost always wins the race.

Fix

ReqLLM.Providers.OpenAI.ChatAPI.decode_stream_event/2 strips the terminal? flag off normal-completion finish_reason chunks, so the stream finalizes on [DONE] (or connection close) instead — by which point the trailing usage chunk has been accumulated into metadata.

Guards keep existing behavior intact:

Inline error chunks stay terminal. finish_reason: :error (and any meta carrying an :error key — how OpenAI-compatible gateways report mid-stream failures via data: {"error": ...}) keep terminal?, so failures still surface immediately instead of waiting for a [DONE] that won't come.
[DONE] and empty-choices usage chunks have no :finish_reason key, so they're untouched and keep their own terminal flag.

Scoped to the OpenAI ChatAPI driver; non-streaming and other providers are unaffected.

Alternative considered

The deeper root cause is that StreamServer snapshots metadata at the terminal chunk, before the trailing usage chunk merges. A fix there (defer finalization/metadata until the stream truly ends) would be more general but a larger change to the finalization lifecycle. This PR is the minimal, decoder-local fix — happy to take the broader approach instead if you'd prefer.

Type of Change

Bug fix (non-breaking change fixing an issue)
New feature (non-breaking change adding functionality)
Breaking change (fix or feature causing existing functionality to change)
Documentation update

Breaking Changes

None. The public API and the Response struct are unchanged; streamed Response.usage is simply populated where it was previously zero.

Testing

Tests pass (mix test)
Quality checks pass (mix format / compile clean; CI runs full mix quality)

New unit tests in test/providers/openai_chat_streaming_usage_test.exs (pure decode tests — no live API / fixtures needed):

finish_reason chunk is no longer terminal
the trailing usage chunk yields usage
a combined finish_reason + usage chunk still yields usage
an empty-choices usage chunk keeps its terminal flag
[DONE] stays terminal
an inline error chunk stays terminal

Checklist

My code follows the project's style guidelines
I have added tests that prove my fix works
All new and existing tests pass
My commits follow conventional commit format
I have NOT edited CHANGELOG.md (auto-generated by git_ops)

Related Issues

Closes #781

valentin-ib · 2026-06-18T14:38:23Z

For context, I identified this whilst setting up a token and cost usage dashboard, and noticed that the usage wasn't coming through (it was always 0). So wanted to raise it and check if anyone else thinks this is a reasonable amendment.

Some OpenAI-compatible providers — notably Azure OpenAI and gateways like LiteLLM — stream token usage in a SEPARATE chunk that arrives AFTER the finish_reason chunk and just before [DONE], when stream_options.include_usage is set: data: {"choices":[{"finish_reason":"stop","index":0,"delta":{}}]} data: {"choices":[{"index":0,"delta":{}}],"usage":{...}} data: [DONE] default_decode_stream_event flags the finish_reason chunk terminal?: true, which finalizes the stream there. A consumer that reads Response.usage right after the stream "completes" then races the still-in-flight usage chunk, so input/output tokens (and any cost derived from them) come back as zero. Non-streaming responses carry usage in the single body and are unaffected. ChatAPI.decode_stream_event/2 strips the terminal flag off normal-completion finish_reason chunks so the stream finalizes on [DONE] (or connection close) instead — by which point the usage chunk has been accumulated. Inline error chunks (finish_reason: :error, or any chunk carrying an :error key) keep their terminal flag so failures still surface immediately. [DONE] and empty-choices usage chunks have no :finish_reason key and are untouched. Adds regression tests for the chunk ordering, the combined finish_reason+usage chunk, empty-choices usage, and error-chunk termination.

mikehostetler · 2026-06-21T16:02:20Z

Thanks for the detailed repro. I dug into this against the latest PR head: after rebasing, the decoder-level claim in the original description no longer matches current main because default_decode_stream_event/2 already emits the finish_reason meta chunk without terminal?.

I removed the redundant ChatAPI override and added an end-to-end regression that replays the ordering from #781: content, then finish_reason: "stop", then a separate usage chunk with non-empty choices, then [DONE]. That test verifies StreamResponse.to_response/1 returns the trailing usage as non-zero Response.usage.

Could you please test the latest PR branch against your original Azure OpenAI / LiteLLM setup or token/cost dashboard repro? The simulated SSE path is covered and CI is green, but I want confirmation that the real provider path that produced zero usage is fixed before we treat #781 as fully closed.

valentin-ib force-pushed the usage-after-finish-reason branch from 71c59b1 to 87e3551 Compare June 18, 2026 15:09

mikehostetler added the needs_work Changes requested before merge label Jun 21, 2026

1Steamwork1 and others added 2 commits June 21, 2026 10:19

test(openai): cover trailing stream usage in responses

0d340dc

mikehostetler force-pushed the usage-after-finish-reason branch from 87e3551 to 0d340dc Compare June 21, 2026 15:29

mikehostetler added ready_to_merge and removed needs_work Changes requested before merge labels Jun 21, 2026

mikehostetler added needs_work Changes requested before merge and removed ready_to_merge labels Jun 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai): capture trailing streamed usage chunk after finish_reason#780

fix(openai): capture trailing streamed usage chunk after finish_reason#780
valentin-ib wants to merge 2 commits into
agentjido:mainfrom
valentin-ib:usage-after-finish-reason

valentin-ib commented Jun 18, 2026 •

edited

Loading

Uh oh!

valentin-ib commented Jun 18, 2026

Uh oh!

mikehostetler commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

valentin-ib commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fix

Alternative considered

Type of Change

Breaking Changes

Testing

Checklist

Related Issues

Uh oh!

valentin-ib commented Jun 18, 2026

Uh oh!

mikehostetler commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

valentin-ib commented Jun 18, 2026 •

edited

Loading