Skip to content

chore(infra): park Langfuse tracing on the agent Lambda#139

Merged
ikuzuki merged 1 commit intomainfrom
chore/park-langfuse-tracing
Apr 21, 2026
Merged

chore(infra): park Langfuse tracing on the agent Lambda#139
ikuzuki merged 1 commit intomainfrom
chore/park-langfuse-tracing

Conversation

@ikuzuki
Copy link
Copy Markdown
Owner

@ikuzuki ikuzuki commented Apr 21, 2026

Summary

Three rounds of env-var tuning (#133, #137, #138) failed to cap the request-path hang at <60s. Test on 2026-04-21 with LANGFUSE_TIMEOUT=2 applied via #138 still hit Lambda's 60s timeout on /team — the blocking call wasn't root-caused, and the cost of more env-var guessing outweighs the value of agent traces at this scale.

Changes

  • lambda.tf: drop LANGFUSE_TIMEOUT and LANGFUSE_FLUSH_INTERVAL (both no-ops when tracing is disabled), rewrite the comment next to LANGFUSE_TRACING_ENABLED="false" to reflect the parked decision rather than the stale "flip via CLI after apply" plan.

What stays

  • Enrichment services keep Langfuse tracing — they run in normal (non-streaming, no-LWA) Lambda and don't show this hang.
  • LANGFUSE_TRACING_ENABLED kill-switch stays in the Terraform env block for the agent, in case someone attempts re-entry later.

Re-entry path (not env var tuning)

Either:

  1. Local reproduction with a debugger. Run the agent under uvicorn locally with real Langfuse credentials, hit /team, and find the exact stack frame blocked at the 5-second mark.
  2. ADOT Lambda Extension. Bake the collector binary into the Dockerfile, export to localhost:4318, let the extension forward to Langfuse Cloud. Moves every potentially-blocking call out of the request thread by construction.

Test plan

  • terraform fmt + terraform validate clean
  • CI green
  • After apply: /team via CloudFront returns <5s (already verified manually — live Lambda matches this config today since we flipped LANGFUSE_TRACING_ENABLED=false via CLI)

🤖 Generated with Claude Code

Three rounds of env-var tuning (#133 / #137 / #138) failed to cap the
request-path hang at <60s. Test on 2026-04-21 with LANGFUSE_TIMEOUT=2
confirmed live still hit Lambda's 60s timeout on /team; the actual
blocking call was not root-caused.

- Drop LANGFUSE_TIMEOUT and LANGFUSE_FLUSH_INTERVAL from lambda.tf —
  both are no-ops when tracing is disabled, keeping them was cargo.
- Rewrite the comment next to LANGFUSE_TRACING_ENABLED="false" to
  reflect the parked decision rather than the stale "flip via CLI
  after apply" plan.

Enrichment services retain tracing; they run in normal Lambda (no LWA,
no streaming) and don't show this class of hang. Re-entry path is
either a local reproduction with a debugger attached or switching to
the ADOT Lambda Extension — not more env-var guessing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ikuzuki ikuzuki merged commit a60a7ba into main Apr 21, 2026
9 checks passed
@ikuzuki ikuzuki deleted the chore/park-langfuse-tracing branch April 21, 2026 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant