diag(merge): log 404 cause + pin Job queue_name + max_jobs=1#8
Conversation
We're seeing 404 "Merge job ... not found" on the very first poll, ~1s after a successful enqueue. The handler has four distinct branches that all emit the same opaque 404 (info=None / function mismatch / args mismatch / post-info status=None), so the log line doesn't tell us which one fired. Two changes: 1. Log a server-side warning at each branch so the next 404 says *why*. Caller still sees the same generic 404 — we don't want to leak the existence (or shape) of other tasks' jobs. 2. Pass `_queue_name=DEFAULT_QUEUE_NAME` through `JobQueue._job()`. Job.info() and Job.result() don't depend on queue_name, but Job.status() reads the queue ZSET (`zscore(self._queue_name, ...)`) to detect the `queued` state — without this, a job still waiting in our `supoclip_tasks` queue would report `not_found` because the Job handle defaulted to arq's `arq:queue`. Defensive even though it's almost certainly not the cause of the current 404 (info() is what fires first and that's queue-agnostic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR hardens merge-job resolution by pinning the queue name in Job construction and adding detailed warning logs to the merge-job polling endpoint. The core fix ensures Job.status() can correctly detect queued jobs; the logging adds observability for debugging edge cases like missing info, function/args mismatches, and race-condition status evictions. ChangesJob Queue Resolution and Logging
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Summary
WARNINGat each of the 4 branches inGET /clips/merge_jobs/{merge_job_id}that emit the same opaque 404, so we can tell why a merge job appears "not found"_queue_name=DEFAULT_QUEUE_NAMEonJob(...)constructed byJobQueue._job()soJob.status()can detect queued-but-not-started jobs in thesupoclip_tasksqueueWorkerSettings.max_jobsfrom 4 -> 1 so one ffmpeg encode gets the full task CPU budget (pairs with BN-side CDK bump to 4 vCPU on prod workers)Why
BN-side smoke test is hitting an immediate 404 on the first poll, ~1s after a successful enqueue. The handler has four code paths that all emit the same generic message — new log line tells us which fired without leaking detail to the caller.
Separately: BN-side worker tier bump from 2 -> 4 vCPU is pointless if a second concurrent job could land on the same worker and halve the per-job CPU budget. libx264 doesn't scale linearly past ~4-6 threads so packing concurrent encodes onto one worker is a net loss; horizontal scaling is the right lever.
Test plan
aws logs tailfilter"Merge job not found"on the supoclip backend log group — should print one of:info=None merge_job_id=… task_id=…function mismatch … function=…args mismatch … info_args=…status=None after info OK …max_jobs=1on startup🤖 Generated with Claude Code