Patch @nicnocquee/dataqueue to fix processor stalling under groupConcurrency#786
Merged
Merged
Conversation
The Hermes data-plane processor stalled whenever a data-collection job ran long. With groupConcurrency=3 the queue claimed same-group jobs in batches of 3 and waited for the whole batch to settle before claiming more, so two fast jobs would finish and the third slow one blocked the entire pipeline group until it timed out. Vendor the upstream continuous-pool fix as a pnpm patch on 1.39.0 until a fixed release is published. startInBackground now keeps up to `concurrency` jobs in flight and refills each slot as it frees, never exceeding groupConcurrency. Patches both dist/index.js (ESM) and dist/index.cjs (CJS). Upstream PR: nicnocquee/dataqueue#41 Verified against a local Postgres: a slow plus five fast same-group jobs under groupConcurrency=2 complete the fast jobs while the slow one is still processing on the patched build, and stall (time out) on the unpatched build.
Both the hermes-dashboard and user-registration Dockerfiles explicitly list what to COPY and omitted the patches/ directory, so pnpm install could not find patches/@nicnocquee__dataqueue@1.39.0.patch and failed with ENOENT in CI.
Previously, data-collection awaited the entire round's fetch batch before persisting any URL — sources only reached the Agent Data API in a burst once the slowest fetch in the round finished. Each URL is now persisted the moment its own fetch and quality gates pass, via a new optional onOutcome hook on performWebFetch that fires inside pMap per slot.
pnpm generates patch filenames like @scope__pkg@1.0.0.patch which cannot be kebab-case by convention. Exclude the patches/ directory from the new-file naming check the same way markdown files are.
This was referenced Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Hermes data-plane processor was stalling after a partial pipeline run. With
concurrency=3andgroupConcurrency=3, the old processor claimed a batch of 3 jobs and then sat idle until all three finished before polling again. One slow job would hold up the two freed slots for the rest of the batch — so pending pipeline jobs went unclaimed for minutes while workers had capacity.This PR patches
@nicnocquee/dataqueue@1.39.0locally withpnpm patchwhile the upstream fix (nicnocquee/dataqueue#41) awaits review. The patch replaces the batch-barrier loop with a continuous pool: each job callspump()from its.finally(), refilling its slot the moment it finishes.Related issues
Closes #785
Important changes
patches/@nicnocquee__dataqueue@1.39.0.patch— rewritesstartInBackgroundin bothdist/index.cjsanddist/index.js. The oldintervalId/currentBatchPromisestate is replaced withpollTimer/claimInProgress/inFlight/inFlightJobs. A newpump()function claims up tomin(concurrency - inFlight, batchSize)jobs and each job's.finally()decrementsinFlightand callsvoid pump()again, so freed slots are filled immediately rather than waiting for the whole batch to settle.package.json— addspnpm.patchedDependenciesentry for@nicnocquee/dataqueue@1.39.0.pnpm-lock.yaml— updated to resolve the patched virtual store entry.Other changes
None.
Key files to review
patches/@nicnocquee__dataqueue@1.39.0.patch— the complete fix applied to both CJS and ESM builds.package.json—pnpm.patchedDependencieswiring.How to test
pnpm --filter hermes-worker build— should complete without errors.grep -c claimInProgress apps/hermes/worker/dist/index.js— should return 4.concurrency=2, groupConcurrency=2, 1 slow job and 5 fast jobs (same group) should complete all 5 fast jobs while the slow one is still running. This mirrors the regression test in Refill freed concurrency slots continuously instead of per batch nicnocquee/dataqueue#41, which times out on the old code and passes on the patched build.@nicnocquee/dataqueuepast 1.39.0 and delete both the patch file and thepatchedDependenciesentry.