feat(manifest): discover and scan nested Bazel sub-workspaces#1363
feat(manifest): discover and scan nested Bazel sub-workspaces#1363Simon (simonhj) wants to merge 3 commits into
Conversation
Add findWorkspaceRoots, a dependency-tree walk that returns every directory containing a Bazel workspace marker (MODULE.bazel / WORKSPACE / WORKSPACE.bazel). The walk takes an injected directory-prune policy (ignoreDirNames / ignoreDirPrefixes), traverses in deterministic sorted order under a visited-directory budget, and caps the number of roots, warning unconditionally when the cap or budget truncates discovery. This is the discovery primitive for scanning nested sub-workspaces; it has no coupling to the extraction pipeline.
Extraction previously scanned only the single workspace rooted at the scan cwd, so a monorepo with nested workspaces (e.g. a mobile/MODULE.bazel under the root) had its sub-workspaces silently ignored. Rename the single-workspace logic to extractOneWorkspace (behavior unchanged) and wrap it in a new extractBazelToMaven that discovers every workspace root via findWorkspaceRoots and runs extraction once per root. Each nested workspace writes its manifest at a path mirroring its location relative to the scan root; the root workspace writes exactly where it did before, so single-workspace output is byte-for-byte unchanged. The result gains an additive manifestPaths array (root first); manifestPath is retained as its first element. Export IGNORED_DIRS from the glob utils so the Bazel default prune policy composes it rather than duplicating the list.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 3 potential issues.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.
| // Only flag "no ecosystem" when EVERY workspace reported it; a single | ||
| // workspace with Maven repos means the ecosystem is present. | ||
| noEcosystemFound: anyEcosystemFound ? undefined : true, | ||
| ok: anyOk, |
There was a problem hiding this comment.
manifestPath not scan root
Medium Severity
The extractBazelToMaven function's manifestPath return value is intended for the scan-root workspace, but it's assigned the first manifest found in discovery order. If the root workspace fails to produce a manifest, manifestPath can incorrectly point to a nested workspace's manifest. This contradicts its documented purpose and can mislead callers, especially when the overall operation is marked as successful.
Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.
| manifestPaths: [], | ||
| noEcosystemFound: true, | ||
| ok: false, | ||
| } |
There was a problem hiding this comment.
Missing scan dir misclassified
Medium Severity
When the scan directory does not exist, findWorkspaceRoots returns no roots and extractBazelToMaven exits with noEcosystemFound: true. Previously the extractor still ran and detectWorkspaceMode failed with a hard extraction error (ok: false without noEcosystemFound), which changes CLI outcome messaging and failure classification.
Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.
| // one manifest under `opts.out`. This is the original single-workspace | ||
| // algorithm, behavior-unchanged; the exported `extractBazelToMaven` wraps it | ||
| // to run once per discovered (sub-)workspace. | ||
| async function extractOneWorkspace( |
There was a problem hiding this comment.
Relative output base per workspace
Medium Severity
validateOutputBase resolves a relative --bazel-output-base against each workspace’s cwd. With multiple discovered roots, the same flag is validated (and possibly created) under different per-workspace paths instead of once relative to the scan root.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.
|
Closing as redundant: the sub-workspace discovery (and the broader Bazel-native Maven extraction rework) this PR was the first slice of has already landed on |


What
Adds nested sub-workspace discovery to
socket manifest bazel. Today thecommand extracts Maven dependencies from a single workspace rooted at the scan
directory; a repo that nests additional Bazel workspaces (e.g. a
mobile/sub-tree with its own
MODULE.bazel) only ever gets its top-level workspacescanned, so the nested workspace's dependencies are missed.
This change walks the directory tree, finds every directory containing a
workspace marker (
MODULE.bazel/WORKSPACE/WORKSPACE.bazel), and runsthe existing single-workspace extractor once per discovered root, writing one
manifest per workspace at a path mirroring its location.
How
bazel-workspace-walk.mts:findWorkspaceRoots()— a deterministic,budget-bounded tree walk that prunes VCS /
node_modules/bazel-*outputsymlink directories and caps the number of roots.
extract_bazel_to_maven.mts: the previous single-workspace body is nowextractOneWorkspace()(behavior unchanged); a thin wrapper walks the rootsand runs it per workspace, aggregating results.
src/utils/glob.mts: export the existingIGNORED_DIRSso the walker reusesthe repo-wide ignore set instead of duplicating it.
Back-compat
For the common single-workspace case (only the root is found) the output path,
return shape, and status are unchanged — verified by test.
Scope
This is the first of a planned series that decomposes a larger reworking of the
Bazel JVM extraction pipeline into reviewable pieces. It intentionally layers on
the existing extraction mechanism; the metadata-
cqueryextraction swap,the
bazel mod show_extensiondiscovery rewrite, and an honestpartial/complete status model are separate follow-up PRs.
Known limitation (tracked for the completeness follow-up)
When multiple workspaces are discovered and one fails extraction while another
succeeds, the aggregate currently reports success (the per-workspace failure is
logged but not surfaced as an overall failure). The follow-up that introduces
the partial/complete status model will make a failed workspace mark the run
partial rather than silently complete.
Testing
pnpm check(lint + type-check): clean.orchestrator integration (per-workspace extraction, mirrored manifest paths,
single-workspace back-compat).
findWorkspaceRootsagainst a real nested-workspace repo: itdiscovers the nested
mobile/workspace the single-workspace path missed.Note
Medium Risk
Changes how Bazel repos are scanned and can miss nested workspaces if walk budgets cap discovery; aggregate
okcan stay true when one workspace fails if another succeeds (noted follow-up). Core extraction per workspace is unchanged.Overview
Bazel Maven extraction now discovers every workspace under the scan root (directories with
MODULE.bazel,WORKSPACE, orWORKSPACE.bazel) and runs the existing single-workspace flow once per root, instead of only treating the scan directory as one workspace.A new
findWorkspaceRootswalker performs a deterministic tree walk with caller-supplied prune rules (IGNORED_DIRSplus VCS/IDE dirs andbazel-*prefixes), no depth limit, and guards via a visited-directory budget and a 16-root cap—both loglogger.warnwhen they truncate.extractBazelToMavenbecomes a thin orchestrator around unchangedextractOneWorkspacelogic; nested workspaces write manifests underoutpaths that mirror their relative location (root output unchanged for the single-workspace case). Results addmanifestPathswhile keepingmanifestPathas the first root for compatibility.IGNORED_DIRSis exported fromglob.mtsso the Bazel walker reuses the shared ignore list. Unit tests cover the walker and multi-workspace orchestration.Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.