Skip to content

feat(manifest): discover and scan nested Bazel sub-workspaces#1363

Closed
Simon (simonhj) wants to merge 3 commits into
v1.xfrom
simon/bazel-subworkspace-pr1
Closed

feat(manifest): discover and scan nested Bazel sub-workspaces#1363
Simon (simonhj) wants to merge 3 commits into
v1.xfrom
simon/bazel-subworkspace-pr1

Conversation

@simonhj

@simonhj Simon (simonhj) commented Jun 15, 2026

Copy link
Copy Markdown

What

Adds nested sub-workspace discovery to socket manifest bazel. Today the
command extracts Maven dependencies from a single workspace rooted at the scan
directory; a repo that nests additional Bazel workspaces (e.g. a mobile/
sub-tree with its own MODULE.bazel) only ever gets its top-level workspace
scanned, so the nested workspace's dependencies are missed.

This change walks the directory tree, finds every directory containing a
workspace marker (MODULE.bazel / WORKSPACE / WORKSPACE.bazel), and runs
the existing single-workspace extractor once per discovered root, writing one
manifest per workspace at a path mirroring its location.

How

  • New bazel-workspace-walk.mts: findWorkspaceRoots() — a deterministic,
    budget-bounded tree walk that prunes VCS / node_modules / bazel-* output
    symlink directories and caps the number of roots.
  • extract_bazel_to_maven.mts: the previous single-workspace body is now
    extractOneWorkspace() (behavior unchanged); a thin wrapper walks the roots
    and runs it per workspace, aggregating results.
  • src/utils/glob.mts: export the existing IGNORED_DIRS so the walker reuses
    the repo-wide ignore set instead of duplicating it.

Back-compat

For the common single-workspace case (only the root is found) the output path,
return shape, and status are unchanged — verified by test.

Scope

This is the first of a planned series that decomposes a larger reworking of the
Bazel JVM extraction pipeline into reviewable pieces. It intentionally layers on
the existing extraction mechanism; the metadata-cquery extraction swap,
the bazel mod show_extension discovery rewrite, and an honest
partial/complete status model are separate follow-up PRs.

Known limitation (tracked for the completeness follow-up)

When multiple workspaces are discovered and one fails extraction while another
succeeds, the aggregate currently reports success (the per-workspace failure is
logged but not surfaced as an overall failure). The follow-up that introduces
the partial/complete status model will make a failed workspace mark the run
partial rather than silently complete.

Testing

  • pnpm check (lint + type-check): clean.
  • New unit tests: walker behavior (prune, nested discovery, caps) and
    orchestrator integration (per-workspace extraction, mirrored manifest paths,
    single-workspace back-compat).
  • Validated findWorkspaceRoots against a real nested-workspace repo: it
    discovers the nested mobile/ workspace the single-workspace path missed.

Note

Medium Risk
Changes how Bazel repos are scanned and can miss nested workspaces if walk budgets cap discovery; aggregate ok can stay true when one workspace fails if another succeeds (noted follow-up). Core extraction per workspace is unchanged.

Overview
Bazel Maven extraction now discovers every workspace under the scan root (directories with MODULE.bazel, WORKSPACE, or WORKSPACE.bazel) and runs the existing single-workspace flow once per root, instead of only treating the scan directory as one workspace.

A new findWorkspaceRoots walker performs a deterministic tree walk with caller-supplied prune rules (IGNORED_DIRS plus VCS/IDE dirs and bazel-* prefixes), no depth limit, and guards via a visited-directory budget and a 16-root cap—both log logger.warn when they truncate. extractBazelToMaven becomes a thin orchestrator around unchanged extractOneWorkspace logic; nested workspaces write manifests under out paths that mirror their relative location (root output unchanged for the single-workspace case). Results add manifestPaths while keeping manifestPath as the first root for compatibility.

IGNORED_DIRS is exported from glob.mts so the Bazel walker reuses the shared ignore list. Unit tests cover the walker and multi-workspace orchestration.

Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.

Add findWorkspaceRoots, a dependency-tree walk that returns every directory
containing a Bazel workspace marker (MODULE.bazel / WORKSPACE /
WORKSPACE.bazel). The walk takes an injected directory-prune policy
(ignoreDirNames / ignoreDirPrefixes), traverses in deterministic sorted
order under a visited-directory budget, and caps the number of roots,
warning unconditionally when the cap or budget truncates discovery.

This is the discovery primitive for scanning nested sub-workspaces; it has
no coupling to the extraction pipeline.
Extraction previously scanned only the single workspace rooted at the scan
cwd, so a monorepo with nested workspaces (e.g. a mobile/MODULE.bazel under
the root) had its sub-workspaces silently ignored.

Rename the single-workspace logic to extractOneWorkspace (behavior
unchanged) and wrap it in a new extractBazelToMaven that discovers every
workspace root via findWorkspaceRoots and runs extraction once per root.
Each nested workspace writes its manifest at a path mirroring its location
relative to the scan root; the root workspace writes exactly where it did
before, so single-workspace output is byte-for-byte unchanged. The result
gains an additive manifestPaths array (root first); manifestPath is
retained as its first element.

Export IGNORED_DIRS from the glob utils so the Bazel default prune policy
composes it rather than duplicating the list.
@simonhj Simon (simonhj) marked this pull request as draft June 15, 2026 08:02

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.

// Only flag "no ecosystem" when EVERY workspace reported it; a single
// workspace with Maven repos means the ecosystem is present.
noEcosystemFound: anyEcosystemFound ? undefined : true,
ok: anyOk,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manifestPath not scan root

Medium Severity

The extractBazelToMaven function's manifestPath return value is intended for the scan-root workspace, but it's assigned the first manifest found in discovery order. If the root workspace fails to produce a manifest, manifestPath can incorrectly point to a nested workspace's manifest. This contradicts its documented purpose and can mislead callers, especially when the overall operation is marked as successful.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.

manifestPaths: [],
noEcosystemFound: true,
ok: false,
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing scan dir misclassified

Medium Severity

When the scan directory does not exist, findWorkspaceRoots returns no roots and extractBazelToMaven exits with noEcosystemFound: true. Previously the extractor still ran and detectWorkspaceMode failed with a hard extraction error (ok: false without noEcosystemFound), which changes CLI outcome messaging and failure classification.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.

// one manifest under `opts.out`. This is the original single-workspace
// algorithm, behavior-unchanged; the exported `extractBazelToMaven` wraps it
// to run once per discovered (sub-)workspace.
async function extractOneWorkspace(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relative output base per workspace

Medium Severity

validateOutputBase resolves a relative --bazel-output-base against each workspace’s cwd. With multiple discovered roots, the same flag is validated (and possibly created) under different per-workspace paths instead of once relative to the scan root.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 20fab7c. Configure here.

@simonhj

Copy link
Copy Markdown
Author

Closing as redundant: the sub-workspace discovery (and the broader Bazel-native Maven extraction rework) this PR was the first slice of has already landed on v1.x via #1342. The remaining, unmerged value from this line of work is a set of correctness hardening fixes (completeness signal, committed-lockfile gating, show_extension failure classification, cquery timeout detection) which I'm opening as a separate PR against current v1.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant