fix: submodule support — reinstate clone, fix pull, add hasSubmodules flag#1550
Open
fix: submodule support — reinstate clone, fix pull, add hasSubmodules flag#1550
Conversation
Please ensure your pull request adheres to the following guidelines: - [ ] make sure to link the related issues in this description - [ ] when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes ## Related Issues Thanks for contributing!
PR #1413 added --recurse-submodules to clone but missed pull, so incremental updates of repos with submodules (e.g. Okta's okta-digital-aem-sites) did not fetch submodule changes. Now pull uses the same org-scoped extraheader auth as clone — submodules on the same customer org inherit credentials and are updated to the SHA pinned in the parent's tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an optional boolean `hasSubmodules` on `site.code` so the import worker can record whether the cloned repo contains a .gitmodules file. Downstream consumers (autofix, suggestion generation) use this flag to decide whether to parse .gitmodules and route patches/PRs across parent + submodule repos. Additive, unset by default — no behaviour change for existing sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dmaurya929
approved these changes
Apr 21, 2026
Replaces the flat boolean with a nested metadata object that carries
the full submodule URL list and an external-vs-internal classification.
Consumers that only need presence check `metadata?.submodules?.urls?.length`;
those that need the URL list or external flag read the same object.
Shape:
code.metadata = {
submodules: {
external: boolean, // true if any URL points off the parent's host
urls: string[] // verbatim from .gitmodules, creds stripped
}
}
// `submodules` is omitted when the repo has no .gitmodules
The importer always overwrites `metadata` on each import so a re-import
that finds no submodules clears stale entries from an earlier import.
Adds CodeMetadata + SubmodulesMetadata TypeScript interfaces and
matching unit tests. Adds a JSDoc doc-comment on the `code` attribute
summarising the contract.
No migration needed — hasSubmodules was unreleased (shipped on this
branch but not yet merged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This PR will trigger a minor release when merged. |
Background: CM BYOG repos are proxied by the CM repo service at
`{cmRepoUrl}/api/program/{pid}/repository/{numericId}.git`. When a
customer's .gitmodules uses relative URLs (`../foo`) — which is the
common case — git resolves them against the parent's clone URL and
produces `.../repository/foo`. The proxy rejects those because it only
serves numeric repository IDs, breaking submodule clone end-to-end.
Fix: a new `cmProgramRepos` map on `site.code.metadata.submodules`
that maps each CM repo's real name (last path segment of its
`url`/`proxyUrl`, NOT the `repo` display field) to its numeric id.
The cm-client reads this at clone/pull time to rewrite name-based
submodule URLs in .git/config to the numeric-id form before running
`git submodule update`. Populated at onboarding via a single call to
`GET /api/program/{pid}/repositories`; the importer doesn't have CM
Management API access so it can't derive the map itself.
cm-client changes:
- Export `isBYOG(repoType)` helper (anything !== STANDARD).
- Split clone() by type:
* STANDARD: `clone --recurse-submodules` + (on ref switch) sync +
update --init --recursive. Unchanged behaviour.
* BYOG: `clone --no-recurse-submodules` + checkout ref + new
`#resolveByogSubmodules` helper that runs submodule init,
rewrites URLs via cmProgramRepos, then submodule update
--recursive with the CM repo service extraheaders re-applied.
- Same split for pull(): STANDARD keeps `pull --recurse-submodules`,
BYOG pulls the parent only and then runs the same rewrite pass to
pick up new submodules introduced by the pulled commits.
- `#resolveByogSubmodules` is idempotent and explicitly never calls
`git submodule sync` — sync would copy .gitmodules URLs back into
.git/config and undo the rewrite. Heavily commented.
- Graceful fallbacks: no .gitmodules (skip), no map (warn, let the
subsequent update fail naturally — parent clone is still usable),
submodule name not in map (leave URL alone).
Schema / types:
- `SubmodulesMetadata.cmProgramRepos?: Record<string, string>` with
a thorough TS doc comment explaining the lifecycle, who writes it,
and why standard repos don't need it.
- site.schema.js JSDoc distinguishes importer-written fields from
onboarding-written fields.
Testing: validated end-to-end against eni-plenitude (program 123722,
parent repo 233725, 12 relative submodules on Azure DevOps BYOG) —
all submodules cloned at their pinned SHAs via the CM proxy. Unit
tests: 79 passing, 100% coverage (up from 70).
TODO follow-ups tracked elsewhere:
- Ask CM to support name-based routes on the repo proxy (would
remove the need for this map entirely).
- Consider exposing customer-native URLs + stored PAT so we could
bypass the proxy for BYOG altogether.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the runtime name-lookup logic with a flat, onboarding-populated
`submoduleMap` array of `{path, url}` entries. The cm-client iterates
the list mechanically and writes each entry into `.git/config` before
running `submodule update`. All name disambiguation, URL classification,
and collision handling is done at onboarding; the runtime is purely
mechanical and free of fuzzy matching.
This unblocks two previously unsupported cases:
- Mixed BYOG + standard submodules under a single BYOG parent. The
new `#buildSubmoduleAuthArgs` helper attaches multiple
`http.<scope>.extraheader` args on a single `git submodule update`
invocation: a Bearer-based scope on the CM proxy host for BYOG
submodules, and a Basic-auth scope on `git.cloudmanager.adobe.com/
{orgName}/` for standard submodules. Git applies each scope's
headers only when the outgoing URL prefix matches.
- Stale parent gitlinks (where the parent's pinned commit SHAs do
not exist in the submodule's current commit graph, e.g. after a
submodule is migrated between BYOG and standard or its history
is rewritten). `git submodule update --force --recursive` resets
the working tree to whatever HEAD points at after fetch instead
of failing silently with an empty working tree.
Other notable changes:
- New exported constant `GIT_CLOUD_MANAGER_HOST` (= `git.cloudmanager.
adobe.com`) so the standard-host literal is centralised.
- `clone()` and `pull()` accept `submoduleMap` instead of the previous
`cmProgramRepos` map. The legacy field is unused and removed.
- `data-access` schema: `SubmodulesMetadata.submoduleMap` documented
in `index.d.ts` with the runtime contract, scope-derivation rules,
and the URL-form decision (proxy vs standard).
- cm-client `README.md` gains a "Submodule handling" section and a
"Known limitations" section calling out cases that aren't supported
by this PR (out-of-program submodules, broken BYOG connections,
cross-org standard submodules, third-party hosts, recursive
depth > 1, map drift). Each is documented as out-of-scope and a
candidate for a follow-up PR.
Tests: 80 passing, 100% coverage on the cm-client. 1920 passing on
data-access.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The workspace's eslint config enforces the `curly` rule unconditionally. The local checkout's eslint had been more permissive on a single-statement `if` body, so the issue slipped through. Fix brings the inner `if (orgName) orgs.add(orgName)` into block form so CI lint passes. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full git-submodule support for both standard and BYOG Cloud Manager repos, plus the data-access schema fields downstream consumers need to drive the runtime.
The branch sits on top of
mainand contains five logical commits, each independently reviewable:fix: clone functionality to include submodules (#1413)— cherry-pick of the original submodule-clone fix that was reverted in Revert "fix: clone functionality to include submodules (#1413)" #1549 because the Lambdagit-layerwas missing the submodule helper binaries. Withspacecat-git-layer:4now carryinggit-submoduleandgit-submodule--helper(see adobe/spacecat-infrastructure#470), this can safely go back in. The commit also carries the per-operation git timeout bump that landed alongside the original fix:GIT_OPERATION_TIMEOUT_MSdefaults to600_000ms (10 min, up from 120s) and is now overridable via env var — recursive clones traverse multiple submodule repos and the old 2-minute budget was too tight.fix: propagate --recurse-submodules to incremental pull— the original PR patchedclonebut notpull. The incremental-update path usespulland was still skipping submodules.feat: add hasSubmodules field to site.code— initial schema work. Superseded in shape by commits 4 and 5 below; included for review continuity.refactor: replace code.hasSubmodules with code.metadata.submodules— moves submodule metadata into a nestedmetadata.submodules: { external, urls }object so additional fields can be added without further schema churn.refactor: precomputed submoduleMap for BYOG submodule clones— the substantive change. Replaces the previous runtime name-lookup logic with a flat, onboarding-populatedsubmoduleMaparray of{ path, url }entries. The cm-client iterates it mechanically and writes each entry into.git/configbefore runningsubmodule update. All name disambiguation, URL classification, and collision handling is done at onboarding; the runtime is purely mechanical.What's new in commit 5 (the meat of the change)
Submodule pipeline split by parent repo type
git clone --recurse-submodulesandpull --recurse-submoduleswork natively..gitmodulesURLs resolve to the same host as the parent (git.cloudmanager.adobe.com/{orgName}/...); the existing org-scoped Basic-auth extraheader covers every submodule transparently. No additional configuration needed.--no-recurse-submodulesat clone time, followed by a second pass driven bysubmoduleMap. The CM repo service proxy can't serve relative or SSH.gitmodulesURLs (they resolve to name-based paths the proxy rejects), so we rewrite each submodule's URL in.git/configto a CM-reachable form before runningsubmodule update..gitmodulesitself is never modified — the working tree stays clean.submoduleMapdata contractEach entry says: "for the submodule at
<path>in.gitmodules, write<url>into.git/config submodule.<path>.url." The runtime decides which auth scope to apply by parsing eachurl's host — proxy URLs get the BYOG triple (Bearer + x-api-key + x-gw-ims-org-id),git.cloudmanager.adobe.comURLs get a Basic header fromCM_STANDARD_REPO_CREDENTIALS[programId]scoped to{origin}/{orgName}/. Both scopes coexist on a singlegit submodule updateinvocation.Why this is one map driven by URLs (not branched by repo type at runtime)
By baking the URL form into the map at onboarding, the runtime has no need to know what
typethe underlying repo is — the URL is the source of truth for both routing AND auth-scope selection. This handles every BYOG configuration we've encountered:submodule updateinvocation; git applies the right headers per-URL by prefix matching.Other additions
GIT_CLOUD_MANAGER_HOST(=git.cloudmanager.adobe.com) so the standard-host literal is centralised.#buildSubmoduleAuthArgsthat walks the map once, collects unique URL hosts, and emits one-c http.<scope>.extraheader=...per scope (deduplicated).git submodule update --force --recursiveis used unconditionally on the BYOG path.--forceresets the submodule's working tree to whatever HEAD points at after fetch — required when the parent's pinned commit SHAs aren't reachable in the submodule's current commit graph (a real configuration we've seen, wheresubmodule updatewould otherwise leave the working tree empty).cmProgramRepos(the previous, runtime-fuzzy-matched name→id map field) is removed throughout. It was unreleased; no production data exists in that shape.Files changed
packages/spacecat-shared-cloud-manager-client/src/index.jsGIT_CLOUD_MANAGER_HOSTconstant;clone()/pull()acceptsubmoduleMap; new#buildSubmoduleAuthArgs;#resolveByogSubmodulesrewritten to iteratesubmoduleMapand runsubmodule update --force --recursivepackages/spacecat-shared-cloud-manager-client/test/cloud-manager-client.test.jspackages/spacecat-shared-cloud-manager-client/README.mdpackages/spacecat-shared-data-access/src/models/site/index.d.tsSubmodulesMetadata.submoduleMap?: Array<{path, url}>with full JSDoc on URL forms and runtime auth-scope derivationpackages/spacecat-shared-data-access/src/models/site/site.schema.jscodeattribute documents the new fieldpackages/spacecat-shared-data-access/test/unit/models/site/site.model.test.jsmetadata.submodules.submoduleMapstorage + cross-import preservationKnown limitations (tracked in cm-client
README.md)The submodule pipeline doesn't cover every BYOG configuration. The cases below are documented as out-of-scope for this PR and would each land as a separate follow-up:
submoduleMapnot yet populated → parent clone preserved, warning logged, submodule fetches fail (expected until onboarding populates the map)..gitmodulesreferences repos not onboarded to CM at all).status: readyin API but proxy 404s).--force; documented trade-off is dirty-pointer state ingit status).{programId, orgName}cred shape).submoduleMapdrift between imports.Test plan
--recurse-submodulesfor a standard parent — verified--no-recurse-submodules+submoduleMaprewrite for a BYOG parent (single-scope and dual-scope cases) using the updated git Lambda layer:4— verified end-to-endRelated
metadata.submodules.submoduleMapgit-submodule+git-submodule--helperintospacecat-git-layer:4🤖 Generated with Claude Code