diff --git a/packages/spacecat-shared-cloud-manager-client/README.md b/packages/spacecat-shared-cloud-manager-client/README.md index 514201b37..a698f4389 100644 --- a/packages/spacecat-shared-cloud-manager-client/README.md +++ b/packages/spacecat-shared-cloud-manager-client/README.md @@ -197,8 +197,8 @@ await client.cleanup(clonePath); ## API overview -- **`clone(programId, repositoryId, config)`** – Clone repo to a unique temp directory. Config: `{ imsOrgId, repoType, repoUrl, ref }`. Optional `ref` checks out a specific branch/tag after clone (failure to checkout does not fail the clone). -- **`pull(clonePath, programId, repositoryId, config)`** – Pull latest changes into an existing clone. Config: `{ imsOrgId, repoType, repoUrl, ref }`. Optional `ref` checks out the branch before pulling. +- **`clone(programId, repositoryId, config)`** – Clone repo to a unique temp directory. Config: `{ imsOrgId, repoType, repoUrl, ref, submoduleMap }`. Optional `ref` checks out a specific branch/tag after clone (failure to checkout does not fail the clone). Optional `submoduleMap` is BYOG-only and drives the submodule rewrite — see "Submodule handling" below. +- **`pull(clonePath, programId, repositoryId, config)`** – Pull latest changes into an existing clone. Config: `{ imsOrgId, repoType, repoUrl, ref, submoduleMap }`. Optional `ref` checks out the branch before pulling. `submoduleMap` follows the same shape as `clone()`. - **`push(clonePath, programId, repositoryId, config)`** – Push a ref to the remote. Config: `{ imsOrgId, repoType, repoUrl, ref }`. The `ref` is **required** and specifies the branch to push. - **`checkout(clonePath, ref)`** – Checkout a specific git ref (branch, tag, or SHA) in an existing clone. Unlike the optional checkout in `clone()`, this throws on failure. - **`zipRepository(clonePath)`** – Zip the clone (including `.git` history) and return a Buffer. @@ -215,15 +215,128 @@ await client.cleanup(clonePath); Repository type constants (for use when passing `repoType` or checking repo type): ```js -import CloudManagerClient, { CM_REPO_TYPE } from '@adobe/spacecat-shared-cloud-manager-client'; +import CloudManagerClient, { + CM_REPO_TYPE, + GIT_CLOUD_MANAGER_HOST, + isBYOG, +} from '@adobe/spacecat-shared-cloud-manager-client'; -// CM_REPO_TYPE.GITHUB → 'github' +// CM_REPO_TYPE.GITHUB → 'github' // CM_REPO_TYPE.BITBUCKET → 'bitbucket' // CM_REPO_TYPE.GITLAB → 'gitlab' // CM_REPO_TYPE.AZURE_DEVOPS → 'azure_devops' // CM_REPO_TYPE.STANDARD → 'standard' + +// GIT_CLOUD_MANAGER_HOST → 'git.cloudmanager.adobe.com' +// Host that serves Cloud Manager standard-type repositories. Used at submodule +// rewrite time to recognise standard URLs in `submoduleMap` and attach the +// corresponding Basic-auth extraheader scope. + +// isBYOG(repoType: string) → boolean +// Returns true for any repo type that tunnels through the CM repo service +// proxy (i.e. anything other than STANDARD). BYOG repos need extra handling +// for submodules — see "Submodule handling" below. +``` + +## Submodule handling + +Both `clone()` and `pull()` populate submodules automatically, but the path differs by parent repo type because the CM repo service proxy only serves repositories by numeric id, while customer `.gitmodules` files reference submodules by name (relative or SSH form). + +### Standard parent + +`git clone --recurse-submodules` (and `pull --recurse-submodules`) just works. Customer `.gitmodules` URLs resolve against the same host as the parent (`git.cloudmanager.adobe.com/{orgName}/...`), and the existing org-scoped Basic-auth extraheader covers every submodule transparently. No additional configuration needed. + +### BYOG parent — `submoduleMap` required + +For BYOG parents the proxy cannot serve relative or SSH `.gitmodules` URLs (they resolve to name-based paths the proxy rejects). The client therefore clones with `--no-recurse-submodules` and replays an onboarding-precomputed `submoduleMap` to rewrite each submodule's URL in `.git/config` before running `submodule update --force --recursive`. `.gitmodules` itself is never modified — the working tree stays clean. + +`submoduleMap` is an array of `{ path, url }` entries: + +```js +[ + // BYOG-typed submodule of a BYOG parent (same program) — proxy URL: + { path: 'sub-byog', + url: 'https://cm-repo.example.com/api/program/12345/repository/501.git' }, + + // standard-typed submodule of a BYOG parent — direct CM-served URL: + { path: 'sub-standard', + url: 'https://git.cloudmanager.adobe.com/acme-org/sub-standard/' }, +] ``` +Each entry says: *"for the submodule at `` in `.gitmodules`, write `` into `.git/config submodule..url` instead of letting git resolve from `.gitmodules`."* The `path` must match the value of `path = …` in `.gitmodules` exactly. The `url` decides which auth scope is applied at fetch time: + +- URLs on the CM proxy host (the `CM_REPO_URL` host) get the BYOG triple — Bearer + `x-api-key` + `x-gw-ims-org-id`. +- URLs on `https://git.cloudmanager.adobe.com/{orgName}/` get a Basic-auth header sourced from `CM_STANDARD_REPO_CREDENTIALS[programId]`, scoped to the org prefix so it covers every standard repo in the same customer org. + +Both scopes coexist on a single `git submodule update` invocation; git applies each scope's headers only when the outgoing URL prefix matches. + +The map is populated at onboarding by walking the parent's `.gitmodules` plus `GET /api/program/{pid}/repositories`, doing all name disambiguation up front so the runtime can iterate mechanically. See `SubmodulesMetadata` in `@adobe/spacecat-shared-data-access` for the full data contract. + +## Known limitations + +The submodule pipeline doesn't cover every BYOG configuration. The cases below are documented so callers can detect them and raise warnings to operators — implementing first-class support for any of them is out of scope for the current client and would land as a separate PR. + +### `submoduleMap` not yet populated + +For BYOG parents with submodules, if `submoduleMap` is missing or empty: + +- A warning is logged: *"BYOG program `{programId}` has .gitmodules but no submoduleMap …"* +- `submodule init` runs, then `submodule update --force --recursive` is invoked with the BYOG-only auth scope. +- The submodule fetches will fail (the proxy can't serve the name-based URLs), but the **parent clone is preserved** and downstream code can still operate on the parent. + +Onboarding must populate `site.code.metadata.submodules.submoduleMap` before BYOG submodules can clone successfully. The onboarding script itself is **out of scope for this package** — it lives upstream in the import-worker / onboarding tooling. The client merely consumes the precomputed map. + +### Out-of-program submodules + +`.gitmodules` entries that reference repositories not onboarded to CM at all. + +- Onboarding cannot produce a `submoduleMap` entry for them. +- `submodule update` fails for those entries only; the rest still complete. +- **Remediation is customer-side** (onboard the missing repo, or remove the stale `.gitmodules` entry). No code change can rescue this case. + +### Broken BYOG connections in CM + +Repositories listed as `status: ready` in the program listing whose proxy returns HTTP 404 on `info/refs` — typically a revoked PAT, deleted upstream repo, or stale BYOG connector. + +- `submoduleMap` may include the entry (it looks healthy in the listing). +- `submodule update` fails for that single entry; other submodules complete normally. +- **Customer-side data fix** required. Worth surfacing as a data-quality warning during onboarding (probe each entry's `info/refs` and flag failures). + +### Stale parent gitlinks + +When a submodule is migrated between BYOG and standard, or its history is rewritten, the parent's pinned commit SHAs may not exist in the submodule's current commit graph. + +- The client uses `git submodule update --force` to handle this — git resets the submodule's working tree to whatever `HEAD` points at after the fetch (typically branch tip) instead of failing silently with an empty working tree. +- **Trade-off:** `git submodule status` will show `+` (working tree differs from parent's pinned gitlink) and `git status` will show modified submodule pointers. This matches what the CM build runner does and is the only way to materialise the working tree when the pin is unreachable. Downstream consumers should expect submodule pointer dirtiness on these clones. + +### Recursive submodule depth > 1 + +`git submodule update --recursive` will descend into nested submodules, but each level needs its own `submoduleMap` (different parent program, different rewrite rules). The current implementation only consumes a single map for the top-level parent. Inner-level submodules will fail to clone if they require rewriting. + +### Cross-org standard submodules + +A BYOG parent whose `submoduleMap` contains standard URLs under multiple `git.cloudmanager.adobe.com/{orgName}/` paths (i.e. submodules belonging to a different customer org than the parent). + +- The client emits a separate `http.{scope}.extraheader` per org, but both scopes use the same `CM_STANDARD_REPO_CREDENTIALS[programId]` Basic-auth value. If different orgs require different credentials, the second org's fetches will return 401. +- Workaround if needed: extend `CM_STANDARD_REPO_CREDENTIALS` to a per-`{programId, orgName}` shape and have the client look up the right cred per scope. + +### Submodules on third-party hosts not behind the CM proxy + +A submodule whose `.gitmodules` URL points at a host the CM proxy cannot serve (private GitHub Enterprise, self-hosted GitLab, etc.). + +- `submoduleMap` would have nothing useful to put as `url` for these — the proxy can't reach them, and the standard host doesn't host them either. +- Would require per-host auth injection (customer-supplied PAT scoped to the third-party host). Out of scope. + +### `submoduleMap` drift between imports + +If a customer adds, removes, or renames submodules between imports without onboarding refreshing the map: + +- New submodule → missing from `submoduleMap` → fetch will fail for that entry (proxy can't serve the default name-based URL). +- Removed submodule → `submoduleMap` still has a stale `path` entry → harmless (no matching `.gitmodules` entry, so `submodule update` won't try to fetch it). + +Refresh `submoduleMap` whenever the parent's `.gitmodules` is suspected to have changed. + ## Testing ```bash diff --git a/packages/spacecat-shared-cloud-manager-client/src/index.js b/packages/spacecat-shared-cloud-manager-client/src/index.js index 2f3770349..1368d9350 100644 --- a/packages/spacecat-shared-cloud-manager-client/src/index.js +++ b/packages/spacecat-shared-cloud-manager-client/src/index.js @@ -25,7 +25,11 @@ import { archiveFolder, extract } from 'zip-lib'; const GIT_BIN = process.env.GIT_BIN_PATH || '/opt/bin/git'; const CLONE_DIR_PREFIX = 'cm-repo-'; const PATCH_FILE_PREFIX = 'cm-patch-'; -const GIT_OPERATION_TIMEOUT_MS = 120_000; // 120s — fail fast before Lambda timeout + +// Per-operation timeout for git commands (clone, push, pull, commit, etc.). +// Override via GIT_OPERATION_TIMEOUT_MS env var. Defaults to 10 min so large +// repositories can finish cloning within the Lambda's 15-min envelope. +const GIT_OPERATION_TIMEOUT_MS = parseInt(process.env.GIT_OPERATION_TIMEOUT_MS, 10) || 600_000; /** * Repository type constants for Cloud Manager integrations. @@ -39,6 +43,28 @@ export const CM_REPO_TYPE = Object.freeze({ STANDARD: 'standard', }); +/** + * Host that serves Cloud Manager `standard`-type repositories. Used at + * submodule rewrite time to recognise standard URLs in `submoduleMap` and + * attach the corresponding Basic-auth extraheader scope. + */ +export const GIT_CLOUD_MANAGER_HOST = 'git.cloudmanager.adobe.com'; + +/** + * Returns true for any repo type that tunnels through the CM repo service + * proxy (i.e. anything other than STANDARD). BYOG repos need extra handling + * for submodules because the proxy URL uses numeric repository IDs, not + * names — relative and SSH URLs in `.gitmodules` must be rewritten via the + * onboarding-populated `submoduleMap` + * (`site.code.metadata.submodules.submoduleMap`). + * + * @param {string} repoType + * @returns {boolean} + */ +export function isBYOG(repoType) { + return repoType !== CM_REPO_TYPE.STANDARD; +} + // Lambda layer environment: git and its helpers (git-remote-https) live under /opt. // Without these, the dynamic linker can't find shared libraries (libcurl, libexpat, …) // and git can't locate its sub-commands (git-remote-https for HTTPS transport). @@ -260,7 +286,9 @@ export default class CloudManagerClient { * Builds authenticated git arguments for a remote command (clone, push, or pull). * * Both repo types use http.extraheader for authentication: - * - Standard repos: Basic auth header via extraheader on the repo URL + * - Standard repos: Basic auth header via extraheader scoped to the org prefix + * (scheme + host + '/' + orgName + '/'), so the header covers all repos and submodules + * belonging to that customer org without granting access to other orgs on the same host * - BYOG repos: Bearer token + API key + IMS org ID via extraheader on the CM Repo URL * * @param {string} command - The git command ('clone', 'push', or 'pull') @@ -276,8 +304,11 @@ export default class CloudManagerClient { if (repoType === CM_REPO_TYPE.STANDARD) { const credentials = this.#getStandardRepoCredentials(programId); const basicAuth = Buffer.from(credentials).toString('base64'); + const parsedUrl = new URL(repoUrl); + const orgName = parsedUrl.pathname.split('/')[1]; + const repoOrgPrefix = `${parsedUrl.origin}/${orgName}/`; return [ - '-c', `http.${repoUrl}.extraheader=Authorization: Basic ${basicAuth}`, + '-c', `http.${repoOrgPrefix}.extraheader=Authorization: Basic ${basicAuth}`, command, repoUrl, ]; } @@ -294,6 +325,17 @@ export default class CloudManagerClient { * - BYOG repos: Bearer token + API key + IMS org ID via CM Repo URL * - Standard repos: Basic auth header via the repo URL * + * Submodule handling differs by repo type: + * - STANDARD: `--recurse-submodules` at clone time. `.gitmodules` URLs + * resolve against the customer's real git host (the same host the + * parent was cloned from), so the existing org-scoped Basic-auth + * extraheader covers every submodule transparently. + * - BYOG: `--no-recurse-submodules` at clone time (auto-recursion would + * resolve `.gitmodules` relative or SSH URLs against the CM proxy URL, + * producing name-based URLs the proxy rejects). Submodules are then + * populated in a second pass driven by the onboarding-populated + * `submoduleMap` — see `#resolveByogSubmodules` for the algorithm. + * * If a ref is provided, the clone will be checked out to that ref after cloning. * Checkout failures are logged but do not cause the clone to fail, so the caller * always gets a usable working copy (on the default branch if checkout fails). @@ -305,18 +347,30 @@ export default class CloudManagerClient { * @param {string} config.repoType - Repository type ('standard' or VCS type) * @param {string} config.repoUrl - Repository URL * @param {string} [config.ref] - Optional. Git ref to checkout after clone (branch, tag, or SHA) + * @param {Array<{path: string, url: string}>} [config.submoduleMap] - Optional. + * BYOG-only. Onboarding-populated list of submodule rewrites; each entry + * says "for the submodule at `` in `.gitmodules`, write `` + * into `.git/config submodule..url` instead of letting git + * resolve from `.gitmodules`." `` is either a CM proxy URL (BYOG- + * typed submodule) or a `https://git.cloudmanager.adobe.com/{org}/...` + * URL (standard-typed submodule). Ignored for STANDARD parent repos. * @returns {Promise} The local clone path */ async clone(programId, repositoryId, { - imsOrgId, repoType, repoUrl, ref, + imsOrgId, repoType, repoUrl, ref, submoduleMap, } = {}) { const clonePath = mkdtempSync(path.join(os.tmpdir(), CLONE_DIR_PREFIX)); + const byog = isBYOG(repoType); try { this.log.info(`Cloning CM repository: program=${programId}, repo=${repositoryId}, type=${repoType}`); const args = await this.#buildAuthGitArgs('clone', programId, repositoryId, { imsOrgId, repoType, repoUrl }); - this.#execGit([...args, clonePath]); + // BYOG: skip auto-recursion so the initial clone doesn't try (and fail) + // to fetch submodules via name-based proxy URLs. We populate them in a + // second pass below using the onboarding-populated submoduleMap. + const recurseFlag = byog ? '--no-recurse-submodules' : '--recurse-submodules'; + this.#execGit([...args, recurseFlag, clonePath]); this.log.info(`Repository cloned to ${clonePath}`); this.#logTmpDiskUsage('clone'); @@ -330,6 +384,19 @@ export default class CloudManagerClient { } } + // Populate submodules for the current ref. + // - STANDARD: only needed when we switched ref above. The initial + // --recurse-submodules clone handled the default branch already; + // sync+update picks up any new/changed submodules introduced by the + // ref switch. + // - BYOG: always needed. The initial --no-recurse-submodules clone + // didn't populate any submodules on any branch. + if (byog) { + await this.#resolveByogSubmodules(clonePath, programId, submoduleMap, { imsOrgId }); + } else if (hasText(ref)) { + this.#initStandardSubmodules(clonePath); + } + return clonePath; } catch (error) { rmSync(clonePath, { recursive: true, force: true }); @@ -337,6 +404,193 @@ export default class CloudManagerClient { } } + /** + * For STANDARD repos after a ref checkout: pick up any submodules the ref + * declares but weren't initialized by the initial `--recurse-submodules` + * clone. Uses `sync` to refresh any URL changes between branches and + * `update --init --recursive` to actually fetch and check them out. + * + * Safe for standard repos because .gitmodules URLs resolve to the + * customer's real git host — which is exactly what we want git to use. + * + * NOTE: do NOT call this on the BYOG path. BYOG .gitmodules URLs are + * name-based and require rewriting first; `sync` would overwrite the + * rewritten entries in .git/config. + */ + #initStandardSubmodules(clonePath) { + try { + this.#execGit(['submodule', 'sync', '--recursive'], { cwd: clonePath }); + this.#execGit(['submodule', 'update', '--init', '--recursive'], { cwd: clonePath }); + this.log.info(`Initialized submodules for standard repo at ${clonePath}`); + } catch (submoduleError) { + this.log.warn(`Standard submodule init failed: ${submoduleError.message}. Continuing without submodule recursion.`); + } + } + + /** + * Builds the dual-scope extraheader args needed to authenticate every + * outgoing submodule fetch in `submoduleMap`. Walks the map once, collects + * the unique URL hosts, and emits one `-c http..extraheader=...` + * per scope: + * + * - URLs on the CM repo service host get the BYOG triple (Bearer + + * x-api-key + x-gw-ims-org-id), scoped to the proxy URL prefix. + * - URLs on `git.cloudmanager.adobe.com/{orgName}/` get a Basic-auth + * header sourced from `CM_STANDARD_REPO_CREDENTIALS[programId]`, + * scoped to the org prefix so it covers every standard repo in the + * same customer org without leaking to other orgs on the same host. + * + * Git applies each scope's headers only when the outgoing request URL + * starts with that scope's prefix — so the BYOG Bearer never leaks into + * standard fetches and vice versa. Both scopes coexist safely on a single + * `git submodule update` invocation. + * + * @param {Array<{path: string, url: string}>} submoduleMap + * @param {string} programId - CM Program ID, used to look up standard creds + * @param {string} imsOrgId - Customer's IMS Organization ID, for BYOG headers + * @returns {Promise} `-c` args ready to prepend to a git invocation + */ + async #buildSubmoduleAuthArgs(submoduleMap, programId, imsOrgId) { + const args = []; + const hosts = new Set(submoduleMap.map((e) => { + try { + return new URL(e.url).host; + } catch { + return null; + } + }).filter(Boolean)); + + const cmRepoHost = new URL(this.config.cmRepoUrl).host; + + // BYOG scope — only added if at least one entry targets the CM proxy + if (hosts.has(cmRepoHost)) { + const byogArgs = await this.#getCMRepoServiceCredentials(imsOrgId); + args.push(...byogArgs); + } + + // Standard scope — one per (orgName) seen in the map under git.cloudmanager.adobe.com + if (hosts.has(GIT_CLOUD_MANAGER_HOST)) { + const orgs = new Set(); + for (const entry of submoduleMap) { + try { + const u = new URL(entry.url); + if (u.host === GIT_CLOUD_MANAGER_HOST) { + const orgName = u.pathname.split('/').filter(Boolean)[0]; + if (orgName) { + orgs.add(orgName); + } + } + } catch { /* skip unparseable */ } + } + + // All standard repos in a CM program live under the same customer org + // (`git.cloudmanager.adobe.com/{orgName}/`). One Basic credential + // covers every repo in that org — validated empirically. + if (orgs.size > 0) { + const credentials = this.#getStandardRepoCredentials(programId); + const basicAuth = Buffer.from(credentials).toString('base64'); + for (const org of orgs) { + const scope = `https://${GIT_CLOUD_MANAGER_HOST}/${org}/`; + args.push('-c', `http.${scope}.extraheader=Authorization: Basic ${basicAuth}`); + } + } + } + + return args; + } + + /** + * For BYOG parents: populate submodules using the onboarding-precomputed + * `submoduleMap`. The map carries one entry per submodule with the + * runtime-ready URL — proxy URL for BYOG-typed submodules, + * `git.cloudmanager.adobe.com/{org}/{name}/` for standard-typed submodules + * of the same parent. All name disambiguation, URL classification, and + * collision handling is done at onboarding; runtime is purely mechanical. + * + * Flow: + * 1. `git submodule init` — registers default `.git/config` entries + * from `.gitmodules`. The initial URLs are wrong (proxy doesn't + * route by name) but we overwrite them in the next step. + * 2. For each entry in `submoduleMap`, write its `url` into + * `.git/config submodule..url`. `.gitmodules` itself stays + * untouched — the working tree remains clean. + * 3. `git submodule update --force --recursive` with all relevant + * auth scopes. `--force` is essential: when the parent's pinned + * gitlink SHA is unreachable in the submodule (common when a + * customer migrates a submodule between BYOG and standard, leaving + * the parent's pin behind), `--force` resets to whatever HEAD + * points at after fetch (typically branch tip) instead of failing + * silently with an empty working tree. + * + * IMPORTANT: never run `git submodule sync` in this flow — `sync` copies + * `.gitmodules` URLs back into `.git/config`, undoing step 2. + * + * Idempotent: re-running this against an existing clone is a no-op for + * already-correct entries and re-applies the rewrite for any new + * submodules pulled in. `pull()` calls it unconditionally after each + * pull for that reason. + * + * Graceful fallbacks: + * - No `.gitmodules`: nothing to do. + * - Empty/missing `submoduleMap`: log a warning and run `submodule + * update --force` without rewrites — git will try the original + * `.gitmodules` URLs (most will fail through the CM proxy), but + * the parent clone itself is preserved. + * - `.gitmodules` entries not in the map: their `.git/config` URL is + * left as-is; `submodule update` will fail for those entries only, + * and the rest still complete. + * + * @param {string} clonePath - Local clone of the BYOG parent + * @param {string} programId - CM Program ID (for standard-cred lookup) + * @param {Array<{path: string, url: string}>} [submoduleMap] + * @param {{imsOrgId: string}} param3 + */ + async #resolveByogSubmodules(clonePath, programId, submoduleMap, { imsOrgId } = {}) { + if (!existsSync(path.join(clonePath, '.gitmodules'))) { + return; + } + + try { + // Step 1: register submodules in .git/config (URLs will be wrong; we overwrite next) + this.#execGit(['submodule', 'init'], { cwd: clonePath }); + + // Step 2: rewrite each submodule..url from the onboarding map + const hasMap = Array.isArray(submoduleMap) && submoduleMap.length > 0; + if (!hasMap) { + this.log.warn( + `BYOG program ${programId} has .gitmodules but no submoduleMap — ` + + 'submodule URLs cannot be resolved through the CM proxy. ' + + 'Populate site.code.metadata.submodules.submoduleMap at onboarding.', + ); + } else { + for (const entry of submoduleMap) { + if (!entry || !hasText(entry.path) || !hasText(entry.url)) { + this.log.warn(`Skipping invalid submoduleMap entry: ${JSON.stringify(entry)}`); + } else { + this.#execGit( + ['config', '--local', `submodule.${entry.path}.url`, entry.url], + { cwd: clonePath }, + ); + } + } + } + + // Step 3: fetch + check out submodules using the rewritten URLs. + // Build dual-scope auth (BYOG + standard) from the URLs in the map, + // and use --force to handle stale parent gitlinks. + const authArgs = hasMap + ? await this.#buildSubmoduleAuthArgs(submoduleMap, programId, imsOrgId) + : await this.#getCMRepoServiceCredentials(imsOrgId); + this.#execGit( + [...authArgs, 'submodule', 'update', '--force', '--recursive'], + { cwd: clonePath }, + ); + this.log.info(`BYOG submodules initialized at ${clonePath}`); + } catch (submoduleError) { + this.log.warn(`BYOG submodule init failed: ${submoduleError.message}. Continuing without submodule recursion.`); + } + } + /** * Recursively validates that all symlinks under rootDir point to targets * within rootDir. Throws if any symlink escapes the root boundary. @@ -564,6 +818,14 @@ export default class CloudManagerClient { * If a ref is provided, the ref is checked out before pulling so that * the pull updates the correct branch. * + * Submodule handling differs by repo type: + * - STANDARD: `pull --recurse-submodules` updates submodules in one step. + * - BYOG: pull the parent only, then re-run the submoduleMap-driven + * rewrite. This also picks up any new submodules the pull may have + * introduced, since the new entries will already be present in the + * map (the map is per-program, not per-clone) provided onboarding has + * refreshed it. + * * @param {string} clonePath - Path to the cloned repository * @param {string} programId - CM Program ID * @param {string} repositoryId - CM Repository ID @@ -572,18 +834,40 @@ export default class CloudManagerClient { * @param {string} config.repoType - Repository type ('standard' or VCS type) * @param {string} config.repoUrl - Repository URL * @param {string} [config.ref] - Optional. Git ref to checkout before pull (branch, tag, or SHA) + * @param {Array<{path: string, url: string}>} [config.submoduleMap] - Optional. + * BYOG-only. Same shape as `clone()`. See `#resolveByogSubmodules` for + * details. */ async pull(clonePath, programId, repositoryId, { - imsOrgId, repoType, repoUrl, ref, + imsOrgId, repoType, repoUrl, ref, submoduleMap, } = {}) { + const byog = isBYOG(repoType); + if (hasText(ref)) { this.#execGit(['checkout', ref], { cwd: clonePath }); this.log.info(`Checked out ref '${ref}' before pull`); } + const pullArgs = await this.#buildAuthGitArgs('pull', programId, repositoryId, { imsOrgId, repoType, repoUrl }); - this.#execGit(pullArgs, { cwd: clonePath }); + // STANDARD: --recurse-submodules keeps submodules in sync during pull. + // BYOG: pull the parent only; submodules are handled separately below + // because relative .gitmodules URLs need rewriting through submoduleMap. + if (byog) { + this.#execGit(pullArgs, { cwd: clonePath }); + } else { + this.#execGit([...pullArgs, '--recurse-submodules'], { cwd: clonePath }); + } this.log.info('Changes pulled successfully'); this.#logTmpDiskUsage('pull'); + + // For BYOG, re-apply the rewrite after the pull in case the pulled commits + // changed .gitmodules (new submodules, renamed ones, etc). The helper is + // idempotent — existing .git/config entries are overwritten with the same + // value when the map hasn't changed, and new submodules pick up entries + // already present in the program-wide map. + if (byog) { + await this.#resolveByogSubmodules(clonePath, programId, submoduleMap, { imsOrgId }); + } } /** diff --git a/packages/spacecat-shared-cloud-manager-client/test/cloud-manager-client.test.js b/packages/spacecat-shared-cloud-manager-client/test/cloud-manager-client.test.js index ca54c75e1..1020500a0 100644 --- a/packages/spacecat-shared-cloud-manager-client/test/cloud-manager-client.test.js +++ b/packages/spacecat-shared-cloud-manager-client/test/cloud-manager-client.test.js @@ -278,7 +278,7 @@ describe('CloudManagerClient', () => { const cloneArgs = getGitArgs(execFileSyncStub.firstCall); const cloneArgsStr = getGitArgsStr(execFileSyncStub.firstCall); expect(cloneArgs).to.include('clone'); - expect(cloneArgsStr).to.include(`http.${TEST_STANDARD_REPO_URL}.extraheader=Authorization: Basic c3RkdXNlcjpzdGR0b2tlbjEyMw==`); + expect(cloneArgsStr).to.include('http.https://git.cloudmanager.adobe.com/myorg/.extraheader=Authorization: Basic c3RkdXNlcjpzdGR0b2tlbjEyMw=='); expect(cloneArgsStr).to.include(TEST_STANDARD_REPO_URL); expect(cloneArgs).to.include(EXPECTED_CLONE_PATH); // No credentials in the URL itself @@ -286,6 +286,39 @@ describe('CloudManagerClient', () => { expect(cloneArgsStr).to.not.include('Bearer'); }); + it('uses --recurse-submodules for STANDARD repos so submodules clone natively', async () => { + const client = CloudManagerClient.createFrom( + createContext({ CM_STANDARD_REPO_CREDENTIALS: TEST_STANDARD_CREDENTIALS }), + ); + + await client.clone( + TEST_PROGRAM_ID, + TEST_REPO_ID, + { repoType: 'standard', repoUrl: TEST_STANDARD_REPO_URL }, + ); + + const gitArgs = getGitArgs(execFileSyncStub.firstCall); + expect(gitArgs).to.include('--recurse-submodules'); + expect(gitArgs).to.not.include('--no-recurse-submodules'); + }); + + it('uses --no-recurse-submodules for BYOG repos so submodules can be rewritten first', async () => { + // BYOG .gitmodules URLs are relative and would resolve against the CM + // proxy URL, producing name-based URLs the proxy rejects. We suppress + // auto-recursion so we can rewrite URLs before fetching submodules. + const client = CloudManagerClient.createFrom(createContext()); + + await client.clone( + TEST_PROGRAM_ID, + TEST_REPO_ID, + { imsOrgId: TEST_IMS_ORG_ID }, + ); + + const gitArgs = getGitArgs(execFileSyncStub.firstCall); + expect(gitArgs).to.include('--no-recurse-submodules'); + expect(gitArgs).to.not.include('--recurse-submodules'); + }); + it('throws when standard credentials not found for programId', async () => { const client = CloudManagerClient.createFrom( createContext({ CM_STANDARD_REPO_CREDENTIALS: TEST_STANDARD_CREDENTIALS }), @@ -307,22 +340,36 @@ describe('CloudManagerClient', () => { expect(mkdtempSyncStub.firstCall.args[0]).to.match(/cm-repo-$/); }); - it('checks out ref after clone when ref is provided', async () => { - const client = CloudManagerClient.createFrom(createContext()); + it('STANDARD: checks out ref and runs sync + update --init --recursive', async () => { + const client = CloudManagerClient.createFrom( + createContext({ CM_STANDARD_REPO_CREDENTIALS: TEST_STANDARD_CREDENTIALS }), + ); const clonePath = await client.clone( TEST_PROGRAM_ID, TEST_REPO_ID, - { imsOrgId: TEST_IMS_ORG_ID, ref: 'release/5.11' }, + { + repoType: 'standard', + repoUrl: TEST_STANDARD_REPO_URL, + ref: 'release/5.11', + }, ); expect(clonePath).to.equal(EXPECTED_CLONE_PATH); - // First call: clone, second call: checkout - expect(execFileSyncStub).to.have.been.calledTwice; + // clone, checkout, submodule sync, submodule update --init --recursive + expect(execFileSyncStub).to.have.callCount(4); const checkoutArgs = getGitArgs(execFileSyncStub.secondCall); expect(checkoutArgs).to.deep.equal(['checkout', 'release/5.11']); expect(execFileSyncStub.secondCall.args[2]).to.have.property('cwd', EXPECTED_CLONE_PATH); + + const syncArgs = getGitArgs(execFileSyncStub.thirdCall); + expect(syncArgs).to.deep.equal(['submodule', 'sync', '--recursive']); + expect(execFileSyncStub.thirdCall.args[2]).to.have.property('cwd', EXPECTED_CLONE_PATH); + + const updateArgs = getGitArgs(execFileSyncStub.getCall(3)); + expect(updateArgs).to.deep.equal(['submodule', 'update', '--init', '--recursive']); + expect(execFileSyncStub.getCall(3).args[2]).to.have.property('cwd', EXPECTED_CLONE_PATH); }); it('does not checkout when ref is not provided', async () => { @@ -330,7 +377,7 @@ describe('CloudManagerClient', () => { await client.clone(TEST_PROGRAM_ID, TEST_REPO_ID, { imsOrgId: TEST_IMS_ORG_ID }); - // Only the clone call, no checkout + // Only the clone call — no checkout, no submodule commands expect(execFileSyncStub).to.have.been.calledOnce; }); @@ -350,11 +397,38 @@ describe('CloudManagerClient', () => { // Clone should still succeed expect(clonePath).to.equal(EXPECTED_CLONE_PATH); + // Only clone + checkout ran — submodule commands skipped when checkout fails + expect(execFileSyncStub).to.have.been.calledTwice; expect(context.log.error).to.have.been.calledWith( sinon.match(/Failed to checkout ref 'nonexistent'.*Continuing with default branch/), ); }); + it('STANDARD: does not fail clone when post-checkout submodule init fails', async () => { + // clone + checkout succeed; submodule sync fails + execFileSyncStub.onFirstCall().returns(''); // clone + execFileSyncStub.onSecondCall().returns(''); // checkout + execFileSyncStub.onThirdCall().throws(new Error('Git command failed: submodule sync failed')); + + const context = createContext({ CM_STANDARD_REPO_CREDENTIALS: TEST_STANDARD_CREDENTIALS }); + const client = CloudManagerClient.createFrom(context); + + const clonePath = await client.clone( + TEST_PROGRAM_ID, + TEST_REPO_ID, + { + repoType: 'standard', + repoUrl: TEST_STANDARD_REPO_URL, + ref: 'release', + }, + ); + + expect(clonePath).to.equal(EXPECTED_CLONE_PATH); + expect(context.log.warn).to.have.been.calledWith( + sinon.match(/Standard submodule init failed.*Continuing without submodule recursion/), + ); + }); + it('throws on git clone failure and cleans up temp directory', async () => { execFileSyncStub.throws(new Error('git clone failed')); @@ -369,6 +443,244 @@ describe('CloudManagerClient', () => { ); }); + // --- BYOG submodule rewrite flow (driven by submoduleMap) --- + + it('BYOG: skips the rewrite pass when .gitmodules is absent', async () => { + existsSyncStub.returns(false); // no .gitmodules at clone root + + const context = createContext(); + const client = CloudManagerClient.createFrom(context); + + await client.clone(TEST_PROGRAM_ID, TEST_REPO_ID, { imsOrgId: TEST_IMS_ORG_ID }); + + // Only the clone call — no submodule init / update when .gitmodules missing + expect(execFileSyncStub).to.have.been.calledOnce; + }); + + it('BYOG: warns and proceeds when submoduleMap is missing', async () => { + existsSyncStub.returns(true); // .gitmodules present + execFileSyncStub.returns(''); + + const context = createContext(); + const client = CloudManagerClient.createFrom(context); + + await client.clone( + TEST_PROGRAM_ID, + TEST_REPO_ID, + { imsOrgId: TEST_IMS_ORG_ID }, + ); + + expect(context.log.warn).to.have.been.calledWith( + sinon.match(/has \.gitmodules but no submoduleMap/), + ); + // clone + submodule init + submodule update --force --recursive — no config writes + expect(execFileSyncStub).to.have.callCount(3); + const updateArgs = getGitArgs(execFileSyncStub.lastCall); + expect(updateArgs).to.include.members(['submodule', 'update', '--force', '--recursive']); + }); + + it('BYOG: pure-proxy submodules use a single Bearer auth scope', async () => { + // Parent and all submodules are BYOG-typed in the same program. Onboarding + // emits a submoduleMap with proxy URLs for each. Only the BYOG (Bearer) + // scope should be attached to submodule update — no standard-host scope. + existsSyncStub.returns(true); + + const submoduleMap = [ + { path: 'sub-a', url: 'https://cm-repo.example.com/api/program/100/repository/201.git' }, + { path: 'sub-b', url: 'https://cm-repo.example.com/api/program/100/repository/202.git' }, + ]; + + const context = createContext(); + const client = CloudManagerClient.createFrom(context); + + await client.clone('100', '200', { + imsOrgId: TEST_IMS_ORG_ID, + submoduleMap, + }); + + // Calls: clone, submodule init, 2× config-set, submodule update + expect(execFileSyncStub).to.have.callCount(5); + + const setCall0 = getGitArgs(execFileSyncStub.getCall(2)); + expect(setCall0).to.deep.equal([ + 'config', '--local', + 'submodule.sub-a.url', + 'https://cm-repo.example.com/api/program/100/repository/201.git', + ]); + + const setCall1 = getGitArgs(execFileSyncStub.getCall(3)); + expect(setCall1).to.deep.equal([ + 'config', '--local', + 'submodule.sub-b.url', + 'https://cm-repo.example.com/api/program/100/repository/202.git', + ]); + + // Final update has --force --recursive and ONLY the BYOG (Bearer) scope + const updateArgStr = getGitArgsStr(execFileSyncStub.getCall(4)); + expect(updateArgStr).to.include('submodule update --force --recursive'); + expect(updateArgStr).to.include(`Authorization: Bearer ${TEST_TOKEN}`); + expect(updateArgStr).to.include('x-api-key: test-client-id'); + expect(updateArgStr).to.include(`x-gw-ims-org-id: ${TEST_IMS_ORG_ID}`); + // No standard-host extraheader since no entries point there + expect(updateArgStr).to.not.include('git.cloudmanager.adobe.com'); + }); + + it('BYOG: mixed BYOG + standard submodules attaches both auth scopes', async () => { + // BYOG parent with a mix of BYOG and standard-typed submodules. The + // submodule update must carry BOTH the BYOG (Bearer) scope on the proxy + // host AND the standard (Basic) scope on the customer-org host. + existsSyncStub.returns(true); + + const submoduleMap = [ + { path: 'sub-byog', url: 'https://cm-repo.example.com/api/program/12345/repository/501.git' }, + { path: 'sub-std-a', url: 'https://git.cloudmanager.adobe.com/acme-org/sub-std-a/' }, + { path: 'sub-std-b', url: 'https://git.cloudmanager.adobe.com/acme-org/sub-std-b/' }, + { path: 'sub-std-c', url: 'https://git.cloudmanager.adobe.com/acme-org/sub-std-c/' }, + ]; + + // Need standard creds in env so the standard-scope extraheader can be built. + // TEST_PROGRAM_ID = '12345' is keyed in TEST_STANDARD_CREDENTIALS. + const context = createContext({ CM_STANDARD_REPO_CREDENTIALS: TEST_STANDARD_CREDENTIALS }); + const client = CloudManagerClient.createFrom(context); + + await client.clone(TEST_PROGRAM_ID, TEST_REPO_ID, { + imsOrgId: TEST_IMS_ORG_ID, + submoduleMap, + }); + + // Calls: clone, submodule init, 4× config-set, submodule update + expect(execFileSyncStub).to.have.callCount(7); + + // Last call is the submodule update — should carry BOTH auth scopes + const updateArgStr = getGitArgsStr(execFileSyncStub.getCall(6)); + expect(updateArgStr).to.include('submodule update --force --recursive'); + + // BYOG scope (Bearer + x-api-key + x-gw-ims-org-id) on cm-repo.example.com + expect(updateArgStr).to.include(`http.https://cm-repo.example.com.extraheader=Authorization: Bearer ${TEST_TOKEN}`); + expect(updateArgStr).to.include('http.https://cm-repo.example.com.extraheader=x-api-key: test-client-id'); + expect(updateArgStr).to.include(`http.https://cm-repo.example.com.extraheader=x-gw-ims-org-id: ${TEST_IMS_ORG_ID}`); + + // Standard scope (Basic) on git.cloudmanager.adobe.com/acme-org/ + // base64('stduser:stdtoken123') == 'c3RkdXNlcjpzdGR0b2tlbjEyMw==' + expect(updateArgStr).to.include( + 'http.https://git.cloudmanager.adobe.com/acme-org/.extraheader=' + + 'Authorization: Basic c3RkdXNlcjpzdGR0b2tlbjEyMw==', + ); + + // Sanity: each scope appears separately, no header bleed + expect(updateArgStr).to.not.include('Bearer c3RkdXNlcjpz'); + expect(updateArgStr).to.not.include('Basic test-access-token'); + }); + + it('BYOG: skips invalid submoduleMap entries (missing path or url)', async () => { + existsSyncStub.returns(true); + + const submoduleMap = [ + { path: 'good-one', url: 'https://cm-repo.example.com/api/program/12345/repository/100.git' }, + { path: 'no-url' /* missing url */ }, + { /* missing path */ url: 'https://cm-repo.example.com/.../200.git' }, + null, + ]; + + const context = createContext(); + const client = CloudManagerClient.createFrom(context); + + await client.clone(TEST_PROGRAM_ID, TEST_REPO_ID, { + imsOrgId: TEST_IMS_ORG_ID, + submoduleMap, + }); + + // Only the valid entry should produce a config-set call + // clone + init + 1× config-set + submodule update = 4 + expect(execFileSyncStub).to.have.callCount(4); + expect(context.log.warn).to.have.been.calledWith( + sinon.match(/Skipping invalid submoduleMap entry/), + ); + + const setArgs = getGitArgs(execFileSyncStub.getCall(2)); + expect(setArgs).to.deep.equal([ + 'config', '--local', + 'submodule.good-one.url', + 'https://cm-repo.example.com/api/program/12345/repository/100.git', + ]); + }); + + it('BYOG: ignores submoduleMap entries with unparseable URLs when computing auth scopes', async () => { + // Defensive — onboarding shouldn't produce these, but if one slips through + // we want the runtime to keep going. The bad entry still gets written to + // .git/config (since it has both path and url string-wise), but no host + // extraheader is added for it. Proxy-based entries still get auth. + existsSyncStub.returns(true); + + const submoduleMap = [ + { path: 'good', url: 'https://cm-repo.example.com/api/program/12345/repository/1.git' }, + { path: 'weird', url: 'not a valid url' }, + ]; + + const context = createContext(); + const client = CloudManagerClient.createFrom(context); + + await client.clone(TEST_PROGRAM_ID, TEST_REPO_ID, { + imsOrgId: TEST_IMS_ORG_ID, + submoduleMap, + }); + + // clone + init + 2× config-set + update + expect(execFileSyncStub).to.have.callCount(5); + + const updateArgStr = getGitArgsStr(execFileSyncStub.getCall(4)); + expect(updateArgStr).to.include(`Authorization: Bearer ${TEST_TOKEN}`); + expect(updateArgStr).to.not.include('git.cloudmanager.adobe.com'); + }); + + it('BYOG: standard-scope pass tolerates an unparseable URL alongside a valid one', async () => { + // A valid standard URL forces us into the standard-scope branch; the + // malformed URL must be silently skipped during org enumeration without + // aborting the whole rewrite. + existsSyncStub.returns(true); + + const submoduleMap = [ + { path: 'std', url: 'https://git.cloudmanager.adobe.com/acme-org/sub-std-a/' }, + { path: 'weird', url: 'not a valid url' }, + ]; + + const context = createContext({ CM_STANDARD_REPO_CREDENTIALS: TEST_STANDARD_CREDENTIALS }); + const client = CloudManagerClient.createFrom(context); + + await client.clone(TEST_PROGRAM_ID, TEST_REPO_ID, { + imsOrgId: TEST_IMS_ORG_ID, + submoduleMap, + }); + + // clone + init + 2× config-set + update = 5 calls + expect(execFileSyncStub).to.have.callCount(5); + const updateArgStr = getGitArgsStr(execFileSyncStub.getCall(4)); + // Standard scope still attached for the good entry + expect(updateArgStr).to.include( + 'http.https://git.cloudmanager.adobe.com/acme-org/.extraheader=Authorization: Basic c3RkdXNlcjpzdGR0b2tlbjEyMw==', + ); + }); + + it('BYOG: does not fail clone when submodule init throws', async () => { + existsSyncStub.returns(true); + execFileSyncStub.onCall(0).returns(''); // clone + execFileSyncStub.onCall(1).throws(new Error('submodule init failed')); // submodule init + + const context = createContext(); + const client = CloudManagerClient.createFrom(context); + + const clonePath = await client.clone( + TEST_PROGRAM_ID, + TEST_REPO_ID, + { imsOrgId: TEST_IMS_ORG_ID }, + ); + + expect(clonePath).to.equal(EXPECTED_CLONE_PATH); + expect(context.log.warn).to.have.been.calledWith( + sinon.match(/BYOG submodule init failed.*Continuing without submodule recursion/), + ); + }); + it('throws a clear message when git command times out', async () => { const err = new Error('SIGTERM'); err.killed = true; @@ -378,9 +690,9 @@ describe('CloudManagerClient', () => { const client = CloudManagerClient.createFrom(context); await expect(client.clone(TEST_PROGRAM_ID, TEST_REPO_ID, { imsOrgId: TEST_IMS_ORG_ID })) - .to.be.rejectedWith('Git command timed out after 120s'); + .to.be.rejectedWith(/^Git command timed out after \d+s$/); - expect(context.log.error.firstCall.args[0]).to.include('timed out after 120s'); + expect(context.log.error.firstCall.args[0]).to.match(/timed out after \d+s/); }); it('sanitizes Bearer token and credentials in git error output', async () => { @@ -784,7 +1096,7 @@ describe('CloudManagerClient', () => { const pushArgs = getGitArgs(execFileSyncStub.firstCall); const pushArgStr = getGitArgsStr(execFileSyncStub.firstCall); expect(pushArgStr).to.include('push'); - expect(pushArgStr).to.include(`http.${TEST_STANDARD_REPO_URL}.extraheader=Authorization: Basic c3RkdXNlcjpzdGR0b2tlbjEyMw==`); + expect(pushArgStr).to.include('http.https://git.cloudmanager.adobe.com/myorg/.extraheader=Authorization: Basic c3RkdXNlcjpzdGR0b2tlbjEyMw=='); expect(pushArgStr).to.include(TEST_STANDARD_REPO_URL); expect(pushArgStr).to.not.include('stduser:stdtoken123@'); expect(pushArgStr).to.not.include('Bearer'); @@ -819,10 +1131,13 @@ describe('CloudManagerClient', () => { { imsOrgId: TEST_IMS_ORG_ID }, ); + // BYOG pull does not use --recurse-submodules (submodules are handled + // separately after pull via the rewrite pass). expect(execFileSyncStub).to.have.been.calledOnce; const pullArgStr = getGitArgsStr(execFileSyncStub.firstCall); expect(pullArgStr).to.include('pull'); + expect(pullArgStr).to.not.include('--recurse-submodules'); expect(pullArgStr).to.include(`Authorization: Bearer ${TEST_TOKEN}`); expect(pullArgStr).to.include('x-api-key: test-client-id'); expect(pullArgStr).to.include(`x-gw-ims-org-id: ${TEST_IMS_ORG_ID}`); @@ -846,7 +1161,8 @@ describe('CloudManagerClient', () => { const pullArgStr = getGitArgsStr(execFileSyncStub.firstCall); expect(pullArgStr).to.include('pull'); - expect(pullArgStr).to.include(`http.${TEST_STANDARD_REPO_URL}.extraheader=Authorization: Basic c3RkdXNlcjpzdGR0b2tlbjEyMw==`); + expect(pullArgStr).to.include('--recurse-submodules'); + expect(pullArgStr).to.include('http.https://git.cloudmanager.adobe.com/myorg/.extraheader=Authorization: Basic c3RkdXNlcjpzdGR0b2tlbjEyMw=='); expect(pullArgStr).to.include(TEST_STANDARD_REPO_URL); expect(pullArgStr).to.not.include('stduser:stdtoken123@'); expect(pullArgStr).to.not.include('Bearer'); @@ -891,6 +1207,48 @@ describe('CloudManagerClient', () => { expect(pullArgStr).to.include('pull'); expect(pullArgStr).to.not.include('checkout'); }); + + it('BYOG: runs the submoduleMap rewrite pass after pull', async () => { + existsSyncStub.returns(true); + + // 0: pull (parent only, no --recurse-submodules) + // 1: submodule init + // 2: config --local (rewrite from submoduleMap) + // 3: submodule update --force --recursive (with auth) + execFileSyncStub.returns(''); + + const client = CloudManagerClient.createFrom(createContext()); + + await client.pull( + '/tmp/cm-repo-test', + '123', + TEST_REPO_ID, + { + imsOrgId: TEST_IMS_ORG_ID, + submoduleMap: [ + { path: 'sub-a', url: 'https://cm-repo.example.com/api/program/123/repository/456.git' }, + ], + }, + ); + + // First call is pull WITHOUT --recurse-submodules + const pullArgStr = getGitArgsStr(execFileSyncStub.firstCall); + expect(pullArgStr).to.include('pull'); + expect(pullArgStr).to.not.include('--recurse-submodules'); + + // The rewrite wrote the URL straight from the map (no name lookup) + const setArgs = getGitArgs(execFileSyncStub.getCall(2)); + expect(setArgs).to.deep.equal([ + 'config', '--local', + 'submodule.sub-a.url', + 'https://cm-repo.example.com/api/program/123/repository/456.git', + ]); + + // Final call is submodule update --force --recursive with auth + const updateArgStr = getGitArgsStr(execFileSyncStub.getCall(3)); + expect(updateArgStr).to.include('submodule update --force --recursive'); + expect(updateArgStr).to.include(`Authorization: Bearer ${TEST_TOKEN}`); + }); }); describe('checkout', () => { diff --git a/packages/spacecat-shared-data-access/src/models/site/index.d.ts b/packages/spacecat-shared-data-access/src/models/site/index.d.ts index d69fd5a63..c8d285a7c 100644 --- a/packages/spacecat-shared-data-access/src/models/site/index.d.ts +++ b/packages/spacecat-shared-data-access/src/models/site/index.d.ts @@ -34,6 +34,78 @@ export interface HlxConfig { }; } +/** + * Detected submodule information captured during code import. + * Omitted when the cloned repo has no `.gitmodules` file. + */ +export interface SubmodulesMetadata { + /** + * True when at least one submodule URL points to a host other than + * the parent repo's host. Relative URLs (`../foo.git`) and SSH URLs + * targeting the parent's host classify as internal. + */ + external: boolean; + /** + * Submodule URLs exactly as declared in `.gitmodules`, with + * basic-auth credentials stripped from https/http forms. Relative + * (`../foo.git`) and SSH (`git@host:path`) forms are preserved as-is. + */ + urls: string[]; + /** + * BYOG-only. Pre-resolved submodule rewrites for use at clone/pull time. + * Each entry says: "for the submodule whose `path` matches this entry, + * write `url` into `.git/config submodule..url` instead of letting + * git resolve the URL from `.gitmodules`." + * + * The CM repo service proxies BYOG clones through URLs of the form + * `{CM_REPO_URL}/api/program/{programId}/repository/{numericId}.git`. + * When a customer's `.gitmodules` uses relative URLs like `../foo` or + * SSH URLs like `git@github.com:org/foo.git`, git resolves them to + * paths the proxy can't serve. We bypass that by rewriting each + * submodule's URL in `.git/config` to a CM-reachable form before + * running `git submodule update`. `.gitmodules` itself is never + * modified — the working tree stays clean. + * + * URL form depends on the underlying repo type, decided at onboarding: + * - BYOG (`github`/`gitlab`/`bitbucket`/`azure_devops`): + * `{cmRepoUrl}/api/program/{programId}/repository/{numericId}.git` + * - `standard`: + * `https://git.cloudmanager.adobe.com/{orgName}/{repoName}/` + * + * The runtime picks the auth scope to apply by parsing each url's host. + * URLs on the CM proxy host get Bearer + x-api-key + x-gw-ims-org-id + * (via `http.{cmRepoUrl}.extraheader`). URLs on + * `https://git.cloudmanager.adobe.com/{orgName}/` get Basic auth from + * `CM_STANDARD_REPO_CREDENTIALS[programId]` (via the org-prefixed + * extraheader scope). + * + * Populated at onboarding from `GET /api/program/{pid}/repositories` + * + the parent's `.gitmodules`. The onboarding script does the name- + * matching and disambiguation (including short-name collisions where + * two repos in the same program share a last-path-segment but differ + * by `type`) so the runtime can iterate this list mechanically. + * + * Omit for `standard` parent programs — their relative submodule URLs + * resolve natively on the customer's git host without translation. + */ + submoduleMap?: Array<{ + /** Submodule path as declared in `.gitmodules` (the value of `path = …`). */ + path: string; + /** URL to write into `.git/config submodule..url` at clone time — + * proxy URL for BYOG-typed repos, `git.cloudmanager.adobe.com` URL for + * standard-typed repos. */ + url: string; + }>; +} + +/** + * Metadata extracted during code import. Consumers should assume an + * empty object when a field is absent. + */ +export interface CodeMetadata { + submodules?: SubmodulesMetadata; +} + export interface CodeConfig { type: string; owner: string; @@ -41,7 +113,17 @@ export interface CodeConfig { ref: string; installationId?: string; url: string; + /** + * S3 key (not full URL) where the imported repository ZIP is stored. + * Written by the code importer after successful ingestion. + */ s3StoragePath?: string; + /** + * Metadata extracted from the cloned repository. Always overwritten + * on each successful import — a re-import that finds no submodules + * clears any submodule entries from an earlier import. + */ + metadata?: CodeMetadata; } export interface DeliveryConfig { diff --git a/packages/spacecat-shared-data-access/src/models/site/site.schema.js b/packages/spacecat-shared-data-access/src/models/site/site.schema.js index e7236456e..75b307d95 100755 --- a/packages/spacecat-shared-data-access/src/models/site/site.schema.js +++ b/packages/spacecat-shared-data-access/src/models/site/site.schema.js @@ -76,6 +76,26 @@ const schema = new SchemaBuilder(Site, SiteCollection) validate: (value) => isNonEmptyObject(validateConfiguration(value)), get: (value) => Config(value), }) + /** + * Repository configuration used by the code importer and downstream + * consumers (autofix, suggestion generation, code analysis). + * + * Fields written by the importer after a successful clone: + * - s3StoragePath: S3 key (not full URL) of the imported repository ZIP + * - metadata.submodules: `{ external, urls }` when the repo has a + * `.gitmodules` file. The importer always overwrites `metadata` so + * a re-import that finds no submodules clears stale entries from + * an earlier import. + * + * Fields populated at onboarding (not by the importer): + * - metadata.submodules.submoduleMap: BYOG-only list of pre-resolved + * `{ path, url }` rewrites the cm-client applies to `.git/config` + * at clone/pull time so submodules can fetch through the CM proxy + * (or the standard-repo host, for standard submodules of BYOG + * parents). See SubmodulesMetadata in index.d.ts. + * + * See CodeConfig in index.d.ts for the full TypeScript shape. + */ .addAttribute('code', { type: 'any', required: false, @@ -89,6 +109,7 @@ const schema = new SchemaBuilder(Site, SiteCollection) installationId: { type: 'string', required: false }, url: { type: 'string', required: true, validate: (value) => isValidUrl(value) }, s3StoragePath: { type: 'string', required: false }, + metadata: { type: 'any', required: false }, }, }) .addAttribute('deliveryType', { diff --git a/packages/spacecat-shared-data-access/test/unit/models/site/site.model.test.js b/packages/spacecat-shared-data-access/test/unit/models/site/site.model.test.js index 8c9249889..0125ae6d3 100755 --- a/packages/spacecat-shared-data-access/test/unit/models/site/site.model.test.js +++ b/packages/spacecat-shared-data-access/test/unit/models/site/site.model.test.js @@ -530,6 +530,46 @@ describe('SiteModel', () => { instance.setCode({ ...instance.getCode(), s3StoragePath: 'code/site-123/github/adobe/spacecat/main/repository.zip' }); expect(instance.getCode().s3StoragePath).to.equal('code/site-123/github/adobe/spacecat/main/repository.zip'); }); + + it('stores and retrieves metadata.submodules', () => { + const metadata = { + submodules: { + external: true, + urls: [ + '../internal-sub.git', + 'https://gitlab.example.com/team/external-sub.git', + ], + }, + }; + const codeData = { + type: 'github', + owner: 'adobe', + repo: 'spacecat', + ref: 'main', + url: 'https://github.com/adobe/spacecat', + metadata, + }; + instance.setCode(codeData); + expect(instance.getCode().metadata).to.deep.equal(metadata); + }); + + it('overwrites metadata on re-import to clear stale submodule data', () => { + const firstImport = { + type: 'github', + owner: 'adobe', + repo: 'spacecat', + ref: 'main', + url: 'https://github.com/adobe/spacecat', + metadata: { + submodules: { external: false, urls: ['../sub.git'] }, + }, + }; + instance.setCode(firstImport); + expect(instance.getCode().metadata.submodules.urls).to.have.lengthOf(1); + + instance.setCode({ ...instance.getCode(), metadata: {} }); + expect(instance.getCode().metadata).to.deep.equal({}); + }); }); describe('localization fields', () => {