feat(blobmanager): add managed CAS backend via S3 Access Points by jiparis · Pull Request #3121 · chainloop-dev/chainloop

jiparis · 2026-05-15T18:23:32Z

Summary

Introduces a new AWS-S3-ACCESS-POINT CAS backend that targets a single shared bucket via per-tenant S3 Access Points. Each request mints a scoped session via sts:AssumeRole with a session policy and RoleSessionName derived from the authenticated requesting org carried in ctx.
All AWS coordinates (AccessPointARN, Region, BaseRoleARN) live in the per-tenant secret blob; there is no deployment-level config block. The provider is registered unconditionally, so on-prem deployments without managed CAS are unaffected — they simply never have managed rows.
Per-tenant key prefix and STS session name both come from the authenticated requesting org (not the secret blob), so a secrets-store rewrite alone cannot reroute uploads to another tenant's AP. The AP resource policy enforces this server-side via aws:userid.
Carries the requesting org through the CAS robotaccount JWT (org-id claim). An auth middleware in artifact-cas reads the claim and enriches the request ctx via blobmanager.WithRequestingOrg before any service handler runs, so callers cannot accidentally skip the binding. Non-managed providers ignore the key.
Dev-mode bypass (skip sts:AssumeRole, use the SDK's default credential chain) is opt-in via the CHAINLOOP_S3_ACCESS_POINT_DEV_MODE env var.
For Managed=true rows, redacts AWS implementation details from any wire output: the AP ARN (Location) becomes "managed by Chainloop" and the provider ID (Provider) becomes "Chainloop" in both API responses and audit-event payloads. The DB and biz layer keep the real values.
The user-facing CASBackend.Create RPC rejects the managed provider; managed rows are provisioned by the platform-side reconciler.

AI Assistance

This change was developed with Claude Code; per-commit Assisted-by: trailers record the specific commits.

Introduce a new `AWS-S3-ACCESS-POINT` CAS backend that targets a single shared bucket via per-tenant S3 Access Points. Each upload/download mints scoped temporary credentials via `sts:AssumeRole` with a session policy narrowed to the tenant's AP ARN and key prefix, and a session name derived from the authenticated requesting org carried in `ctx` (`s3accesspoint.WithRequestingOrg`). Both upstream binaries pick up a new optional `blob_backends.s3_access_point` config block (`base_role_arn`, `region`, `session_duration`); when the block is absent the provider stays unregistered and behaviour is identical to before. The pod's ambient AWS identity (IRSA / instance profile / env vars) is used to call STS — no static credentials live in config. Per-tenant data (AP ARN, region override, key prefix) is stored as a JSON blob in the secrets manager and read via `FromCredentials`, so the existing `backend.Provider` interface is unchanged. Add `OrgID` to the CAS robotaccount JWT claims so artifact-cas can enrich its context with the requesting org before invoking the backend; existing providers ignore the key. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev> Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2

… dev mode Two related refinements to the AWS-S3-ACCESS-POINT provider. 1. The per-tenant key prefix is now derived at request time from the authenticated requesting org carried in ctx via WithRequestingOrg, rather than read from a `KeyPrefix` field in the secrets-manager blob. The prefix and the AssumeRole `RoleSessionName` now share their single source of truth, so a tampered Credentials blob can no longer reroute a tenant's writes into another tenant's namespace. The Credentials struct shrinks to {AccessPointARN, Region}. The session policy and the bucket-level key both use `<orgUUID>` as the prefix; the AP resource policy's Resource ARN must be `${apARN}/object/<orgUUID>/*` to match. 2. Add a `dev_mode_use_ambient_credentials` Config flag (proto + wire-plumbed in both binaries) that bypasses `sts:AssumeRole` and routes S3 calls through whatever ambient AWS identity the SDK's default credential chain produced. Local dev no longer requires an IAM role + trust policy setup. The missing-org fail-closed check still fires in dev mode so callers that forget WithRequestingOrg surface the same bug locally that they would in production. A loud warning is logged at startup. DEV ONLY — never enable in multi-tenant deployments. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev> Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2

…wire output For Managed=true CAS backends, replace Location with "managed by Chainloop" and Provider with "Chainloop" everywhere the controlplane emits a CASBackend outside its trust boundary: * API responses (bizCASBackendToPb), so `chainloop cas-backend ls` no longer prints the AWS account ID, region, or AP name. * Audit-log events on the NATS bus (CASBackendCreated, CASBackendUpdated, CASBackendDeleted, CASBackendPermanentDeleted, CASBackendStatusChanged), so downstream consumers can't surface the same details to tenants either. The DB and biz layer continue to carry the real ARN and provider ID unchanged, so PerformValidation, the platform reconciler, and any forensic join by CASBackendID still work. Two helpers (displayLocation, displayProvider) keep the sanitization rule in one place. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev> Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2

chainloop-platform · 2026-05-15T18:23:40Z

AI Session Analysis

Avg score	Sessions	Failing policies	Attribution	Files	Lines	Total Duration
🟢 87%	1	⚠️ 1	93% AI / 7% Human	43	+2263 / -1070	104h30m12s

🟢 87% — 93% AI — ⚠️ 1 policies failing

May 14, 2026 09:34 UTC · 104h30m12s · $366.25 · 8.1k in / 849.5k out · claude-code 2.1.139 (claude-opus-4-7)

Change Summary

Adds a new S3 Access Point backend for managed-CAS blob storage with per-tenant STS credential isolation. Introduces org-scoped JWT enforcement (orgID mandatory) and managed-CAS guards throughout the auth and blob-manager layers. Refactors BlobBackends proto to ManagedCASBackends and relocates auth helpers per user direction. Adds a developer-mode bypass env var with explicit production warning. Includes new unit tests covering fail-closed org checks, credential extraction, and managed-location masking.

AI Session Overall Score

🟢 87% — Well-executed feature with solid tests, fail-closed security, and clean scope discipline.

AI Session Analysis Breakdown

🟢 92% · alignment

No notes.

🟢 88% · scope-discipline

No notes.

🟢 88% · solution-quality

🟢 Fail-closed guard errors when org UUID is absent from context, preventing cross-tenant data access. · High Impact

🟡 Dev-mode env var bypasses per-tenant STS AssumeRole isolation; must be absent from all production configs. · Low Severity

💡 Ensure the dev-mode env var is explicitly blocked or absent from all production deployment configs.

🟢 88% · verification

🟢 32 test runner invocations across all changed packages; final run fully green with behavioral assertions. · High Impact

🟡 Intermediate test failures occurred mid-session before being resolved at the final run. · Low Severity

🟢 85% · context-and-planning

🟢 AI fetched the Linear design doc, ran parallel explorations, and produced a detailed written plan before coding. · High Impact

🟡 No CLAUDE.md or AGENTS.md was read; only Conductor system prompt provided project-level guidance. · Low Severity

🟡 74% · user-trust-signal

🟠 AI declared PR description updated before verifying, requiring user to re-prompt for the correction. · Medium Severity

💡 Verify delegated output before declaring done to avoid user-visible false completions.

🟡 Multiple sequential AWS errors during manual testing required several back-and-forth correction turns. · Low Severity

💡 Pre-validate AWS setup steps in the guide to reduce error-driven correction loops.

File Attribution

██████████████████░░ 93% AI / 7% Human

Status	Attribution	File	Lines
modified	ai	`pkg/blobmanager/s3accesspoint/backend.go`	+438 / -70
modified	ai	`pkg/blobmanager/s3accesspoint/provider.go`	+353 / -151
modified	ai	`pkg/blobmanager/s3accesspoint/provider_test.go`	+262 / -108
modified	ai	`pkg/blobmanager/s3accesspoint/backend_test.go`	+255 / -45
deleted	ai	`app/artifact-cas/internal/server/auth.go`	+69 / -69
modified	ai	`app/controlplane/internal/conf/controlplane/config/v1/conf.proto`	+61 / -61
modified	ai	`pkg/blobmanager/loader/loader.go`	+62 / -54
modified	ai	`pkg/blobmanager/loader/loader_test.go`	+75 / -33
modified	ai	`app/artifact-cas/internal/service/service.go`	+33 / -62
modified	ai	`app/controlplane/internal/service/casbackend_test.go`	+95 / -0
deleted	ai	`pkg/blobmanager/backend_test.go`	+44 / -44
modified	ai	`internal/robotaccount/cas/robotaccount_test.go`	+77 / -2
modified	human	`app/artifact-cas/internal/service/service_test.go`	+0 / -75
modified	ai	`internal/robotaccount/cas/robotaccount.go`	+59 / -14
modified	ai	`app/controlplane/pkg/biz/casbackend.go`	+58 / -11
modified	ai	`app/controlplane/cmd/wire.go`	+33 / -33
modified	ai	`app/artifact-cas/cmd/wire.go`	+32 / -32
modified	ai	`pkg/blobmanager/backend.go`	+32 / -32
modified	ai	`app/artifact-cas/internal/conf/conf.proto`	+29 / -29
modified	ai	`app/controlplane/internal/service/casbackend.go`	+36 / -12
modified	ai	`app/artifact-cas/configs/config.devel.yaml`	+18 / -18
modified	ai	`app/controlplane/pkg/biz/cascredentials.go`	+23 / -13
modified	ai	`app/controlplane/configs/config.devel.yaml`	+17 / -17
modified	ai	`app/controlplane/pkg/biz/mocks/CASClient.go`	+11 / -10
modified	ai	`app/artifact-cas/internal/server/grpc.go`	+9 / -9

…and 18 more file(s).

Policies (4, 1 failing)

Status	Policy	Material	Messages
✅ Passed	`ai-config-ai-agents-allowed`	`ai-coding-session-234a03`	-
✅ Passed	`ai-config-no-dangerous-commands`	`ai-coding-session-234a03`	-
⚠️ Failed	`ai-config-no-secrets`	`ai-coding-session-234a03`	Potential secret (Quoted API key/password) found in session content [turn=1567, source=tool_result, line=39, value=secret: ...bdK"] Potential secret (Quoted API key/password) found in session content [turn=662, source=tool_result, line=78, value=secret: ...mV0"]
✅ Passed	`ai-config-mcp-servers-allowed`	`ai-coding-session-234a03`	-

Powered by Chainloop and Chainloop Trace

kusari-inspector · 2026-05-15T18:27:14Z

Kusari Analysis Results:

✅ No Flagged Issues Detected
All values appear to be within acceptable risk parameters.

No pinned version dependency changes, code issues or exposed secrets detected!

Note

View full detailed analysis result for more information on the output and the checks that were run.

@kusari-inspector rerun - Trigger a re-analysis of this PR
@kusari-inspector feedback [your message] - Send feedback to our AI and team
See Kusari's documentation for setup and configuration.
Commit: cfa4ff4, performed at: 2026-05-15T19:34:39Z

Found this helpful? Give it a 👍 or 👎 reaction!

cubic-dev-ai

2 issues found across 34 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="app/controlplane/internal/conf/controlplane/config/v1/conf.proto">

<violation number="1" location="app/controlplane/internal/conf/controlplane/config/v1/conf.proto:152">
P2: Enforce `base_role_arn` when dev mode is disabled; the current schema allows invalid production config that will fail only at runtime.</violation>
</file>

<file name="app/controlplane/internal/service/cascredential.go">

<violation number="1" location="app/controlplane/internal/service/cascredential.go:152">
P1: Use the authenticated requesting org when minting CAS credentials; deriving `OrgID` from `backend.OrganizationID` can incorrectly scope managed S3 access-point sessions to backend ownership instead of caller identity.</violation>
</file>

_{Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Fix all with cubic
Re-trigger cubic}

Two follow-ups from the PR review on chainloop-dev#3121: * The CAS JWT minted by cascredential.go, attestation.go and casredirect.go now embeds OrgID from the authenticated caller (entities.CurrentOrg / robotAccount.OrgID) instead of backend.OrganizationID. For managed S3 Access Point backends this OrgID drives the AssumeRole session name and the AP-policy aws:userid match; deriving it from the resolved row would weaken the cross-tenant guarantee if a future bug ever let a caller resolve a backend they don't own. * The S3AccessPoint proto message now carries a buf.validate CEL constraint that requires base_role_arn when dev_mode_use_ambient_credentials is false, surfacing the misconfiguration at config-load time rather than at first upload. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev> Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2

kusari-inspector · 2026-05-15T19:34:41Z

Kusari PR Analysis rerun based on - cfa4ff4 performed at: 2026-05-15T19:34:39Z - link to updated analysis

A `go mod tidy` while developing the s3accesspoint provider regressed several deps: * go-git/v6 downgraded alpha.3 -> alpha.2 (CVE-2026-45022, commit signature spoofing) * go-billy/v5 downgraded 5.9.0 -> 5.8.0 (CVE-2026-44973 path traversal, CVE-2026-44740 symlink-loop DoS) * go-billy/v6 swapped to an older snapshot * go-git/v5 downgraded 5.19.0 -> 5.18.0 * unrelated olekukonko/* and golang.org/x/* version churn that broke CI's go-module tidy check Restoring go.mod and go.sum to match origin/main resolves both the Kusari CVE alerts and the CI failures. aws-sdk-go-v2/service/sts (needed by the s3accesspoint provider) is already an indirect at v1.41.9 on main, so no go.mod change is required for the new code to build. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev>

migmartri · 2026-05-17T20:01:12Z

@jiparis how does this work? Does it create a backend DB entry automatically if the managed setup is configured in the instance in a similar way we do it with inline?

The proto message and its YAML field describe configuration for *managed* CAS backends (provisioned and operated by Chainloop), not generic blob storage. Rename: * proto message `BlobBackends` -> `ManagedCASBackends` * proto field `blob_backends` -> `managed_cas_backends` in both controlplane and artifact-cas Bootstrap messages * matching Go field on the regenerated `*conf.Bootstrap` (`ManagedCasBackends`) and references in wire.go / wire_gen.go * commented-out example block in both `config.devel.yaml` No behavioural change; the only deployments that read this block today are local-dev configs (gitignored config.local.yaml) which have been updated separately. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev> Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2

CASBackendService.Create previously accepted any provider ID present in the loader's provider map, including AWS-S3-ACCESS-POINT. A sufficiently determined user could craft a Create request that half-provisioned a managed row pointing at an AP ARN they don't own, bypassing the platform reconciler's trust boundary. Add an explicit isManagedOnlyProvider() guard at the front of Create so the public RPC fails fast with `managed CAS backends cannot be created via this API`. The platform reconciler still creates managed rows by calling biz.CASBackendUseCase.Create directly, which is unaffected. Update/SoftDelete are already guarded against managed rows in the biz layer. Assisted-by: Claude Code Signed-off-by: Jose I. Paris <jiparis@chainloop.dev> Chainloop-Trace-Sessions: 234a03ed-b238-4506-95f0-235242842db2