Skip to content

perf(controlplane): denormalize organization_id onto workflow_runs#3127

Open
migmartri wants to merge 5 commits into
chainloop-dev:mainfrom
migmartri:miguel/pfm-5839-workflow-runs-organization-id
Open

perf(controlplane): denormalize organization_id onto workflow_runs#3127
migmartri wants to merge 5 commits into
chainloop-dev:mainfrom
migmartri:miguel/pfm-5839-workflow-runs-organization-id

Conversation

@migmartri
Copy link
Copy Markdown
Member

@migmartri migmartri commented May 16, 2026

Fixes #3128.

Summary

  • Adds organization_id to workflow_runs and a (organization_id, created_at DESC) index so org-scoped list/aggregate queries are sargable without joining workflows.
  • The previous query joined via a correlated EXISTS, which let the planner pick a backward index scan on created_at and walk most of the table for orgs with sparse recent activity. See workflow run list times out on some organizations #3128 for the full root-cause analysis.

Migration

The migration is designed to run safely under a rolling deploy:

  1. Add the column nullable + FK NOT VALID.
  2. Install a BEFORE INSERT trigger that fills organization_id from the parent workflow when the caller doesn't set it. This protects against the rolling-update window where old replicas are still inserting without the column set.
  3. Batched backfill in a DO loop with one COMMIT per 5000-row batch (no giant transaction, no long row locks).
  4. VALIDATE CONSTRAINT + SET NOT NULL (verify-only scan in PG 12+).
  5. CREATE INDEX CONCURRENTLY.

A follow-up PR will drop the trigger and function in a separate release once every replica runs code that sets the column explicitly.

Assisted-by: Claude Code

Makes org-scoped workflow run queries sargable via a new
(organization_id, created_at DESC) index, fixing the timeouts on the
workflow runs list reported in PFM-5839. The previous query joined to
workflows via a correlated EXISTS, which let the planner pick a
backward index scan on created_at and scan most of the table on orgs
with sparse recent activity.

A BEFORE INSERT trigger fills organization_id from the parent workflow
during the rolling-deploy window so old replicas that don't set the
column explicitly can keep writing. A follow-up release will drop the
trigger and function once every replica runs code that sets the
column directly.

Assisted-by: Claude Code
Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>

Chainloop-Trace-Sessions: f00cda5d-4707-40cf-800e-54da06ada048
@chainloop-platform
Copy link
Copy Markdown
Contributor

chainloop-platform Bot commented May 16, 2026

AI Session Analysis

Avg score Sessions Failing policies Attribution Files Lines Total Duration
🟡 74% 1 ✅ 0 86% AI / 14% Human 7 +246 / -48 55m0s

🟡 74% — 86% AI — ✅ All policies passing

May 16, 2026 20:48 UTC · 55m0s · $40.41 · 1.6k in / 323.6k out · claude-code 2.1.143 (claude-opus-4-7)

Change Summary

  • Adds a denormalized organization_id column to the workflow run entity via two Ent schema migrations.
  • Refactors the List query predicate to use the new column, eliminating an N+1 join.
  • Adds regression tests covering cross-org isolation and soft-delete exclusion after a PR review caught a dropped predicate.

AI Session Overall Score

🟡 74% — Core denormalization delivered correctly; migration negotiation required repeated redirects and tests were added only after review.

AI Session Analysis Breakdown

🟢 85% · scope-discipline

🟢 AI identified stray go.sum changes as unrelated and explicitly excluded them from the commit, keeping the diff clean. · High Impact

🟡 AI wrote a memory/feedback file and edited MEMORY.md outside the PR's code scope without being asked. · Low Severity

🟢 82% · solution-quality

No notes.

🟡 72% · verification

🟢 All 42 WorkflowRun integration tests passed via testcontainers against a real DB, including the new migration trigger. · High Impact

🔴 AI committed the List query refactor without tests; a PR reviewer caught the soft-delete regression, not the AI. User-trust-signal and solution-quality flagged the same omission. · High Severity

💡 When refactoring a query predicate, read the existing test suite first and add a case for each predicate you touch before committing — not after review catches it.

🟡 Two targeted test runs returned no output, making it hard to confirm those specific subtests executed; aggregate pass confirmed separately. · Low Severity

🟡 68% · alignment

🟠 AI misread the user's migration-trigger rejection and asked for clarification rather than rephrasing its understanding, requiring two extra exchanges to converge. · Medium Severity

💡 When a user interrupts mid-plan, restate your understanding of their position before proposing the next option — it shortens the redirection loop.

🟠 AI scaffolded the two-migration-file approach before the user confirmed, triggering another redirect; user had to re-explain the same intent across several turns. · Medium Severity

💡 On multi-step destructive schema changes, seek explicit sign-off on each phase boundary before executing, not mid-execution.

🟡 68% · context-and-planning

🟢 AI produced a detailed multi-section plan with schema, data layer, migration steps, and race-window caveats before any code edits, then waited for explicit user confirmation. · High Impact

🟠 Opening prompt was a single Linear ticket URL with no stated constraints, scope, or success criteria for a complex multi-axis DB change. · Medium Severity

💡 Front-load key constraints (performance target, migration safety requirements, deployment model) so the AI does not have to discover them through back-and-forth.

🟡 62% · user-trust-signal

No notes.


File Attribution

█████████████████░░░ 86% AI / 14% Human

Status Attribution File Lines
modified ai app/controlplane/pkg/data/ent/migrate/migrations/20260516210119.sql +87 / -1
modified ai app/controlplane/pkg/biz/workflowrun_integration_test.go +83 / -0
modified ai app/controlplane/pkg/data/workflowrun.go +32 / -36
modified human app/controlplane/pkg/data/ent/migrate/schema.go +28 / -10
modified ai app/controlplane/pkg/data/ent/schema/workflowrun.go +13 / -0
modified human app/controlplane/pkg/data/ent/migrate/migrations/atlas.sum +2 / -1
modified ai app/controlplane/pkg/data/ent/schema/organization.go +1 / -0

Policies (4)

Status Policy Material Messages
✅ Passed ai-config-ai-agents-allowed ai-coding-session-f00cda -
✅ Passed ai-config-no-dangerous-commands ai-coding-session-f00cda -
✅ Passed ai-config-no-secrets ai-coding-session-f00cda -
✅ Passed ai-config-mcp-servers-allowed ai-coding-session-f00cda -

Powered by Chainloop and Chainloop Trace

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 20 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="app/controlplane/pkg/data/workflowrun.go">

<violation number="1" location="app/controlplane/pkg/data/workflowrun.go:375">
P2: List no longer consistently excludes runs from soft-deleted workflows. Apply `workflow.DeletedAtIsNil()` for all list queries, not only when `ProjectIDs` is provided.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Fix all with cubic | Re-trigger cubic

Comment thread app/controlplane/pkg/data/workflowrun.go Outdated
@migmartri migmartri requested a review from a team May 16, 2026 21:25
migmartri added 4 commits May 16, 2026 23:31
The previous commit moved org scoping off the workflows edge but only
re-attached workflow.DeletedAtIsNil() when ProjectIDs was set, so runs
from soft-deleted workflows leaked into the org-level list when no
project filter was applied. Apply DeletedAtIsNil() unconditionally via
the workflows edge — it's a per-row PK lookup after the org-scoped index
narrows the candidate set, so it doesn't reintroduce the planner
ambiguity from PFM-5839.

Adds an integration test that creates a workflow + run, soft-deletes
the workflow, and asserts the run is excluded from List.

Assisted-by: Claude Code
Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>

Chainloop-Trace-Sessions: f00cda5d-4707-40cf-800e-54da06ada048
…orkflow run List

The refactor in PFM-5839 changed how org scoping and project filtering
compose in WorkflowRunRepo.List. The previous table-driven cases didn't
assert either behavior — they only listed runs in a single org and never
filtered by ProjectIDs — so the regression cubic caught had no coverage
beyond the soft-delete case added in the prior commit.

Adds two integration tests:
- TestListIsolatedByOrg: listing in org1 returns only org1 runs and
  vice versa.
- TestListFilterByProjectIDs: creates a workflow in a second project,
  asserts filtering by the original project's ID excludes it.

Assisted-by: Claude Code
Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>

Chainloop-Trace-Sessions: f00cda5d-4707-40cf-800e-54da06ada048
Removes the PFM-5839 references in the migration header and the
WorkflowRunRepo.List comment. The PFM tracker is private; the public
codebase shouldn't link to it. The technical context remains in place.

Assisted-by: Claude Code
Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>

Chainloop-Trace-Sessions: f00cda5d-4707-40cf-800e-54da06ada048
The prior commit edited the migration SQL but left atlas.sum stale,
which the ent CI check caught. Rerunning atlas migrate hash.

Assisted-by: Claude Code
Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

workflow run list times out on some organizations

1 participant