Skip to content
Open
Show file tree
Hide file tree
Changes from 77 commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
a114ab3
initial design
jp-agenta Apr 24, 2026
747502d
initial implementation
jp-agenta Apr 26, 2026
2220e65
initial review
jp-agenta Apr 26, 2026
0c559d8
fixed basic evals
jp-agenta Apr 26, 2026
3862dfe
Add evaluation parallelization
jp-agenta Apr 27, 2026
b958574
Parallelization checks
jp-agenta Apr 27, 2026
5b647ee
quick engine fix
jp-agenta Apr 27, 2026
c4d8368
ongoing debug
jp-agenta May 5, 2026
e049ef5
intermediate design extensions
jp-agenta May 15, 2026
50d1d8e
Merge release/v0.99.9 and evals<>queues work
jp-agenta May 15, 2026
603820f
evals<>queues implementation
jp-agenta May 15, 2026
3ebe721
latest findings
jp-agenta May 20, 2026
f68e42a
Merge release/v0.100.1
jp-agenta May 20, 2026
9b388bd
Add missing sdk files
jp-agenta May 20, 2026
7aedca9
Fix Dependency Injection in EventsDAO
jp-agenta May 20, 2026
ce1258a
bump py deps
jp-agenta May 20, 2026
4cc2f0e
extra findings
jp-agenta May 20, 2026
6bc3301
fixing findings
jp-agenta May 20, 2026
dd211e9
Merge
jp-agenta May 20, 2026
ef1d228
fixing tests and dependency injection
jp-agenta May 21, 2026
08eb3ea
clean up dependencies
jp-agenta May 21, 2026
f1cb077
deep clean up
jp-agenta May 21, 2026
d73fe65
Merge branch 'release/v0.100.1' into feat/unified-eval-loops
junaway May 21, 2026
59ed5e2
docs(eval-loops): consolidate legacy eval-loops docs into unified-eva…
jp-agenta May 21, 2026
a9d28b3
reconciliation added in edit + some fixing
jp-agenta May 21, 2026
0f91ea3
fix simple queue creation and eval processing
jp-agenta May 21, 2026
f64d299
fix services tests
jp-agenta May 21, 2026
3b3ac15
Fix domain exceptions and run (un)archival
jp-agenta May 21, 2026
c5acdbe
feat(api): mock_v0 test workflow + fix batch run-status finalization …
jp-agenta May 21, 2026
7eaf519
test(api): eval flow + flag-matrix tests; close UEL-012/016/028, file…
jp-agenta May 21, 2026
9628538
fix(api): finalize batch query→evaluator runs (UEL-029)
jp-agenta May 21, 2026
71aa2cd
fix(api): make the closed-run lock actually return 409 (UEL-031)
jp-agenta May 21, 2026
c009f0b
test(api): default-queue policy + lifecycle coverage (UEL-011); file …
jp-agenta May 21, 2026
f4f6cc0
fix(API): enforce one active default eval queue per run (UEL-030)
jp-agenta May 21, 2026
c5bdccd
test(API): cover refresh_metrics dispatch branches
jp-agenta May 21, 2026
0bf8d12
docs(eval): move closed UEL-030 into the Closed Findings section
jp-agenta May 21, 2026
8f9059e
refactor(API/SDK): exact source-family rule, resolver/over-count hard…
jp-agenta May 22, 2026
13648d4
deep tests/findings cleanup
jp-agenta May 22, 2026
461e397
fix(API): stamp updated_at on archive_queue; drop inert server_onupdate
jp-agenta May 22, 2026
98d3057
fix(API): allow default-queue is_queue sync on a closed run
jp-agenta May 22, 2026
1d3c3b5
fix(API): stamp updated_at/updated_by_id on secret + org-provider/dom…
jp-agenta May 22, 2026
a27c21a
test(SDK): unit-cover evaluate() spec parsing and normalization
jp-agenta May 22, 2026
5aa9296
test(SDK): integration + acceptance coverage for evaluate(); fix save…
jp-agenta May 22, 2026
76a9833
clean up logs and auto/custom/human
jp-agenta May 22, 2026
026547e
docs(eval): origin execution model — today (human/auto/custom) and fu…
jp-agenta May 22, 2026
e9ff86d
bump py deps
jp-agenta May 22, 2026
423800e
fixing tensor
jp-agenta May 22, 2026
30f53c8
split refresh metrics from process slice
jp-agenta May 22, 2026
92d733e
add live evaluation tests
jp-agenta May 22, 2026
ae5fc5a
Merge release/v0.100.2
jp-agenta May 22, 2026
85f4b68
merge v0.100.2
jp-agenta May 26, 2026
5459c20
Merge release/v0.100.9
jp-agenta Jun 1, 2026
cf7c22f
Merge branch 'release/v0.100.9' into feat/unified-eval-loops
jp-agenta Jun 1, 2026
4e90b46
Merge branch 'release/v0.100.9' into feat/unified-eval-loops
jp-agenta Jun 1, 2026
58d666e
remove fern clients and api references
jp-agenta Jun 1, 2026
05980d5
save breakdown
jp-agenta Jun 1, 2026
a22c1af
Fix create/edit > set for results/metrics
jp-agenta Jun 1, 2026
c4b8755
Fix migrations, and rerun mechanism
jp-agenta Jun 1, 2026
ea1895b
fix env vars
jp-agenta Jun 1, 2026
dceb7bb
Fix tests
jp-agenta Jun 1, 2026
2470509
Merge branch 'fe-feat/mustache-support' into feat/unified-eval-loops
jp-agenta Jun 2, 2026
5c66e22
Merge branch 'fe-feat/mustache-support' into feat/unified-eval-loops
junaway Jun 2, 2026
05bb37d
fix test warnings
jp-agenta Jun 2, 2026
e4d9678
fix more warnings
jp-agenta Jun 2, 2026
02c0379
Update AGENTS.md
junaway Jun 2, 2026
616823c
fix fern clients and remove tsbuildinfo
jp-agenta Jun 2, 2026
4c388c3
fix REPORTS
jp-agenta Jun 2, 2026
a9c5670
fix lazy loads
jp-agenta Jun 2, 2026
aec1e80
fix tests
jp-agenta Jun 2, 2026
d756a0e
fix migrations and default queues
jp-agenta Jun 2, 2026
ccc6e95
fix tests
jp-agenta Jun 2, 2026
eeb1fa6
drop dead code
jp-agenta Jun 2, 2026
4309ce4
remove events ingestion in oss
jp-agenta Jun 2, 2026
86ff78c
fix applied_identifying_filter
jp-agenta Jun 2, 2026
eea6a73
drop api reference changes
jp-agenta Jun 2, 2026
831b1e0
split caching and locking
jp-agenta Jun 2, 2026
6f09b30
rename locks
jp-agenta Jun 2, 2026
76ded57
fix some trace_id typing
jp-agenta Jun 2, 2026
37d4776
update polling
jp-agenta Jun 2, 2026
10522f6
CR clean-up
jp-agenta Jun 2, 2026
0693d26
fix concurrent slices
jp-agenta Jun 2, 2026
6e5d545
oversee large unification (ingest vs re-execute)
jp-agenta Jun 2, 2026
e1c385a
quick CR
jp-agenta Jun 2, 2026
d650588
re-generate fern clients
jp-agenta Jun 2, 2026
187b53a
fix fern responses
jp-agenta Jun 2, 2026
95cf42e
Merge branch 'fe-feat/mustache-support' into feat/unified-eval-loops
junaway Jun 2, 2026
e7bdf54
fix fern clients
jp-agenta Jun 2, 2026
209fcd2
Fix findings
jp-agenta Jun 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
**/*.pem
**/*dont_commit_me*
web/packages/agenta-api-client/dist/
web/tsconfig.tsbuildinfo

__pycache__/
**/__pycache__/
Expand Down
20 changes: 20 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,26 @@ Concrete examples:
- Legacy app storage marker (`WORKFLOW_MARKER_KEY`): `api/oss/src/core/applications/service.py`
- Legacy dedup key normalization (`__dedup_id__` <-> `testcase_dedup_id`): `api/oss/src/apis/fastapi/testsets/router.py`

### Alembic migration chains (OSS + EE)

Migrations live in two separate, parallel chains that must each resolve to a
single head:
- OSS: `api/oss/databases/postgres/migrations/core/versions/`
- EE: `api/ee/databases/postgres/migrations/core/versions/`

Rules:
- After adding/editing/renaming any migration, verify each chain has exactly ONE
head with the bundled tool: from each `.../migrations/` directory run
`python3 find_head.py core` and confirm the `Heads:` list has a single entry.
Run it for BOTH OSS and EE.
- New migrations chain linearly after the existing head — never fork off an older
node (a fork produces two heads; alembic then can't resolve a linear upgrade).
- Revision ids must be globally unique within a chain. A duplicate id makes
alembic silently skip one file (the migration never runs).
- The EE chain and the OSS chain are parallel, so an OSS migration chains after
the OSS head while its EE counterpart chains after the EE head — same revision id,
possibly different `down_revision`.

### Router and function style conventions

Router style:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -254,9 +254,11 @@ def check_url_safety(cls, v: Any) -> Any: # noqa: N805
return v

from oss.src.dbs.postgres.git.mappings import map_dto_to_dbe
from oss.src.dbs.postgres.shared.engine import engine as db_engine
from oss.src.dbs.postgres.shared.engine import get_transactions_engine
from datetime import datetime, timezone

db_engine = get_transactions_engine()

workflow_create = WorkflowCreate(
**application_create.model_dump(mode="json"),
)
Expand All @@ -267,7 +269,7 @@ def check_url_safety(cls, v: Any) -> Any: # noqa: N805

# Avoid slug collision with existing workflow artifacts (e.g. evaluators)
artifact_slug = git_artifact_create.slug
async with db_engine.core_session() as session:
async with db_engine.session() as session:
existing = (
await session.execute(
select(WorkflowArtifactDBE).filter(
Expand Down Expand Up @@ -298,7 +300,7 @@ def check_url_safety(cls, v: Any) -> Any: # noqa: N805
dto=artifact_dto,
)

async with db_engine.core_session() as session:
async with db_engine.session() as session:
session.add(artifact_dbe)
await session.commit()

Expand Down Expand Up @@ -364,7 +366,7 @@ def check_url_safety(cls, v: Any) -> Any: # noqa: N805
dto=variant_dto,
)

async with db_engine.core_session() as session:
async with db_engine.session() as session:
session.add(variant_dbe)
await session.commit()

Expand Down Expand Up @@ -415,7 +417,7 @@ def check_url_safety(cls, v: Any) -> Any: # noqa: N805
dto=revision_dto,
)

async with db_engine.core_session() as session:
async with db_engine.session() as session:
session.add(revision_dbe)
await session.commit()

Expand Down
4 changes: 2 additions & 2 deletions api/ee/databases/postgres/migrations/core/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from alembic import context

from oss.src.dbs.postgres.shared.engine import engine
from oss.src.utils.env import env
from oss.src.dbs.postgres.shared.base import Base

# Side-effect imports: register SQLAlchemy models with Base.metadata
Expand All @@ -29,7 +29,7 @@
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
config.set_main_option("sqlalchemy.url", engine.postgres_uri_core) # type: ignore
config.set_main_option("sqlalchemy.url", env.postgres.uri_core)


# Interpret the config file for Python logging.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
"""add default evaluation queues

Revision ID: a1d2e3f4a5b6
Revises: e6f7a8b9c0d2
Create Date: 2026-05-15 00:00:00

Previously shared revision id `a1b2c3d4e5f6` with
`drop_corrupted_metrics_for_some_runs`, so alembic skipped it and the index
below never ran. Renamed to `a1d2e3f4a5b6`. The EE chain extends past the shared
OSS head `e6f7a8b9c0d1` with EE-only migrations
(`9d3e8f0a1b2c -> a1b2c3d4e5f7 -> b2c3d4e5f7a8`), so this EE copy chains after
`b2c3d4e5f7a8` while the OSS copy chains after `e6f7a8b9c0d1`.

The partial unique index covers ALL default queues (active or archived), so
there is at most ONE default queue row per (project_id, run_id) for the lifetime
of the run. Archiving a default does NOT free the slot — the single row is
archived/unarchived in place by reconcile, and user-facing archive of a default
is forbidden in the service layer.
"""

from typing import Sequence, Union

from alembic import op

revision: str = "a1d2e3f4a5b6"
down_revision: Union[str, None] = "e6f7a8b9c0d2"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
op.execute("DROP INDEX IF EXISTS ux_evaluation_queues_default_per_run")
op.execute("""
CREATE UNIQUE INDEX ux_evaluation_queues_default_per_run
ON evaluation_queues (project_id, run_id)
WHERE (flags ->> 'is_default')::boolean = true
""")


def downgrade() -> None:
op.execute("DROP INDEX IF EXISTS ux_evaluation_queues_default_per_run")
Loading
Loading