feat: implement verifier runner to check received data by erenaslandev · Pull Request #13 · VirtualMetric/PipeBench

erenaslandev · 2026-06-16T11:37:13Z

Summary by CodeRabbit

New Features
- Test cases can now enable an optional DuckDB-based “verifier” for post-run correctness checks on Avro/Parquet data from an S3-compatible endpoint.
- The runner adds support for configuring the verifier container image and runs it as part of the correctness flow, producing a verdict output.
Tests
- Added unit tests for verifier verdict logic, S3 query/prelude generation, parsing behavior, and compose rendering.
Chores
- Updated CI/build/release tooling to build and publish the verifier container image, with additional verifier-focused Go test coverage.

coderabbitai · 2026-06-16T11:37:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 748a1dd1-03e6-46c9-add0-0c82df0317c5

📥 Commits

Reviewing files that changed from the base of the PR and between b7b881c and 9d4f214.

📒 Files selected for processing (2)

internal/config/case.go
internal/config/verifier_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/config/case.go

Walkthrough

Adds a DuckDB-based verifier container (containers/verifier) that reads Avro/Parquet S3 objects, polls for stability, computes correctness metrics (lines received, unique lines, duplicates), and emits verdict.json. The orchestrator conditionally renders a verifier compose service under the verify profile (omitting the receiver), the runner bifurcates into verifier vs. receiver drain paths, and build/release tooling is extended for the new image.

Changes

DuckDB Verifier Integration

Layer / File(s)	Summary
VerifierConfig model and validation `internal/config/case.go`, `internal/config/subject.go`, `internal/config/verifier_test.go`	`TestCase` gains an optional `Verifier *VerifierConfig` field with S3 bucket/prefix, object format (`avro`/`parquet`), duplicate/NULL check settings, timing controls, and helper methods. `UsesVerifier()` reports whether verifier is enabled. `validateVerifier()` enforces structural constraints: singular generator with `total_lines > 0`, required `s3_bucket`, S3 emulator availability, valid duration strings. The `vmetric` registry gains `s3_avro_sink` and `s3_parquet_sink` capabilities to gate columnar correctness cases.
Verifier container binary and module `containers/verifier/go.mod`, `containers/verifier/Dockerfile`, `containers/verifier/main.go`, `containers/verifier/main_test.go`	New standalone Go module with multi-stage Dockerfile (compiles Go binary, uses DuckDB base with pre-loaded `httpfs` and `avro` extensions). `main.go` implements: `loadConfig()` reads environment, `waitStable()` polls `count(*)` until quiet window achieved or expected total reached, `queryStats()` aggregates total/distinct/null row counts via SQL, `buildVerdict()` computes pass/fail and duplicate counts, `run()` orchestrates the two-phase flow and persists `verdict.json`. Helpers include `sourceExpr()` for S3 glob paths, `prelude()` for DuckDB session setup with credential redaction, `runDuckDB()` CLI invocation, `parseRows()` JSON parsing, and environment/duration parsing. Unit tests cover verdict outcomes, source expressions, prelude credential handling, endpoint splitting, JSON parsing, and quiet-poll floor clamping.
Orchestrator interface, compose rendering, and WaitForVerifierExit `internal/orchestrator/orchestrator.go`, `internal/orchestrator/docker.go`, `internal/orchestrator/verifier_render_test.go`	`Orchestrator` interface gains `WaitForVerifierExit(timeout time.Duration) error`. `RunConfig` and internal `composeVars` extend to include `VerifierImage`, `VerifierEnabled`, and all verifier-specific environment variables (S3 bucket/prefix, format, expected count, quiet window, timeout, message field, null fields, credentials). Compose template conditionally renders a `verifier` service under the `verify` profile with mounts and env, omits the `receiver` service when verifier is enabled, and suppresses generator's `depends_on.receiver` dependency. `writeCompose()` populates verifier variables from test case, defaults message field name, joins null-fields list, and selects S3 credential env from MinIO or AWS emulator. `ComposeRunner.WaitForVerifierExit()` polls `docker inspect` until container state is `exited`, treating only timeout as error. `populateServiceNames()` early-returns for verifier cases. Compose-render tests verify service configuration, env var injection, S3 backend selection, receiver omission, and profile behavior.
Runner bifurcation and runVerifier helper `internal/runner/runner.go`, `cmd/harness/main.go`	`Options` gains `VerifierImage` field with default `vmetric/bench-verifier:latest`, passed through `orchestrator.RunConfig`. `Runner.Run()` bifurcates on `tc.UsesVerifier()`: verifier path calls `runVerifier()` to start the service, wait for exit bounded by run deadline, read/unmarshal `verdict.json` into `ReceiverMetrics`; non-verifier path retains existing receiver drain and grace/performance polling logic. Metrics acquisition after drain adjusts: verifier cases use pre-populated `recvMetrics`, others query receiver. Correctness verdict re-derivation from loss/over-delivery is gated to exclude verifier cases. CLI harness adds `--verifier-image` flag bound to verifier image option.
Build, push, and release tooling `Makefile`, `.github/workflows/ci.yml`, `.goreleaser.yml`	`Makefile` introduces `VERIFIER_IMAGE` variable, `build-verifier` target (docker buildx with `--load`), `push-verifier` target (docker buildx with `--push` and `ATTEST_FLAGS` when `ATTEST=1`), includes both in `build-containers` and `push-containers` aggregates, and extends `tidy` to run `go mod tidy` in `containers/verifier`. CI workflow adds Go test and build steps for `containers/verifier`, PR-only `build-verifier` step, and main-push-only `push-verifier ATTEST=1` step. GoReleaser release notes extend GitHub release header to include `vmetric/bench-verifier` image reference.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(173, 216, 230, 0.5)
    Note over Harness: Verifier end-to-end flow
  end
  participant Harness as cmd/harness
  participant Runner as Runner.Run
  participant ComposeRunner as ComposeRunner
  participant Verifier as verifier container
  participant S3 as S3 / LocalStack
  participant Disk as /results/verdict.json

  Harness->>Runner: Run(tc) with VerifierImage
  Runner->>ComposeRunner: NewComposeRunner(RunConfig{VerifierImage})
  ComposeRunner->>ComposeRunner: writeCompose (renders verifier service)
  Runner->>ComposeRunner: StartService("verifier")
  ComposeRunner->>Verifier: docker compose up --profile verify
  Verifier->>S3: DuckDB httpfs glob count(*) polling (waitStable)
  S3-->>Verifier: row counts during quiet window
  Verifier->>S3: DuckDB aggregate query (queryStats)
  S3-->>Verifier: total/distinct/null row counts
  Verifier->>Disk: write verdict.json (passed/duplicates/errors)
  Verifier-->>ComposeRunner: container exits
  Runner->>ComposeRunner: WaitForVerifierExit(deadline)
  ComposeRunner-->>Runner: exited status
  Runner->>Disk: read verdict.json
  Disk-->>Runner: ReceiverMetrics
  Runner-->>Harness: run result with verdicted metrics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

VirtualMetric/PipeBench#8: The main PR's new verifier feature directly depends on the same AWS/LocalStack cloud configuration and environment-injection plumbing introduced in PR #8, which is foundational for the S3 emulator setup that verifier uses.
VirtualMetric/PipeBench#11: Both PRs extend MinIO/S3 handling in docker-compose rendering—this PR adds a DuckDB verifier service that injects MinIO/AWS env for S3 correctness, while PR #11 modifies the same compose/template plumbing to support MinIO endpoints and credential selection.

Suggested reviewers

namles
yusufozturk

Poem

🐇 A bunny hops through S3 globs at dawn,
DuckDB awakes and queries on and on,
httpfs whispers, "Avro rows are here!"
The verdict lands — passed: true — all clear!
No duplicates to find, no NULLs in sight,
The verifier hops away in pure delight. 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 29.73% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: implementing a verifier runner component to validate received data correctness, which is the central theme across all file modifications.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch DT-798

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Makefile (1)
189-189: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include verifier image in clean to keep local state consistent.

Line 189 removes all bench images except $(VERIFIER_IMAGE). After this PR, make clean leaves stale verifier images behind, which can mask Dockerfile/tag changes in local runs.
Suggested patch
-	docker rmi -f $(GENERATOR_IMAGE) $(RECEIVER_IMAGE) $(COLLECTOR_IMAGE) $(KDC_IMAGE) 2>/dev/null || true
+	docker rmi -f $(GENERATOR_IMAGE) $(RECEIVER_IMAGE) $(COLLECTOR_IMAGE) $(KDC_IMAGE) $(VERIFIER_IMAGE) 2>/dev/null || true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` at line 189, The docker rmi command in the clean target is missing
the VERIFIER_IMAGE variable from the list of images to remove. Add the
VERIFIER_IMAGE variable to the existing docker rmi command on line 189 alongside
GENERATOR_IMAGE, RECEIVER_IMAGE, COLLECTOR_IMAGE, and KDC_IMAGE to ensure all
benchmark images are cleaned up and no stale verifier images are left behind.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 78-80: The CI workflow step "Build verifier image" only validates
that the verifier container builds, but does not run unit tests for the verifier
module like it does for the receiver module. Add a new workflow step after the
verifier image build that executes the unit tests for the containers/verifier
module to ensure verifier correctness is validated in PR CI, similar to how
containers/receiver tests are run.

In `@containers/verifier/main.go`:
- Around line 143-153: The row count mismatch check on Line 143 enforces strict
equality (st.total != cfg.Expected) regardless of whether AllowOverDel is
enabled, which causes valid duplicate delivery scenarios to fail. Modify the
condition to allow higher total counts when AllowOverDel is true by checking
against unique count or acceptable loss instead of strict total equality, while
preserving the strict equality check when AllowOverDel is false or not set.
Additionally, update the test case in containers/verifier/main_test.go Lines
36-41 to ensure the Expected field is set to the source-produced count rather
than a higher value that would incorrectly trigger the mismatch check.

In `@internal/config/case.go`:
- Around line 612-631: The validateVerifier function needs to enforce that when
a verifier is configured, there is a deterministic expected count source
available. Add validation to check that tc.Generator.TotalLines is set and
non-zero when tc.Verifier is not nil, since the orchestrator uses
tc.Generator.TotalLines to populate VERIFIER_EXPECTED_LINES. Return an error
with a clear message if the expected lines value is not deterministically set,
to prevent the verifier's exact-count verification from being silently disabled
due to Expected being 0.

In `@internal/orchestrator/docker.go`:
- Around line 367-368: The template correctly omits the singular receiver
service when verifier is enabled (at the {{- else if not .VerifierEnabled }}
condition), but ComposeRunner.populateServiceNames() still pre-populates
receiver metadata unconditionally, causing ReceiverMetricsPort(s) and related
receiver metadata to advertise services that do not exist in verifier mode. Fix
this metadata contract drift by ensuring ComposeRunner.populateServiceNames()
checks the VerifierEnabled flag and skips pre-populating receiver service,
container, and port metadata when the receiver is disabled. Apply the same
conditional check at all sites where receiver metadata is pre-populated
(including the areas around lines 1827-1844 referenced in the comment) to ensure
consistency across the codebase.

In `@internal/runner/runner.go`:
- Around line 2381-2384: The wait duration calculation in runner.go (around
TimeoutDuration and the 30-second buffer) has a logic gap: when
time.Until(runDeadline) returns a non-positive value (deadline has passed or is
now), the wait variable is not updated and retains the full verifier timeout
plus 30 seconds, allowing the run to exceed the hard deadline. Fix this by
ensuring that when remaining time (rem) is zero or negative, wait is explicitly
set to 0 or another appropriate minimal value to respect the hard deadline,
rather than leaving it at the default verifier timeout duration.
- Around line 394-399: The verifier allows overdelivery when
verifier.allow_overdelivery is true, but the later correctness logic still
enforces recvMetrics.LinesReceived <= expectedOut for non-kafka cases,
overriding the verifier's verdict. After the runVerifier call in the
tc.UsesVerifier() block succeeds, you need to track whether overdelivery was
allowed by the verifier, and then modify the subsequent correctness check that
enforces the LinesReceived constraint for non-kafka cases to skip this
validation when the verifier has already allowed overdelivery. Ensure that once
a verifier pass allows overdelivery, the downstream generic non-kafka
over-delivery gate respects that decision and does not flip the result to
failed.

---

Outside diff comments:
In `@Makefile`:
- Line 189: The docker rmi command in the clean target is missing the
VERIFIER_IMAGE variable from the list of images to remove. Add the
VERIFIER_IMAGE variable to the existing docker rmi command on line 189 alongside
GENERATOR_IMAGE, RECEIVER_IMAGE, COLLECTOR_IMAGE, and KDC_IMAGE to ensure all
benchmark images are cleaned up and no stale verifier images are left behind.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0382b102-ebfe-4cac-b7f8-614d096ec18c

📥 Commits

Reviewing files that changed from the base of the PR and between 188636b and 7982834.

📒 Files selected for processing (14)

.github/workflows/ci.yml
.goreleaser.yml
Makefile
cmd/harness/main.go
containers/verifier/Dockerfile
containers/verifier/go.mod
containers/verifier/main.go
containers/verifier/main_test.go
internal/config/case.go
internal/config/subject.go
internal/orchestrator/docker.go
internal/orchestrator/orchestrator.go
internal/orchestrator/verifier_render_test.go
internal/runner/runner.go

runner correctness gate

cloudflare-workers-and-pages · 2026-06-16T12:34:20Z

Deploying pipebench with Cloudflare Pages

Latest commit:	`9d4f214`
Status:	✅ Deploy successful!
Preview URL:	https://0c9c8baa.pipebench.pages.dev
Branch Preview URL:	https://dt-798.pipebench.pages.dev

View logs

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/orchestrator/docker.go`:
- Around line 924-929: The validateVerifier() function currently allows test
cases to configure both verifier and receivers, but populateServiceNames()
intentionally clears receiver metadata for verifier cases while vars.Receivers
still gets populated from the config, causing the template to render
non-existent receiver services. Fix this by adding validation in
validateVerifier() to reject configurations that set both verifier mode and
receivers (check tc.MultiReceiver() for multiple receivers and check for
non-empty singular receiver blocks), or alternatively guard the receiver
template population logic (around lines 1824-1843) to skip it when
tc.UsesVerifier() is true to prevent vars.Receivers from being populated in
verifier mode.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 57a61a7e-d214-4e1d-9408-161c344fbf3e

📥 Commits

Reviewing files that changed from the base of the PR and between 7982834 and b7b881c.

📒 Files selected for processing (7)

.github/workflows/ci.yml
containers/verifier/main.go
containers/verifier/main_test.go
internal/config/case.go
internal/config/verifier_test.go
internal/orchestrator/docker.go
internal/runner/runner.go

🚧 Files skipped from review as they are similar to previous changes (4)

containers/verifier/main_test.go
internal/config/case.go
internal/runner/runner.go
containers/verifier/main.go

feat: implement verifier runner to check received data

7982834

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml

Comment thread containers/verifier/main.go Outdated

Comment thread internal/config/case.go

Comment thread internal/orchestrator/docker.go

Comment thread internal/runner/runner.go

Comment thread internal/runner/runner.go

fix(verifier): assert distinct rows under allow_overdelivery and harden

b7b881c

runner correctness gate

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread internal/orchestrator/docker.go

fix(config): reject verifier cases that also configure a receiver block

9d4f214

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement verifier runner to check received data#13

feat: implement verifier runner to check received data#13
erenaslandev wants to merge 3 commits into
mainfrom
DT-798

erenaslandev commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erenaslandev commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying pipebench with Cloudflare Pages

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

erenaslandev commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 16, 2026 •

edited

Loading