test(agents,llm): cover RoutingAgent, code validators, OpenAI provider by kovtcharov-amd · Pull Request #1244 · amd/gaia

kovtcharov-amd · 2026-05-29T03:13:54Z

Three production modules had zero dedicated unit tests: RoutingAgent (agent selection logic), code validators (syntax/antipattern/AST/requirements checking), and the OpenAI provider (chat completion, streaming, error handling). Now they have 135 tests across 3 new files, all with mocked LLM and HTTP dependencies.

Review also surfaced source-level gaps worth tracking separately: AntipatternChecker ignores AsyncFunctionDef, validate_imports misses from X import Y duplicates, and _fallback_keyword_detection appears to be dead code in RoutingAgent.

Test plan

python -m pytest tests/unit/agents/test_routing_agent.py tests/unit/agents/test_code_validators.py tests/unit/test_openai_provider.py -xvs — all 135 tests pass
No new dependencies required (uses existing pytest-mock)

Closes #880

github-actions · 2026-05-29T03:19:09Z

Three production modules — RoutingAgent, code validators, and OpenAIProvider — had zero dedicated unit tests; this PR lands 135 tests across three new files with clean mock hygiene and solid structural coverage. One test in the routing suite silently passes via the wrong code path, and two minor style issues are worth cleaning up before merge.

Issues Found

🟡 `test_cli_mode_asks_clarification_then_resolves` doesn't exercise the clarification loop (`tests/unit/agents/test_routing_agent.py:930`)

The first mocked LLM response has "language": "unknown". In process_query, _default_unknown_language_to_typescript runs before _has_unknowns and converts the analysis to {"language": "typescript", "project_type": "fullstack", "confidence": 1.0}. _has_unknowns then returns False, the clarification branch is never entered, input() is never called, and the second side_effect response is never consumed. The test passes only because its single assertion (agent is _patch_code_agent.return_value) is satisfied via _create_agent, not the clarification loop.

The fix is to use a first response that survives _default_unknown_language_to_typescript but still triggers the unknown-parameter branch — e.g., known language with unknown project type and low confidence:

        mock_llm_client.generate.side_effect = [
            json.dumps(
                {
                    "agent": "code",
                    "parameters": {
                        "language": "typescript",
                        "project_type": "unknown",
                    },
                    "confidence": 0.4,
                    "reasoning": "ambiguous",
                }
            ),
            json.dumps(
                {
                    "agent": "code",
                    "parameters": {
                        "language": "typescript",
                        "project_type": "fullstack",
                    },
                    "confidence": 0.95,
                    "reasoning": "user clarified",
                }
            ),
        ]

Also add an assertion that input() was actually called:

with patch("builtins.input", return_value="Next.js blog") as mock_input:
    agent = router.process_query("Build something", execute=False)

mock_input.assert_called_once()
assert agent is _patch_code_agent.return_value

🟢 Redundant legacy `call_args` access pattern (`tests/unit/agents/test_routing_agent.py:855`, `879`)

call_kwargs[1] is the pre-3.8 way to access kwargs; call_kwargs.kwargs is the modern, readable form. The or guards are defensive noise — pytest-mock in this project is new enough to support .kwargs.

        assert call_kwargs.kwargs.get("language") == "typescript"

Same at line 879:

        assert call_kwargs.kwargs.get("project_type") == "fullstack"

Strengths

sys.modules patching for OpenAIProvider — using patch.dict("sys.modules", {"openai": mock_mod}) is exactly right here since import openai is deferred to __init__. This avoids requiring the real openai package in the test environment while still exercising the full class construction.
parametrize for keyword-detection matrix — test_typescript_keywords and test_python_keywords give clear, data-driven coverage of the keyword table without copy-paste per-framework tests.
Honest PR description — surfacing the AsyncFunctionDef gap, the validate_imports miss on from X import Y duplicates, and the dead _fallback_keyword_detection code in the PR body is the right call; tracking those separately keeps this diff focused.

Verdict

Request changes — the 🟡 issue is the one that needs fixing. The test claims to cover the clarification loop (the only interactive path in the routing flow), but passes through a different code path entirely, leaving that branch untested. The 🟢 items can be applied as one-click suggestions.

The CLI-mode clarification test was passing via `_create_agent` (the default-to-typescript shortcut), never entering the `input()` branch. First LLM response now uses known language + unknown project_type + low confidence so `_default_unknown_language_to_typescript` leaves it alone and `_has_unknowns` fires. Asserts `mock_input.assert_called_once()`. Also drops redundant `call_kwargs[1].get()` fallbacks — `.kwargs` is the modern accessor and sufficient here.

github-actions · 2026-05-29T06:49:54Z

135 tests land with zero production changes — a safe, focused PR. The test design is solid; three issues are worth addressing before the gaps get wider.

Issues

🟡 Important — tests cover `_fallback_keyword_detection`, which is confirmed dead code (`tests/unit/agents/test_routing_agent.py:733–774`)

_fallback_keyword_detection is defined at src/gaia/agents/routing/agent.py:245 but is never invoked from any production path — grep -rn "_fallback_keyword_detection" src/ returns only the definition. The PR description flags it as "appears to be dead code" but then proceeds to add 10 tests for it. Tests that exercise code the production flow never reaches give false confidence: the suite passes even if the method is completely wrong relative to the actual routing path (_analyze_with_llm). These tests should either be clearly @pytest.mark.skip(reason="dead code — tracked in #NNN") until the source is wired up, or the underlying source bug should be fixed first.

@pytest.mark.skip(reason="_fallback_keyword_detection is unreachable dead code — fix tracked in #NNN")
class TestFallbackKeywordDetection:

🟢 Minor — `test_extra_kwargs_not_passed_to_openai_client` validates a silent-drop that violates CLAUDE.md (`tests/unit/test_openai_provider.py:66–71`)

OpenAIProvider.__init__ uses **_kwargs and silently discards everything including base_url, which is a valid openai.OpenAI() constructor parameter for custom API endpoints. A caller who passes base_url="http://my-proxy" gets a silently misconfigured client — exactly the "quiet wrong answer" CLAUDE.md's "No Silent Fallbacks" rule prohibits. The test correctly documents the current behavior, but should call out that this is a known gap, not an endorsed design:

    def test_extra_kwargs_not_passed_to_openai_client(self, mock_openai_module):
        from gaia.llm.providers.openai_provider import OpenAIProvider

        mock_mod, _ = mock_openai_module
        # base_url is silently dropped (known gap — tracked in #NNN); if you need
        # a custom endpoint, pass it directly when constructing openai.OpenAI().
        OpenAIProvider(api_key="sk-test", base_url="http://x", unknown_arg=True)
        mock_mod.OpenAI.assert_called_once_with(api_key="sk-test")

🟢 Minor — source-level gaps named in the PR description have no test coverage (`tests/unit/agents/test_code_validators.py`)

The PR description calls out two bugs in production code:

AntipatternChecker ignores ast.AsyncFunctionDef nodes
validate_imports misses from X import Y duplicate style (e.g., from os import path\nfrom os import path\n)

Neither gap has a corresponding test. Adding @pytest.mark.xfail cases would lock in the regression surface so a future fix doesn't regress silently:

# in TestAntipatternChecker
@pytest.mark.xfail(reason="AsyncFunctionDef not handled — source gap")
def test_async_function_name_checked(self, checker):
    code = "async def get_and_process_and_transform():\n    pass\n"
    result = checker.check(Path("a.py"), code)
    assert any("Combinatorial" in e for e in result["errors"])

# in TestSyntaxValidator
@pytest.mark.xfail(reason="from-import duplicates not detected — source gap")
def test_validate_imports_from_import_duplicate(self, validator):
    code = "from os import path\nfrom os import path\n"
    warnings = validator.validate_imports(code)
    assert any("Duplicate" in w for w in warnings)

Strengths

Fixture layering (mock_llm_client → _patch_create_client → router) makes dependencies explicit and avoids state leakage between classes; the pattern will port cleanly to future routing tests.
patch.dict("sys.modules", {"openai": mock_mod}) prevents any real network calls regardless of the test environment — consistent with how Lemonade tests are isolated in this codebase.
Boundary tests for _has_unknowns (confidence=0.9 passes, 0.89 fails) nail the threshold exactly and would catch a > / >= flip in the source.

Verdict

Approve with suggestions. The dead-code issue is the most important — tests for unreachable paths should be clearly marked or gated behind the source fix. The other two are documentation-level nits. No blocking correctness bugs found.

itomek

The clarification-loop test from the earlier review is now correct — it uses a known-language/unknown-project-type/confidence=0.4 first response so _has_unknowns actually fires, and asserts input() was called. Fixture layering and the patch.dict("sys.modules", {"openai": ...}) isolation are clean. Two items worth tracking, noted inline: the TestFallbackKeywordDetection class tests genuinely unreachable code (confirmed by reading routing/agent.py — process_query never calls _fallback_keyword_detection), and test_extra_kwargs_not_passed_to_openai_client documents a real base_url silent-drop in OpenAIProvider.init. Neither blocks this test PR; the dead-code class is the one worth filing an issue for. Approving.

Generated by Claude Code

itomek · 2026-05-29T13:32:37Z

TestFallbackKeywordDetection (10 tests) exercises _fallback_keyword_detection, which is unreachable: process_query routes through _analyze_with_llm, whose JSON-decode failure returns a hardcoded default dict and whose generic except re-raises — the keyword fallback is never called. Testing it gives green CI for a method that could be arbitrarily wrong relative to the live routing path. Either gate the class behind @pytest.mark.skip(reason="dead code — tracked in #NNN") or wire the source up first. Worth filing an issue to track.

Generated by Claude Code

itomek · 2026-05-29T13:32:40Z

test_extra_kwargs_not_passed_to_openai_client pins a real gap: OpenAIProvider.init swallows **_kwargs, so base_url=... is silently dropped — a caller pointing at a custom endpoint gets a misconfigured client with no error (the quiet-wrong-answer pattern the no-silent-fallbacks rule forbids). The test documents it; add a comment marking it as a known source bug to track, or better, forward base_url to openai.OpenAI() while you're touching this surface.

Generated by Claude Code

#880) RoutingAgent, code validators (syntax, antipattern, AST, requirements), and OpenAIProvider had zero dedicated unit tests. Adds 129 tests covering init, routing logic, LLM JSON parsing, disambiguation, keyword fallback, all four validator classes, and the full OpenAI provider surface (chat, stream, generate, embed, unsupported methods).

Adds boundary test at 0.89 confidence, CLI-mode interactive path coverage (mock input()), console.print_info verification, generate(stream=True), SDK exception propagation, and tighter embed/init assertions.

The CLI-mode clarification test was passing via `_create_agent` (the default-to-typescript shortcut), never entering the `input()` branch. First LLM response now uses known language + unknown project_type + low confidence so `_default_unknown_language_to_typescript` leaves it alone and `_has_unknowns` fires. Asserts `mock_input.assert_called_once()`. Also drops redundant `call_kwargs[1].get()` fallbacks — `.kwargs` is the modern accessor and sufficient here.

github-actions Bot added the tests Test changes label May 29, 2026

kovtcharov-amd added devops DevOps/infrastructure changes p1 medium priority labels May 29, 2026

kovtcharov-amd self-assigned this May 29, 2026

itomek approved these changes May 29, 2026

View reviewed changes

Ovtcharov added 3 commits May 29, 2026 09:32

test: strengthen assertions after review

dca78fa

Adds boundary test at 0.89 confidence, CLI-mode interactive path coverage (mock input()), console.print_info verification, generate(stream=True), SDK exception propagation, and tighter embed/init assertions.

kovtcharov-amd force-pushed the test/unreferenced-modules-880 branch from efadc81 to 06250c8 Compare May 29, 2026 16:33

kovtcharov-amd mentioned this pull request May 29, 2026

fix(types): resolve remaining mypy errors missed by #1253 #1296

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(agents,llm): cover RoutingAgent, code validators, OpenAI provider#1244

test(agents,llm): cover RoutingAgent, code validators, OpenAI provider#1244
kovtcharov-amd wants to merge 3 commits into
mainfrom
test/unreferenced-modules-880

kovtcharov-amd commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

itomek left a comment

Uh oh!

itomek May 29, 2026

Uh oh!

itomek May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kovtcharov-amd commented May 29, 2026

Test plan

Uh oh!

github-actions Bot commented May 29, 2026

Issues Found

🟡 test_cli_mode_asks_clarification_then_resolves doesn't exercise the clarification loop (tests/unit/agents/test_routing_agent.py:930)

🟢 Redundant legacy call_args access pattern (tests/unit/agents/test_routing_agent.py:855, 879)

Strengths

Verdict

Uh oh!

github-actions Bot commented May 29, 2026

Issues

🟡 Important — tests cover _fallback_keyword_detection, which is confirmed dead code (tests/unit/agents/test_routing_agent.py:733–774)

🟢 Minor — test_extra_kwargs_not_passed_to_openai_client validates a silent-drop that violates CLAUDE.md (tests/unit/test_openai_provider.py:66–71)

🟢 Minor — source-level gaps named in the PR description have no test coverage (tests/unit/agents/test_code_validators.py)

Strengths

Verdict

Uh oh!

itomek left a comment

Choose a reason for hiding this comment

Uh oh!

itomek May 29, 2026

Choose a reason for hiding this comment

Uh oh!

itomek May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🟡 `test_cli_mode_asks_clarification_then_resolves` doesn't exercise the clarification loop (`tests/unit/agents/test_routing_agent.py:930`)

🟢 Redundant legacy `call_args` access pattern (`tests/unit/agents/test_routing_agent.py:855`, `879`)

🟡 Important — tests cover `_fallback_keyword_detection`, which is confirmed dead code (`tests/unit/agents/test_routing_agent.py:733–774`)

🟢 Minor — `test_extra_kwargs_not_passed_to_openai_client` validates a silent-drop that violates CLAUDE.md (`tests/unit/test_openai_provider.py:66–71`)

🟢 Minor — source-level gaps named in the PR description have no test coverage (`tests/unit/agents/test_code_validators.py`)