Skip to content

test(agents,llm): cover RoutingAgent, code validators, OpenAI provider#1244

Open
kovtcharov-amd wants to merge 3 commits into
mainfrom
test/unreferenced-modules-880
Open

test(agents,llm): cover RoutingAgent, code validators, OpenAI provider#1244
kovtcharov-amd wants to merge 3 commits into
mainfrom
test/unreferenced-modules-880

Conversation

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

Three production modules had zero dedicated unit tests: RoutingAgent (agent selection logic), code validators (syntax/antipattern/AST/requirements checking), and the OpenAI provider (chat completion, streaming, error handling). Now they have 135 tests across 3 new files, all with mocked LLM and HTTP dependencies.

Review also surfaced source-level gaps worth tracking separately: AntipatternChecker ignores AsyncFunctionDef, validate_imports misses from X import Y duplicates, and _fallback_keyword_detection appears to be dead code in RoutingAgent.

Test plan

  • python -m pytest tests/unit/agents/test_routing_agent.py tests/unit/agents/test_code_validators.py tests/unit/test_openai_provider.py -xvs — all 135 tests pass
  • No new dependencies required (uses existing pytest-mock)

Closes #880

@github-actions github-actions Bot added the tests Test changes label May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Three production modules — RoutingAgent, code validators, and OpenAIProvider — had zero dedicated unit tests; this PR lands 135 tests across three new files with clean mock hygiene and solid structural coverage. One test in the routing suite silently passes via the wrong code path, and two minor style issues are worth cleaning up before merge.


Issues Found

🟡 test_cli_mode_asks_clarification_then_resolves doesn't exercise the clarification loop (tests/unit/agents/test_routing_agent.py:930)

The first mocked LLM response has "language": "unknown". In process_query, _default_unknown_language_to_typescript runs before _has_unknowns and converts the analysis to {"language": "typescript", "project_type": "fullstack", "confidence": 1.0}. _has_unknowns then returns False, the clarification branch is never entered, input() is never called, and the second side_effect response is never consumed. The test passes only because its single assertion (agent is _patch_code_agent.return_value) is satisfied via _create_agent, not the clarification loop.

The fix is to use a first response that survives _default_unknown_language_to_typescript but still triggers the unknown-parameter branch — e.g., known language with unknown project type and low confidence:

        mock_llm_client.generate.side_effect = [
            json.dumps(
                {
                    "agent": "code",
                    "parameters": {
                        "language": "typescript",
                        "project_type": "unknown",
                    },
                    "confidence": 0.4,
                    "reasoning": "ambiguous",
                }
            ),
            json.dumps(
                {
                    "agent": "code",
                    "parameters": {
                        "language": "typescript",
                        "project_type": "fullstack",
                    },
                    "confidence": 0.95,
                    "reasoning": "user clarified",
                }
            ),
        ]

Also add an assertion that input() was actually called:

with patch("builtins.input", return_value="Next.js blog") as mock_input:
    agent = router.process_query("Build something", execute=False)

mock_input.assert_called_once()
assert agent is _patch_code_agent.return_value

🟢 Redundant legacy call_args access pattern (tests/unit/agents/test_routing_agent.py:855, 879)

call_kwargs[1] is the pre-3.8 way to access kwargs; call_kwargs.kwargs is the modern, readable form. The or guards are defensive noise — pytest-mock in this project is new enough to support .kwargs.

        assert call_kwargs.kwargs.get("language") == "typescript"

Same at line 879:

        assert call_kwargs.kwargs.get("project_type") == "fullstack"

Strengths

  • sys.modules patching for OpenAIProvider — using patch.dict("sys.modules", {"openai": mock_mod}) is exactly right here since import openai is deferred to __init__. This avoids requiring the real openai package in the test environment while still exercising the full class construction.
  • parametrize for keyword-detection matrixtest_typescript_keywords and test_python_keywords give clear, data-driven coverage of the keyword table without copy-paste per-framework tests.
  • Honest PR description — surfacing the AsyncFunctionDef gap, the validate_imports miss on from X import Y duplicates, and the dead _fallback_keyword_detection code in the PR body is the right call; tracking those separately keeps this diff focused.

Verdict

Request changes — the 🟡 issue is the one that needs fixing. The test claims to cover the clarification loop (the only interactive path in the routing flow), but passes through a different code path entirely, leaving that branch untested. The 🟢 items can be applied as one-click suggestions.

kovtcharov-amd pushed a commit that referenced this pull request May 29, 2026
The CLI-mode clarification test was passing via `_create_agent` (the
default-to-typescript shortcut), never entering the `input()` branch.
First LLM response now uses known language + unknown project_type + low
confidence so `_default_unknown_language_to_typescript` leaves it alone
and `_has_unknowns` fires. Asserts `mock_input.assert_called_once()`.

Also drops redundant `call_kwargs[1].get()` fallbacks — `.kwargs` is
the modern accessor and sufficient here.
@github-actions
Copy link
Copy Markdown
Contributor

135 tests land with zero production changes — a safe, focused PR. The test design is solid; three issues are worth addressing before the gaps get wider.


Issues

🟡 Important — tests cover _fallback_keyword_detection, which is confirmed dead code (tests/unit/agents/test_routing_agent.py:733–774)

_fallback_keyword_detection is defined at src/gaia/agents/routing/agent.py:245 but is never invoked from any production path — grep -rn "_fallback_keyword_detection" src/ returns only the definition. The PR description flags it as "appears to be dead code" but then proceeds to add 10 tests for it. Tests that exercise code the production flow never reaches give false confidence: the suite passes even if the method is completely wrong relative to the actual routing path (_analyze_with_llm). These tests should either be clearly @pytest.mark.skip(reason="dead code — tracked in #NNN") until the source is wired up, or the underlying source bug should be fixed first.

@pytest.mark.skip(reason="_fallback_keyword_detection is unreachable dead code — fix tracked in #NNN")
class TestFallbackKeywordDetection:

🟢 Minor — test_extra_kwargs_not_passed_to_openai_client validates a silent-drop that violates CLAUDE.md (tests/unit/test_openai_provider.py:66–71)

OpenAIProvider.__init__ uses **_kwargs and silently discards everything including base_url, which is a valid openai.OpenAI() constructor parameter for custom API endpoints. A caller who passes base_url="http://my-proxy" gets a silently misconfigured client — exactly the "quiet wrong answer" CLAUDE.md's "No Silent Fallbacks" rule prohibits. The test correctly documents the current behavior, but should call out that this is a known gap, not an endorsed design:

    def test_extra_kwargs_not_passed_to_openai_client(self, mock_openai_module):
        from gaia.llm.providers.openai_provider import OpenAIProvider

        mock_mod, _ = mock_openai_module
        # base_url is silently dropped (known gap — tracked in #NNN); if you need
        # a custom endpoint, pass it directly when constructing openai.OpenAI().
        OpenAIProvider(api_key="sk-test", base_url="http://x", unknown_arg=True)
        mock_mod.OpenAI.assert_called_once_with(api_key="sk-test")

🟢 Minor — source-level gaps named in the PR description have no test coverage (tests/unit/agents/test_code_validators.py)

The PR description calls out two bugs in production code:

  • AntipatternChecker ignores ast.AsyncFunctionDef nodes
  • validate_imports misses from X import Y duplicate style (e.g., from os import path\nfrom os import path\n)

Neither gap has a corresponding test. Adding @pytest.mark.xfail cases would lock in the regression surface so a future fix doesn't regress silently:

# in TestAntipatternChecker
@pytest.mark.xfail(reason="AsyncFunctionDef not handled — source gap")
def test_async_function_name_checked(self, checker):
    code = "async def get_and_process_and_transform():\n    pass\n"
    result = checker.check(Path("a.py"), code)
    assert any("Combinatorial" in e for e in result["errors"])

# in TestSyntaxValidator
@pytest.mark.xfail(reason="from-import duplicates not detected — source gap")
def test_validate_imports_from_import_duplicate(self, validator):
    code = "from os import path\nfrom os import path\n"
    warnings = validator.validate_imports(code)
    assert any("Duplicate" in w for w in warnings)

Strengths

  • Fixture layering (mock_llm_client → _patch_create_client → router) makes dependencies explicit and avoids state leakage between classes; the pattern will port cleanly to future routing tests.
  • patch.dict("sys.modules", {"openai": mock_mod}) prevents any real network calls regardless of the test environment — consistent with how Lemonade tests are isolated in this codebase.
  • Boundary tests for _has_unknowns (confidence=0.9 passes, 0.89 fails) nail the threshold exactly and would catch a > / >= flip in the source.

Verdict

Approve with suggestions. The dead-code issue is the most important — tests for unreachable paths should be clearly marked or gated behind the source fix. The other two are documentation-level nits. No blocking correctness bugs found.

@kovtcharov-amd kovtcharov-amd added devops DevOps/infrastructure changes p1 medium priority labels May 29, 2026
@kovtcharov-amd kovtcharov-amd self-assigned this May 29, 2026
Copy link
Copy Markdown
Collaborator

@itomek itomek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clarification-loop test from the earlier review is now correct — it uses a known-language/unknown-project-type/confidence=0.4 first response so _has_unknowns actually fires, and asserts input() was called. Fixture layering and the patch.dict("sys.modules", {"openai": ...}) isolation are clean. Two items worth tracking, noted inline: the TestFallbackKeywordDetection class tests genuinely unreachable code (confirmed by reading routing/agent.py — process_query never calls _fallback_keyword_detection), and test_extra_kwargs_not_passed_to_openai_client documents a real base_url silent-drop in OpenAIProvider.init. Neither blocks this test PR; the dead-code class is the one worth filing an issue for. Approving.


Generated by Claude Code

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestFallbackKeywordDetection (10 tests) exercises _fallback_keyword_detection, which is unreachable: process_query routes through _analyze_with_llm, whose JSON-decode failure returns a hardcoded default dict and whose generic except re-raises — the keyword fallback is never called. Testing it gives green CI for a method that could be arbitrarily wrong relative to the live routing path. Either gate the class behind @pytest.mark.skip(reason="dead code — tracked in #NNN") or wire the source up first. Worth filing an issue to track.


Generated by Claude Code

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_extra_kwargs_not_passed_to_openai_client pins a real gap: OpenAIProvider.init swallows **_kwargs, so base_url=... is silently dropped — a caller pointing at a custom endpoint gets a misconfigured client with no error (the quiet-wrong-answer pattern the no-silent-fallbacks rule forbids). The test documents it; add a comment marking it as a known source bug to track, or better, forward base_url to openai.OpenAI() while you're touching this surface.


Generated by Claude Code

Ovtcharov added 3 commits May 29, 2026 09:32
#880)

RoutingAgent, code validators (syntax, antipattern, AST, requirements),
and OpenAIProvider had zero dedicated unit tests. Adds 129 tests covering
init, routing logic, LLM JSON parsing, disambiguation, keyword fallback,
all four validator classes, and the full OpenAI provider surface (chat,
stream, generate, embed, unsupported methods).
Adds boundary test at 0.89 confidence, CLI-mode interactive path coverage
(mock input()), console.print_info verification, generate(stream=True),
SDK exception propagation, and tighter embed/init assertions.
The CLI-mode clarification test was passing via `_create_agent` (the
default-to-typescript shortcut), never entering the `input()` branch.
First LLM response now uses known language + unknown project_type + low
confidence so `_default_unknown_language_to_typescript` leaves it alone
and `_has_unknowns` fires. Asserts `mock_input.assert_called_once()`.

Also drops redundant `call_kwargs[1].get()` fallbacks — `.kwargs` is
the modern accessor and sufficient here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops DevOps/infrastructure changes p1 medium priority tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tests: cover unreferenced production modules (routing, validators, openai_provider, ui routers, mcp servers, eval)

2 participants