feat(knowledge): Tavily connector + web-research wrapper with caching#1234
feat(knowledge): Tavily connector + web-research wrapper with caching#1234TravisHaa wants to merge 4 commits into
Conversation
…udgeting, and keyless fallback DuckDuckGo
|
@claude review this PR |
|
The foundation is solid: the layered design ( Issues🟡 Important —
|
|
Nice foundation — the 🟡
|
|
Solid foundation for the Knowledge Agent phase — the four-layer design (cache → budget gate → Tavily SDK → DDG fallback) is clean and the test suite is thorough. One blocking usability bug in the async client that needs addressing before merge; everything else is minor polish. Issues🟡 Important —
|
|
@itomek-amd, please review this PR when you get a chance. |
itomek
left a comment
There was a problem hiding this comment.
Thanks @TravisHaa — this is a genuinely strong first contribution. The four-layer design (cache -> budget gate -> Tavily SDK -> DuckDuckGo fallback) is clean, the SHA-256 cache key over normalized-query + sorted-params is correct, the budget gate is fail-loud by default (matching the no-silent-fallbacks rule), and injecting sdk_client=/web_client= in the tests instead of monkeypatching the module graph is exactly the pattern we like. Docs and the dep declaration are in place, and the description follows the house style well (why-first + test plan + "Phase 1 of #1141" + explicit open questions).
One real bug to land or document, noted inline: AsyncTavilyClient() raises RuntimeError inside a running event loop when the connector is configured, because key resolution happens synchronously in init via get_credential_sync(). The tests don't catch it because they all inject a client or hit the unconfigured early-return. Fix is to defer key resolution into aenter (or document that async callers pass api_key=), plus a regression test. There's also a low-urgency unbounded-stale-cache note inline.
On your catalog-scope open question: a foundation connector entry with no agent consumer yet is fine for a Phase 1 PR — just keep it called out as you did. Approving; happy to re-review once the async-init path is sorted.
Generated by Claude Code
|
@TravisHaa, thanks for the contribution. Approved on condition the following failing test is fixed: |
…KEPT_IDS in MCP catalog
|
@itomek can I get a approval for this so we can run CI/CD again? Ready for another pass when you have a moment :) |
Why this matters
GAIA agents could only reach the web through keyless DuckDuckGo HTML scraping — no
Tavily, no result caching, and no way to track or cap API spend. This PR is the Phase 1
foundation for the Knowledge Agent: a
_TAVILYconnector that exposes Tavilysearch/extractas MCP tools to all agents from one keyring-stored API key, plus agaia.web.tavilywrapper giving cached, credit-budgeted search/extract/crawl with anautomatic DuckDuckGo fallback when Tavily isn't configured. Agents get higher-quality web
research without re-paying for repeat queries or silently blowing past a credit budget.
Test plan
python -m pytest tests/unit/test_tavily_wrapper.py— 16 tests (mocked SDK): cache hit/TTL, credit ledger, budget warn/block, DuckDuckGo fallbackpython -m pytest tests/unit/connectors/test_catalog_docs_url.py— the new connector'sdocs_urlresolvespython util/lint.py --black --isort— cleangaia connectors configure mcp-tavily --set TAVILY_API_KEY=tvly-…thengaia knowledge search "…"returns Tavily results; without it, the same command falls back to DuckDuckGogaia knowledge usageprints the credit ledgerOpen questions for reviewers
_TAVILYhas no built-in agent consumer yet (the Knowledge Agent lands in a later phase), so there's noREQUIRED_CONNECTORSwiring. Acceptable for a foundation PR, or should the entry wait? (@kovtcharov-amd)crawl: syncTavilyClienthascrawl;AsyncTavilyClientdoesn't yet — deferred unless you want parity now.Phase 1 of #1141.