Skip to content

fix(security,agents): harden SSRF DNS-rebind, fail loudly, cover untested modules#1299

Open
kovtcharov wants to merge 1 commit into
mainfrom
kalin/code-hygiene-and-tests
Open

fix(security,agents): harden SSRF DNS-rebind, fail loudly, cover untested modules#1299
kovtcharov wants to merge 1 commit into
mainfrom
kalin/code-hygiene-and-tests

Conversation

@kovtcharov
Copy link
Copy Markdown
Collaborator

A DNS-rebind attacker could slip past the web client's SSRF guard: the pre-flight IP check and the actual TCP connect used separate DNS lookups, so a host could answer the check with a public IP and the connect with a private/internal one. This PR pins the resolved IP and validates the exact address it dials through a single authority, closing the window for both http and https (HTTPS cert-name verification still binds to the real hostname; the SNI-vhosting trade-off is documented, not silently accepted). It also removes three silent-fallback violations that hid real failures, and adds tests to three high-risk modules that had none.

Reviewer-relevant threads:

  • SSRF hardening (web/client.py) — the security fix above; worth a close read of PinnedIPAdapter.
  • Fail-loudly fixes — corrupt memory-settings now logs instead of reverting to defaults silently; Telegram background startup re-raises on PID-write failure (a supervisor can no longer be fooled into thinking a dead process started); a raising system-prompt fragment now logs instead of vanishing from the prompt.
  • New coverage — DockerAgent (subprocess/path-allowlist), the home-dir discovery classifiers, and Jira JQL templating; the API non-streaming completion happy-path is now tested with a mocked backend instead of @pytest.mark.skip.

Test plan

  • pytest tests/unit/agents/test_discovery.py tests/unit/agents/test_docker_agent.py tests/unit/agents/test_jql_templates.py tests/unit/test_web_client_ip_pinning.py -q — 100 pass
  • pytest tests/unit/test_web_client_edge_cases.py tests/test_rag.py -q — no regression (96 pass)
  • pytest tests/test_api.py -q — completion happy-path now runs (no longer skipped)
  • python util/lint.py --black --isort --flake8 — clean on changed files
  • Agent eval (running separately) confirms no regression from the prompt-fragment logging change — the change is logging-only on the exception branch, so the composed prompt is byte-identical on the happy path

…allbacks, cover untested modules

A DNS-rebind attacker could pass the web client's pre-flight IP check with a
public address and then have the actual TCP connect resolve to a private/internal
one, because validation and connection used separate DNS lookups. Several modules
also swallowed errors silently — a corrupt memory-settings file reverted to
defaults with no signal, a failed Telegram background PID-file write let startup
"succeed" so a supervisor could never stop the process, and a raising system-prompt
fragment was dropped from the composed prompt without a trace. And three high-risk
modules (DockerAgent's subprocess calls, the home-dir discovery scanner, Jira JQL
templating) plus the API completion happy-path had no test coverage.

- SSRF: a PinnedIPAdapter resolves once, validates the exact IP it will dial
  through a single authority, and pins it — closing the rebind window for http
  and https. HTTPS cert-name verification still binds to the real hostname; the
  SNI-vhosting limitation is documented, not silently accepted.
- Fail loudly: memory-settings load narrows to the real parse errors and logs;
  Telegram background startup re-raises on PID-write failure; the prompt-fragment
  drop and rag unload-on-reload now log instead of `except: pass`.
- Tests: new suites for DockerAgent (subprocess/path-allowlist), the discovery
  classifiers, and JQL templating; the API non-streaming completion happy-path is
  now covered with a mocked backend instead of being skipped.
@github-actions github-actions Bot added mcp MCP integration changes rag RAG system changes tests Test changes performance Performance-critical changes agents labels May 29, 2026
@kovtcharov
Copy link
Copy Markdown
Collaborator Author

@claude perform a deep code review of this PR.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Solid, well-scoped PR — the SSRF fix is correct, the fail-loudly changes are right, and the new tests genuinely exercise the security boundaries rather than rubber-stamping them. One functional trade-off in the HTTPS path is worth an explicit decision before merge. Details below.

🟢 SSRF hardening — correct

_assert_ip_allowed as the single authority routed through both validate_url (pre-flight) and PinnedIPAdapter._resolve_first_ip (connect-time) is the right design. The key property holds: the adapter validates the exact IP it's about to pin and caches only after it passes, so a rebind that hands validate_url a public IP and the adapter a private one is blocked at connect time. I traced the redirect path too — each hop re-enters send()_resolve_first_ip, so redirects are validated at the adapter layer, not just the initial URL. Good.

I spot-checked _is_blocked_ip coverage on Python 3.12: loopback, private, link-local (incl. the 169.254.169.254 cloud-metadata endpoint), IPv4-mapped IPv6 (::ffff:10.0.0.1), 0.0.0.0, and ::1 are all blocked. Comprehensive.

🟡 HTTPS SNI trade-off — confirm this is acceptable for gaia browse

The docstring is admirably honest that SNI is sent as the pinned IP. The practical impact deserves a conscious call: most of the modern web sits behind CDNs (Cloudflare, Fastly, etc.) that require a correct SNI to route the handshake. With SNI = IP, those servers return a default/wrong cert, assert_hostname then rejects it, and the fetch fails as a TLS error. Since WebClient is the backend for BrowserToolsMixin / gaia browse, this can break a large fraction of real HTTPS fetches — not a security weakness (the SSRF block still fires first), but a meaningful functionality regression for the agent's primary use case.

The PR documents the clean fix (a custom urllib3 PoolManager/HTTPSConnection that decouples server_hostname from the connect address) as out of scope. That's a reasonable boundary — but I'd suggest filing a follow-up issue to track it rather than leaving it only in a docstring, so the SNI-vhosting breakage doesn't get rediscovered as a bug report. Worth @kovtcharov weighing whether the narrower-but-correct guarantee is the right default for browse today.

🟢 Fail-loudly fixes — all correct

  • telegram.py: the PID write uses the process's own os.getpid() and runs before the application is built, so re-raising aborts cleanly with no orphaned process. Right call.
  • memory.py / mcp_bridge.py: both narrow the bare except to specific types and log with context; logger is defined in both modules (verified).
  • agent.py prompt-fragment: logging-only on the exception branch, happy path byte-identical — consistent with the eval requirement in CLAUDE.md. Confirm the separately-running eval comes back clean before merge, as the description states.

🟢 Tests — real coverage, not filler

The Docker tests assert the actual security invariants (list argv, shell=True never set, allowlist denies /etc without invoking subprocess) — verified the agent uses module-level subprocess.run with list argv and a docker --version probe, so the patch target and argv assertions are accurate. The JQL tests prove injection containment via restrictive capture classes. Replacing the @pytest.mark.skip API test with a mocked-backend happy path is a real upgrade.

Minor (non-blocking)

  • _is_blocked_ip allows 100.64.0.0/10 (RFC 6598 CGNAT / shared address space), which routes internally in some cloud environments. If you want belt-and-suspenders, not ip.is_global is a stricter catch-all — but it may over-block, so it's a judgement call, fine to leave.
  • PinnedIPAdapter._pinned_cache is unbounded and never expires per WebClient instance. Harmless for short-lived clients; if any caller keeps a session long-lived, a legitimately-rotated DNS record won't be re-resolved. Worth a bounded/TTL cache only if that pattern exists.

Nice work — the security core is sound and the only real open question is the HTTPS/SNI functionality trade-off above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents mcp MCP integration changes performance Performance-critical changes rag RAG system changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant