amd · itomek · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
@@ -0,0 +1,66 @@
+---
+type: plan
+source-issue: 1293
+repo: amd/gaia
+title: "Agent UI first-boot fails: interactive Delete/re-download prompt dead-ends non-interactive backend"
+created: 2026-05-29
+status: in-progress
+work_type: code-feature
+complexity: standard
+tdd_required: true
+suggested_team_size: 2
+estimated_files_changed: 2
+test_command: "PYTHONPATH=\"$PWD/src\" /Users/tomasz/src/amd/gaia/.venv/bin/python -m pytest tests/unit/test_lemonade_model_loading.py -xvs"
+build_command: ""
+lint_command: "/Users/tomasz/src/amd/gaia/.venv/bin/python util/lint.py --black --isort"
+branch: tmi/issue-1293-noninteractive-boot
+reflection_iterations: 0
+agents_used: [planning, execution, validation]
+---
+
+# Issue #1293 — Non-interactive boot auto-heal
+
+Stacked on `tmi/issue-1294-corrupt-classification` (parent commit `cccc34ff`). #1294
+already narrowed `_is_corrupt_download_error` (removed the bare "llama-server failed to
+start" signal); this branch does NOT touch that function or its tests.
+
+## Files
+- `src/gaia/llm/lemonade_client.py` — `_prompt_user_for_delete` guard + `load_model`
+  corrupt-download branch honoring `prompt`.
+- `tests/unit/test_lemonade_model_loading.py` — new TDD coverage (NOT the
+  error_classification file).
+
+## Bug
+On a fresh Agent UI install the corrupt-download repair path in `load_model` fires
+interactive `[y/N]` prompts inside the FastAPI lifespan threadpool (no TTY). `input()`
+raises `EOFError` / hangs → boot-init job fails. Two mechanisms:
+1. `_prompt_user_for_delete` lacks the `sys.stdin.isatty()/sys.stdout.isatty()` guard its
+   siblings (`_prompt_user_for_download`, `_prompt_user_for_repair`) both have.
+2. The corrupt-download branch calls `_prompt_user_for_repair` / `_prompt_user_for_delete`
+   unconditionally, ignoring the `prompt` argument that boot callers pass as `prompt=False`
+   (`lemonade_manager._try_preload_with_ctx`, `ui/server.py:_load_model`).
+
+## Recovery-policy design (UX-first: silent success, loud only when unrecoverable)
+- When `prompt=False` OR stdin/stdout not a TTY: NEVER call `input()` in any branch.
+- `_prompt_user_for_delete` gets the same non-interactive guard → returns the proceed
+  default under non-TTY so auto-heal can continue.
+- Corrupt-download branch HONORS `prompt`: with `prompt=False`, skip the prompts and
+  auto-proceed (resume → if that fails, ONE delete+redownload). Bounded to a single
+  delete+redownload; no loops.
+- Surface recovery PROGRESS at INFO (percent from `pull_model_stream` events) so the
+  backend log (tailed by the UI) shows movement and boot doesn't look frozen. The
+  corrupt-detected / repairing "why" detail logs at DEBUG.
+- Unrecoverable after the single delete+redownload → one loud actionable
+  `LemonadeClientError` (what failed / what to do — UI Force-redownload or manual recovery /
+  where to look — Lemonade server.log). No EOFError, no hang, no silent swallow.
+- Interactive TTY (`prompt=True` + real TTY) still prompts as today.
+
+## Acceptance criteria
+1. No `load_model` branch calls `input()` when `prompt=False` OR non-TTY.
+2. `_prompt_user_for_delete` has the isatty guard like its two siblings.
+3. Corrupt-download repair/delete branch honors the `prompt` argument.
+4. Non-interactive corrupt model → automatic recovery, bounded to ONE delete+redownload,
+   no prompt.
+5. Recovery surfaces progress (INFO from the pull stream); repair detail at DEBUG.
+6. Unrecoverable → a single loud actionable `LemonadeClientError`. No EOFError, no hang.
+7. Interactive prompting still works when `prompt=True` and a TTY is present.
@@ -0,0 +1,75 @@
+---
+type: plan
+source-issue: 1294
+repo: amd/gaia
+title: "Lemonade: _is_corrupt_download_error misclassifies generic llama-server failed to start as corruption"
+created: 2026-05-29
+status: in-progress
+work_type: code-refactor
+complexity: trivial
+tdd_required: true
+suggested_team_size: 1
+estimated_files_changed: 2
+test_command: "PYTHONPATH=\"$PWD/src\" /Users/tomasz/src/amd/gaia/.venv/bin/python -m pytest tests/unit/test_lemonade_error_classification.py -xvs"
+build_command: ""
+lint_command: "/Users/tomasz/src/amd/gaia/.venv/bin/python util/lint.py --black --isort"
+branch: tmi/issue-1294-corrupt-classification
+reflection_iterations: 0
+agents_used: [planning, execution, validation]
+---
+
+# Issue #1294 — `_is_corrupt_download_error` misclassifies generic `llama-server failed to start`
+
+## Problem
+`LemonadeClient._is_corrupt_download_error` (`src/gaia/llm/lemonade_client.py`, ~1225-1248)
+treats the generic substring `"llama-server failed to start"` as evidence of a corrupt /
+incomplete model download. Lemonade raises that string for many NON-corruption failures
+(resource limits, ctx_size issues, GPU/backend startup, port conflicts). Misclassifying
+routes ordinary load failures into a delete-and-redownload path (default model ~25GB) and
+dead-ends first-boot.
+
+The real-world payload was `{"code":"model_load_error","type":"model_load_error",
+"message":"...llama-server failed to start"}` — `code`/`type` is `model_load_error`,
+which is NOT a corruption signal.
+
+## Fix (surgical)
+Remove `"llama-server failed to start"` from the unconditional corruption phrase list in
+`_is_corrupt_download_error`. Treat that string as corruption ONLY when a specific corruption
+phrase is ALSO present (corroboration); otherwise return `False`. Keep the five existing
+specific corruption phrases unconditional:
+- `"download validation failed"`
+- `"files are incomplete"`
+- `"files are missing"`
+- `"incomplete or missing"`
+- `"corrupted download"`
+
+This makes a bare `llama-server failed to start` load failure fall through to `load_model`'s
+existing non-corrupt branch, which raises an actionable `LemonadeClientError` and does NOT
+enter the delete + `pull_model_stream` repair path.
+
+## Files to change
+1. `src/gaia/llm/lemonade_client.py` — `_is_corrupt_download_error` only. Do NOT touch the
+   prompt helpers (`_prompt_user_for_delete` / `_prompt_user_for_repair`) or `load_model`'s
+   corrupt branch (owned by stacked issue #1293).
+2. `tests/unit/test_lemonade_error_classification.py` — APPEND new test classes (file already
+   exists with #1030 regression tests; preserve them). Do NOT touch
+   `tests/unit/test_lemonade_model_loading.py` (owned by #1293).
+
+## Test approach (TDD — red first)
+Append to `tests/unit/test_lemonade_error_classification.py`:
+- Parametrized `_is_corrupt_download_error`: each of the 5 specific phrases -> True; bare
+  `"llama-server failed to start"` -> False; that string PLUS a corruption phrase -> True;
+  the real `model_load_error` structured payload -> False.
+- `load_model` test: mock `_send_request` to raise the bare `llama-server failed to start`
+  error; assert `delete_model` and `pull_model_stream` are NOT called and an actionable
+  `LemonadeClientError` is raised.
+- `load_model` test: a specific corruption error DOES enter the repair path (resume via
+  `pull_model_stream`).
+
+## Acceptance criteria
+1. `_is_corrupt_download_error("...llama-server failed to start...")` -> False unless a
+   specific corruption signal is also present.
+2. The five existing specific phrases continue to return True (no regression).
+3. A bare `llama-server failed to start` load failure makes `load_model` raise an actionable
+   `LemonadeClientError` and does NOT enter delete+redownload.
+4. When corruption IS correctly detected (a specific phrase), the existing repair flow runs.
@@ -269,9 +269,7 @@ def write_python_file(
             except Exception as e:
                 path_validator = getattr(self, "path_validator", None)
                 if path_validator is not None:
-                    path_validator.audit_write(
-                        "write", file_path, 0, "error", str(e)
-                    )
+                    path_validator.audit_write("write", file_path, 0, "error", str(e))
                 return {"status": "error", "error": str(e)}
 
         @tool
@@ -302,9 +300,7 @@ def edit_python_file(
                 path_validator = getattr(self, "path_validator", None)
                 if path_validator is not None:
                     # Check blocklist
-                    is_blocked, reason = path_validator.is_write_blocked(
-                        str(file_path)
-                    )
+                    is_blocked, reason = path_validator.is_write_blocked(str(file_path))
                     if is_blocked:
                         path_validator.audit_write(
                             "edit", str(file_path), 0, "denied", reason
@@ -313,9 +309,7 @@ def edit_python_file(
 
                     # Check allowlist
                     if not path_validator.is_path_allowed(str(file_path)):
-                        reason = (
-                            f"Access denied: {file_path} is not in allowed paths"
-                        )
+                        reason = f"Access denied: {file_path} is not in allowed paths"
                         path_validator.audit_write(
                             "edit", str(file_path), 0, "denied", reason
                         )
@@ -428,9 +422,7 @@ def edit_python_file(
             except Exception as e:
                 path_validator = getattr(self, "path_validator", None)
                 if path_validator is not None:
-                    path_validator.audit_write(
-                        "edit", file_path, 0, "error", str(e)
-                    )
+                    path_validator.audit_write("edit", file_path, 0, "error", str(e))
                 return {"status": "error", "error": str(e)}
 
         @tool
@@ -642,9 +634,7 @@ def write_markdown_file(
             except Exception as e:
                 path_validator = getattr(self, "path_validator", None)
                 if path_validator is not None:
-                    path_validator.audit_write(
-                        "write", file_path, 0, "error", str(e)
-                    )
+                    path_validator.audit_write("write", file_path, 0, "error", str(e))
                 return {"status": "error", "error": str(e)}
 
         @tool
@@ -1000,9 +990,7 @@ def replace_function(
                 path_validator = getattr(self, "path_validator", None)
                 if path_validator is not None:
                     # Check blocklist
-                    is_blocked, reason = path_validator.is_write_blocked(
-                        str(file_path)
-                    )
+                    is_blocked, reason = path_validator.is_write_blocked(str(file_path))
                     if is_blocked:
                         path_validator.audit_write(
                             "edit", str(file_path), 0, "denied", reason
@@ -1011,9 +999,7 @@ def replace_function(
 
                     # Check allowlist
                     if not path_validator.is_path_allowed(str(file_path)):
-                        reason = (
-                            f"Access denied: {file_path} is not in allowed paths"
-                        )
+                        reason = f"Access denied: {file_path} is not in allowed paths"
                         path_validator.audit_write(
                             "edit", str(file_path), 0, "denied", reason
                         )
@@ -1149,9 +1135,7 @@ def replace_function(
             except Exception as e:
                 path_validator = getattr(self, "path_validator", None)
                 if path_validator is not None:
-                    path_validator.audit_write(
-                        "edit", file_path, 0, "error", str(e)
-                    )
+                    path_validator.audit_write("edit", file_path, 0, "error", str(e))
                 return {"status": "error", "error": str(e)}
 
         # Return the list of registered tools for tracking