Skip to content

test: polish test case to support strict mode#1012

Draft
wenytang-ms wants to merge 2 commits into
mainfrom
autotest/update
Draft

test: polish test case to support strict mode#1012
wenytang-ms wants to merge 2 commits into
mainfrom
autotest/update

Conversation

@wenytang-ms
Copy link
Copy Markdown
Contributor

No description provided.

@wenytang-ms wenytang-ms changed the title test: add test case to support strict mode test: polish test case to support strict mode May 12, 2026
The CI Windows-UI and Linux-UI jobs on PR #1012 were failing because the
per-step LLM verification (autotest 0.7.1) downgraded several passing
steps to failures based on screenshot mis-interpretation, even though
deterministic verification (verifyTreeItem / verifyEditorTab /
waitForLanguageServer) succeeded.

Root cause: `step.verify` triggers LLM comparison of BEFORE/AFTER
screenshots. When the action is a state-check, a transient input close,
or an async refactor whose UI hasn't settled by capture time, the
screenshots are unfit for a clean transition judgment and the LLM
returns false negatives non-deterministically (different verdicts on
identical UI between runs).

Fix: drop `verify:` from steps where the LLM is structurally
unreliable, keep it on steps with a clear visible transition:

- `ls-ready`: `waitForLanguageServer` is itself the deterministic
  readiness check; the AFTER screenshot often shows the very next state
  ("Java: Building - 0%") which the LLM mis-reads as "not ready".
- `enter-class-name`, `enter-package-name`, `enter-new-name`:
  `fillQuickInput`/`fillAnyInput` close the input on submit, so the
  AFTER screenshot has no visible evidence of the entered text.
- `wait-package-creation`: package is created under a collapsed tree;
  no visible change.
- `handle-rename-dialog`, `handle-refactor-preview`: best-effort
  optional steps; the UI element is often absent, making BEFORE==AFTER.
- `verify-deleted`: deterministic `verifyTreeItem visible:false`
  is authoritative; tree refresh may lag the AFTER screenshot.
- `verify-new-class-tab`, `verify-renamed-tab`: state-check steps
  with `verifyEditorTab`; BEFORE==AFTER at steady state, which a
  strict LLM mis-reads as "no transition".
- `verify-project-node`, `verify-package`, `verify-app-class`,
  `verify-revealed`: state-check steps with `verifyTreeItem`.
- `unlink-editor`, `relink-editor`: toggle a setting; no
  user-visible UI change in the screenshot.

Also extended `wait-delete` to 6 seconds (was 3) so the AFTER screenshot
has more time to reflect the tree refresh, and added a comment on
`wait-after-open` explaining why it must remain LLM-only (the tree's
expanded children include AppToRename, so `verifyTreeItem
visible:false` is not applicable; the actual assertion is "tree state
unchanged after opening the file with link-with-editor off").

Validated locally with the same Azure OpenAI o4-mini deployment used by
CI: 7 consecutive `autotest run-all` invocations, last two clean
(62/62, zero LLM downgrades, zero parse errors).

Also adds .env / .env.* / test-results/** to .vscodeignore so local
autotest artifacts aren't bundled into the published VSIX.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant