test: polish test case to support strict mode by wenytang-ms · Pull Request #1012 · microsoft/vscode-java-dependency

wenytang-ms · 2026-05-12T08:18:56Z

No description provided.

The CI Windows-UI and Linux-UI jobs on PR #1012 were failing because the per-step LLM verification (autotest 0.7.1) downgraded several passing steps to failures based on screenshot mis-interpretation, even though deterministic verification (verifyTreeItem / verifyEditorTab / waitForLanguageServer) succeeded. Root cause: `step.verify` triggers LLM comparison of BEFORE/AFTER screenshots. When the action is a state-check, a transient input close, or an async refactor whose UI hasn't settled by capture time, the screenshots are unfit for a clean transition judgment and the LLM returns false negatives non-deterministically (different verdicts on identical UI between runs). Fix: drop `verify:` from steps where the LLM is structurally unreliable, keep it on steps with a clear visible transition: - `ls-ready`: `waitForLanguageServer` is itself the deterministic readiness check; the AFTER screenshot often shows the very next state ("Java: Building - 0%") which the LLM mis-reads as "not ready". - `enter-class-name`, `enter-package-name`, `enter-new-name`: `fillQuickInput`/`fillAnyInput` close the input on submit, so the AFTER screenshot has no visible evidence of the entered text. - `wait-package-creation`: package is created under a collapsed tree; no visible change. - `handle-rename-dialog`, `handle-refactor-preview`: best-effort optional steps; the UI element is often absent, making BEFORE==AFTER. - `verify-deleted`: deterministic `verifyTreeItem visible:false` is authoritative; tree refresh may lag the AFTER screenshot. - `verify-new-class-tab`, `verify-renamed-tab`: state-check steps with `verifyEditorTab`; BEFORE==AFTER at steady state, which a strict LLM mis-reads as "no transition". - `verify-project-node`, `verify-package`, `verify-app-class`, `verify-revealed`: state-check steps with `verifyTreeItem`. - `unlink-editor`, `relink-editor`: toggle a setting; no user-visible UI change in the screenshot. Also extended `wait-delete` to 6 seconds (was 3) so the AFTER screenshot has more time to reflect the tree refresh, and added a comment on `wait-after-open` explaining why it must remain LLM-only (the tree's expanded children include AppToRename, so `verifyTreeItem visible:false` is not applicable; the actual assertion is "tree state unchanged after opening the file with link-with-editor off"). Validated locally with the same Azure OpenAI o4-mini deployment used by CI: 7 consecutive `autotest run-all` invocations, last two clean (62/62, zero LLM downgrades, zero parse errors). Also adds .env / .env.* / test-results/** to .vscodeignore so local autotest artifacts aren't bundled into the published VSIX. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

test: add test case to support strict mode

1240f5a

wenytang-ms changed the title ~~test: add test case to support strict mode~~ test: polish test case to support strict mode May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: polish test case to support strict mode#1012

test: polish test case to support strict mode#1012
wenytang-ms wants to merge 2 commits into
mainfrom
autotest/update

wenytang-ms commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wenytang-ms commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant