feat(e2e): complete E2E v2 suite — 66 specs, orchestrator, bug fixes by YellowSnnowmann · Pull Request #2353 · tinyhumansai/openhuman

YellowSnnowmann · 2026-05-20T13:57:30Z

Summary

Rewrites and massively expands the E2E test suite from a handful of specs to a full 66-spec suite organized into 11 suites, with a new orchestrator, hardened test helpers, production bug fixes discovered during E2E work, and CI improvements.

New specs (11 new spec files)

Suite	Spec	Coverage
Chat	`chat-tool-call-flow`	Single web_fetch round, timeline entry, IN_FLIGHT drain
Chat	`chat-multi-tool-round`	Sequential file_read + grep, 3-turn LLM loop
Chat	`chat-tool-error-recovery`	Mid-stream error surfacing, composer re-enable, recovery send
Journeys	`user-journey-full-task`	Login → chat → tool call → result → navigate away + back
Journeys	`user-journey-settings-round-trip`	Every major settings panel loads without blank screens
Journeys	`chat-conversation-history`	Multi-turn memory via message context + disk persistence
Navigation	`navigation-smoothness`	8-route cycle ×2 (normal + rapid), blank-screen guard
Navigation	`navigation-settings-panels`	All 8 settings sub-panels visited individually
Accounts	`accounts-provider-modal`	6 provider tiles, hidden provider assertion, registration
UI	`screen-intelligence`	Screen Awareness panel validation

Orchestrator rewrite (`e2e-run-all-flows.sh`)

Groups all 66 specs into 11 suites: auth, navigation, chat, skills, notifications, webhooks, providers, payments, settings, system, journeys
--suite=<name> to run a single suite; --bail stops on first failure
--skip-preflight to bypass environment checks
Per-spec exit codes + summary table at the end

Production bug fixes

selectSocketUserId / socket status mismatch — selectSocketUserId parsed the JWT while socketService used auth.userId, producing different keys → selectSocketStatus always returned "disconnected" → all chat sends blocked. Aligned both to use auth.userId.
ConversationStore::get_messages I/O error on empty threads — returned an I/O error for threads whose JSONL file hadn't been written yet. Now returns []. Regression test added.
test_reset missing onboarding_completed flag — didn't clear onboarding_completed, causing post-reset sessions to skip onboarding gate. Fixed + restore flag after wipe.
threadSlice uncaught rejection — generateThreadTitleIfNeeded async dispatch wrapped in try/catch instead of .catch(), causing uncaught promise rejection.

Test infrastructure improvements

Chat harness: waitForSocketConnected() polls store until socket is connected before any send; typeIntoComposer uses native OS keyboard events via WebDriver Actions API (CDP Input.dispatchKeyEvent) instead of synthetic React events; clickSend() extended clear-wait + JS focus fallback to avoid AppUpdatePrompt overlay interception
Command palette: WebDriver Actions API for key dispatch (capture-phase listeners)
Mock API: socket handler namespace aligned to current openhuman.* RPC event shape; seeded composio/webhook state keys
Shared flows: new helpers — openAddAccountModal, waitForAccountsPage, clickAddAccountProvider, navigateToSkills, waitForHomePage
RPC preflight (rpc-preflight.ts): validates RPC methods against live core before suite runs
Environment preflight (e2e-preflight.sh): bundle, Appium, port sanity checks
data-testid selectors added to AddAccountModal and Accounts page

Existing spec fixes

Corrected stale settings routes (/settings/account → /settings, /settings/channels → /settings/connections, etc.)
insights-dashboard: replaced removed IntelligenceMemoryTab selectors with current MemoryWorkspace data-testids
screen-intelligence: panel title updated to 'Screen Awareness'
Auth timing hardened (waitUntil instead of browser.pause)
Explicit auth setup + mock server lifecycle added to logout, notifications, slack, whatsapp specs
Rewards progression thresholds corrected after points model update

CI

Artifact upload on failure for WDIO spec results
Job summary step with pass/fail counts in GitHub Actions view
test:e2e and test:e2e:flows convenience scripts in root package.json

Docs

docs/e2e-status.md — living tracker for 66-spec suite status
docs/e2e-audit-2026-05.md — root-cause audit findings from May 2026

Test plan

Run full suite: ./app/scripts/e2e-run-all-flows.sh
Run individual suites: --suite=chat, --suite=navigation, --suite=auth
Verify socket fix: chat sends work on first load without "socket disconnected" toast
Verify empty thread fix: create new conversation, navigate away, return — no errors

Add three new E2E specs covering the complete tool-call pipeline: - chat-tool-call-flow: single web_fetch round, timeline entry, IN_FLIGHT drain - chat-multi-tool-round: sequential file_read + grep, 3-turn LLM loop - chat-tool-error-recovery: mid-stream error surfacing, composer re-enable, recovery send

…versation history Add three new E2E specs covering real user workflows: - user-journey-full-task: login → chat → web_fetch tool call → result → navigate away + back - user-journey-settings-round-trip: every major settings panel loads without blank screens - chat-conversation-history: multi-turn memory verified via message context inspection and disk persistence

Add two new E2E specs covering navigation quality: - navigation-smoothness: 8-route cycle run twice (normal + rapid), blank-screen char-count guard - navigation-settings-panels: all 8 settings sub-panels visited individually (N2.1-N2.9)

Wire all 8 new specs into the sequential flow runner under three sections: - Chat & agent harness: chat-tool-call, chat-multi-tool, chat-error-recovery - User journeys: journey-full-task, journey-settings, chat-history - Navigation & core UI: navigation-smoothness, navigation-settings

… case `test_reset` now sets `onboarding_completed=false` (in addition to `chat_onboarding_completed=false`) to faithfully mirror a fresh install. Also fixes `ConversationStore::get_messages` returning an I/O error for threads whose JSONL file hasn't been written yet — returns `[]` instead. Adds a regression test for the empty-thread case.

…ck all specs test_reset (fixed above) now clears onboarding_completed=false. App.tsx's onboarding gate reads this flag: when false it redirects every session to /onboarding, causing every spec that depends on /home to fail. Call config_set_onboarding_completed({value:true}) immediately after a successful wipe so the gate routes to /home as expected. Adds retry logic for auth bypass if home page isn't reached first time.

…adSlice promise AddAccountModal: add data-testid on the modal root and each provider button so accounts-provider-modal.spec.ts can target them precisely. Accounts page: add data-testid on page root and add-button rail icon. threadSlice: fire-and-forget generateThreadTitleIfNeeded via .catch() rather than try/catch to avoid an uncaught rejection on async dispatch.

…t API shape shared-flows: add openAddAccountModal, waitForAccountsPage, clickAddAccountProvider, waitForAddAccountModalClosed, navigateToSkills, and waitForHomePage for the new accounts-provider-modal and journey specs. mock-api socket/core + websocket: update socket handler namespace to match current RPC event shape (openhuman.* prefix, correct field names). mock-api state: seed composio/webhook state keys for provider specs. root package.json: add test:e2e and test:e2e:flows convenience aliases.

…il/--skip-preflight flags Replaces the old per-spec runner with a single master orchestrator that: - Groups all 66 specs into 11 suites (auth, navigation, chat, skills, notifications, webhooks, providers, payments, settings, system, journeys) - --suite=<name> to run a single suite; --bail stops on first suite failure - --skip-preflight to bypass environment checks - Removes OPENHUMAN_SERVICE_MOCK=1 from service-connectivity invocation — the old sidecar service model was removed in PR tinyhumansai#1061; the spec now auto-skips via its own guard rather than running against a dead mock - Captures per-spec exit codes and prints a summary table at the end

navigation-settings-panels + user-journey-settings-round-trip: /settings/account → /settings, /settings/channels → /settings/connections, /settings/data → /settings/memory-data, /settings/ai-skills → /settings/intelligence, /settings/advanced → /settings/developer-options, /settings/dev → /settings/appearance, /settings/features → /settings/tools (all corrected to match Settings.tsx routes). insights-dashboard: IntelligenceMemoryTab was removed; replace assertions on #actionable-search / #actionable-source with [data-testid="memory-workspace"] and [data-testid="memory-actions"] from the current MemoryWorkspace component. screen-intelligence: panel title renamed from 'Screen Intelligence' to 'Screen Awareness' (i18n key settings.features.screenAwareness). onboarding-modes: resetApp now restores onboarding_completed=true; spec must explicitly set it back to false to test the onboarding flow.

… patterns Replace hardcoded browser.pause() calls with waitUntil() in auth-access-control. Add explicit auth setup and mock server lifecycle to logout-relogin-onboarding, notifications, slack-flow, whatsapp-flow. composio-triggers-flow: tighten RPC result unwrapping to handle both {result:{result:...}} and {result:...} response shapes. tool-filesystem-flow: resolve relative paths inside tmp workspace; guard path-sensitive assertions against sandbox restrictions. rewards specs: correct progress assertion thresholds after points model update. navigation + tauri-commands: add missing mock server lifecycle hooks. settings-account-preferences: fix selector after label rename.

… React state The previous approach (native HTMLTextAreaElement prototype setter + synthetic input/change events via browser.execute) does not update React's controlled inputValue state in the CEF renderer — the events fire but React's synthetic onChange handler never sees a value change, leaving the composer empty and the send button permanently disabled. Fix: focus the textarea via JS (avoids coordinate-based click that gets intercepted by AppUpdatePrompt at z-[9998]), select-all existing content, then send the text as real OS-level keyboard events via browser.keys(). These go through CDP Input.dispatchKeyEvent → Chromium input pipeline → React's onChange → inputValue state update → send button enabled.

…+ add auth The synthetic KeyboardEvent dispatched via browser.execute() does not reliably reach window capture-phase listeners in the Appium Chromium (CDP) driver. Replace dispatchKey with browser.action('key') which maps to CDP Input.dispatchKeyEvent — a real key event in Chromium's input pipeline that hotkeyManager's capture listener sees correctly. Falls back to synthetic dispatch if the Actions API throws. Also adds startMockServer + resetApp to before/after hooks: CommandProvider (which mounts the mod+K listener) lives inside the auth-gated provider chain and does not mount without a valid session token.

…rkflow Upload WDIO spec result artifacts on failure so CI logs are accessible without re-running. Add a job summary step that surfaces pass/fail counts directly in the GitHub Actions job summary view.

accounts-provider-modal.spec.ts: asserts all 6 exposed account provider tiles appear in the picker, hidden providers (google-meet, zoom) are absent, and each provider can be registered via picker interaction. rpc-preflight.ts: validates RPC methods against the live core before the suite runs to catch ghost RPC calls (like removed skills runtime methods) early rather than mid-suite. e2e-preflight.sh: environment sanity checks (bundle, Appium, ports). docs/e2e-status.md + e2e-audit-2026-05.md: living tracking docs for the 66-spec suite status and root-cause audit findings.

…allback composerSendDecision.ts blocks every send with 'socket_disconnected' when the Socket.IO connection to the in-process Rust core is not yet up. In practice this produces the visible error toast "Realtime socket is not connected — responses cannot be delivered without a client ID." and causes ALL chat-harness specs to fail. Changes: - chat-harness.ts: add waitForSocketConnected(timeoutMs=30_000) that polls window.__OPENHUMAN_STORE__ until socket.byUser[*].status === 'connected'. - chat-harness.ts: fix clickSend() fallback — extend primary clear-wait from 1 s to 5 s (addMessageLocal does a Rust RPC before setInputValue('') so the composer can take 100–500 ms to clear) and replace the coordinate- based composer.click() fallback with a JS el.focus() call to avoid the AppUpdatePrompt overlay (z-[9998]) intercepting the click. - All 10 chat + user-journey specs: import waitForSocketConnected and call it with a warn-if-false guard before the first clickSend().

socketService.getSocketUserId() was changed (3aa8477) to use auth.userId from the core state snapshot, but selectSocketUserId still parsed the JWT token. The two derivations produced different keys (e.g. "user-123" vs the JWT sub claim), so selectSocketStatus returned "disconnected" even when the socket was connected — blocking all chat sends with "socket_disconnected". Use the same auth.userId source in both paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ltiple specs - Consolidated import statements in reset-app.ts and rpc-preflight.ts for better readability. - Enhanced formatting of timeout configurations in auth-access-control.spec.ts for consistency. - Streamlined object definitions in various specs to improve clarity and maintainability. - Updated console log statements to ensure consistent formatting across navigation and chat specs. - Minor adjustments to ensure better alignment with coding standards and improve overall code quality.

…osio-triggers-flow specs

coderabbitai · 2026-05-20T13:57:39Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d306591a-a81c-4460-8e42-2242267bfef6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…fix threadSlice async assertion Socket selector tests were still keying state by JWT-parsed tgUserId, but selectSocketUserId now reads auth.userId directly. Thread title assertion raced against a fire-and-forget dispatch — use vi.waitFor(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…, and lint/format fixes Align all E2E specs with updated helper APIs (shared-flows, app-helpers), fix unused variable lint errors in settings-data-management and settings-feature-preferences, and apply Prettier formatting across remaining spec files. Update e2e-run-all-flows and e2e-run-session scripts for the revised spec set.

# Conflicts: # app/test/e2e/helpers/shared-flows.ts # src/openhuman/test_support/rpc.rs

…solution

YellowSnnowmann and others added 20 commits May 20, 2026 19:01

ci(e2e): add artifact upload and job summary steps to reusable E2E wo…

baa63fe

…rkflow Upload WDIO spec result artifacts on failure so CI logs are accessible without re-running. Add a job summary step that surfaces pass/fail counts directly in the GitHub Actions job summary view.

fix(e2e): remove unused variables flagged by lint

0566e34

refactor(e2e): remove unnecessary whitespace in chat-harness and comp…

5bdda6f

…osio-triggers-flow specs

YellowSnnowmann and others added 4 commits May 20, 2026 20:01

Merge remote-tracking branch 'upstream/main' into fix/e2e-v2

1a49ab6

# Conflicts: # app/test/e2e/helpers/shared-flows.ts # src/openhuman/test_support/rpc.rs

fix(e2e): remove unreachable sidebar fallback after merge conflict re…

74c90ef

…solution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(e2e): complete E2E v2 suite — 66 specs, orchestrator, bug fixes#2353

feat(e2e): complete E2E v2 suite — 66 specs, orchestrator, bug fixes#2353
YellowSnnowmann wants to merge 24 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/e2e-v2

YellowSnnowmann commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YellowSnnowmann commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New specs (11 new spec files)

Orchestrator rewrite (e2e-run-all-flows.sh)

Production bug fixes

Test infrastructure improvements

Existing spec fixes

CI

Docs

Test plan

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YellowSnnowmann commented May 20, 2026 •

edited

Loading

Orchestrator rewrite (`e2e-run-all-flows.sh`)

coderabbitai Bot commented May 20, 2026 •

edited

Loading