Skip to content

feat(e2e): complete E2E v2 suite — 66 specs, orchestrator, bug fixes#2353

Draft
YellowSnnowmann wants to merge 24 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/e2e-v2
Draft

feat(e2e): complete E2E v2 suite — 66 specs, orchestrator, bug fixes#2353
YellowSnnowmann wants to merge 24 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/e2e-v2

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented May 20, 2026

Summary

Rewrites and massively expands the E2E test suite from a handful of specs to a full 66-spec suite organized into 11 suites, with a new orchestrator, hardened test helpers, production bug fixes discovered during E2E work, and CI improvements.

New specs (11 new spec files)

Suite Spec Coverage
Chat chat-tool-call-flow Single web_fetch round, timeline entry, IN_FLIGHT drain
Chat chat-multi-tool-round Sequential file_read + grep, 3-turn LLM loop
Chat chat-tool-error-recovery Mid-stream error surfacing, composer re-enable, recovery send
Journeys user-journey-full-task Login → chat → tool call → result → navigate away + back
Journeys user-journey-settings-round-trip Every major settings panel loads without blank screens
Journeys chat-conversation-history Multi-turn memory via message context + disk persistence
Navigation navigation-smoothness 8-route cycle ×2 (normal + rapid), blank-screen guard
Navigation navigation-settings-panels All 8 settings sub-panels visited individually
Accounts accounts-provider-modal 6 provider tiles, hidden provider assertion, registration
UI screen-intelligence Screen Awareness panel validation

Orchestrator rewrite (e2e-run-all-flows.sh)

  • Groups all 66 specs into 11 suites: auth, navigation, chat, skills, notifications, webhooks, providers, payments, settings, system, journeys
  • --suite=<name> to run a single suite; --bail stops on first failure
  • --skip-preflight to bypass environment checks
  • Per-spec exit codes + summary table at the end

Production bug fixes

  • selectSocketUserId / socket status mismatchselectSocketUserId parsed the JWT while socketService used auth.userId, producing different keys → selectSocketStatus always returned "disconnected" → all chat sends blocked. Aligned both to use auth.userId.
  • ConversationStore::get_messages I/O error on empty threads — returned an I/O error for threads whose JSONL file hadn't been written yet. Now returns []. Regression test added.
  • test_reset missing onboarding_completed flag — didn't clear onboarding_completed, causing post-reset sessions to skip onboarding gate. Fixed + restore flag after wipe.
  • threadSlice uncaught rejectiongenerateThreadTitleIfNeeded async dispatch wrapped in try/catch instead of .catch(), causing uncaught promise rejection.

Test infrastructure improvements

  • Chat harness: waitForSocketConnected() polls store until socket is connected before any send; typeIntoComposer uses native OS keyboard events via WebDriver Actions API (CDP Input.dispatchKeyEvent) instead of synthetic React events; clickSend() extended clear-wait + JS focus fallback to avoid AppUpdatePrompt overlay interception
  • Command palette: WebDriver Actions API for key dispatch (capture-phase listeners)
  • Mock API: socket handler namespace aligned to current openhuman.* RPC event shape; seeded composio/webhook state keys
  • Shared flows: new helpers — openAddAccountModal, waitForAccountsPage, clickAddAccountProvider, navigateToSkills, waitForHomePage
  • RPC preflight (rpc-preflight.ts): validates RPC methods against live core before suite runs
  • Environment preflight (e2e-preflight.sh): bundle, Appium, port sanity checks
  • data-testid selectors added to AddAccountModal and Accounts page

Existing spec fixes

  • Corrected stale settings routes (/settings/account/settings, /settings/channels/settings/connections, etc.)
  • insights-dashboard: replaced removed IntelligenceMemoryTab selectors with current MemoryWorkspace data-testids
  • screen-intelligence: panel title updated to 'Screen Awareness'
  • Auth timing hardened (waitUntil instead of browser.pause)
  • Explicit auth setup + mock server lifecycle added to logout, notifications, slack, whatsapp specs
  • Rewards progression thresholds corrected after points model update

CI

  • Artifact upload on failure for WDIO spec results
  • Job summary step with pass/fail counts in GitHub Actions view
  • test:e2e and test:e2e:flows convenience scripts in root package.json

Docs

  • docs/e2e-status.md — living tracker for 66-spec suite status
  • docs/e2e-audit-2026-05.md — root-cause audit findings from May 2026

Test plan

  • Run full suite: ./app/scripts/e2e-run-all-flows.sh
  • Run individual suites: --suite=chat, --suite=navigation, --suite=auth
  • Verify socket fix: chat sends work on first load without "socket disconnected" toast
  • Verify empty thread fix: create new conversation, navigate away, return — no errors

YellowSnnowmann and others added 20 commits May 20, 2026 19:01
Add three new E2E specs covering the complete tool-call pipeline:
- chat-tool-call-flow: single web_fetch round, timeline entry, IN_FLIGHT drain
- chat-multi-tool-round: sequential file_read + grep, 3-turn LLM loop
- chat-tool-error-recovery: mid-stream error surfacing, composer re-enable, recovery send
…versation history

Add three new E2E specs covering real user workflows:
- user-journey-full-task: login → chat → web_fetch tool call → result → navigate away + back
- user-journey-settings-round-trip: every major settings panel loads without blank screens
- chat-conversation-history: multi-turn memory verified via message context inspection and disk persistence
Add two new E2E specs covering navigation quality:
- navigation-smoothness: 8-route cycle run twice (normal + rapid), blank-screen char-count guard
- navigation-settings-panels: all 8 settings sub-panels visited individually (N2.1-N2.9)
Wire all 8 new specs into the sequential flow runner under three sections:
- Chat & agent harness: chat-tool-call, chat-multi-tool, chat-error-recovery
- User journeys: journey-full-task, journey-settings, chat-history
- Navigation & core UI: navigation-smoothness, navigation-settings
… case

`test_reset` now sets `onboarding_completed=false` (in addition to
`chat_onboarding_completed=false`) to faithfully mirror a fresh install.
Also fixes `ConversationStore::get_messages` returning an I/O error for
threads whose JSONL file hasn't been written yet — returns `[]` instead.
Adds a regression test for the empty-thread case.
…ck all specs

test_reset (fixed above) now clears onboarding_completed=false.
App.tsx's onboarding gate reads this flag: when false it redirects
every session to /onboarding, causing every spec that depends on /home
to fail. Call config_set_onboarding_completed({value:true}) immediately
after a successful wipe so the gate routes to /home as expected.
Adds retry logic for auth bypass if home page isn't reached first time.
…adSlice promise

AddAccountModal: add data-testid on the modal root and each provider
button so accounts-provider-modal.spec.ts can target them precisely.
Accounts page: add data-testid on page root and add-button rail icon.
threadSlice: fire-and-forget generateThreadTitleIfNeeded via .catch()
rather than try/catch to avoid an uncaught rejection on async dispatch.
…t API shape

shared-flows: add openAddAccountModal, waitForAccountsPage,
clickAddAccountProvider, waitForAddAccountModalClosed, navigateToSkills,
and waitForHomePage for the new accounts-provider-modal and journey specs.
mock-api socket/core + websocket: update socket handler namespace to
match current RPC event shape (openhuman.* prefix, correct field names).
mock-api state: seed composio/webhook state keys for provider specs.
root package.json: add test:e2e and test:e2e:flows convenience aliases.
…il/--skip-preflight flags

Replaces the old per-spec runner with a single master orchestrator that:
- Groups all 66 specs into 11 suites (auth, navigation, chat, skills,
  notifications, webhooks, providers, payments, settings, system, journeys)
- --suite=<name> to run a single suite; --bail stops on first suite failure
- --skip-preflight to bypass environment checks
- Removes OPENHUMAN_SERVICE_MOCK=1 from service-connectivity invocation —
  the old sidecar service model was removed in PR tinyhumansai#1061; the spec now
  auto-skips via its own guard rather than running against a dead mock
- Captures per-spec exit codes and prints a summary table at the end
navigation-settings-panels + user-journey-settings-round-trip:
  /settings/account → /settings, /settings/channels → /settings/connections,
  /settings/data → /settings/memory-data, /settings/ai-skills → /settings/intelligence,
  /settings/advanced → /settings/developer-options, /settings/dev → /settings/appearance,
  /settings/features → /settings/tools (all corrected to match Settings.tsx routes).
insights-dashboard: IntelligenceMemoryTab was removed; replace assertions on
  #actionable-search / #actionable-source with [data-testid="memory-workspace"]
  and [data-testid="memory-actions"] from the current MemoryWorkspace component.
screen-intelligence: panel title renamed from 'Screen Intelligence' to
  'Screen Awareness' (i18n key settings.features.screenAwareness).
onboarding-modes: resetApp now restores onboarding_completed=true; spec must
  explicitly set it back to false to test the onboarding flow.
… patterns

Replace hardcoded browser.pause() calls with waitUntil() in
auth-access-control. Add explicit auth setup and mock server lifecycle
to logout-relogin-onboarding, notifications, slack-flow, whatsapp-flow.
composio-triggers-flow: tighten RPC result unwrapping to handle both
{result:{result:...}} and {result:...} response shapes.
tool-filesystem-flow: resolve relative paths inside tmp workspace;
guard path-sensitive assertions against sandbox restrictions.
rewards specs: correct progress assertion thresholds after points model
update. navigation + tauri-commands: add missing mock server lifecycle
hooks. settings-account-preferences: fix selector after label rename.
… React state

The previous approach (native HTMLTextAreaElement prototype setter +
synthetic input/change events via browser.execute) does not update
React's controlled inputValue state in the CEF renderer — the events
fire but React's synthetic onChange handler never sees a value change,
leaving the composer empty and the send button permanently disabled.

Fix: focus the textarea via JS (avoids coordinate-based click that gets
intercepted by AppUpdatePrompt at z-[9998]), select-all existing content,
then send the text as real OS-level keyboard events via browser.keys().
These go through CDP Input.dispatchKeyEvent → Chromium input pipeline →
React's onChange → inputValue state update → send button enabled.
…+ add auth

The synthetic KeyboardEvent dispatched via browser.execute() does not
reliably reach window capture-phase listeners in the Appium Chromium
(CDP) driver. Replace dispatchKey with browser.action('key') which maps
to CDP Input.dispatchKeyEvent — a real key event in Chromium's input
pipeline that hotkeyManager's capture listener sees correctly.
Falls back to synthetic dispatch if the Actions API throws.

Also adds startMockServer + resetApp to before/after hooks: CommandProvider
(which mounts the mod+K listener) lives inside the auth-gated provider
chain and does not mount without a valid session token.
…rkflow

Upload WDIO spec result artifacts on failure so CI logs are accessible
without re-running. Add a job summary step that surfaces pass/fail
counts directly in the GitHub Actions job summary view.
accounts-provider-modal.spec.ts: asserts all 6 exposed account provider
tiles appear in the picker, hidden providers (google-meet, zoom) are
absent, and each provider can be registered via picker interaction.
rpc-preflight.ts: validates RPC methods against the live core before
the suite runs to catch ghost RPC calls (like removed skills runtime
methods) early rather than mid-suite.
e2e-preflight.sh: environment sanity checks (bundle, Appium, ports).
docs/e2e-status.md + e2e-audit-2026-05.md: living tracking docs for
the 66-spec suite status and root-cause audit findings.
…allback

composerSendDecision.ts blocks every send with 'socket_disconnected' when
the Socket.IO connection to the in-process Rust core is not yet up.  In
practice this produces the visible error toast
  "Realtime socket is not connected — responses cannot be delivered
   without a client ID."
and causes ALL chat-harness specs to fail.

Changes:
- chat-harness.ts: add waitForSocketConnected(timeoutMs=30_000) that polls
  window.__OPENHUMAN_STORE__ until socket.byUser[*].status === 'connected'.
- chat-harness.ts: fix clickSend() fallback — extend primary clear-wait
  from 1 s to 5 s (addMessageLocal does a Rust RPC before setInputValue('')
  so the composer can take 100–500 ms to clear) and replace the coordinate-
  based composer.click() fallback with a JS el.focus() call to avoid the
  AppUpdatePrompt overlay (z-[9998]) intercepting the click.
- All 10 chat + user-journey specs: import waitForSocketConnected and call
  it with a warn-if-false guard before the first clickSend().
socketService.getSocketUserId() was changed (3aa8477) to use
auth.userId from the core state snapshot, but selectSocketUserId
still parsed the JWT token. The two derivations produced different
keys (e.g. "user-123" vs the JWT sub claim), so selectSocketStatus
returned "disconnected" even when the socket was connected —
blocking all chat sends with "socket_disconnected".

Use the same auth.userId source in both paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ltiple specs

- Consolidated import statements in reset-app.ts and rpc-preflight.ts for better readability.
- Enhanced formatting of timeout configurations in auth-access-control.spec.ts for consistency.
- Streamlined object definitions in various specs to improve clarity and maintainability.
- Updated console log statements to ensure consistent formatting across navigation and chat specs.
- Minor adjustments to ensure better alignment with coding standards and improve overall code quality.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d306591a-a81c-4460-8e42-2242267bfef6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Comment @coderabbitai help to get the list of available commands and usage tips.

YellowSnnowmann and others added 4 commits May 20, 2026 20:01
…fix threadSlice async assertion

Socket selector tests were still keying state by JWT-parsed tgUserId,
but selectSocketUserId now reads auth.userId directly. Thread title
assertion raced against a fire-and-forget dispatch — use vi.waitFor().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, and lint/format fixes

Align all E2E specs with updated helper APIs (shared-flows, app-helpers),
fix unused variable lint errors in settings-data-management and
settings-feature-preferences, and apply Prettier formatting across
remaining spec files. Update e2e-run-all-flows and e2e-run-session
scripts for the revised spec set.
# Conflicts:
#	app/test/e2e/helpers/shared-flows.ts
#	src/openhuman/test_support/rpc.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant