feat(voice): global push-to-talk hotkey (#3090)#3349
Conversation
…3090) Design doc for the PTT half of issue tinyhumansai#3090 — a hold-to-talk global hotkey that lets the user speak to OpenHuman while it's in the background, with agent replies routed through TTS. Background screen capture (the other half of tinyhumansai#3090) is scoped to a follow-up PR.
…umansai#3090) 15-task plan covering the Rust schema delta (`channel.web_chat`), voice/bus event, Tauri-shell hotkey + overlay window, frontend redux slice + service state machine + UI + settings panel, 13-locale i18n, and a WDIO E2E spec with mocked STT.
…ghten schema tests (tinyhumansai#3090)
… check; reject empty tokens (tinyhumansai#3090)
…ted on speak_reply=true (tinyhumansai#3090)
…oop; tag overlay error (tinyhumansai#3090)
…inyhumansai#3090) - Drop MaybePttRoot optional-key type; selectors now use { ptt: PttState } mirroring the mascotSlice selector convention - Remove duplicate `export default pttSlice.reducer`; keep only the named pttReducer export that index.ts already imports - Add resetUserScopedState test asserting dirty state returns to initialPttState
…nscribe-fail paths (tinyhumansai#3090)
…es (tinyhumansai#3090) Store the most recent Tauri hotkey registration error in pttSlice (registrationError, transient/non-persisted) and dispatch it from usePttHotkey on failure, clearing it on success. PttSettingsPanel maps well-known error strings (dictation conflict, Wayland, accessibility, shortcut-in-use) to their existing i18n keys and renders them inline below the capture input so the user sees the real failure reason instead of a silent "saved" state.
Surfaces the global push-to-talk feature in the user-facing capability catalog so the /about and settings search surfaces describe it. The id, domain, and Conversation category sit next to conversation.send_voice and the iOS mobile.push_to_talk entry — pinned by a new `capability_list_includes_voice_ptt` test that also asserts the how_to mentions Push-to-Talk and the description mentions hold + hotkey, so a future copy refactor can't silently drop the hook. Privacy is `DERIVED_TO_BACKEND` because PTT routes audio through the configured STT provider (matching `conversation.send_voice`'s shape).
…mansai#3090) Final quality-sweep pass for the PTT branch: - prettier --write across the new PTT TypeScript / overlay / settings files - cargo fmt --all across the new Tauri shell PTT modules, the core voice/web channel touch points, and the tests files that grew assertions for the optional speak_reply/source/session_id metadata - fix the only ESLint error on the branch: empty `catch (_) {}` in pttService onStart's preempt-race cancel — annotated with a comment explaining the orphan-session cleanup is best-effort Behaviour unchanged; this commit is purely whitespace + the catch comment.
|
Need the big picture first? Review this PR in Change Stack to see what changed before going file by file. No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (19)
✅ Files skipped from review due to trivial changes (7)
🚧 Files skipped from review as they are similar to previous changes (10)
📝 WalkthroughWalkthroughAdds complete global Push-to-Talk: Tauri hotkey expansion/registration and overlay, renderer PTT service and audio adapters, persisted PTT Redux slice and settings UI, chat metadata forwarding and core voice events, tests (unit, Vitest, E2E), i18n across locales, and a design spec and audio asset notes. ChangesGlobal Push-to-Talk
Sequence Diagram(s): sequenceDiagram
participant OS as OS (Global Hotkey)
participant Tauri as Tauri (ptt_hotkeys)
participant Renderer as Renderer (pttService)
participant Core as Core (web chat / reply speech)
OS->>Tauri: key press / release (expanded variant)
Tauri->>Renderer: emit ptt://start / ptt://stop (session_id)
Renderer->>Renderer: capture audio / finalize / transcribe
Renderer->>Core: openhuman.channel_web_chat (speak_reply, source, session_id)
Core->>Core: optionally synthesize reply (if speak_reply)
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
|
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (6)
src/openhuman/voice/bus.rs (1)
64-66: ⚡ Quick winReplace fixed sleep with a bounded receive wait.
The fixed 50ms delay is timing-sensitive and can flake on slower CI. Prefer a timeout-backed receive/poll for deterministic delivery checks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice/bus.rs` around lines 64 - 66, Replace the fixed tokio::time::sleep with a timeout-backed receive: wrap the receiver.recv() call in tokio::time::timeout(Duration::from_millis(50), receiver.recv()).await, handle the timeout (Err) path and the recv result (Ok(Some/Ok/Err) depending on receiver type), and proceed only when the receive completed or timeouts deterministically; update the code around the broadcaster tick to use tokio::time::timeout and the appropriate Receiver::recv() (e.g., tokio::sync::broadcast::Receiver::recv or mpsc Receiver::recv) instead of sleeping.app/src/pages/settings/voice/PttSettingsPanel.tsx (1)
146-146: ⚡ Quick winUse namespaced debug logging instead of
console.debug.Please switch this to the app’s namespaced debug logger pattern for dev diagnostics.
As per coding guidelines, "In
app/src, use namespaceddebuglogs (e.g. from thedebugnpm package) with dev-only detail for development diagnostics".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/pages/settings/voice/PttSettingsPanel.tsx` at line 146, Replace the console.debug call in PttSettingsPanel that logs the captured shortcut with the app’s namespaced debug logger: add or reuse a debug instance (e.g. debug('app:pttSettings') or the project's existing namespace) and call that logger with the same message/args instead of console.debug; ensure the import/initializer for the debug logger is present in the PttSettingsPanel module and that the log invocation uses the shortcutString variable.app/src/components/settings/panels/VoicePanel.tsx (1)
1233-1240: 🏗️ Heavy liftSplit this panel instead of extending a 1.2k-line component.
Line 1233 adds more feature surface to an already oversized module; please extract sections (e.g., provider/routing/PTT composition) into smaller components and keep
VoicePanelas orchestration only.As per coding guidelines: “Keep TypeScript/React files at or below ~500 lines; split growing modules into smaller files.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 1233 - 1240, VoicePanel is getting too large; extract the big sub-sections (e.g., provider/routing, PTT composition, and any internal stateful blocks such as the PttSettingsPanel logic) into their own components/files so VoicePanel becomes an orchestration-only component that just composes and passes props. Concretely: create new components (e.g., PttSettingsPanel, PttProviderControls, PttRoutingControls) that encapsulate local state, handlers, and UI currently embedded in VoicePanel; move related helper hooks/utilities alongside them and export them; update VoicePanel to import and render these new components and forward any required props or callbacks (keep shared slice interactions with usePttHotkey and ptt slice usage centralized). Ensure tests/imports are updated and default exports/named exports match the new files so the behavior is unchanged while VoicePanel stays under ~500 lines.app/src/features/voice/pttAudio.ts (1)
54-55: ⚡ Quick winUse namespaced debug logger instead of
console.*in app runtime modules.Please switch these diagnostics to a namespaced
debug(...)logger to match the app logging standard.As per coding guidelines, "In
app/src, use namespaceddebuglogs (e.g. from thedebugnpm package) with dev-only detail for development diagnostics".Also applies to: 70-71, 90-90, 98-99, 112-112
app/src/features/voice/pttChimes.ts (1)
40-40: ⚡ Quick winReplace
console.debugwith namespaced debug logging.Use the shared namespaced debug pattern here as well for consistency with app diagnostics policy.
As per coding guidelines, "Use namespaced
debugfunction with dev-only detail for diagnostic logging in TypeScript/React".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/features/voice/pttChimes.ts` at line 40, Replace the plain console.debug call in pttChimes.ts with the shared namespaced debug logger used across the app: import or get the module's debug factory, create a logger for this module (e.g., const dbg = debug('voice:ptt-chime') or similar existing naming convention), and replace console.debug('[ptt-chime] play failed', { kind, err: String(err) }) with dbg('play failed', { kind, err: String(err) }) so diagnostics follow the app's namespaced/dev-only logging pattern.app/src/components/PttHotkeyManager.tsx (1)
87-90: ⚡ Quick winSwap
console.*diagnostics to namespaceddebuglogging.This component currently mixes runtime diagnostics through
console.*; align with the app logging convention.As per coding guidelines, "In
app/src, use namespaceddebuglogs (e.g. from thedebugnpm package) with dev-only detail for development diagnostics".Also applies to: 117-119, 131-131
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/PttHotkeyManager.tsx` around lines 87 - 90, Replace the raw console calls in the logger object with namespaced debug loggers: import debug from 'debug' and create namespace(s) like const log = debug('app:PttHotkeyManager') and optionally const info = debug('app:PttHotkeyManager:info'), const warn = debug('app:PttHotkeyManager:warn'); then change the object entries (the debug/info/warn arrow functions shown in the snippet) to call those debug instances (e.g. debug(msg, meta) -> log(format message and metadata) or info(JSON.stringify(meta) when present)) so logs follow the app's dev-only namespaced convention; keep the same parameter shape (msg, meta) and ensure meta is omitted or stringified when undefined.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/PttHotkeyManager.tsx`:
- Around line 103-110: Listener callbacks in PttHotkeyManager.tsx call
service.onStart and service.onStop without handling rejections, which can create
unhandled promise rejections; update the 'ptt://start' and 'ptt://stop' listen
callbacks to handle promise failures from service.onStart(session_id) and
service.onStop(session_id) (e.g., append .catch(...) or use an async wrapper
with try/catch) and in the catch log the error and/or dispatch an error action
so failures are observed and do not cause unhandled rejections.
In `@app/src/lib/i18n/es.ts`:
- Around line 1558-1559: Update the i18n entry for the 'pttSettings.description'
key so the copy no longer asserts that OpenHuman will always speak replies;
instead indicate that replies are spoken only when text-to-speech is enabled
(e.g., add "si la síntesis de voz está activada" or similar). Locate the
'pttSettings.description' string in the Spanish translations and modify the
sentence to reference the optional TTS behavior so it aligns with the setting
controlled elsewhere.
In `@app/src/lib/i18n/ru.ts`:
- Around line 1536-1537: Update the translation for the key
pttSettings.description so it no longer states that OpenHuman will always speak
replies; make it neutral/conditional and reference the optional behavior
controlled by pttSettings.speakRepliesLabel (e.g., indicate replies may be
spoken if the "speak replies" setting is enabled) so the text accurately
reflects that speaking is optional.
In `@app/src/pages/settings/voice/PttSettingsPanel.tsx`:
- Around line 71-76: The serialization of key labels in PttSettingsPanel uses
e.key which is a single space for the Space key, producing visually blank
bindings; update the logic around label (the variable set from e.key) so if
label === ' ' (or other browser variants if desired) you normalize it to "Space"
before parts.push and returning parts.join('+'), ensuring the Space key is
serialized as "Space" rather than an empty-looking string.
- Around line 118-125: The keydown handler handleShortcutKeyDown currently calls
e.preventDefault()/e.stopPropagation() for every key which blocks Tab/Shift+Tab
focus navigation; update handleShortcutKeyDown in PttSettingsPanel to
early-return (do not call preventDefault/stopPropagation) when the pressed key
is Tab (e.key === 'Tab' or keyCode 9) or when Shift+Tab is detected so native
focus movement still occurs, and keep the existing
preventDefault/stopPropagation behavior for all other keys so they remain
captured as shortcut bindings.
In `@app/src/services/pttService.ts`:
- Around line 135-145: finaliseSession currently invokes
deps.resolveActiveThreadId, deps.createNewVoiceThread and deps.sendMessage
directly, allowing any thrown/rejected promise to escape; wrap the
thread-resolution/creation and sendMessage sequence in a try/catch inside
finaliseSession (around calls to resolveActiveThreadId, createNewVoiceThread and
sendMessage), log or record the caught error and ensure finaliseSession
completes (e.g., still performs session cleanup) rather than rethrowing so no
rejected promise leaks to caller event paths; reference the functions
resolveActiveThreadId, createNewVoiceThread and sendMessage when updating the
implementation.
In `@docs/superpowers/specs/2026-06-02-global-ptt-design.md`:
- Line 296: The locale count in the doc is incorrect: it says "12 non-English
locale files" but the list under the pttSettings / pttOverlay i18n change
includes 13 locales (ar, bn, de, es, fr, hi, id, it, ko, pl, pt, ru, zh-CN);
update the sentence to the correct number (13) or remove the hardcoded number
and say "the following non-English locales" so the text and the pttSettings /
pttOverlay list in app/src/lib/i18n/en.ts remain consistent.
- Around line 102-144: Add documentation for the three new Tauri IPC commands
(register_ptt_hotkey, unregister_ptt_hotkey, show_ptt_overlay) to the Tauri
shell docs: update gitbooks/developing/architecture/tauri-shell.md to list each
command signature, brief behavior summary (including error/return cases like
ConflictsWithDictation for register_ptt_hotkey), and note any lifecycle/cleanup
expectations (PttHotkeyState and ptt overlay creation/destruction); also add a
checklist/task in the spec testing section reminding implementers to update this
documentation when adding IPC commands.
In `@src/openhuman/channels/providers/web.rs`:
- Around line 929-954: The synthesize_reply(...) ReplySpeechResult is being
awaited and then discarded in the Ok branch inside the if let Ok(ref
task_result) = result block (conditioned on metadata.speak_reply); instead
capture the successful result (from
crate::openhuman::voice::reply_speech::synthesize_reply) and propagate it to the
client—either attach it to the outgoing response payload for this request or
emit the appropriate web channel event so the renderer can play TTS. Update the
branch handling the Ok(_) case to store the ReplySpeechResult and include it in
the same response/event that carries task_result.full_response (use the existing
client_id/thread_id/request_id context), and keep the log but no longer drop the
synthesized result; ensure error handling remains in the Err path.
---
Nitpick comments:
In `@app/src/components/PttHotkeyManager.tsx`:
- Around line 87-90: Replace the raw console calls in the logger object with
namespaced debug loggers: import debug from 'debug' and create namespace(s) like
const log = debug('app:PttHotkeyManager') and optionally const info =
debug('app:PttHotkeyManager:info'), const warn =
debug('app:PttHotkeyManager:warn'); then change the object entries (the
debug/info/warn arrow functions shown in the snippet) to call those debug
instances (e.g. debug(msg, meta) -> log(format message and metadata) or
info(JSON.stringify(meta) when present)) so logs follow the app's dev-only
namespaced convention; keep the same parameter shape (msg, meta) and ensure meta
is omitted or stringified when undefined.
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 1233-1240: VoicePanel is getting too large; extract the big
sub-sections (e.g., provider/routing, PTT composition, and any internal stateful
blocks such as the PttSettingsPanel logic) into their own components/files so
VoicePanel becomes an orchestration-only component that just composes and passes
props. Concretely: create new components (e.g., PttSettingsPanel,
PttProviderControls, PttRoutingControls) that encapsulate local state, handlers,
and UI currently embedded in VoicePanel; move related helper hooks/utilities
alongside them and export them; update VoicePanel to import and render these new
components and forward any required props or callbacks (keep shared slice
interactions with usePttHotkey and ptt slice usage centralized). Ensure
tests/imports are updated and default exports/named exports match the new files
so the behavior is unchanged while VoicePanel stays under ~500 lines.
In `@app/src/features/voice/pttChimes.ts`:
- Line 40: Replace the plain console.debug call in pttChimes.ts with the shared
namespaced debug logger used across the app: import or get the module's debug
factory, create a logger for this module (e.g., const dbg =
debug('voice:ptt-chime') or similar existing naming convention), and replace
console.debug('[ptt-chime] play failed', { kind, err: String(err) }) with
dbg('play failed', { kind, err: String(err) }) so diagnostics follow the app's
namespaced/dev-only logging pattern.
In `@app/src/pages/settings/voice/PttSettingsPanel.tsx`:
- Line 146: Replace the console.debug call in PttSettingsPanel that logs the
captured shortcut with the app’s namespaced debug logger: add or reuse a debug
instance (e.g. debug('app:pttSettings') or the project's existing namespace) and
call that logger with the same message/args instead of console.debug; ensure the
import/initializer for the debug logger is present in the PttSettingsPanel
module and that the log invocation uses the shortcutString variable.
In `@src/openhuman/voice/bus.rs`:
- Around line 64-66: Replace the fixed tokio::time::sleep with a timeout-backed
receive: wrap the receiver.recv() call in
tokio::time::timeout(Duration::from_millis(50), receiver.recv()).await, handle
the timeout (Err) path and the recv result (Ok(Some/Ok/Err) depending on
receiver type), and proceed only when the receive completed or timeouts
deterministically; update the code around the broadcaster tick to use
tokio::time::timeout and the appropriate Receiver::recv() (e.g.,
tokio::sync::broadcast::Receiver::recv or mpsc Receiver::recv) instead of
sleeping.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 084ac58f-7897-4bd8-afe0-95263f2d401d
⛔ Files ignored due to path filters (3)
app/src/assets/audio/ptt-close.wavis excluded by!**/*.wavapp/src/assets/audio/ptt-error.wavis excluded by!**/*.wavapp/src/assets/audio/ptt-open.wavis excluded by!**/*.wav
📒 Files selected for processing (65)
app/src-tauri/src/lib.rsapp/src-tauri/src/ptt_hotkeys.rsapp/src-tauri/src/ptt_overlay.rsapp/src/App.tsxapp/src/AppRoutes.tsxapp/src/__tests__/App.boot.test.tsxapp/src/assets/audio/README.mdapp/src/components/PttHotkeyManager.tsxapp/src/components/settings/panels/VoicePanel.tsxapp/src/features/voice/pttAudio.tsapp/src/features/voice/pttChimes.tsapp/src/features/voice/pttThread.tsapp/src/features/voice/pttTranscribe.tsapp/src/hooks/usePttHotkey.tsapp/src/lib/i18n/ar.tsapp/src/lib/i18n/bn.tsapp/src/lib/i18n/de.tsapp/src/lib/i18n/en.tsapp/src/lib/i18n/es.tsapp/src/lib/i18n/fr.tsapp/src/lib/i18n/hi.tsapp/src/lib/i18n/id.tsapp/src/lib/i18n/it.tsapp/src/lib/i18n/ko.tsapp/src/lib/i18n/pl.tsapp/src/lib/i18n/pt.tsapp/src/lib/i18n/ru.tsapp/src/lib/i18n/zh-CN.tsapp/src/pages/PttOverlayPage.test.tsxapp/src/pages/PttOverlayPage.tsxapp/src/pages/settings/voice/PttSettingsPanel.tsxapp/src/pages/settings/voice/__tests__/PttSettingsPanel.test.tsxapp/src/services/__tests__/chatService.test.tsapp/src/services/__tests__/pttService.test.tsapp/src/services/chatService.tsapp/src/services/pttService.tsapp/src/store/__tests__/pttSlice.test.tsapp/src/store/index.tsapp/src/store/pttSlice.tsapp/src/test/test-utils.tsxapp/src/utils/tauriCommands/ptt.tsapp/test/e2e/specs/ptt-flow.spec.tsdocs/superpowers/plans/2026-06-02-global-ptt.mddocs/superpowers/specs/2026-06-02-global-ptt-design.mdsrc/core/event_bus/events.rssrc/core/event_bus/mod.rssrc/core/socketio.rssrc/openhuman/about_app/catalog_data.rssrc/openhuman/about_app/catalog_tests.rssrc/openhuman/channels/bus.rssrc/openhuman/channels/providers/web.rssrc/openhuman/channels/providers/web_tests.rssrc/openhuman/voice/bus.rssrc/openhuman/voice/mod.rssrc/openhuman/voice/reply_speech.rstests/channels_large_round25_raw_coverage_e2e.rstests/channels_provider_deep_raw_coverage_e2e.rstests/channels_provider_leftovers_raw_coverage_e2e.rstests/channels_runtime_raw_coverage_e2e.rstests/channels_web_startup_raw_coverage_e2e.rstests/channels_web_telegram_raw_coverage_e2e.rstests/channels_web_yuanbao_round22_raw_coverage_e2e.rstests/json_rpc_e2e.rstests/tools_approval_channels_raw_coverage_e2e.rstests/tools_network_channels_raw_coverage_e2e.rs
👮 Files not reviewed due to content moderation or server errors (7)
- app/src-tauri/src/ptt_hotkeys.rs
- app/src-tauri/src/ptt_overlay.rs
- app/src-tauri/src/lib.rs
- app/src/hooks/usePttHotkey.ts
- app/src/features/voice/pttThread.ts
- app/src/utils/tauriCommands/ptt.ts
- app/test/e2e/specs/ptt-flow.spec.ts
| if let Ok(ref task_result) = result { | ||
| let speak_reply = matches!(metadata.speak_reply, Some(true)); | ||
| let trimmed_response = task_result.full_response.trim(); | ||
| if speak_reply && !trimmed_response.is_empty() { | ||
| let opts = crate::openhuman::voice::reply_speech::ReplySpeechOptions::default(); | ||
| match crate::openhuman::voice::reply_speech::synthesize_reply( | ||
| &config, | ||
| &task_result.full_response, | ||
| &opts, | ||
| ) | ||
| .await | ||
| { | ||
| Ok(_) => log::debug!( | ||
| "[web_channel] reply_speech dispatched chars={} client_id={} thread_id={} request_id={}", | ||
| task_result.full_response.len(), | ||
| client_id, | ||
| thread_id, | ||
| request_id, | ||
| ), | ||
| Err(err) => log::warn!( | ||
| "[web_channel] reply_speech failed: {err} client_id={} thread_id={} request_id={}", | ||
| client_id, | ||
| thread_id, | ||
| request_id, | ||
| ), | ||
| } |
There was a problem hiding this comment.
speak_reply synthesis result is dropped before it can be consumed.
On Line 915 the comment says this path lets the renderer play TTS, but the synthesize_reply(...) result is only logged and discarded. No event/response path carries ReplySpeechResult to the client, so playback cannot occur from this branch.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/channels/providers/web.rs` around lines 929 - 954, The
synthesize_reply(...) ReplySpeechResult is being awaited and then discarded in
the Ok branch inside the if let Ok(ref task_result) = result block (conditioned
on metadata.speak_reply); instead capture the successful result (from
crate::openhuman::voice::reply_speech::synthesize_reply) and propagate it to the
client—either attach it to the outgoing response payload for this request or
emit the appropriate web channel event so the renderer can play TTS. Update the
branch handling the Ok(_) case to store the ReplySpeechResult and include it in
the same response/event that carries task_result.full_response (use the existing
client_id/thread_id/request_id context), and keep the log but no longer drop the
synthesized result; ensure error handling remains in the Err path.
There was a problem hiding this comment.
You are correct that the synthesize_reply return value is dropped at the Ok(_) match arm. However, this is intentional for this PR's scope and not a regression.
The PTT TTS delivery path in this PR works via the renderer-side useHumanMascot hook: when chat_done fires and speakRef.current is true, the hook calls synthesizeSpeech (→ openhuman.voice_reply_synthesize RPC) directly from the frontend to fetch the audio and play it with viseme sync. This path is already wired and functional when the mascot page is open.
The synthesize_reply call in web.rs is a parallel synthesis intended for the background case (user in another app, mascot not rendered), where the core would need to emit a WebChannelEvent carrying the audio_base64 so the renderer can play it via a dedicated socket listener. That WebChannelEvent variant, socket event name, and frontend listener do not exist yet — adding them is a follow-up tracked by the existing TODO comment.
For the scope of this PR (landing the PTT state machine, hotkey registration, overlay, and settings panel), the core-side synthesis call is a non-functional placeholder that ensures the speak_reply flag round-trips through the schema and the synthesize_reply code path is exercised by the test seam in the E2E suite. Dropping the result rather than propagating it is safe because the renderer already handles TTS independently via the mascot hook for the active-window case.
A follow-up issue will add the WebChannelEvent::PttReplySpeech variant + socket event + frontend listener to complete the background-TTS delivery path.
There was a problem hiding this comment.
@CodeGhost21, thanks for the detailed walkthrough — that makes the two-path design clear:
- Active-window path (mascot rendered):
chat_done→ renderer'suseHumanMascot→openhuman.voice_reply_synthesizeRPC → viseme-synced playback. - Background path (mascot not rendered):
synthesize_replyinweb.rspre-synthesizes the audio server-side, but the delivery mechanism (WebChannelEvent::PttReplySpeech+ socket event + frontend listener) is a follow-up.
The placeholder call exercising the test seam and validating the schema round-trip makes sense for this PR's scope. I'll withdraw my concern.
Would you like me to open a follow-up GitHub issue to track the background-TTS delivery path (WebChannelEvent::PttReplySpeech + socket event + frontend listener)?
✏️ Learnings added
Learnt from: CodeGhost21
Repo: tinyhumansai/openhuman PR: 3349
File: src/openhuman/channels/providers/web.rs:1233-1258
Timestamp: 2026-06-05T20:57:51.915Z
Learning: In `src/openhuman/channels/providers/web.rs`, the `synthesize_reply(...)` call inside `run_chat_task` (conditioned on `metadata.speak_reply == Some(true)`) intentionally drops its `ReplySpeechResult`. The active-window TTS path is handled entirely on the renderer side via the `useHumanMascot` hook: on `chat_done`, if `speakRef.current` is `true`, the hook calls `openhuman.voice_reply_synthesize` RPC directly for viseme-synced playback. The core-side `synthesize_reply` call is a placeholder for the future background TTS path (user in another app, mascot not rendered), which requires a not-yet-existing `WebChannelEvent::PttReplySpeech` variant, socket event name, and frontend listener. Do NOT flag the dropped `Ok(_)` result as a bug — it is a deliberate PR-scoped placeholder.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: senamakel
Repo: tinyhumansai/openhuman PR: 3173
File: app/src/pages/ios/MascotScreen.tsx:208-208
Timestamp: 2026-06-02T02:51:02.003Z
Learning: In `app/src/pages/ios/MascotScreen.tsx`, iOS TTS is handled by the native PTT plugin `speak()` (from `tauri-plugin-ptt-api`), which bypasses `useHumanMascot`'s built-in ElevenLabs TTS pipeline entirely. As a result, `visemeCode` from `useHumanMascot({ listening: pttActive })` only animates during streamed text deltas, not during actual audio playback. Plumbing native speech timing/visemes into the hook requires changes to the Swift PTT plugin to emit viseme events — a separate, larger effort deferred to a future PR.
Learnt from: senamakel
Repo: tinyhumansai/openhuman PR: 3317
File: src/openhuman/agent/harness/engine/core.rs:120-160
Timestamp: 2026-06-03T20:45:28.519Z
Learning: In `src/openhuman/channels/providers/web.rs`, `start_chat`'s `enforce_prompt_input(...)` gate (around line 502) runs before the `queue_mode` branching (around line 585). Steer/collect/followup messages pushed into the `RunQueue` are already prompt-enforcement-validated before they enter the queue. The injection/drain path in `src/openhuman/agent/harness/engine/core.rs` appends pre-validated payloads to history — it is NOT a prompt-enforcement bypass. Do not flag the engine run-queue drain block as bypassing `enforce_prompt_input`.
Learnt from: zahica1234
Repo: tinyhumansai/openhuman PR: 0
File: :0-0
Timestamp: 2026-05-29T13:56:33.003Z
Learning: In `app/src/pages/Conversations.tsx` (tinyhumansai/openhuman), the silence-timer rearm effect must scope its reactive dependencies to `activeThreadId` (not `pendingSendingThreadId`). `pendingSendingThreadId` is released the moment the backend accepts the send, which causes deps to go `undefined` mid-turn and breaks the rearm logic for long-running sub-agent turns. `activeThreadId` is set on send and cleared only on done/error/timeout, so it correctly tracks the in-flight turn for the timer's entire lifetime.
Learnt from: senamakel
Repo: tinyhumansai/openhuman PR: 1355
File: app/src-tauri/src/meet_audio/listen_capture.rs:117-129
Timestamp: 2026-05-08T03:01:20.719Z
Learning: In `app/src-tauri/src/meet_audio/listen_capture.rs`, the CEF-audio capture path is legacy/inactive code. It is stored as the `_legacy_listen` field on `MeetAudioSession` in `app/src-tauri/src/meet_audio/mod.rs`. The active listen path was pivoted to Meet's built-in captions (captions bridge) in commit 996bed70. Review issues specific to listen_capture.rs chunking, backpressure, or resampler behavior should be deferred unless the CEF-audio path is explicitly re-enabled.
Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 0
File: :0-0
Timestamp: 2026-06-04T15:15:38.156Z
Learning: In `tinyhumansai/openhuman` PR `#3324`, the `run_workflow` spawn-backstop counter correctness (ensuring the atomic backstop counter is decremented on every code path, including early-return errors before the background task is spawned) was intentionally deferred to a follow-up PR per senamakel. Future reviews should check that `SPAWN_BACKSTOP_COUNT` is decremented in all error/early-exit branches in `src/openhuman/agent/tools/run_workflow.rs`.
Learnt from: zahica1234
Repo: tinyhumansai/openhuman PR: 0
File: :0-0
Timestamp: 2026-05-29T13:56:33.003Z
Learning: In `app/src/pages/Conversations.tsx` (tinyhumansai/openhuman), `prevInferenceStatusRef` must be explicitly reset to `undefined` in three places to avoid stale-state mis-detection on a new send: (1) the 120 s timeout-fire block, (2) the `chatSend` catch handler, and (3) at send initiation immediately before `armSilenceTimer`. Without this, a new send can read the prior turn's last status and mis-classify the first `defined → undefined` transition as a chat_done, clearing the freshly-armed timer.
Learnt from: oxoxDev
Repo: tinyhumansai/openhuman PR: 3016
File: src/openhuman/tools/impl/presentation/engine.rs:78-130
Timestamp: 2026-05-30T19:57:11.205Z
Learning: In `src/openhuman/tools/impl/presentation/engine.rs`, the `generate()` function intentionally uses `tokio::time::timeout` around `tokio::task::spawn_blocking` for PPTX generation. The blocking thread is not cancelled on timeout (tokio cannot cancel spawn_blocking mid-run), but this is an accepted trade-off because the `ppt-rs` work is CPU-bound + pure (no I/O, no external state, no shared mutable state), and is bounded in practice by the 64-slide cap (typically <100ms). The "agent-visible deadline" — the call returning to the orchestrator within bound — is the user-facing contract. Do not flag this as a cancellation bug in future reviews of this file.
Learnt from: graycyrus
Repo: tinyhumansai/openhuman PR: 283
File: app/test/e2e/specs/skill-execution-flow.spec.ts:67-114
Timestamp: 2026-04-03T06:28:58.632Z
Learning: In `app/test/e2e/specs/skill-execution-flow.spec.ts`, the model→agent→tool→conversation E2E path (model-triggered tool_calls) is intentionally deferred to a future PR. It is tracked via an `it.skip` placeholder at the end of the spec. The blocker is that the mock backend does not yet support returning structured `tool_calls` in chat completion responses. The current spec only covers the deterministic skill runtime RPC surface: `skills_start` → `skills_list_tools` → `skills_call_tool` → `skills_stop`.
Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 1171
File: src/openhuman/memory/tree/tree_source/summariser/llm.rs:118-125
Timestamp: 2026-05-04T06:47:39.980Z
Learning: In `src/openhuman/memory/tree/tree_source/summariser/llm.rs` (and the sibling files `score/extract/llm.rs` and `score/embed/ollama.rs`), the reqwest Client is intentionally built with `connect_timeout` only — no body-read timeout. This is because: (1) Ollama runs locally on CPU-only mode (AMD-Vulkan partial-offload is forced off via `num_gpu=0`), where a single `/api/chat` call can take 60-180 seconds; (2) reqwest/hyper mid-stream future cancellation triggered a multi-GB heap retention storm in internal state machines (empirically verified with OOM crashes); (3) the 3× retry-with-backoff loop would re-run the same slow prompt, multiplying load until the sidecar OOM'd. A stuck local Ollama blocks at most one worker behind `Semaphore::new(1)`; the queue-backed async seal cascade keeps other workers running. The `InertSummariser` fallback is scoped to transport errors (connect refused), not "Ollama is slow but alive". Do NOT flag the absence of a body-read timeout or suggest adding `tokio::time::timeout` here — it would reproduce the exact cancellation-path heap leak that was the root cause.
Learnt from: oxoxDev
Repo: tinyhumansai/openhuman PR: 2149
File: src/openhuman/tools/impl/system/pushover.rs:0-0
Timestamp: 2026-05-19T06:36:19.217Z
Learning: In `src/openhuman/tools/impl/system/pushover.rs`, `PushoverTool` intentionally does NOT override `external_effect()` (defaults to `false`). The `ApprovalGate` runs before `execute()`, so marking `external_effect=true` would park a pending approval for calls that `security.can_act()` / `security.record_action()` inside `execute()` would immediately reject — a dead-end approval flow. The follow-up (`#1339-v3`) is to thread `SecurityPolicy` into the gate pre-check so `external_effect` can be flipped to `true` without generating ghost approvals. Do not flag the missing `external_effect=true` on `PushoverTool` as a bug.
Learnt from: senamakel
Repo: tinyhumansai/openhuman PR: 2112
File: app/src-tauri/permissions/allow-core-process.toml:103-122
Timestamp: 2026-05-20T02:56:09.546Z
Learning: In the `tinyhumansai/openhuman` repository, splitting `logs_folder_path`/`reveal_logs_folder` and `meet_call_open_window`/`meet_call_close_window` out of the `allow-core-process` permission bundle (`app/src-tauri/permissions/allow-core-process.toml`) into dedicated narrower permission identifiers (e.g., `allow-logs-access` and `allow-meet-window-control`) is a known follow-up task (deferred from PR `#2112`). It requires: (1) creating new permission TOML files, (2) updating capability grants in `app/src-tauri/capabilities/` to bind them only to the UI surfaces that need them, and (3) verifying no `allow-core-process` consumer regresses.
Learnt from: senamakel
Repo: tinyhumansai/openhuman PR: 1350
File: app/src-tauri/src/lib.rs:1261-1291
Timestamp: 2026-05-08T00:42:31.195Z
Learning: In tinyhumansai/openhuman, the Chromium flags `--use-fake-device-for-media-stream`, `--use-fake-ui-for-media-stream`, and `--use-file-for-fake-video-capture` are intentionally applied at the CEF process level (in `app/src-tauri/src/lib.rs` inside the `run()` function's `command_line_args` builder). This is by design: the mascot Y4M is meant to be the agent's camera identity in *every* CEF webview (Meet, Slack Huddles, Discord voice, etc.), not scoped to the Meet window alone. The CDP `audioCapture`/`videoCapture` permission grants in `cdp/session.rs` control *whether* media is requested per-webview; the fake-device flag only changes the underlying device factory. The long-term upgrade path is a macOS CoreMediaIO Extension (real OS-level virtual camera) tracked as a follow-up PR. Do not flag these process-level fake-camera flags as a scope issue.
Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 416
File: src/openhuman/memory/relex.rs:441-464
Timestamp: 2026-04-07T15:49:51.275Z
Learning: When using the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, don’t require individual execution-provider feature flags (e.g., `directml`, `coreml`, `cuda`) alongside `load-dynamic` to get EP registration code. The `ort` crate already compiles EP registration via `#[cfg(any(feature = "load-dynamic", feature = "<ep_name>"))]` guards, and adding per-EP feature flags can pull in static-linking dependencies that conflict with the dynamic loading approach. At runtime, EP availability is determined by what the dynamically loaded ONNX Runtime library (`onnxruntime.dll`/`.so`/`.dylib`) supports; ort docs indicate providers like `directml`/`xnnpack`/`coreml` are available in builds when the platform supports them.
Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 416
File: src/openhuman/memory/relex.rs:441-464
Timestamp: 2026-04-07T15:49:51.275Z
Learning: When integrating the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, do NOT also require/enable individual provider EP Cargo features like `directml`, `coreml`, or `cuda`. In `ort` v2.x, EP registration for providers (e.g., DirectML, CoreML, CUDA, etc.) is already compiled in under source-level `#[cfg(any(feature = "load-dynamic", feature = "<provider>"))]` guards, such as in `ep/directml.rs`. Adding provider feature flags alongside `load-dynamic` can pull in static-linking dependencies that conflict with the dynamic-loading approach. Provider availability should be treated as runtime-determined by what the loaded `onnxruntime` library (`onnxruntime.dll`/`libonnxruntime.so`/`libonnxruntime.dylib`) actually supports.
Learnt from: oxoxDev
Repo: tinyhumansai/openhuman PR: 571
File: src/openhuman/local_ai/service/whisper_engine.rs:69-80
Timestamp: 2026-04-14T19:59:04.826Z
Learning: When reviewing Rust code in this repo that uses the upstream `whisper-rs` crate (v0.16.0), do not report `WhisperContextParameters::use_gpu(...)` or `WhisperContextParameters::flash_attn(...)` as missing/invalid APIs. These builder-style methods exist upstream and return `&mut Self`; they are not limited to `WhisperVadContextParams`.
Learnt from: graycyrus
Repo: tinyhumansai/openhuman PR: 1078
File: src/openhuman/agent/agents/welcome/prompt.rs:24-24
Timestamp: 2026-05-01T13:41:00.958Z
Learning: For Rust code under `src/openhuman/**/*.rs`, use `snake_case` for local variables (not `camelCase`). If a local variable name is written in `camelCase`, treat it as a style/lint issue because it will trigger Rust’s `non_snake_case` warning (and related clippy linting, if enabled). Avoid suggesting `camelCase` for any Rust local variable names in this repository.
Learnt from: senamakel
Repo: tinyhumansai/openhuman PR: 1173
File: tests/agent_memory_loader_public.rs:88-88
Timestamp: 2026-05-04T06:50:47.877Z
Learning: In this repository, the general camelCase naming guideline should not be applied to Rust source files. For all .rs files, Rust function (and related) names should use snake_case, and snake_case Rust function names should not be flagged—even for async test functions annotated with attributes like #[tokio::test]. This is consistent with Rust’s non_snake_case lint behavior.
Resolves 18 conflicts from upstream commits c3b9b2d (active-run steering), db0307d (ArtifactPending), 3a7180d (explicit turn origin + fail-closed approval gate), ed3c6d7 (backend_meet), 71e04ea + 307c8e2 (artifact panel + ArtifactCard). PTT-side changes preserved: ChatRequestMetadata threading through channel_web_chat -> start_chat -> spawn_progress_bridge, VoiceEvent + DomainEvent::Voice, reply_speech invocation on speak_reply, test seam short-circuit + E2E test, frontend chatService forwarding, pttSlice + persistence. start_chat signature: queue_mode comes BEFORE metadata in the merged trailing args. dispatch_followups call site also updated to pass ChatRequestMetadata::default(). channel_web_chat schema gains all five new fields (speak_reply, source, session_id, queue_mode). WebChatParams gains queue_mode alongside the PTT trio. Verified: cargo check (lib + tests + tauri shell) clean, pnpm compile clean, json_rpc_channel_web_chat_with_speak_reply_invokes_reply_speech passes, voice::bus::tests passes, full json_rpc_e2e: 78 pass + 1 pre-existing port-conflict flake unrelated to merge.
# Conflicts: # app/src/App.tsx # app/src/store/index.ts # src/core/event_bus/events.rs # src/openhuman/channels/providers/web.rs # tests/json_rpc_e2e.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 515-526: The current catch reverts local settings even if
openhumanUpdateVoiceServerSettings succeeded but syncNotchVisibility failed,
causing UI/server mismatch; change the flow in the toggle handler (the code
calling openhumanUpdateVoiceServerSettings, syncNotchVisibility, and
setSettings) to handle each RPC independently: first call
openhumanUpdateVoiceServerSettings and only revert setSettings if that call
fails; if syncNotchVisibility fails after a successful server update, do not
revert settings — instead log the error, attempt a retry or call a dedicated
recovery function to update the notch state (or schedule a background sync), so
setSettings reflects the persisted server value while still surfacing sync
failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b7ba2c00-6a71-4487-b0d0-6d053a1f2f54
📒 Files selected for processing (25)
app/src-tauri/src/lib.rsapp/src/App.tsxapp/src/AppRoutes.tsxapp/src/components/settings/panels/VoicePanel.tsxapp/src/lib/i18n/ar.tsapp/src/lib/i18n/bn.tsapp/src/lib/i18n/de.tsapp/src/lib/i18n/en.tsapp/src/lib/i18n/es.tsapp/src/lib/i18n/fr.tsapp/src/lib/i18n/hi.tsapp/src/lib/i18n/id.tsapp/src/lib/i18n/it.tsapp/src/lib/i18n/ko.tsapp/src/lib/i18n/pl.tsapp/src/lib/i18n/pt.tsapp/src/lib/i18n/ru.tsapp/src/lib/i18n/zh-CN.tsapp/src/store/index.tsapp/src/test/test-utils.tsxsrc/core/event_bus/events.rssrc/core/socketio.rssrc/openhuman/about_app/catalog_data.rssrc/openhuman/about_app/catalog_tests.rssrc/openhuman/agent/task_dispatcher.rs
💤 Files with no reviewable changes (11)
- src/openhuman/agent/task_dispatcher.rs
- app/src/test/test-utils.tsx
- src/core/socketio.rs
- src/openhuman/about_app/catalog_data.rs
- app/src/lib/i18n/zh-CN.ts
- app/src/lib/i18n/pt.ts
- app/src/lib/i18n/ru.ts
- src/core/event_bus/events.rs
- app/src/store/index.ts
- src/openhuman/about_app/catalog_tests.rs
- app/src/lib/i18n/pl.ts
✅ Files skipped from review due to trivial changes (7)
- app/src/lib/i18n/en.ts
- app/src/lib/i18n/de.ts
- app/src/lib/i18n/ko.ts
- app/src/lib/i18n/id.ts
- app/src/lib/i18n/it.ts
- app/src/lib/i18n/ar.ts
- app/src/lib/i18n/bn.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- app/src/AppRoutes.tsx
- app/src/lib/i18n/hi.ts
- app/src/lib/i18n/fr.ts
- app/src-tauri/src/lib.rs
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 515-526: The current catch reverts local settings even if
openhumanUpdateVoiceServerSettings succeeded but syncNotchVisibility failed,
causing UI/server mismatch; change the flow in the toggle handler (the code
calling openhumanUpdateVoiceServerSettings, syncNotchVisibility, and
setSettings) to handle each RPC independently: first call
openhumanUpdateVoiceServerSettings and only revert setSettings if that call
fails; if syncNotchVisibility fails after a successful server update, do not
revert settings — instead log the error, attempt a retry or call a dedicated
recovery function to update the notch state (or schedule a background sync), so
setSettings reflects the persisted server value while still surfacing sync
failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b7ba2c00-6a71-4487-b0d0-6d053a1f2f54
📒 Files selected for processing (25)
app/src-tauri/src/lib.rsapp/src/App.tsxapp/src/AppRoutes.tsxapp/src/components/settings/panels/VoicePanel.tsxapp/src/lib/i18n/ar.tsapp/src/lib/i18n/bn.tsapp/src/lib/i18n/de.tsapp/src/lib/i18n/en.tsapp/src/lib/i18n/es.tsapp/src/lib/i18n/fr.tsapp/src/lib/i18n/hi.tsapp/src/lib/i18n/id.tsapp/src/lib/i18n/it.tsapp/src/lib/i18n/ko.tsapp/src/lib/i18n/pl.tsapp/src/lib/i18n/pt.tsapp/src/lib/i18n/ru.tsapp/src/lib/i18n/zh-CN.tsapp/src/store/index.tsapp/src/test/test-utils.tsxsrc/core/event_bus/events.rssrc/core/socketio.rssrc/openhuman/about_app/catalog_data.rssrc/openhuman/about_app/catalog_tests.rssrc/openhuman/agent/task_dispatcher.rs
💤 Files with no reviewable changes (11)
- src/openhuman/agent/task_dispatcher.rs
- app/src/test/test-utils.tsx
- src/core/socketio.rs
- src/openhuman/about_app/catalog_data.rs
- app/src/lib/i18n/zh-CN.ts
- app/src/lib/i18n/pt.ts
- app/src/lib/i18n/ru.ts
- src/core/event_bus/events.rs
- app/src/store/index.ts
- src/openhuman/about_app/catalog_tests.rs
- app/src/lib/i18n/pl.ts
✅ Files skipped from review due to trivial changes (7)
- app/src/lib/i18n/en.ts
- app/src/lib/i18n/de.ts
- app/src/lib/i18n/ko.ts
- app/src/lib/i18n/id.ts
- app/src/lib/i18n/it.ts
- app/src/lib/i18n/ar.ts
- app/src/lib/i18n/bn.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- app/src/AppRoutes.tsx
- app/src/lib/i18n/hi.ts
- app/src/lib/i18n/fr.ts
- app/src-tauri/src/lib.rs
🛑 Comments failed to post (1)
app/src/components/settings/panels/VoicePanel.tsx (1)
515-526:
⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPotential state inconsistency between UI and server.
If
openhumanUpdateVoiceServerSettingssucceeds butsyncNotchVisibilityfails, the catch block reverts the localsettingsstate, but the server retains the updated value. Users would see the toggle as disabled when the backend actually has it enabled.Consider handling each RPC failure independently:
🔀 Suggested approach
try { await openhumanUpdateVoiceServerSettings({ always_on_enabled: next }); + setSavedSettings(current => + current ? { ...current, always_on_enabled: next } : current + ); // The notch pill is the always-on listening HUD: show it // when listening is enabled, drop it when disabled. - await syncNotchVisibility(next); + try { + await syncNotchVisibility(next); + } catch (notchErr) { + console.error('[VoicePanel] notch sync failed (non-critical)', notchErr); + // Keep toggle in sync with server; notch visibility is cosmetic + } } catch (err) { // Revert on failure so the UI reflects the persisted value. setSettings(current => current ? { ...current, always_on_enabled: !next } : current ); console.error('[VoicePanel] failed to toggle always-on', err); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 515 - 526, The current catch reverts local settings even if openhumanUpdateVoiceServerSettings succeeded but syncNotchVisibility failed, causing UI/server mismatch; change the flow in the toggle handler (the code calling openhumanUpdateVoiceServerSettings, syncNotchVisibility, and setSettings) to handle each RPC independently: first call openhumanUpdateVoiceServerSettings and only revert setSettings if that call fails; if syncNotchVisibility fails after a successful server update, do not revert settings — instead log the error, attempt a retry or call a dedicated recovery function to update the notch state (or schedule a background sync), so setSettings reflects the persisted server value while still surfacing sync failures.
…addresses @coderabbitai on PttHotkeyManager.tsx:110) Adds .catch() handlers to the ptt://start and ptt://stop listener callbacks so failures from service.onStart/onStop are logged rather than becoming unhandled rejections. Routes logging through the debug package (app:ptt:manager namespace) instead of console.warn, consistent with the rest of the file.
…s all 14 locales (addresses @coderabbitai on es.ts:1586, ru.ts:1564) The PTT description incorrectly said OpenHuman always speaks replies after release. The speakReplies toggle makes TTS optional, so the copy now reads "if 'Speak agent replies' is on, OpenHuman speaks the reply" in all 14 locales. en.ts updated as source of truth; all 13 non-English locales carry real translations.
…rough (addresses @coderabbitai on PttSettingsPanel.tsx:76,125) - Normalize e.key === ' ' to 'Space' before building the shortcut string so the saved binding reads "Space" or "Ctrl+Space" rather than a blank character that renders invisibly in the input. - Guard Tab / Shift+Tab at the top of handleShortcutKeyDown so they are not captured as bindings and keyboard focus navigation within the settings panel continues to work normally.
… session finalization (addresses @coderabbitai on pttService.ts:145) resolveActiveThreadId, createNewVoiceThread, and sendMessage can all throw. Previously a failure would leave the state machine in a clean state but with no user feedback. Now each failure path logs a warn, plays the error chime, and returns early so the session completes visibly to the user rather than silently.
…tauri-shell.md (addresses @coderabbitai on spec:296, spec:144) - spec: fix "12 non-English locale files" → "13" (ar, bn, de, es, fr, hi, id, it, ko, pl, pt, ru, zh-CN = 13 locales). - tauri-shell.md: add "Push-to-talk (PTT) hotkey + overlay" section documenting register_ptt_hotkey, unregister_ptt_hotkey, and show_ptt_overlay IPC commands including their signatures, event flow, and conflict detection behavior.
sanil-23
left a comment
There was a problem hiding this comment.
Code Review: Global push-to-talk hotkey (#3090)
Recommendation: 🔴 Request changes (posted as a comment).
Impressive, well-structured PR — the Rust hotkey state machine, the dependency-injected pttService, and the unit tests on the pure units (slice, state machine, schema) are high quality. But it is not mergeable as-is for three independent reasons, plus one substantive functional gap.
Blockers
- All 9 CodeRabbit review threads are still unresolved, and I verified every one is still present in current HEAD (
b4796d0). TheCHANGES_REQUESTEDreview stands; none of the findings were applied. - CI Coverage Gate fails — 46% on changed lines vs. the required ≥80% (hard merge gate per CLAUDE.md).
- Merge conflicts — GitHub reports
mergeable: CONFLICTING/DIRTY. Needs a rebase onmain.
Outstanding CodeRabbit concerns (all unresolved, all still valid)
| # | Location | Severity | Still present? |
|---|---|---|---|
| 1 | PttHotkeyManager.tsx:110 |
Major | ✅ void service.onStart(...)/onStop(...) still have no .catch — unhandled-rejection risk, compounded by #6 |
| 2 | es.ts pttSettings.description |
Minor | ✅ Still "dice la respuesta en voz alta" (unconditional; speaking is optional) |
| 3 | ru.ts pttSettings.description |
Minor | ✅ Still "озвучит ответ" (unconditional). Note: en.ts has the same unconditional "speaks the reply back" wording and should also be fixed |
| 4 | PttSettingsPanel.tsx:76 |
Major | ✅ eventToShortcut never normalizes Space → blank binding like Ctrl+ |
| 5 | PttSettingsPanel.tsx:125 |
Major | ✅ handleShortcutKeyDown calls preventDefault() unconditionally — Tab/Shift+Tab can't escape the field (a11y / keyboard-nav blocker) |
| 6 | pttService.ts:145 |
Major | ✅ resolveActiveThreadId/createNewVoiceThread/sendMessage not wrapped in try/catch; a throw escapes finaliseSession → escapes onStop → unhandled rejection. 'ppt') — don't apply verbatim |
| 7 | spec doc :144 |
Major | ✅ gitbooks/developing/architecture/tauri-shell.md untouched; CLAUDE.md requires the 3 new IPC commands (register_ptt_hotkey, unregister_ptt_hotkey, show_ptt_overlay) to be documented there |
| 8 | spec doc :296 |
Minor | ✅ Says "12 non-English locale files" but lists 13 |
| 9 | web.rs:1254 |
Major | ✅ See below |
#9 — speak_reply is a no-op for actual playback
In run_chat_task, when speak_reply is set the code calls voice::reply_speech::synthesize_reply(...) and only log::debug!/warn! on the result — the returned ReplySpeechResult { audio_base64, visemes, ... } is dropped. synthesize_reply emits no socket/bus event (other callers consume the returned value directly), so there is no path carrying the audio to the renderer. Net effect: enabling "Speak agent replies" makes a paid backend TTS call on every PTT turn and discards the audio. The chatService.ts doc comment ("the core will synthesize the agent reply via TTS and stream audio back") describes behavior that doesn't exist. Real feature gap — please wire the audio to the renderer or scope speak_reply out of this PR.
Additional findings (beyond CodeRabbit)
Should-fix
- Coverage: renderer glue is entirely untested —
PttHotkeyManager.tsx,pttAudio.ts,pttChimes.ts,pttThread.ts,pttTranscribe.ts,usePttHotkey.ts,utils/tauriCommands/ptt.tsall at 0%. That's exactly where bugs #1/#4/#5/#6 live; tests here would catch several of them.
Non-blocking
- Dead error variants:
PttError::UnsupportedOnWaylandandPttError::RegistrationFailedare defined but never constructed. The Wayland message is localized/advertised inen.tsbut the core never returns it — on Wayland the user gets the raw plugin error. Either wire up detection or drop the variant + copy. held_mshardcoded0(andfinalized_by_watchdog: false) in thePttTranscriptCommittedpublish — documentedTODO, but the event is currently low-value for subscribers.- Overlay window loads the full
index.html#/ptt-overlay(a second complete React/CEF webview) to render a 160×56 dot. Functional and lazily created/destroyed, but heavy — worth a follow-up. - Stale TODO in
PttOverlayPage.tsxsays i18n keys are "added in T13" butpttOverlay.listening/idlealready exist across all locales.
What's good
expand_ptt_shortcutsrejecting empty/modifier-only bindings with thorough tests and per-OSCmdOrCtrlexpansion.- Solid hotkey lifecycle: CAS-guarded
is_heldto drop OS key-repeat, register/unregister with rollback on failure, and bidirectional conflict detection with the dictation hotkey. isHeldexcluded from redux-persist + reset on mount — thoughtful crash-safety.pttServicepreemption/watchdog/stale-session logic is carefully reasoned and well-tested.- Env-gated
test_seaminreply_speech.rsis a clean way to make the path testable fromtests/. ChatRequestMetadatathreaded through allstart_chatcall sites with::default()— no behavior change for existing callers;#[non_exhaustive]VoiceEventand capability-catalog entry follow conventions.
Bottom line
Strong foundation. To get to merge: resolve the 9 CodeRabbit threads (especially #9), bring diff coverage to ≥80% by testing the renderer glue, update tauri-shell.md, fix the unconditional TTS copy (en/es/ru), and rebase to clear conflicts.
Review assisted by Claude Code.
Prettier normalized quote styles in fr.ts, it.ts, ko.ts after i18n content edits introduced double-quote strings alongside single-quote strings.
Summary
tauri-plugin-global-shortcutuniform code path (deliberately different from dictation's OS-forked rdev/Tauri-plugin path)./ptt-overlay.speak_reply/source/session_idadditive optional fields onchannel.web_chat; backwards-compatible — older clients are unaffected.ptt.*andpttOverlay.*i18n keys with real translations across all 13 locales; newvoice.pttentry in the about-app capability catalog.Problem
OpenHuman had voice input on iOS (
mobile.push_to_talk) and dictation on macOS but nothing global on the desktop — users had to focus the chat window before talking. Issue #3090 asks for a Clicky-style hold-to-talk shortcut that works from any window without stealing focus.The dictation path is OS-forked (rdev on macOS, tauri plugin on Windows/Linux) and inserts text into the focused field. Reusing it for PTT would have forced a focus switch and inherited two code paths to maintain. PTT instead routes through a single
tauri-plugin-global-shortcutregistration and dispatches the transcript directly into the active chat thread viachannel.web_chat, so behaviour is uniform across macOS, Windows, and Linux/X11.Solution
app/src-tauri/src/ptt_hotkeys.rs) — parses the configured shortcut string (Ctrl+Shift+Space,Cmd+Alt+M, …), validates modifier-only and empty tokens, fails registration when the binding collides with the dictation hotkey, and CAS-guards press/release so a stuck OS modifier can't drive the state machine into both "held" and "released" at once.app/src-tauri/src/ptt_overlay.rs) — lazy borderless always-on-top window; created on first register, destroyed on unregister so we don't keep a hidden window alive when PTT is off. macOS-gatedaccept_first_mouseandtag-overlay-errordebug logging tighten reviewer findings.src/openhuman/voice/bus.rs) — publishesDomainEvent::Voice::PttTranscriptCommittedon successful commit; re-exported asvoice::publishso the channel layer doesn't reach into the event bus directly.src/openhuman/channels/providers/web.rs) —ChatRequestMetadatacarries the new optional fields;speak_reply=truetriggersreply_speech::invokeand publishes the PTT-committed event for downstream subscribers (analytics, mascot).app/src/services/pttService.ts) — state machine (idle→arming→held→committing), 10s watchdog timer for swallowed release events, preempt-race CAS guard so a fresh start can't orphan an in-flight stop, and a Tauri-only test seam.pttSlicestores the binding + enabled toggle;PttSettingsPanellets the user pick a shortcut, surfaces conflict-with-dictation errors with localized messages, and explains overlay/exclusive-fullscreen behaviour inline.voice.pttentry insrc/openhuman/about_app/catalog_data.rs(Conversation category,DERIVED_TO_BACKENDprivacy because audio routes through the configured STT provider, same shape asconversation.send_voice).Submission Checklist
pttSlice,pttService,chatService.speak_replyforwarding,PttOverlayPage,PttSettingsPanel; Tauri shell tests forptt_hotkeysparser/conflict/state; newcapability_list_includes_voice_ptttest pinning the catalog entry.diff-cover) meet the gate enforced by.github/workflows/pr-ci.yml. New code lives in dedicated PTT modules with co-located tests;pnpm test:coveragewas not re-run locally on this branch tip (cold full-suite cost) — the CI gate will compute and enforce the actual percentage, and visual review of the new modules shows the touched lines are reached by the listed Vitest + cargo tests.N/A: matrix tracks higher-level feature rows and the new PTT feature sits inside the existing Voice / Conversation rows that already exist.## Related— see theClosesline below.docs/RELEASE-MANUAL-SMOKE.md) —N/A: PTT is opt-in via Settings → Voice and does not affect the existing release-cut smoke flows; reviewers are asked to bind a hotkey and confirm overlay + transcript dispatch.Closes #NNNin the## Relatedsection — seeCloses #3090below.Impact
tauri-plugin-global-shortcut. The iOS client and headless deployments are unaffected (the hotkey manager is not mounted on those targets).channel.web_chataccepts three new optional metadata fields; clients that don't send them get the existing behaviour. Tests pin the schema so a future tightening can't silently break older callers.conversation.send_voice. Capability catalog entry usesDERIVED_TO_BACKENDprivacy so the Privacy surface reflects this. Microphone permission is requested on first use.Related
app/test/e2e/specs/ptt-flow.spec.tslands here but is not executed in CI yet — cold-build cost — and will be wired into the e2e matrix as a follow-up).AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
feat/global-ptt-3090feat(about_app)+style(ptt)final-sweep commits)Validation Run
pnpm --filter openhuman-app format:check— clean (cargo fmt + prettier both pass after the final-sweep commit)pnpm typecheck— clean (no TS errors)pnpm debug rust capability_list_includes_voice_ptt✓ ; Vitest unit suite — all 4688 tests green on the final re-run (one earlier run flaked on a pre-existingConversations.rendertest unrelated to PTT, did not reproduce);pnpm i18n:checkandpnpm i18n:english:checkboth no-worse-than-main (main: 1312 unexpected English; this branch: 1285 — branch improves the i18n posture).cargo fmt --checkclean,cargo check --manifest-path Cargo.tomlclean.cargo fmt --manifest-path app/src-tauri/Cargo.toml --checkclean,pnpm rust:checkclean.Validation Blocked
command:pnpm test:coverage(full coverage matrix)error:not run on the final commit — cold-build cost exceeded the local time budget for this PR; the CIpr-ci.ymljob will compute and enforce the actual diff-coverage gate.impact:low — the new files are dedicated PTT modules with co-located unit tests (pttSlice,pttService,chatService.speak_reply,PttOverlayPage,PttSettingsPanel,ptt_hotkeys, voice/bus, channel-web schema, capability catalog), and the CI gate will block merge if diff-coverage falls below 80%.Behavior Changes
speak_replyplays the agent's TTS reply.Parity Contract
mobile.push_to_talkand the macOS/Windows dictation paths are untouched. The new desktop PTT path is uniform across platforms and does not collide with dictation (registration fails fast with a localized error when the bindings overlap).ChatRequestMetadatadefaults preserve the pre-PTT request shape; older callers that do not sendspeak_reply/source/session_idsee the exact same channel behaviour as before.Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Bug Fixes / UX
Tests
Localization
Documentation