Skip to content

feat(file_state): guard cross-agent file edits from stale sibling writes#3316

Merged
senamakel merged 1 commit into
tinyhumansai:mainfrom
senamakel:issue/3253-guard-cross-agent-file-edits-from-stale
Jun 3, 2026
Merged

feat(file_state): guard cross-agent file edits from stale sibling writes#3316
senamakel merged 1 commit into
tinyhumansai:mainfrom
senamakel:issue/3253-guard-cross-agent-file-edits-from-stale

Conversation

@senamakel
Copy link
Copy Markdown
Member

@senamakel senamakel commented Jun 3, 2026

Summary

  • Introduce a process-wide FileStateCoordinator that tracks per-agent read stamps and per-path write stamps across parallel subagents.
  • Write tools (file_write, edit, apply_patch) detect and reject writes based on stale sibling edits or partial reads.
  • edit and apply_patch acquire per-path async locks to serialize read-modify-write sections.
  • spawn_parallel_agents annotates results with stale_parent_reads when child writes invalidate the parent's view.
  • Opt-out via OPENHUMAN_FILE_STATE_GUARD=0 (or false/off/no).

Problem

Parallel subagents and worker threads share a workspace. Without coordination, one worker can read a file, a sibling can edit it, and the first worker later writes based on stale content. Path ownership hints reduce this risk but do not enforce it.

Solution

A new file_state domain module (src/openhuman/file_state/) implements a process-global singleton coordinator:

  • Read stamps: file_read records (agent_id, resolved_path, mtime, partial) after every successful read.
  • Write stamps: file_write, edit, apply_patch record (agent_id, resolved_path) after every successful write.
  • Staleness check: Before writing, tools check if another agent wrote the file after this agent's last read → model-facing error requiring re-read.
  • Partial-read protection: Writing after a partial (paginated) read is rejected → model-facing error requiring full re-read.
  • Path locking: edit and apply_patch acquire a per-path tokio::Mutex for their read-modify-write sections.
  • Parent reminder: spawn_parallel_agents checks if child writes touched files the parent had previously read and annotates the result JSON.
  • Agent identity: A new task-local FILE_STATE_AGENT_ID (following the sandbox_context pattern) carries the currently-executing agent's identity so file tools can attribute reads/writes without widening the Tool trait.
  • Opt-out: OPENHUMAN_FILE_STATE_GUARD=0 disables the entire coordinator (zero overhead).

Design tradeoffs:

  • Uses parking_lot::RwLock<HashMap> rather than adding a dashmap dependency.
  • Agent identity via task-local (not tool-trait argument) to avoid touching every tool implementation and caller.
  • The coordinator is in-memory only — no persistence across process restarts (appropriate since agent sessions are per-process).

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case)
  • Diff coverage ≥ 80% — CI will validate; 11 unit tests cover all coordinator branches + 2 integration test paths through existing tool tests.
  • Coverage matrix updated — N/A: new internal domain, no user-facing feature row
  • All affected feature IDs from the matrix are listed — N/A
  • No new external network dependencies introduced
  • Manual smoke checklist updated — N/A: internal guard, no UI surface
  • Linked issue closed via Closes #NNN below

Impact

  • Runtime: Desktop (all platforms). No UI changes. The guard is process-internal with negligible overhead (in-memory HashMap lookups).
  • Security: Prevents a class of file-corruption bugs in parallel subagent execution.
  • Compatibility: Fully backwards-compatible. The guard is transparent when only one agent operates (no staleness possible). Opt-out via env var for debugging.

Related

Closes #3253


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: issue/3253-guard-cross-agent-file-edits-from-stale
  • Commit SHA: see branch HEAD

Validation Run

  • pnpm --filter openhuman-app format:check — passed (pre-push hook)
  • pnpm typecheck — N/A (no TS changes)
  • Focused tests: cargo test -- file_state (11 passed), cargo test -- file_read (all passed), cargo test -- spawn_parallel (9 passed)
  • Rust fmt/check: cargo fmt + cargo check passed
  • Tauri fmt/check: cargo check --manifest-path app/src-tauri/Cargo.toml passed

Validation Blocked

  • N/A

Behavior Changes

  • Intended behavior change: parallel subagent file writes are now rejected when based on stale reads from a sibling agent.
  • User-visible effect: agents see a model-facing error message requiring a re-read; no UI change.

Parity Contract

  • Legacy behavior preserved: single-agent execution is unaffected (no sibling writes → no staleness detected). Opt-out with OPENHUMAN_FILE_STATE_GUARD=0.
  • Guard/fallback/dispatch parity checks: when the coordinator is disabled or uninitialized, all file tools behave exactly as before.

Duplicate / Superseded PR Handling

  • N/A

Summary by CodeRabbit

Release Notes

  • New Features
    • Added file state coordination system to detect stale file reads/writes across agents and prevent concurrent access conflicts.
    • Parallel agent execution now tracks which parent files become stale due to child agent writes.
    • New OPENHUMAN_FILE_STATE_GUARD configuration option to control file state guard behavior.

…tes (tinyhumansai#3253)

Introduce a process-wide FileStateCoordinator that tracks per-agent read
stamps and per-path write stamps across parallel subagents. Write tools
(file_write, edit, apply_patch) now detect when another agent modified a
file after this agent's last read and return a model-facing error
requiring a re-read. Partial-read protection prevents overwrites after
paginated reads.

- file_state/types.rs — ReadStamp, WriteStamp, FileStateCoordinator
- file_state/ops.rs — record_read/write, check_stale/partial, path locks
- file_state/agent_context.rs — task-local agent ID for tool attribution
- file_read records read stamps after successful reads
- file_write/edit/apply_patch check staleness + partial before writing
- edit/apply_patch acquire per-path async locks for read-modify-write
- spawn_parallel_agents annotates results with stale_parent_reads
- OPENHUMAN_FILE_STATE_GUARD=0 disables the guard entirely
@senamakel senamakel requested a review from a team June 3, 2026 16:07
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces a process-wide file-state coordinator that tracks per-agent reads and writes to detect when sibling agents modify files. It guards filesystem operations against stale overwrites, integrates into the agent execution flow, and reports to parent agents which files their children touched.

Changes

File-state guard system

Layer / File(s) Summary
Coordinator types and global operations
src/openhuman/file_state/types.rs, src/openhuman/file_state/ops.rs
Core data structures (ReadStamp, WriteStamp, FileStateCoordinator) and singleton coordinator with monotonic timestamp-based staleness detection, path locking, and partial-read tracking. Enable/disable control via OPENHUMAN_FILE_STATE_GUARD env var.
Agent-scoped context and module wiring
src/openhuman/file_state/agent_context.rs, src/openhuman/file_state/mod.rs, src/openhuman/mod.rs
Tokio task-local to carry agent identity through agent turns. Module re-exports coordinator operations, context helpers, and types for downstream use.
Bootstrap and execution flow integration
src/core/jsonrpc.rs, src/openhuman/agent/bus.rs, src/openhuman/agent/harness/subagent_runner/ops.rs
Initialize coordinator at startup. Wrap agent-bus and subagent-runner execution with agent-ID task-local scope so all downstream tools inherit the context.
File tool read/write guards
src/openhuman/tools/impl/filesystem/file_read.rs, src/openhuman/tools/impl/filesystem/file_write.rs, src/openhuman/tools/impl/filesystem/edit_file.rs, src/openhuman/tools/impl/filesystem/apply_patch.rs
File tools acquire per-path locks, record successful reads with mtime, guard writes against stale/partial reads from the current agent, and record writes for future staleness checks.
Parallel agent staleness reporting
src/openhuman/agent_orchestration/tools/spawn_parallel_agents.rs
Enhance ParallelAgentResult with staleParentReads field. After child tasks complete, compute which paths the parent read but children wrote, and annotate all results.
Configuration documentation
.env.example
Document OPENHUMAN_FILE_STATE_GUARD env var for disabling the guard in tests and debugging.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • tinyhumansai/openhuman#3012: Refactors subagent/tool-loop execution in harness/subagent_runner/ops.rs as part of the unified TurnEngine, overlapping with the file-state agent-ID scoping added in this PR.

Suggested labels

feature, agent, memory

Suggested reviewers

  • graycyrus

Poem

🐰 Staleness guard, a rabbit's delight,
Siblings' writes now tracked in morning light,
Path locks serialize the dance,
No more stale edits by chance—
Files stay fresh through the night!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing a file-state guard to prevent cross-agent file edits from stale sibling writes, which is the core feature of this PR.
Linked Issues check ✅ Passed All acceptance criteria from issue #3253 are met: read/write stamps recorded [file_read, file_write, edit, apply_patch], sibling-write detection implemented [check_stale_read], partial-read protection enforced [check_partial_read], path locking via per-path tokio::Mutex [acquire_path_lock], parent reminder via staleParentReads [spawn_parallel_agents], opt-out via OPENHUMAN_FILE_STATE_GUARD env var [ops.rs], and comprehensive unit tests demonstrating coverage.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #3253 objectives: file-state coordinator module, agent context tracking, file operation guards, and parallel agent result annotation. No unrelated refactoring or feature creep detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. labels Jun 3, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
src/openhuman/file_state/ops.rs (1)

99-134: ⚡ Quick win

Log the rejection branches.

These are the core decision points for the new guard, but they currently return silently. Add stable trace/debug logs with correlation fields like agent, path, writer, and the chosen decision so stale-write reports can be reconstructed from logs.

As per coding guidelines "Default to verbose diagnostics on new/changed flows; log entry/exit, branches, external calls, retries/timeouts, state transitions, errors using stable grep-friendly prefixes and correlation fields; use log / tracing at debug / trace level; never log secrets or full PII".

Also applies to: 157-176

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/file_state/ops.rs` around lines 99 - 134, Add structured
debug/trace logging to the rejection branches of the read-guard functions so
decisions can be reconstructed: in check_stale_read and check_partial_read (and
the similar branch around lines 157-176) emit a stable, grep-friendly log when
returning Some(...) that includes correlation fields agent=agent_id,
path=resolved_path.display(), decision="stale_read" or "partial_read", and
writer=ws.writer (only for stale_read). Use the tracing/log crate at debug/trace
level with a consistent prefix (e.g. "guard:decision") and avoid logging
sensitive data; place the log immediately before the Some(...) return so it
records the exact branch taken.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/bus.rs`:
- Around line 260-264: The current file_state_id built in the file_state_id
variable ("bus:{channel_name}:{target_agent_id}") is not unique per turn; change
it to include a per-turn/session discriminator (for example a request/turn id
passed into this context, or generate a short UUID/timestamp once per request)
so each run gets its own identity. Update the construction of file_state_id to
append that discriminator (e.g.
"bus:{channel_name}:{target_agent_id}:{turn_id}") and ensure the discriminator
is created once per request/run (not per write) so other logic that checks
same-agent writes will no longer collapse concurrent turns; make this change
where file_state_id is formed and propagate the new turn/request id into this
scope if necessary.

In `@src/openhuman/file_state/ops.rs`:
- Around line 15-21: The is_disabled function uses OPENHUMAN_FILE_STATE_GUARD
but compares the env value case-sensitively, so values like "FALSE" or "Off"
don't disable the guard; update the closure inside the static
DISABLED.get_or_init to perform a case-insensitive comparison (e.g., call
v.eq_ignore_ascii_case(...) or convert v to_ascii_lowercase()) when matching
against "0", "false", "off", "no" so all capitalizations correctly disable the
guard; the change should be applied within the is_disabled function and touches
the DISABLED initialization logic only.

In `@src/openhuman/file_state/types.rs`:
- Around line 12-18: The current staleness logic relies on ReadStamp.timestamp:
Instant which can be equal across nearby events; change the ordering to use a
process-local strictly increasing sequence number instead: add a u64 sequence
field to ReadStamp (and any write stamp structures) and initialize/increment it
using a global AtomicU64.fetch_add(1, Ordering::SeqCst) on every read and write;
update all places that set timestamp (reads/writes) to also set sequence, and
change comparisons in check_stale_read, stale_reads_for_parent, and
parent_stale_files to compare sequence (or sequence as a tie-breaker with
Instant) so ordering is strict and unambiguous.

In `@src/openhuman/tools/impl/filesystem/apply_patch.rs`:
- Around line 147-169: The current lock acquisition uses caller-supplied edit
order (built into unique_paths) which can deadlock; make the order deterministic
by deduplicating and sorting paths before locking: replace the HashSet-based
unique_paths construction with a deterministic collection (e.g., a BTreeSet or
collect into a Vec then sort and dedup) derived from parsed entries’ e.path
values, then iterate that sorted list when resolving with
self.security.action_dir, tokio::fs::canonicalize, and calling
file_state::acquire_path_lock so _path_guards are always acquired in the same
global order.

In `@src/openhuman/tools/impl/filesystem/file_read.rs`:
- Around line 94-103: The read recording currently performs a separate await via
tokio::fs::metadata(...).await after read_to_string, creating a race; fix by
reusing the metadata you already obtained for the size check (or capture
metadata before the read) and call file_state::record_read(&agent_id,
resolved_path, <reuse_mtime>, false) immediately after the read completes
without performing another await. Locate the Ok(contents) branch in file_read.rs
where file_state::current_file_state_agent_id() and file_state::record_read(...)
are used and replace the extra tokio::fs::metadata(...).await call with the
previously fetched metadata's modified() time (or the metadata captured before
the read) so record_read uses that mtime synchronously in that branch.

In `@src/openhuman/tools/impl/filesystem/file_write.rs`:
- Around line 133-162: The checks and write in file_write are not serialized:
wrap the stale/partial checks, the tokio::fs::write call, and the
file_state::record_write call in a per-path lock by calling
acquire_path_lock(&resolved_target) (or its guard-returning form) so the
sequence check_stale_read(&agent_id, &resolved_target) /
check_partial_read(&agent_id, &resolved_target) /
tokio::fs::write(&resolved_target, content).await /
file_state::record_write(&agent_id, resolved_target) executes atomically under
that lock; ensure you still obtain current_file_state_agent_id() inside the
locked section and release the lock after record_write completes.

---

Nitpick comments:
In `@src/openhuman/file_state/ops.rs`:
- Around line 99-134: Add structured debug/trace logging to the rejection
branches of the read-guard functions so decisions can be reconstructed: in
check_stale_read and check_partial_read (and the similar branch around lines
157-176) emit a stable, grep-friendly log when returning Some(...) that includes
correlation fields agent=agent_id, path=resolved_path.display(),
decision="stale_read" or "partial_read", and writer=ws.writer (only for
stale_read). Use the tracing/log crate at debug/trace level with a consistent
prefix (e.g. "guard:decision") and avoid logging sensitive data; place the log
immediately before the Some(...) return so it records the exact branch taken.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9e8322c7-be12-4334-a0b4-4e3f020e110f

📥 Commits

Reviewing files that changed from the base of the PR and between db0307d and 3018cfa.

📒 Files selected for processing (14)
  • .env.example
  • src/core/jsonrpc.rs
  • src/openhuman/agent/bus.rs
  • src/openhuman/agent/harness/subagent_runner/ops.rs
  • src/openhuman/agent_orchestration/tools/spawn_parallel_agents.rs
  • src/openhuman/file_state/agent_context.rs
  • src/openhuman/file_state/mod.rs
  • src/openhuman/file_state/ops.rs
  • src/openhuman/file_state/types.rs
  • src/openhuman/mod.rs
  • src/openhuman/tools/impl/filesystem/apply_patch.rs
  • src/openhuman/tools/impl/filesystem/edit_file.rs
  • src/openhuman/tools/impl/filesystem/file_read.rs
  • src/openhuman/tools/impl/filesystem/file_write.rs

Comment on lines +260 to +264
let file_state_id = format!(
"bus:{}:{}",
channel_name,
target_agent_id.as_deref().unwrap_or("root")
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the bus file-state ID unique per turn.

"bus:{channel}:{target_agent}" collapses every concurrent turn on the same channel/agent into one logical writer. Since stale-read detection ignores same-agent writes, overlapping root turns can overwrite each other without tripping the new guard. Include a per-turn/session discriminator here (or generate one once per request) so each run gets its own file-state identity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/bus.rs` around lines 260 - 264, The current file_state_id
built in the file_state_id variable ("bus:{channel_name}:{target_agent_id}") is
not unique per turn; change it to include a per-turn/session discriminator (for
example a request/turn id passed into this context, or generate a short
UUID/timestamp once per request) so each run gets its own identity. Update the
construction of file_state_id to append that discriminator (e.g.
"bus:{channel_name}:{target_agent_id}:{turn_id}") and ensure the discriminator
is created once per request/run (not per write) so other logic that checks
same-agent writes will no longer collapse concurrent turns; make this change
where file_state_id is formed and propagate the new turn/request id into this
scope if necessary.

Comment on lines +15 to +21
fn is_disabled() -> bool {
static DISABLED: OnceLock<bool> = OnceLock::new();
*DISABLED.get_or_init(|| {
std::env::var("OPENHUMAN_FILE_STATE_GUARD")
.map(|v| matches!(v.as_str(), "0" | "false" | "off" | "no"))
.unwrap_or(false)
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Parse the disable flag case-insensitively.

OPENHUMAN_FILE_STATE_GUARD=FALSE or Off currently leaves the guard enabled because Line 19 only matches exact lowercase strings. For a debug/emergency kill switch, that silent miss is risky.

Suggested fix
     *DISABLED.get_or_init(|| {
         std::env::var("OPENHUMAN_FILE_STATE_GUARD")
-            .map(|v| matches!(v.as_str(), "0" | "false" | "off" | "no"))
+            .map(|v| {
+                let v = v.trim();
+                v == "0"
+                    || v.eq_ignore_ascii_case("false")
+                    || v.eq_ignore_ascii_case("off")
+                    || v.eq_ignore_ascii_case("no")
+            })
             .unwrap_or(false)
     })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn is_disabled() -> bool {
static DISABLED: OnceLock<bool> = OnceLock::new();
*DISABLED.get_or_init(|| {
std::env::var("OPENHUMAN_FILE_STATE_GUARD")
.map(|v| matches!(v.as_str(), "0" | "false" | "off" | "no"))
.unwrap_or(false)
})
fn is_disabled() -> bool {
static DISABLED: OnceLock<bool> = OnceLock::new();
*DISABLED.get_or_init(|| {
std::env::var("OPENHUMAN_FILE_STATE_GUARD")
.map(|v| {
let v = v.trim();
v == "0"
|| v.eq_ignore_ascii_case("false")
|| v.eq_ignore_ascii_case("off")
|| v.eq_ignore_ascii_case("no")
})
.unwrap_or(false)
})
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/file_state/ops.rs` around lines 15 - 21, The is_disabled
function uses OPENHUMAN_FILE_STATE_GUARD but compares the env value
case-sensitively, so values like "FALSE" or "Off" don't disable the guard;
update the closure inside the static DISABLED.get_or_init to perform a
case-insensitive comparison (e.g., call v.eq_ignore_ascii_case(...) or convert v
to_ascii_lowercase()) when matching against "0", "false", "off", "no" so all
capitalizations correctly disable the guard; the change should be applied within
the is_disabled function and touches the DISABLED initialization logic only.

Comment on lines +12 to +18
pub struct ReadStamp {
/// Filesystem mtime at the moment of the read.
pub mtime: SystemTime,
/// Monotonic clock timestamp of the read.
pub timestamp: Instant,
/// Whether the read was partial (paginated / offset+limit).
pub partial: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🌐 Web query:

Does Rust's std::time::Instantguarantee that consecutiveInstant::now() calls are strictly increasing / unique, or can they be equal because of clock granularity?

💡 Result:

Rust's std::time::Instant does not guarantee that consecutive calls to Instant::now will be unique [1][2]. It guarantees that instants are monotonically nondecreasing, meaning that a newly created instant will always be greater than or equal to any previously measured instant [1][2]. While the clock is intended to move forward, it is possible for consecutive calls to Instant::now to return the exact same value because of the underlying clock's granularity [1][2]. If the time elapsed between two calls is smaller than the resolution of the platform's monotonic clock, the calls may return equal, non-decreasing values [1][2]. In summary: - Monotonicity: Instants are guaranteed to be nondecreasing (never go backward) [1][2]. - Uniqueness: Consecutive calls are not guaranteed to be unique; they can be equal if the time between calls is less than the clock's resolution [1][2]. - Reliability: These guarantees depend on the platform's OS API and hardware, and while they are generally robust on tier 1 platforms, they can theoretically be broken by rare system, hardware, or virtualization bugs [1][2].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "timestamp\s*>\s*|>\s*.*timestamp|Instant::now|SystemTime|ReadStamp|WriteStamp" src/openhuman/file_state -S || true

echo "---- types.rs relevant excerpts ----"
FILE="src/openhuman/file_state/types.rs"
if [ -f "$FILE" ]; then
  nl -ba "$FILE" | sed -n '1,120p'
fi

echo "---- ops.rs relevant excerpts ----"
FILE2="src/openhuman/file_state/ops.rs"
if [ -f "$FILE2" ]; then
  nl -ba "$FILE2" | sed -n '1,140p'
fi

Repository: tinyhumansai/openhuman

Length of output: 3536


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/openhuman/file_state/types.rs"
echo "---- types.rs (lines 1-120) ----"
if [ -f "$FILE" ]; then
  sed -n '1,120p' "$FILE"
fi

echo "---- types.rs (around line ~65) ----"
sed -n '50,90p' "$FILE"

FILE2="src/openhuman/file_state/ops.rs"
echo "---- ops.rs (lines 1-140) ----"
if [ -f "$FILE2" ]; then
  sed -n '1,160p' "$FILE2"
fi

echo "---- ops.rs (around stale checks) ----"
sed -n '90,140p' "$FILE2"
sed -n '150,230p' "$FILE2"
sed -n '230,330p' "$FILE2"

Repository: tinyhumansai/openhuman

Length of output: 17895


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "check_stale_read|stale_reads_for_parent|parent_stale_files" src/openhuman/file_state -S
rg -n "acquire_path_lock|record_read|record_write" src/openhuman -S

Repository: tinyhumansai/openhuman

Length of output: 5972


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "check_stale_read|parent_stale_files|check_partial_read" src/openhuman/tools/impl/filesystem -S
rg -n "check_stale_read|parent_stale_files|check_partial_read" src/openhuman -S

Repository: tinyhumansai/openhuman

Length of output: 2539


Fix stale-read ordering: don’t rely on Instant-only equality semantics

check_stale_read (src/openhuman/file_state/ops.rs) and stale_reads_for_parent / parent_stale_files (src/openhuman/file_state/types.rs + ops.rs) treat staleness as ws.timestamp > read_stamp.timestamp, but both reads and writes record timestamp: Instant::now().

Instant is only monotonic/non-decreasing, not unique—two consecutive Instant::now() calls can return the same value—so a sibling write that happens after a read can still compare equal and slip past the stale-read guard.

Switch to a process-local strictly increasing sequence (e.g., AtomicU64 fetch_add) and compare (sequence) (or sequence as a tie-breaker alongside Instant) so ordering is strict and unambiguous.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/file_state/types.rs` around lines 12 - 18, The current
staleness logic relies on ReadStamp.timestamp: Instant which can be equal across
nearby events; change the ordering to use a process-local strictly increasing
sequence number instead: add a u64 sequence field to ReadStamp (and any write
stamp structures) and initialize/increment it using a global
AtomicU64.fetch_add(1, Ordering::SeqCst) on every read and write; update all
places that set timestamp (reads/writes) to also set sequence, and change
comparisons in check_stale_read, stale_reads_for_parent, and parent_stale_files
to compare sequence (or sequence as a tie-breaker with Instant) so ordering is
strict and unambiguous.

Comment on lines +147 to +169
// Acquire per-path locks for all unique paths before any reads.
let unique_paths: Vec<String> = {
let mut seen = std::collections::HashSet::new();
parsed
.iter()
.filter_map(|e| {
if seen.insert(e.path.clone()) {
Some(e.path.clone())
} else {
None
}
})
.collect()
};
let mut _path_guards = Vec::new();
for p in &unique_paths {
let full = self.security.action_dir.join(p);
if let Ok(resolved) = tokio::fs::canonicalize(&full).await {
if let Some(guard) = file_state::acquire_path_lock(&resolved).await {
_path_guards.push(guard);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Acquire batch path locks in a deterministic order.

These locks are taken in caller-supplied edit order. Two concurrent patches over the same files in opposite orders will deadlock after each grabs its first lock. Resolve/sort/dedup the path list before locking so every task acquires the same global order.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/filesystem/apply_patch.rs` around lines 147 - 169,
The current lock acquisition uses caller-supplied edit order (built into
unique_paths) which can deadlock; make the order deterministic by deduplicating
and sorting paths before locking: replace the HashSet-based unique_paths
construction with a deterministic collection (e.g., a BTreeSet or collect into a
Vec then sort and dedup) derived from parsed entries’ e.path values, then
iterate that sorted list when resolving with self.security.action_dir,
tokio::fs::canonicalize, and calling file_state::acquire_path_lock so
_path_guards are always acquired in the same global order.

Comment on lines +94 to +103
Ok(contents) => {
if let Some(agent_id) = file_state::current_file_state_agent_id() {
let mtime = tokio::fs::metadata(&resolved_path)
.await
.ok()
.and_then(|m| m.modified().ok())
.unwrap_or(std::time::SystemTime::UNIX_EPOCH);
file_state::record_read(&agent_id, resolved_path, mtime, false);
}
Ok(ToolResult::success(contents))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Record the read without another await after read_to_string.

The extra tokio::fs::metadata(...).await creates a yield point after the contents are read but before the read stamp is recorded. If a sibling writes in that window, this read gets timestamped after the sibling write even though the returned contents are older, so the next write can slip past check_stale_read. Reuse the metadata you already fetched for the size check, or otherwise call record_read immediately after the read completes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/filesystem/file_read.rs` around lines 94 - 103, The
read recording currently performs a separate await via
tokio::fs::metadata(...).await after read_to_string, creating a race; fix by
reusing the metadata you already obtained for the size check (or capture
metadata before the read) and call file_state::record_read(&agent_id,
resolved_path, <reuse_mtime>, false) immediately after the read completes
without performing another await. Locate the Ok(contents) branch in file_read.rs
where file_state::current_file_state_agent_id() and file_state::record_read(...)
are used and replace the extra tokio::fs::metadata(...).await call with the
previously fetched metadata's modified() time (or the metadata captured before
the read) so record_read uses that mtime synchronously in that branch.

Comment on lines +133 to +162
// File-state guard: reject writes based on stale or partial reads.
if let Some(agent_id) = file_state::current_file_state_agent_id() {
if let Some(msg) = file_state::check_stale_read(&agent_id, &resolved_target) {
tracing::debug!(
agent = %agent_id,
path = %resolved_target.display(),
"[file_state] file_write blocked: stale read"
);
return Ok(ToolResult::error(msg));
}
if let Some(msg) = file_state::check_partial_read(&agent_id, &resolved_target) {
tracing::debug!(
agent = %agent_id,
path = %resolved_target.display(),
"[file_state] file_write blocked: partial read"
);
return Ok(ToolResult::error(msg));
}
}

match tokio::fs::write(&resolved_target, content).await {
Ok(()) => Ok(ToolResult::success(format!(
"Written {} bytes to {path}",
content.len()
))),
Ok(()) => {
if let Some(agent_id) = file_state::current_file_state_agent_id() {
file_state::record_write(&agent_id, resolved_target);
}
Ok(ToolResult::success(format!(
"Written {} bytes to {path}",
content.len()
)))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Serialize file_write's check→write section on the path lock.

Right now check_stale_read and tokio::fs::write are separated by awaits with no per-path lock. Two agents that both read the same file can both clear the guard and the later write will silently overwrite the earlier one. Wrap the stale/partial checks, the write, and record_write in acquire_path_lock(&resolved_target) so this path gets the same atomicity that edit and apply_patch now have.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/tools/impl/filesystem/file_write.rs` around lines 133 - 162,
The checks and write in file_write are not serialized: wrap the stale/partial
checks, the tokio::fs::write call, and the file_state::record_write call in a
per-path lock by calling acquire_path_lock(&resolved_target) (or its
guard-returning form) so the sequence check_stale_read(&agent_id,
&resolved_target) / check_partial_read(&agent_id, &resolved_target) /
tokio::fs::write(&resolved_target, content).await /
file_state::record_write(&agent_id, resolved_target) executes atomically under
that lock; ensure you still obtain current_file_state_agent_id() inside the
locked section and release the lock after record_write completes.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@senamakel — the PR Submission Checklist check is failing because the "Diff coverage >= 80%" checkbox in the PR description is unchecked (the Coverage Gate CI check itself passed — just tick the box).

I skimmed the diff while CI is red and have a couple of things beyond what CodeRabbit flagged:

Memory growth: reads and writes in FileStateCoordinator are unbounded HashMaps with no eviction strategy. Every read and write across all agent turns in a process lifetime accumulates here indefinitely. For long-running server deployments handling many sequential sessions this causes steady memory growth. You need a session-boundary cleanup hook or TTL-based eviction pass.

Double canonicalize in apply_patch: The lock acquisition loop and the stale-read guard loop both call tokio::fs::canonicalize on the same paths. Save the resolved paths from the first loop and reuse them in the second — two filesystem round-trips per path is redundant.

CodeRabbit's open findings — deadlock in apply_patch lock ordering, missing path lock in file_write, extra-await race window in file_read, non-unique bus agent ID — also need addressing before this is ready. Fix the CI check and those findings and I'll come back for a full pass.

pub struct FileStateCoordinator {
/// Per-agent, per-resolved-path read stamps.
/// Key: `(agent_id, canonical_path)`.
pub(crate) reads: RwLock<HashMap<(String, PathBuf), ReadStamp>>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] These maps accumulate entries for the entire process lifetime with no eviction or session-scoped cleanup. In a long-running server that handles many sequential agent sessions, every read and write ever recorded stays in memory until the process exits. Add a session-boundary cleanup hook (drain entries for a given session ID when it tears down) or a TTL-based eviction pass.

@senamakel senamakel merged commit 6c4e300 into tinyhumansai:main Jun 3, 2026
23 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. feature Net-new user-facing capability or product behavior. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Guard cross-agent file edits from stale sibling writes

2 participants