Skip to content

fix: prevent orphan graph edges when VDB upsert fails during merge#2941

Open
korbonits wants to merge 2 commits intoHKUDS:mainfrom
korbonits:fix/edge-count-drift-merge-vdb-rollback
Open

fix: prevent orphan graph edges when VDB upsert fails during merge#2941
korbonits wants to merge 2 commits intoHKUDS:mainfrom
korbonits:fix/edge-count-drift-merge-vdb-rollback

Conversation

@korbonits
Copy link
Copy Markdown

Description

Fixes the edge-count drift bug reported in #2917, where the graph-edge count drifts above the VDB-relation count after merge or delete operations.

Root Cause

In _merge_entities_impl, edges are written to the graph (step 6) and then embedded into the relationships VDB one-by-one (step 7). If the VDB upsert fails partway through — e.g. embedder crash, context-length exceeded with a high-degree hub entity, network timeout — the graph already holds the new edges but the VDB does not. The subsequent DETACH DELETE of source entities (step 10) then removes the original edges, leaving the new edges as orphans with no VDB counterpart. Over time this causes visible drift (reported: 372 orphan edges on a live instance).

Changes Made

lightrag/utils_graph.py

  • Wrap each per-edge relationships_vdb.upsert() call in try/except inside _merge_entities_impl
  • Collect failed edges in vdb_failed_edges; after the loop call remove_edges(vdb_failed_edges) to roll back the matching graph writes — ensuring graph and VDB are always in sync (either both have the edge, or neither does)
  • Fix long-standing typo "updatign""updating" in log message
  • Add check_graph_consistency(graph, vdb) — compares all graph edges against VDB entries via get_all_edges() + get_by_ids(), returns {orphan_graph_edges, total_graph_edges, total_vdb_relations}; handles PostgreSQL/AGE double-quoted entity_id values transparently
  • Add repair_graph_consistency(graph, vdb, *, dry_run=False) — calls the check then removes orphans via remove_edges; dry_run=True for safe inspection before committing changes

lightrag/lightrag.py

  • Expose both utilities as rag.acheck_graph_consistency(), rag.arepair_graph_consistency(dry_run=True), and their synchronous wrappers on the LightRAG class

tests/test_graph_consistency.py (new file)

  • 8 unit tests covering: no-edges, all-present, orphan detection, AGE quote stripping, dry-run, repair, no-op repair, and VDB-failure rollback in _merge_entities_impl

Related Issues

Closes #2917

Usage (for existing drift)

# Inspect without changing anything
report = await rag.acheck_graph_consistency()
print(f"Orphan edges: {len(report['orphan_graph_edges'])}")

# Fix
report = await rag.arepair_graph_consistency(dry_run=True)  # preview
report = await rag.arepair_graph_consistency()               # apply
print(f"Repaired: {report['repaired']}")

Checklist

  • Changes tested locally
  • Pre-commit checks pass (pre-commit run --all-files)
  • Unit tests added
  • Documentation updated (docstrings + LightRAG method docstrings with usage examples)

korbonits and others added 2 commits April 15, 2026 00:17
When `_merge_entities_impl` writes new edges to the graph (step 6) and
then the corresponding relationships-VDB upsert fails partway through
(e.g. embedder crash, context-length exceeded), the subsequent
source-entity deletion with DETACH DELETE left those edges orphaned in
the graph with no VDB counterpart.  Over time this causes the
graph-edge count to drift above the VDB-relation count (issue HKUDS#2917).

Changes:
- Wrap each per-edge VDB upsert in try/except inside `_merge_entities_impl`
- On failure, call `remove_edges()` to roll back the matching graph write
  so graph and VDB are always in sync (either both have the edge, or neither)
- Fix long-standing typo "updatign" → "updating" in log message
- Add `check_graph_consistency()` utility that compares all graph edges
  against VDB entries and returns a list of orphan (src, tgt) pairs;
  handles PostgreSQL/AGE double-quoted entity_id values transparently
- Add `repair_graph_consistency()` utility that calls the check and
  removes orphans; supports `dry_run=True` for safe inspection first
- Expose both utilities as `rag.acheck_graph_consistency()`,
  `rag.arepair_graph_consistency()` (and sync wrappers) on `LightRAG`
- Add 8 unit tests covering all new code paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ruff-format reformatted two collapsed set-comprehension / function-call
  expressions in utils_graph.py
- removed the leftover graph_written_edges variable (collected but never
  read after the VDB-rollback approach made it unnecessary)
- removed unused asyncio and patch imports from the test file
- removed unused result assignment in test_merge_rolls_back_graph_edge_on_vdb_failure

All pre-commit hooks pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@danielaskdd
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 68e7f74698

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lightrag/utils_graph.py
Comment on lines +1855 to +1857
found_ids = {
relation_ids[i] for i, result in enumerate(vdb_results) if result is not None
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Abort repair when VDB ID lookups are incomplete

check_graph_consistency assumes relationships_vdb.get_by_ids() returns a positional list for every requested ID and interprets anything else as “missing in VDB”. Several backends return [] on retrieval errors instead of raising (for example lightrag/kg/qdrant_impl.py:965-969 and lightrag/kg/mongo_impl.py:2564-2568), so a transient backend failure can make found_ids empty and cause repair_graph_consistency() to delete essentially all graph edges as false orphans. Add an explicit completeness/error check before deriving found_ids.

Useful? React with 👍 / 👎.

Comment thread lightrag/utils_graph.py
if not src or not tgt:
continue
normalized_src, normalized_tgt = sorted([src, tgt])
relation_id = compute_mdhash_id(normalized_src + normalized_tgt, prefix="rel-")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check both legacy relation-id permutations

The consistency check hashes only normalized_src + normalized_tgt, but the codebase still handles legacy reverse-order relation IDs elsewhere (e.g., deleting both permutations in utils_graph.py:783-786). If an existing relation is stored only under the reverse hash, this logic marks it orphaned and repair can remove a valid graph edge. Use both permutations (or make_relation_vdb_ids) before classifying an edge as orphan.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug/Investigation] AGE_edges count drifts from vdb_relation count after delete/merge operations

2 participants