fix: prevent orphan graph edges when VDB upsert fails during merge#2941
fix: prevent orphan graph edges when VDB upsert fails during merge#2941korbonits wants to merge 2 commits intoHKUDS:mainfrom
Conversation
When `_merge_entities_impl` writes new edges to the graph (step 6) and then the corresponding relationships-VDB upsert fails partway through (e.g. embedder crash, context-length exceeded), the subsequent source-entity deletion with DETACH DELETE left those edges orphaned in the graph with no VDB counterpart. Over time this causes the graph-edge count to drift above the VDB-relation count (issue HKUDS#2917). Changes: - Wrap each per-edge VDB upsert in try/except inside `_merge_entities_impl` - On failure, call `remove_edges()` to roll back the matching graph write so graph and VDB are always in sync (either both have the edge, or neither) - Fix long-standing typo "updatign" → "updating" in log message - Add `check_graph_consistency()` utility that compares all graph edges against VDB entries and returns a list of orphan (src, tgt) pairs; handles PostgreSQL/AGE double-quoted entity_id values transparently - Add `repair_graph_consistency()` utility that calls the check and removes orphans; supports `dry_run=True` for safe inspection first - Expose both utilities as `rag.acheck_graph_consistency()`, `rag.arepair_graph_consistency()` (and sync wrappers) on `LightRAG` - Add 8 unit tests covering all new code paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ruff-format reformatted two collapsed set-comprehension / function-call expressions in utils_graph.py - removed the leftover graph_written_edges variable (collected but never read after the VDB-rollback approach made it unnecessary) - removed unused asyncio and patch imports from the test file - removed unused result assignment in test_merge_rolls_back_graph_edge_on_vdb_failure All pre-commit hooks pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 68e7f74698
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| found_ids = { | ||
| relation_ids[i] for i, result in enumerate(vdb_results) if result is not None | ||
| } |
There was a problem hiding this comment.
Abort repair when VDB ID lookups are incomplete
check_graph_consistency assumes relationships_vdb.get_by_ids() returns a positional list for every requested ID and interprets anything else as “missing in VDB”. Several backends return [] on retrieval errors instead of raising (for example lightrag/kg/qdrant_impl.py:965-969 and lightrag/kg/mongo_impl.py:2564-2568), so a transient backend failure can make found_ids empty and cause repair_graph_consistency() to delete essentially all graph edges as false orphans. Add an explicit completeness/error check before deriving found_ids.
Useful? React with 👍 / 👎.
| if not src or not tgt: | ||
| continue | ||
| normalized_src, normalized_tgt = sorted([src, tgt]) | ||
| relation_id = compute_mdhash_id(normalized_src + normalized_tgt, prefix="rel-") |
There was a problem hiding this comment.
Check both legacy relation-id permutations
The consistency check hashes only normalized_src + normalized_tgt, but the codebase still handles legacy reverse-order relation IDs elsewhere (e.g., deleting both permutations in utils_graph.py:783-786). If an existing relation is stored only under the reverse hash, this logic marks it orphaned and repair can remove a valid graph edge. Use both permutations (or make_relation_vdb_ids) before classifying an edge as orphan.
Useful? React with 👍 / 👎.
Description
Fixes the edge-count drift bug reported in #2917, where the graph-edge count drifts above the VDB-relation count after merge or delete operations.
Root Cause
In
_merge_entities_impl, edges are written to the graph (step 6) and then embedded into the relationships VDB one-by-one (step 7). If the VDB upsert fails partway through — e.g. embedder crash, context-length exceeded with a high-degree hub entity, network timeout — the graph already holds the new edges but the VDB does not. The subsequentDETACH DELETEof source entities (step 10) then removes the original edges, leaving the new edges as orphans with no VDB counterpart. Over time this causes visible drift (reported: 372 orphan edges on a live instance).Changes Made
lightrag/utils_graph.pyrelationships_vdb.upsert()call intry/exceptinside_merge_entities_implvdb_failed_edges; after the loop callremove_edges(vdb_failed_edges)to roll back the matching graph writes — ensuring graph and VDB are always in sync (either both have the edge, or neither does)"updatign"→"updating"in log messagecheck_graph_consistency(graph, vdb)— compares all graph edges against VDB entries viaget_all_edges()+get_by_ids(), returns{orphan_graph_edges, total_graph_edges, total_vdb_relations}; handles PostgreSQL/AGE double-quotedentity_idvalues transparentlyrepair_graph_consistency(graph, vdb, *, dry_run=False)— calls the check then removes orphans viaremove_edges;dry_run=Truefor safe inspection before committing changeslightrag/lightrag.pyrag.acheck_graph_consistency(),rag.arepair_graph_consistency(dry_run=True), and their synchronous wrappers on theLightRAGclasstests/test_graph_consistency.py(new file)_merge_entities_implRelated Issues
Closes #2917
Usage (for existing drift)
Checklist
pre-commit run --all-files)LightRAGmethod docstrings with usage examples)