fix(memgraph): prevent Cypher injection via workspace label#3022
Open
sebastiondev wants to merge 1 commit intoHKUDS:mainfrom
Open
fix(memgraph): prevent Cypher injection via workspace label#3022sebastiondev wants to merge 1 commit intoHKUDS:mainfrom
sebastiondev wants to merge 1 commit intoHKUDS:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix a Cypher injection vector in
MemgraphStorage._get_workspace_label()(CWE-89 / CWE-943). The workspace value is interpolated directly into Cypher query strings as both a label identifier and — in one location — a single-quoted string literal. The previous sanitizer only doubled backticks, which is insufficient when the value is also embedded inside a Cypher string literal.Affected file:
lightrag/kg/memgraph_impl.pyAffected function:
MemgraphStorage._get_workspace_label()and the BFS query inget_knowledge_graph()(around line 1096)The bug
_get_workspace_label()previously returnedworkspace.replace("`", ""). That is reasonable when the value is only used as a backtick-quoted label, e.g.MATCH (n:{label})``. However, the same value is also interpolated into a Cypher single-quoted string literal inget_knowledge_graph():A workspace value such as
x' OR true OR 'contains no backticks, so the old escape passes it through unchanged. Once interpolated, it breaks out of the string literal and becomes attacker-controlled Cypher.Fix
Two changes, both narrowly scoped to
lightrag/kg/memgraph_impl.py:_get_workspace_label()to only allow[A-Za-z0-9_]. Cypher labels cannot be parameterized, so a strict allowlist is the correct primitive here. Anything else is replaced with_, and an empty result falls back to"base".get_knowledge_graph()to a query parameter ($workspace_label). Even with the strict sanitizer in place, parameterizing the literal removes a class of bug rather than relying on a single chokepoint.The label-position interpolations (e.g.
(n:`{workspace_label}`)) remain as identifiers since Cypher does not allow parameterizing labels — the new allowlist makes that safe.Tests
Updated
tests/test_workspace_sanitization.pyto reflect the stricter contract:_(not just escaped)."base"."base".All
tests/test_memgraph_storage.pyandtests/test_workspace_sanitization.pycases pass locally.Security analysis
self.workspace(set fromMEMGRAPH_WORKSPACEenv var, theLIGHTRAG-WORKSPACEHTTP header, or a direct constructor argument) flows into_get_workspace_label()and from there into Cypher query strings built via f-strings.x' OR true OR '(or similar) injects into the'{workspace_label}'literal inget_knowledge_graph()and yields attacker-controlled Cypher. The previous backtick-doubling sanitizer does not affect quote characters.Adversarial review
Before submitting, we tried to disprove this. The strongest counter-argument is that
lightrag_server.get_workspace_from_request()already applies a similar sanitizer to theLIGHTRAG-WORKSPACEheader, so over the HTTP API the dangerous characters never reachMemgraphStorage— that lowers practical severity for typical API deployments. However, the storage class is also reachable via (a) theMEMGRAPH_WORKSPACEenvironment variable, (b) direct library usage where embedders pass an untrusted workspace argument, and (c) any future code path that constructs aMemgraphStoragewithout going through the server's header sanitizer. A storage layer should not rely on a single upstream caller for query-construction safety, especially when the correct primitive (allowlist + parameter binding) is small and local.Scope
Two files changed:
lightrag/kg/memgraph_impl.py— sanitizer + one parameterization (~10 net lines)tests/test_workspace_sanitization.py— updated expectations for the stricter contractNo behavior change for valid workspace names (alphanumerics + underscore, which matches what the API layer already accepts).
cc @lewiswigmore