fix(memgraph): prevent Cypher injection via workspace label by sebastiondev · Pull Request #3022 · HKUDS/LightRAG

sebastiondev · 2026-05-06T02:03:55Z

Summary

Fix a Cypher injection vector in MemgraphStorage._get_workspace_label() (CWE-89 / CWE-943). The workspace value is interpolated directly into Cypher query strings as both a label identifier and — in one location — a single-quoted string literal. The previous sanitizer only doubled backticks, which is insufficient when the value is also embedded inside a Cypher string literal.

Affected file: lightrag/kg/memgraph_impl.py
Affected function: MemgraphStorage._get_workspace_label() and the BFS query in get_knowledge_graph() (around line 1096)

The bug

_get_workspace_label() previously returned workspace.replace("`", ""). That is reasonable when the value is only used as a backtick-quoted label, e.g. MATCH (n:{label})``. However, the same value is also interpolated into a Cypher single-quoted string literal in get_knowledge_graph():

WHERE path IS NULL OR ALL(n IN nodes(path) WHERE '{workspace_label}' IN labels(n))

A workspace value such as x' OR true OR ' contains no backticks, so the old escape passes it through unchanged. Once interpolated, it breaks out of the string literal and becomes attacker-controlled Cypher.

Fix

Two changes, both narrowly scoped to lightrag/kg/memgraph_impl.py:

Tighten _get_workspace_label() to only allow [A-Za-z0-9_]. Cypher labels cannot be parameterized, so a strict allowlist is the correct primitive here. Anything else is replaced with _, and an empty result falls back to "base".
Convert the string-literal interpolation in get_knowledge_graph() to a query parameter ($workspace_label). Even with the strict sanitizer in place, parameterizing the literal removes a class of bug rather than relying on a single chokepoint.

The label-position interpolations (e.g. (n:`{workspace_label}`)) remain as identifiers since Cypher does not allow parameterizing labels — the new allowlist makes that safe.

Tests

Updated tests/test_workspace_sanitization.py to reflect the stricter contract:

Backticks, quotes, semicolons, and other Cypher metacharacters are replaced with _ (not just escaped).
Empty/whitespace workspaces still fall back to "base".
A workspace consisting entirely of disallowed characters also falls back to "base".
Round-trip checks against the BFS query confirm the value can no longer terminate a string literal.

All tests/test_memgraph_storage.py and tests/test_workspace_sanitization.py cases pass locally.

Security analysis

Source → sink: self.workspace (set from MEMGRAPH_WORKSPACE env var, the LIGHTRAG-WORKSPACE HTTP header, or a direct constructor argument) flows into _get_workspace_label() and from there into Cypher query strings built via f-strings.
Pre-fix exploit: workspace value x' OR true OR ' (or similar) injects into the '{workspace_label}' literal in get_knowledge_graph() and yields attacker-controlled Cypher. The previous backtick-doubling sanitizer does not affect quote characters.
Post-fix: the allowlist makes injection structurally impossible for label interpolations, and the parameterized literal removes the string-literal sink entirely. The change does not depend on upstream callers for query-construction safety.

Adversarial review

Before submitting, we tried to disprove this. The strongest counter-argument is that lightrag_server.get_workspace_from_request() already applies a similar sanitizer to the LIGHTRAG-WORKSPACE header, so over the HTTP API the dangerous characters never reach MemgraphStorage — that lowers practical severity for typical API deployments. However, the storage class is also reachable via (a) the MEMGRAPH_WORKSPACE environment variable, (b) direct library usage where embedders pass an untrusted workspace argument, and (c) any future code path that constructs a MemgraphStorage without going through the server's header sanitizer. A storage layer should not rely on a single upstream caller for query-construction safety, especially when the correct primitive (allowlist + parameter binding) is small and local.

Scope

Two files changed:

lightrag/kg/memgraph_impl.py — sanitizer + one parameterization (~10 net lines)
tests/test_workspace_sanitization.py — updated expectations for the stricter contract

No behavior change for valid workspace names (alphanumerics + underscore, which matches what the API layer already accepts).

cc @lewiswigmore

fix: prevent Memgraph workspace Cypher injection

3570fa4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memgraph): prevent Cypher injection via workspace label#3022

fix(memgraph): prevent Cypher injection via workspace label#3022
sebastiondev wants to merge 1 commit intoHKUDS:mainfrom
sebastiondev:fix/cwe89-memgraph-impl-cypher-a63f

sebastiondev commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sebastiondev commented May 6, 2026

Summary

The bug

Fix

Tests

Security analysis

Adversarial review

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant