Skip to content

fix(memgraph): prevent Cypher injection via workspace label#3022

Open
sebastiondev wants to merge 1 commit intoHKUDS:mainfrom
sebastiondev:fix/cwe89-memgraph-impl-cypher-a63f
Open

fix(memgraph): prevent Cypher injection via workspace label#3022
sebastiondev wants to merge 1 commit intoHKUDS:mainfrom
sebastiondev:fix/cwe89-memgraph-impl-cypher-a63f

Conversation

@sebastiondev
Copy link
Copy Markdown
Contributor

Summary

Fix a Cypher injection vector in MemgraphStorage._get_workspace_label() (CWE-89 / CWE-943). The workspace value is interpolated directly into Cypher query strings as both a label identifier and — in one location — a single-quoted string literal. The previous sanitizer only doubled backticks, which is insufficient when the value is also embedded inside a Cypher string literal.

Affected file: lightrag/kg/memgraph_impl.py
Affected function: MemgraphStorage._get_workspace_label() and the BFS query in get_knowledge_graph() (around line 1096)

The bug

_get_workspace_label() previously returned workspace.replace("`", ""). That is reasonable when the value is only used as a backtick-quoted label, e.g. MATCH (n:{label})``. However, the same value is also interpolated into a Cypher single-quoted string literal in get_knowledge_graph():

WHERE path IS NULL OR ALL(n IN nodes(path) WHERE '{workspace_label}' IN labels(n))

A workspace value such as x' OR true OR ' contains no backticks, so the old escape passes it through unchanged. Once interpolated, it breaks out of the string literal and becomes attacker-controlled Cypher.

Fix

Two changes, both narrowly scoped to lightrag/kg/memgraph_impl.py:

  1. Tighten _get_workspace_label() to only allow [A-Za-z0-9_]. Cypher labels cannot be parameterized, so a strict allowlist is the correct primitive here. Anything else is replaced with _, and an empty result falls back to "base".
  2. Convert the string-literal interpolation in get_knowledge_graph() to a query parameter ($workspace_label). Even with the strict sanitizer in place, parameterizing the literal removes a class of bug rather than relying on a single chokepoint.

The label-position interpolations (e.g. (n:`{workspace_label}`)) remain as identifiers since Cypher does not allow parameterizing labels — the new allowlist makes that safe.

Tests

Updated tests/test_workspace_sanitization.py to reflect the stricter contract:

  • Backticks, quotes, semicolons, and other Cypher metacharacters are replaced with _ (not just escaped).
  • Empty/whitespace workspaces still fall back to "base".
  • A workspace consisting entirely of disallowed characters also falls back to "base".
  • Round-trip checks against the BFS query confirm the value can no longer terminate a string literal.

All tests/test_memgraph_storage.py and tests/test_workspace_sanitization.py cases pass locally.

Security analysis

  • Source → sink: self.workspace (set from MEMGRAPH_WORKSPACE env var, the LIGHTRAG-WORKSPACE HTTP header, or a direct constructor argument) flows into _get_workspace_label() and from there into Cypher query strings built via f-strings.
  • Pre-fix exploit: workspace value x' OR true OR ' (or similar) injects into the '{workspace_label}' literal in get_knowledge_graph() and yields attacker-controlled Cypher. The previous backtick-doubling sanitizer does not affect quote characters.
  • Post-fix: the allowlist makes injection structurally impossible for label interpolations, and the parameterized literal removes the string-literal sink entirely. The change does not depend on upstream callers for query-construction safety.

Adversarial review

Before submitting, we tried to disprove this. The strongest counter-argument is that lightrag_server.get_workspace_from_request() already applies a similar sanitizer to the LIGHTRAG-WORKSPACE header, so over the HTTP API the dangerous characters never reach MemgraphStorage — that lowers practical severity for typical API deployments. However, the storage class is also reachable via (a) the MEMGRAPH_WORKSPACE environment variable, (b) direct library usage where embedders pass an untrusted workspace argument, and (c) any future code path that constructs a MemgraphStorage without going through the server's header sanitizer. A storage layer should not rely on a single upstream caller for query-construction safety, especially when the correct primitive (allowlist + parameter binding) is small and local.

Scope

Two files changed:

  • lightrag/kg/memgraph_impl.py — sanitizer + one parameterization (~10 net lines)
  • tests/test_workspace_sanitization.py — updated expectations for the stricter contract

No behavior change for valid workspace names (alphanumerics + underscore, which matches what the API layer already accepts).

cc @lewiswigmore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant