Skip to content

Feature/workspace isolation#3011

Open
disillusioners wants to merge 32 commits intoHKUDS:mainfrom
disillusioners:feature/workspace-isolation
Open

Feature/workspace isolation#3011
disillusioners wants to merge 32 commits intoHKUDS:mainfrom
disillusioners:feature/workspace-isolation

Conversation

@disillusioners
Copy link
Copy Markdown

Description

Adds workspace-based data isolation across all 13 LightRAG storage backends, enabling safe multi-tenant deployments where each LightRAG instance operates in its own isolated data space.

Every LightRAG instance can now be assigned an immutable workspace identifier. All data — entities, relations, documents, indexes — is namespaced under that workspace, preventing cross-tenant data access or collision.

Supports all 13 storage backends:

Category Backends
Graph NetworkX, Neo4j, Memgraph
Vector FAISS, NanoVectorDB, Milvus, Qdrant
KV / Doc PostgreSQL, MongoDB, Redis, OpenSearch, JSON KV, JSON DocStatus

Isolation strategy varies by storage type:

  • Shared storage backends (Neo4j, Redis, PostgreSQL, etc.) — uses {workspace}:{namespace} prefix on entity/relation identifiers. Each workspace's data is isolated at the key/ID level within the same physical store.
  • File-based backends (NetworkX, NanoVectorDB, FAISS, JSON KV, JSON DocStatus) — uses directory-based isolation via self.workspace path. Each workspace gets its own data directory.

Workspace lifecycle:

  • Set at LightRAG(working_dir=..., workspace="tenant-a") construction time
  • Immutable after creation — cannot be changed on an existing instance
  • Strong input sanitization: path traversal prevention, character whitelist (a-z, 0-9, -, _), length limits

Administrators of server-based backends can leverage existing environment variable controls (e.g., WORKSPACE_ISOLATION, {BACKEND}_WORKSPACE) alongside this feature.

Related Issues

Changes Made

  • Added workspace parameter to LightRAG constructor for multi-tenant data isolation
  • Implemented workspace-based key namespacing ({workspace}:{namespace}) for shared storage backends (Neo4j, Memgraph, PostgreSQL, MongoDB, Redis, OpenSearch)
  • Implemented directory-based isolation for file-based backends (NetworkX, NanoVectorDB, FAISS, JSON KV, JSON DocStatus)
  • Added workspace input sanitization (path traversal prevention, character whitelist, length limits)
  • Ensured workspace feature works with existing server-backend environment variable controls (WORKSPACE_ISOLATION, {BACKEND}_WORKSPACE)
  • Ensured full backward compatibility — existing code without workspace continues to work identically

Checklist

  • Changes tested locally
  • Code reviewed
  • Documentation updated (if necessary)
  • Unit tests added (if applicable)

Additional Notes

Test coverage: 3 test files, 1,653 lines total:

File Focus Scenarios
test_workspace_isolation.py End-to-end isolation across backends 11 scenarios
test_workspace_migration_isolation.py PostgreSQL migration under isolation Migration-specific
test_workspace_sanitization.py Cypher injection & input sanitization Security edge cases

Backward compatibility: Fully backward compatible. Internal separator differences between backends (: vs _) and empty-workspace normalization are preserved for compatibility.

Usage example:

# Each tenant gets an isolated LightRAG instance
rag_tenant_a = LightRAG(working_dir="./data", workspace="tenant-a")
rag_tenant_b = LightRAG(working_dir="./data", workspace="tenant-b")

# Data is fully isolated — no cross-contamination
rag_tenant_a.insert("Alice works at Acme Corp")
rag_tenant_b.insert("Bob works at Tech Inc")

# Queries return only the tenant's own data
rag_tenant_a.query("Who works where?")  # → Alice at Acme
rag_tenant_b.query("Who works where?")  # → Bob at Tech

…isolation

Phase 1:
- Add WorkspaceManager class with LRU cache (max 10 instances), reference counting,
  per-workspace async locking, and safe eviction
- Add WorkspaceCapacityError for capacity overflow
- Add sanitize_workspace_name() utility in api/utils.py
- Add comprehensive unit tests (26 tests)

Phase 2:
- Create factory callable in lightrag_server.py capturing all 25 LightRAG constructor args
- Replace single rag instance with WorkspaceManager
- Add FastAPI lifespan handler for startup pre-warm and shutdown cleanup
- Update route factory signatures to accept workspace_mgr
- Update /health endpoint to use WorkspaceManager with try/finally release
- Reduce Neo4j default connection pool from 100 to 10
- Audit _default_workspace usage (verified safe, documented)
…e reporting

- Wire sanitize_workspace_name() into get_workspace_from_request() (C2)
- Add defensive sanitization call in WorkspaceManager.get_or_create()
- Move _finalize_instance() outside global lock in _evict_one() and shutdown() (C3)
- Document pre-warm ref_count=1 design choice (W3)
- Fix /health endpoint to report actual queried workspace (W4)
Add comprehensive integration tests for workspace isolation at the HTTP API
layer using httpx.AsyncClient with ASGITransport. Tests verify:

- Header-based workspace extraction from LIGHTRAG-WORKSPACE header
- Default workspace fallback (empty string) when no header present
- Workspace name validation (special chars, path traversal, length)
- Concurrent request isolation across different workspaces
- Background task pattern with proper ref count management
- Streaming response pattern with ref held during stream
- Capacity limit enforcement returning HTTP 503
- LRU eviction under concurrent load

Also update conftest.py to allow @pytest.mark.offline tests in
tests/integration/ to run without --run-integration flag.
…eError

- memgraph_impl.py: initialize memgraph_workspace and original_workspace to None before conditional block
- neo4j_impl.py: initialize original_workspace to None before conditional block (neo4j_workspace was already fixed)

These variables were only assigned inside conditional blocks but referenced
unconditionally in logging statements, causing NameError when WORKSPACE_ISOLATION=true.
- Add WorkspaceSelector dropdown component with auto-refresh
- Add currentWorkspace state to settings store with v20 migration
- Add Workspace type and getWorkspaces() API function
- Inject LIGHTRAG-WORKSPACE header conditionally in axios interceptor and streaming fetch
- Integrate selector into SiteHeader with proper separator handling
- Add workspace i18n keys to English locale
- C1: Sanitize LIGHTRAG-WORKSPACE header to prevent CRLF injection
- C2: Add malformed response guard in getWorkspaces()
- W3: Reset stale workspace selection when workspace removed server-side
- Add workspace API tests (sanitizeHeader, getWorkspaces, header injection)
- Add WorkspaceSelector component logic tests (fetch, stale detection, change handling)
- Add settings store migration tests
- Export sanitizeHeader and axiosInstance for testability
- Add testing dependencies: @testing-library/react, @testing-library/jest-dom,
  @playwright/test, happy-dom, playwright
- Remove folder icon from workspace selector dropdown
- Add tooltip on hover showing "Workspace" label
- Add spacing between LightRAG title and workspace selector
- Add useWorkspaceChange hook that monitors workspace changes and clears state
- Documents: clear and re-fetch document list on workspace change
- Knowledge Graph: reset graph state including isFetching flag
- Retrieval: clear query messages and history on workspace change
- Add workspaceRefreshTrigger signal in settings store
- API tab confirmed workspace-agnostic (no changes needed)

Files modified:
- src/stores/settings.ts (workspaceRefreshTrigger + triggerWorkspaceRefresh)
- src/stores/graph.ts (isFetching: false in reset)
- src/hooks/useWorkspaceChange.ts (new)
- src/App.tsx (useWorkspaceChange hook)
- src/features/DocumentManager.tsx (workspace refresh handling)
- src/features/RetrievalTesting.tsx (clear messages on workspace change)
- Add partialize to settings persist config to exclude trigger counters
  from localStorage, preventing stale refresh on page reload
- Move graphDataFetchAttempted/labelsFetchAttempted resets and
  incrementGraphDataVersion into graph.reset() for completeness
- Remove now-redundant manual calls from useWorkspaceChange hook
… functionality

Add comprehensive tests for workspace isolation features including:
- workspaceRefreshTrigger state and triggerWorkspaceRefresh() in settings store
- searchLabelDropdownRefreshTrigger state and triggerSearchLabelDropdownRefresh() in settings store
- useWorkspaceChange hook behavior
- graph store workspace isolation
…ration

Include /workspaces in the VITE_API_ENDPOINTS environment variable to ensure the development server correctly proxies workspace-related API requests.
The root cause was state.reset() being called inside the fetch completion
handler (useLightragGraph.tsx line 377). reset() sets graphDataFetchAttempted
to false, which re-triggers the fetch useEffect that checks that flag.

The fix replaces state.reset() with targeted clears that preserve the
fetch attempt flags (graphDataFetchAttempted, labelsFetchAttempted),
preventing the fetch useEffect from re-triggering after a successful fetch.

Fetch flags are only reset by the workspace change handler (useWorkspaceChange),
which is the correct place for full state reset.
The previous fix for the infinite loop (commit 3cc3613) prevented
state.reset() from being called in the fetch completion handler. But
this broke workspace switching: after calling reset(), the fetch
useEffect never re-fired because none of its React dependencies
actually changed.

Root cause: Two issues after workspace change:
1. graphDataVersion was not incremented, so the fetch useEffect's
   dependency array didn't change (isFetching was already false)
2. queryLabel stayed empty ('') because the previous fetch handler
   cleared it when graph data was empty. The emptyDataHandledRef guard
   then blocked re-fetching.

Fix: In useWorkspaceChange, after calling reset():
- Call incrementGraphDataVersion() to trigger the fetch useEffect
- Call setQueryLabel(defaultQueryLabel) to restore '*' so the fetch
  path is entered (avoids emptyDataHandledRef guard)

Verified with Playwright E2E:
- Initial load: 1 /graphs call
- Switch workspace: 1 /graphs call (was 0 before fix)
- Switch back: 1 /graphs call (was 0 before fix)
- No infinite loop: 0 calls during 15s watch periods
- All 86 unit tests pass
The workspace change useEffect was calling fetchPopularLabels() without await,
causing bumpDropdownData() to trigger AsyncSelect remount BEFORE the popular
labels were fetched and stored in SearchHistoryManager. This resulted in the
combobox reading stale/empty data.

Fixed by awaiting the fetchPopularLabels() call before triggering the dropdown
refresh, ensuring SearchHistoryManager is populated before the component remounts
and re-reads the data.
…CACHE_LIMIT env var

The LRU cache limit for workspace RAG instances was hardcoded to 10.
Now configurable via LIGHTRAG_WORKSPACE_CACHE_LIMIT environment variable.
Defaults to 10. Invalid/non-numeric/negative values fall back to 10.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant