HKUDS · tonygil · Apr 27, 2026 · May 3, 2026 · May 7, 2026 · May 7, 2026
diff --git a/CONTEXT.md b/CONTEXT.md
@@ -0,0 +1,36 @@
+# LightRAG Domain Glossary
+
+## Knowledge Graph
+The shared corpus of entities and relationships extracted from ingested documents.
+A single Knowledge Graph is shared across all users in a team deployment.
+Distinct from per-user workspaces, which are not used in this project.
+
+## Query
+An operation that retrieves context from the Knowledge Graph and synthesises a
+natural-language answer using an LLM. The primary MCP tool surface for callers.
+
+## Retrieve
+A diagnostic operation that returns raw entities, relations, and chunks from the
+Knowledge Graph without LLM synthesis. Intended for developers and power users
+inspecting retrieval quality, not for routine AI-agent use.
+
+## MCP Server
+The adapter between MCP-compatible clients (Claude Desktop, Cursor, etc.) and the
+LightRAG REST API. Exposes `query` and `retrieve` as MCP tools. Read-only: no
+document insertion or deletion is exposed through this layer.
+
+## Retrieval Mode
+The algorithm used to fetch context from the Knowledge Graph (local, global,
+hybrid, naive, mix). Configured server-side via `LIGHTRAG_QUERY_MODE`; not
+exposed as a caller-controlled parameter in the MCP layer.
+
+## QueryParam
+A pure value object that carries all parameters for a single Knowledge Graph
+Query. Has no environment reads at definition time — all numeric defaults are
+compile-time constants from `lightrag/constants.py`.
+
+## ConfigResolver
+The single place where environment variables are read and converted into a
+`QueryParam`. Tests inject a plain `dict`; production code uses `os.environ`
+via the `default_query_param()` convenience function. Lives in
+`lightrag/query_config.py`.
diff --git a/docs/adr/0001-mcp-server-thin-adapter.md b/docs/adr/0001-mcp-server-thin-adapter.md
@@ -0,0 +1,54 @@
+# ADR-0001: MCP Server is a thin adapter; query caching deferred
+
+**Status:** Accepted  
+**Date:** 2026-05-07
+
+## Context
+
+The MCP Server (`lightrag/mcp_server.py`) wraps the LightRAG REST API as two
+MCP tools (`query`, `retrieve`). After a grilling session on its design, three
+deepening options were considered:
+
+1. **Startup warm-up** — verify the LightRAG server is reachable before
+   accepting MCP connections.
+2. **In-process query cache** — skip the HTTP round-trip for repeated identical
+   questions within a session.
+3. **Accept it as a thin adapter** — record the decision so future architecture
+   reviews don't re-open it without new evidence.
+
+## Decision
+
+The MCP Server is intentionally a **thin adapter** between the MCP protocol and
+the LightRAG REST API. It owns:
+
+- HTTP client lifecycle and authentication
+- Error translation (transport/5xx → `McpError`; 4xx → string result)
+- Reference formatting (`[title](url)` with fallback)
+- Startup warm-up (added; see below)
+
+It does **not** own retrieval logic, Knowledge Graph access, or LLM calls —
+those stay in the LightRAG server.
+
+**Query caching was considered and deferred.** Reasons:
+
+- Cache invalidation requires knowing when the Knowledge Graph changes (new
+  documents ingested). The MCP Server has no visibility into this event.
+- In practice, repeated identical queries within a single MCP session are rare
+  for the team-server use case this project targets.
+- If repeated-query load becomes measurable, caching belongs in the LightRAG
+  REST API (`/query` endpoint) where cache invalidation can be wired to document
+  ingestion events — not in the MCP adapter.
+
+**Startup warm-up was added.** A single `GET /health` call on startup surfaces
+misconfiguration immediately (wrong `LIGHTRAG_API_URL`, server not started)
+rather than silently failing the first user query. A warning — not a crash — is
+emitted so transient connectivity issues during rolling restarts don't prevent
+the MCP server from coming up.
+
+## Consequences
+
+- Adding behaviour to the MCP layer (e.g. per-user context, audit logging)
+  requires revisiting this ADR and justifying why it belongs in the adapter
+  rather than the REST API.
+- If query caching is later added, it should be implemented in the LightRAG
+  REST API, not here.
diff --git a/lightrag/api/config.py b/lightrag/api/config.py
@@ -73,6 +73,7 @@ def get_default_host(binding_type: str) -> str:
         "gemini": os.getenv(
             "LLM_BINDING_HOST", "https://generativelanguage.googleapis.com"
         ),
+        "anthropic": os.getenv("LLM_BINDING_HOST", ""),  # SDK uses its own default
     }
     return default_hosts.get(
         binding_type, os.getenv("LLM_BINDING_HOST", "http://localhost:11434")
@@ -318,6 +319,7 @@ def parse_args() -> argparse.Namespace:
             "azure_openai",
             "aws_bedrock",
             "gemini",
+            "anthropic",
         ],
         help="LLM binding type (default: from env or ollama)",
     )

diff --git a/lightrag/api/lightrag_server.py b/lightrag/api/lightrag_server.py
@@ -306,6 +306,7 @@ def create_app(args):
         "azure_openai",
         "aws_bedrock",
         "gemini",
+        "anthropic",
     ]:
         raise Exception("llm binding not supported")
 
@@ -618,6 +619,40 @@ def create_llm_model_func(binding: str):
                 from lightrag.llm.ollama import ollama_model_complete
 
                 return ollama_model_complete
+            elif binding == "anthropic":
+                from lightrag.llm.anthropic import anthropic_complete_if_cache
+
+                async def anthropic_model_complete(
+                    prompt,
+                    system_prompt=None,
+                    history_messages=None,
+                    keyword_extraction=False,
+                    **kwargs,
+                ) -> str:
+                    kwargs.pop("keyword_extraction", None)
+                    if history_messages is None:
+                        history_messages = []
+                    kwargs["timeout"] = llm_timeout
+                    kwargs.setdefault("max_tokens", 8192)  # required by Anthropic API
+                    result = await anthropic_complete_if_cache(
+                        args.llm_model,
+                        prompt,
+                        system_prompt=system_prompt,
+                        history_messages=history_messages,
+                        api_key=args.llm_binding_api_key,
+                        base_url=args.llm_binding_host if args.llm_binding_host else None,
+                        **kwargs,
+                    )
+                    # anthropic_complete_if_cache always returns an async generator;
+                    # collect it into a string for non-streaming callers
+                    if hasattr(result, "__aiter__"):
+                        chunks = []
+                        async for chunk in result:
+                            chunks.append(chunk)
+                        return "".join(chunks)
+                    return result
+
+                return anthropic_model_complete
             elif binding == "aws_bedrock":
                 return bedrock_model_complete  # Already defined locally
             elif binding == "azure_openai":

diff --git a/lightrag/api/routers/query_routes.py b/lightrag/api/routers/query_routes.py
@@ -20,7 +20,7 @@ class QueryRequest(BaseModel):
     )
 
     mode: Literal["local", "global", "hybrid", "naive", "mix", "bypass"] = Field(
-        default="mix",
+        default="local",
         description="Query mode",
     )
 

diff --git a/lightrag/base.py b/lightrag/base.py
@@ -7,10 +7,8 @@
 from dataclasses import dataclass, field
 from typing import (
     Any,
-    Literal,
     TypedDict,
     TypeVar,
-    Callable,
     Optional,
     Dict,
     List,
@@ -19,18 +17,13 @@
 from .utils import EmbeddingFunc
 from .types import KnowledgeGraph
 from .constants import (
-    DEFAULT_TOP_K,
-    DEFAULT_CHUNK_TOP_K,
-    DEFAULT_MAX_ENTITY_TOKENS,
-    DEFAULT_MAX_RELATION_TOKENS,
-    DEFAULT_MAX_TOTAL_TOKENS,
-    DEFAULT_HISTORY_TURNS,
     DEFAULT_OLLAMA_MODEL_NAME,
     DEFAULT_OLLAMA_MODEL_TAG,
     DEFAULT_OLLAMA_MODEL_SIZE,
     DEFAULT_OLLAMA_CREATED_AT,
     DEFAULT_OLLAMA_DIGEST,
 )
+from .query_config import QueryParam, ConfigResolver, default_query_param  # noqa: F401
 
 # use the .env that is inside the current folder
 # allows to use different .env file for each lightrag instance
@@ -81,93 +74,6 @@ class TextChunkSchema(TypedDict):
 T = TypeVar("T")
 
 
-@dataclass
-class QueryParam:
-    """Configuration parameters for query execution in LightRAG."""
-
-    mode: Literal["local", "global", "hybrid", "naive", "mix", "bypass"] = "mix"
-    """Specifies the retrieval mode:
-    - "local": Focuses on context-dependent information.
-    - "global": Utilizes global knowledge.
-    - "hybrid": Combines local and global retrieval methods.
-    - "naive": Performs a basic search without advanced techniques.
-    - "mix": Integrates knowledge graph and vector retrieval.
-    """
-
-    only_need_context: bool = False
-    """If True, only returns the retrieved context without generating a response."""
-
-    only_need_prompt: bool = False
-    """If True, only returns the generated prompt without producing a response."""
-
-    response_type: str = "Multiple Paragraphs"
-    """Defines the response format. Examples: 'Multiple Paragraphs', 'Single Paragraph', 'Bullet Points'."""
-
-    stream: bool = False
-    """If True, enables streaming output for real-time responses."""
-
-    top_k: int = int(os.getenv("TOP_K", str(DEFAULT_TOP_K)))
-    """Number of top items to retrieve. Represents entities in 'local' mode and relationships in 'global' mode."""
-
-    chunk_top_k: int = int(os.getenv("CHUNK_TOP_K", str(DEFAULT_CHUNK_TOP_K)))
-    """Number of text chunks to retrieve initially from vector search and keep after reranking.
-    If None, defaults to top_k value.
-    """
-
-    max_entity_tokens: int = int(
-        os.getenv("MAX_ENTITY_TOKENS", str(DEFAULT_MAX_ENTITY_TOKENS))
-    )
-    """Maximum number of tokens allocated for entity context in unified token control system."""
-
-    max_relation_tokens: int = int(
-        os.getenv("MAX_RELATION_TOKENS", str(DEFAULT_MAX_RELATION_TOKENS))
-    )
-    """Maximum number of tokens allocated for relationship context in unified token control system."""
-
-    max_total_tokens: int = int(
-        os.getenv("MAX_TOTAL_TOKENS", str(DEFAULT_MAX_TOTAL_TOKENS))
-    )
-    """Maximum total tokens budget for the entire query context (entities + relations + chunks + system prompt)."""
-
-    hl_keywords: list[str] = field(default_factory=list)
-    """List of high-level keywords to prioritize in retrieval."""
-
-    ll_keywords: list[str] = field(default_factory=list)
-    """List of low-level keywords to refine retrieval focus."""
-
-    # History mesages is only send to LLM for context, not used for retrieval
-    conversation_history: list[dict[str, str]] = field(default_factory=list)
-    """Stores past conversation history to maintain context.
-    Format: [{"role": "user/assistant", "content": "message"}].
-    """
-
-    # TODO: deprecated. No longer used in the codebase, all conversation_history messages is send to LLM
-    history_turns: int = int(os.getenv("HISTORY_TURNS", str(DEFAULT_HISTORY_TURNS)))
-    """Number of complete conversation turns (user-assistant pairs) to consider in the response context."""
-
-    model_func: Callable[..., object] | None = None
-    """Optional override for the LLM model function to use for this specific query.
-    If provided, this will be used instead of the global model function.
-    This allows using different models for different query modes.
-    """
-
-    user_prompt: str | None = None
-    """User-provided prompt for the query.
-    Addition instructions for LLM. If provided, this will be inject into the prompt template.
-    It's purpose is the let user customize the way LLM generate the response.
-    """
-
-    enable_rerank: bool = os.getenv("RERANK_BY_DEFAULT", "true").lower() == "true"
-    """Enable reranking for retrieved text chunks. If True but no rerank model is configured, a warning will be issued.
-    Default is True to enable reranking when rerank model is available.
-    """
-
-    include_references: bool = False
-    """If True, includes reference list in the response for supported endpoints.
-    This parameter controls whether the API response includes a references field
-    containing citation information for the retrieved content.
-    """
-
 
 @dataclass
 class StorageNameSpace(ABC):

diff --git a/lightrag/kg/faiss_impl.py b/lightrag/kg/faiss_impl.py
@@ -13,6 +13,7 @@
     get_namespace_lock,
     get_update_flag,
     set_all_update_flags,
+    resolve_workspace,
 )
 
 # You must manually install faiss-cpu or faiss-gpu before using FAISS vector db
@@ -38,18 +39,9 @@ def __post_init__(self):
             )
         self.cosine_better_than_threshold = cosine_threshold
 
-        # Where to save index file if you want persistent storage
-        working_dir = self.global_config["working_dir"]
-        if self.workspace:
-            # Include workspace in the file path for data isolation
-            workspace_dir = os.path.join(working_dir, self.workspace)
-
-        else:
-            # Default behavior when workspace is empty
-            workspace_dir = working_dir
-            self.workspace = ""
-
-        os.makedirs(workspace_dir, exist_ok=True)
+        workspace_dir, self.workspace = resolve_workspace(
+            self.global_config, self.workspace
+        )
         self._faiss_index_file = os.path.join(
             workspace_dir, f"faiss_index_{self.namespace}.index"
         )

diff --git a/lightrag/kg/json_doc_status_impl.py b/lightrag/kg/json_doc_status_impl.py
@@ -23,6 +23,7 @@
     set_all_update_flags,
     clear_all_update_flags,
     try_initialize_namespace,
+    resolve_workspace,
 )
 
 
@@ -32,16 +33,9 @@ class JsonDocStatusStorage(DocStatusStorage):
     """JSON implementation of document status storage"""
 
     def __post_init__(self):
-        working_dir = self.global_config["working_dir"]
-        if self.workspace:
-            # Include workspace in the file path for data isolation
-            workspace_dir = os.path.join(working_dir, self.workspace)
-        else:
-            # Default behavior when workspace is empty
-            workspace_dir = working_dir
-            self.workspace = ""
-
-        os.makedirs(workspace_dir, exist_ok=True)
+        workspace_dir, self.workspace = resolve_workspace(
+            self.global_config, self.workspace
+        )
         self._file_name = os.path.join(workspace_dir, f"kv_store_{self.namespace}.json")
         self._data = None
         self._storage_lock = None

diff --git a/lightrag/kg/json_kv_impl.py b/lightrag/kg/json_kv_impl.py
@@ -20,23 +20,17 @@
     set_all_update_flags,
     clear_all_update_flags,
     try_initialize_namespace,
+    resolve_workspace,
 )
 
 
 @final
 @dataclass
 class JsonKVStorage(BaseKVStorage):
     def __post_init__(self):
-        working_dir = self.global_config["working_dir"]
-        if self.workspace:
-            # Include workspace in the file path for data isolation
-            workspace_dir = os.path.join(working_dir, self.workspace)
-        else:
-            # Default behavior when workspace is empty
-            workspace_dir = working_dir
-            self.workspace = ""
-
-        os.makedirs(workspace_dir, exist_ok=True)
+        workspace_dir, self.workspace = resolve_workspace(
+            self.global_config, self.workspace
+        )
         self._file_name = os.path.join(workspace_dir, f"kv_store_{self.namespace}.json")
         self._data = None
         self._storage_lock = None