Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions CONTEXT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# LightRAG Domain Glossary

## Knowledge Graph
The shared corpus of entities and relationships extracted from ingested documents.
A single Knowledge Graph is shared across all users in a team deployment.
Distinct from per-user workspaces, which are not used in this project.

## Query
An operation that retrieves context from the Knowledge Graph and synthesises a
natural-language answer using an LLM. The primary MCP tool surface for callers.

## Retrieve
A diagnostic operation that returns raw entities, relations, and chunks from the
Knowledge Graph without LLM synthesis. Intended for developers and power users
inspecting retrieval quality, not for routine AI-agent use.

## MCP Server
The adapter between MCP-compatible clients (Claude Desktop, Cursor, etc.) and the
LightRAG REST API. Exposes `query` and `retrieve` as MCP tools. Read-only: no
document insertion or deletion is exposed through this layer.

## Retrieval Mode
The algorithm used to fetch context from the Knowledge Graph (local, global,
hybrid, naive, mix). Configured server-side via `LIGHTRAG_QUERY_MODE`; not
exposed as a caller-controlled parameter in the MCP layer.

## QueryParam
A pure value object that carries all parameters for a single Knowledge Graph
Query. Has no environment reads at definition time — all numeric defaults are
compile-time constants from `lightrag/constants.py`.

## ConfigResolver
The single place where environment variables are read and converted into a
`QueryParam`. Tests inject a plain `dict`; production code uses `os.environ`
via the `default_query_param()` convenience function. Lives in
`lightrag/query_config.py`.
54 changes: 54 additions & 0 deletions docs/adr/0001-mcp-server-thin-adapter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# ADR-0001: MCP Server is a thin adapter; query caching deferred

**Status:** Accepted
**Date:** 2026-05-07

## Context

The MCP Server (`lightrag/mcp_server.py`) wraps the LightRAG REST API as two
MCP tools (`query`, `retrieve`). After a grilling session on its design, three
deepening options were considered:

1. **Startup warm-up** — verify the LightRAG server is reachable before
accepting MCP connections.
2. **In-process query cache** — skip the HTTP round-trip for repeated identical
questions within a session.
3. **Accept it as a thin adapter** — record the decision so future architecture
reviews don't re-open it without new evidence.

## Decision

The MCP Server is intentionally a **thin adapter** between the MCP protocol and
the LightRAG REST API. It owns:

- HTTP client lifecycle and authentication
- Error translation (transport/5xx → `McpError`; 4xx → string result)
- Reference formatting (`[title](url)` with fallback)
- Startup warm-up (added; see below)

It does **not** own retrieval logic, Knowledge Graph access, or LLM calls —
those stay in the LightRAG server.

**Query caching was considered and deferred.** Reasons:

- Cache invalidation requires knowing when the Knowledge Graph changes (new
documents ingested). The MCP Server has no visibility into this event.
- In practice, repeated identical queries within a single MCP session are rare
for the team-server use case this project targets.
- If repeated-query load becomes measurable, caching belongs in the LightRAG
REST API (`/query` endpoint) where cache invalidation can be wired to document
ingestion events — not in the MCP adapter.

**Startup warm-up was added.** A single `GET /health` call on startup surfaces
misconfiguration immediately (wrong `LIGHTRAG_API_URL`, server not started)
rather than silently failing the first user query. A warning — not a crash — is
emitted so transient connectivity issues during rolling restarts don't prevent
the MCP server from coming up.

## Consequences

- Adding behaviour to the MCP layer (e.g. per-user context, audit logging)
requires revisiting this ADR and justifying why it belongs in the adapter
rather than the REST API.
- If query caching is later added, it should be implemented in the LightRAG
REST API, not here.
2 changes: 2 additions & 0 deletions lightrag/api/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ def get_default_host(binding_type: str) -> str:
"gemini": os.getenv(
"LLM_BINDING_HOST", "https://generativelanguage.googleapis.com"
),
"anthropic": os.getenv("LLM_BINDING_HOST", ""), # SDK uses its own default
}
return default_hosts.get(
binding_type, os.getenv("LLM_BINDING_HOST", "http://localhost:11434")
Expand Down Expand Up @@ -318,6 +319,7 @@ def parse_args() -> argparse.Namespace:
"azure_openai",
"aws_bedrock",
"gemini",
"anthropic",
],
help="LLM binding type (default: from env or ollama)",
)
Expand Down
35 changes: 35 additions & 0 deletions lightrag/api/lightrag_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,7 @@ def create_app(args):
"azure_openai",
"aws_bedrock",
"gemini",
"anthropic",
]:
raise Exception("llm binding not supported")

Expand Down Expand Up @@ -618,6 +619,40 @@ def create_llm_model_func(binding: str):
from lightrag.llm.ollama import ollama_model_complete

return ollama_model_complete
elif binding == "anthropic":
from lightrag.llm.anthropic import anthropic_complete_if_cache

async def anthropic_model_complete(
prompt,
system_prompt=None,
history_messages=None,
keyword_extraction=False,
**kwargs,
) -> str:
kwargs.pop("keyword_extraction", None)
if history_messages is None:
history_messages = []
kwargs["timeout"] = llm_timeout
kwargs.setdefault("max_tokens", 8192) # required by Anthropic API
result = await anthropic_complete_if_cache(
args.llm_model,
prompt,
system_prompt=system_prompt,
history_messages=history_messages,
api_key=args.llm_binding_api_key,
base_url=args.llm_binding_host if args.llm_binding_host else None,
**kwargs,
)
# anthropic_complete_if_cache always returns an async generator;
# collect it into a string for non-streaming callers
if hasattr(result, "__aiter__"):
chunks = []
async for chunk in result:
chunks.append(chunk)
return "".join(chunks)
return result

return anthropic_model_complete
elif binding == "aws_bedrock":
return bedrock_model_complete # Already defined locally
elif binding == "azure_openai":
Expand Down
2 changes: 1 addition & 1 deletion lightrag/api/routers/query_routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class QueryRequest(BaseModel):
)

mode: Literal["local", "global", "hybrid", "naive", "mix", "bypass"] = Field(
default="mix",
default="local",
description="Query mode",
)

Expand Down
96 changes: 1 addition & 95 deletions lightrag/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,8 @@
from dataclasses import dataclass, field
from typing import (
Any,
Literal,
TypedDict,
TypeVar,
Callable,
Optional,
Dict,
List,
Expand All @@ -19,18 +17,13 @@
from .utils import EmbeddingFunc
from .types import KnowledgeGraph
from .constants import (
DEFAULT_TOP_K,
DEFAULT_CHUNK_TOP_K,
DEFAULT_MAX_ENTITY_TOKENS,
DEFAULT_MAX_RELATION_TOKENS,
DEFAULT_MAX_TOTAL_TOKENS,
DEFAULT_HISTORY_TURNS,
DEFAULT_OLLAMA_MODEL_NAME,
DEFAULT_OLLAMA_MODEL_TAG,
DEFAULT_OLLAMA_MODEL_SIZE,
DEFAULT_OLLAMA_CREATED_AT,
DEFAULT_OLLAMA_DIGEST,
)
from .query_config import QueryParam, ConfigResolver, default_query_param # noqa: F401

# use the .env that is inside the current folder
# allows to use different .env file for each lightrag instance
Expand Down Expand Up @@ -81,93 +74,6 @@ class TextChunkSchema(TypedDict):
T = TypeVar("T")


@dataclass
class QueryParam:
"""Configuration parameters for query execution in LightRAG."""

mode: Literal["local", "global", "hybrid", "naive", "mix", "bypass"] = "mix"
"""Specifies the retrieval mode:
- "local": Focuses on context-dependent information.
- "global": Utilizes global knowledge.
- "hybrid": Combines local and global retrieval methods.
- "naive": Performs a basic search without advanced techniques.
- "mix": Integrates knowledge graph and vector retrieval.
"""

only_need_context: bool = False
"""If True, only returns the retrieved context without generating a response."""

only_need_prompt: bool = False
"""If True, only returns the generated prompt without producing a response."""

response_type: str = "Multiple Paragraphs"
"""Defines the response format. Examples: 'Multiple Paragraphs', 'Single Paragraph', 'Bullet Points'."""

stream: bool = False
"""If True, enables streaming output for real-time responses."""

top_k: int = int(os.getenv("TOP_K", str(DEFAULT_TOP_K)))
"""Number of top items to retrieve. Represents entities in 'local' mode and relationships in 'global' mode."""

chunk_top_k: int = int(os.getenv("CHUNK_TOP_K", str(DEFAULT_CHUNK_TOP_K)))
"""Number of text chunks to retrieve initially from vector search and keep after reranking.
If None, defaults to top_k value.
"""

max_entity_tokens: int = int(
os.getenv("MAX_ENTITY_TOKENS", str(DEFAULT_MAX_ENTITY_TOKENS))
)
"""Maximum number of tokens allocated for entity context in unified token control system."""

max_relation_tokens: int = int(
os.getenv("MAX_RELATION_TOKENS", str(DEFAULT_MAX_RELATION_TOKENS))
)
"""Maximum number of tokens allocated for relationship context in unified token control system."""

max_total_tokens: int = int(
os.getenv("MAX_TOTAL_TOKENS", str(DEFAULT_MAX_TOTAL_TOKENS))
)
"""Maximum total tokens budget for the entire query context (entities + relations + chunks + system prompt)."""

hl_keywords: list[str] = field(default_factory=list)
"""List of high-level keywords to prioritize in retrieval."""

ll_keywords: list[str] = field(default_factory=list)
"""List of low-level keywords to refine retrieval focus."""

# History mesages is only send to LLM for context, not used for retrieval
conversation_history: list[dict[str, str]] = field(default_factory=list)
"""Stores past conversation history to maintain context.
Format: [{"role": "user/assistant", "content": "message"}].
"""

# TODO: deprecated. No longer used in the codebase, all conversation_history messages is send to LLM
history_turns: int = int(os.getenv("HISTORY_TURNS", str(DEFAULT_HISTORY_TURNS)))
"""Number of complete conversation turns (user-assistant pairs) to consider in the response context."""

model_func: Callable[..., object] | None = None
"""Optional override for the LLM model function to use for this specific query.
If provided, this will be used instead of the global model function.
This allows using different models for different query modes.
"""

user_prompt: str | None = None
"""User-provided prompt for the query.
Addition instructions for LLM. If provided, this will be inject into the prompt template.
It's purpose is the let user customize the way LLM generate the response.
"""

enable_rerank: bool = os.getenv("RERANK_BY_DEFAULT", "true").lower() == "true"
"""Enable reranking for retrieved text chunks. If True but no rerank model is configured, a warning will be issued.
Default is True to enable reranking when rerank model is available.
"""

include_references: bool = False
"""If True, includes reference list in the response for supported endpoints.
This parameter controls whether the API response includes a references field
containing citation information for the retrieved content.
"""


@dataclass
class StorageNameSpace(ABC):
Expand Down
16 changes: 4 additions & 12 deletions lightrag/kg/faiss_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
get_namespace_lock,
get_update_flag,
set_all_update_flags,
resolve_workspace,
)

# You must manually install faiss-cpu or faiss-gpu before using FAISS vector db
Expand All @@ -38,18 +39,9 @@ def __post_init__(self):
)
self.cosine_better_than_threshold = cosine_threshold

# Where to save index file if you want persistent storage
working_dir = self.global_config["working_dir"]
if self.workspace:
# Include workspace in the file path for data isolation
workspace_dir = os.path.join(working_dir, self.workspace)

else:
# Default behavior when workspace is empty
workspace_dir = working_dir
self.workspace = ""

os.makedirs(workspace_dir, exist_ok=True)
workspace_dir, self.workspace = resolve_workspace(
self.global_config, self.workspace
)
self._faiss_index_file = os.path.join(
workspace_dir, f"faiss_index_{self.namespace}.index"
)
Expand Down
14 changes: 4 additions & 10 deletions lightrag/kg/json_doc_status_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
set_all_update_flags,
clear_all_update_flags,
try_initialize_namespace,
resolve_workspace,
)


Expand All @@ -32,16 +33,9 @@ class JsonDocStatusStorage(DocStatusStorage):
"""JSON implementation of document status storage"""

def __post_init__(self):
working_dir = self.global_config["working_dir"]
if self.workspace:
# Include workspace in the file path for data isolation
workspace_dir = os.path.join(working_dir, self.workspace)
else:
# Default behavior when workspace is empty
workspace_dir = working_dir
self.workspace = ""

os.makedirs(workspace_dir, exist_ok=True)
workspace_dir, self.workspace = resolve_workspace(
self.global_config, self.workspace
)
self._file_name = os.path.join(workspace_dir, f"kv_store_{self.namespace}.json")
self._data = None
self._storage_lock = None
Expand Down
14 changes: 4 additions & 10 deletions lightrag/kg/json_kv_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,17 @@
set_all_update_flags,
clear_all_update_flags,
try_initialize_namespace,
resolve_workspace,
)


@final
@dataclass
class JsonKVStorage(BaseKVStorage):
def __post_init__(self):
working_dir = self.global_config["working_dir"]
if self.workspace:
# Include workspace in the file path for data isolation
workspace_dir = os.path.join(working_dir, self.workspace)
else:
# Default behavior when workspace is empty
workspace_dir = working_dir
self.workspace = ""

os.makedirs(workspace_dir, exist_ok=True)
workspace_dir, self.workspace = resolve_workspace(
self.global_config, self.workspace
)
self._file_name = os.path.join(workspace_dir, f"kv_store_{self.namespace}.json")
self._data = None
self._storage_lock = None
Expand Down
Loading