Skip to content

Isolate Milvus collections by embedding model#2979

Open
LiemFrans wants to merge 1 commit intoHKUDS:mainfrom
LiemFrans:fix/milvus-model-isolation
Open

Isolate Milvus collections by embedding model#2979
LiemFrans wants to merge 1 commit intoHKUDS:mainfrom
LiemFrans:fix/milvus-model-isolation

Conversation

@LiemFrans
Copy link
Copy Markdown

Description

Prevent Milvus startup failures when an existing unsuffixed collection was created with a different embedding model or vector dimension. This aligns Milvus with the model-aware isolation already used by other vector backends while still preserving compatible legacy collections.

Related Issues

  • No tracked issue number.
  • Addresses the Milvus startup failure caused by Vector dimension mismatch for collection 'entities': existing=4096, current=2560.

Changes Made

  • Added model-aware Milvus collection naming based on embedding model and dimension.
  • Reused compatible legacy Milvus collections when an existing unsuffixed collection still matches the current schema and embedding dimension.
  • Created a new suffixed collection when a legacy collection exists but is incompatible with the current embedding configuration.
  • Hardened collection suffix generation to ignore non-string model_name values.
  • Added regression tests covering suffixed naming, compatible legacy reuse, and incompatible legacy fallback behavior.

Checklist

  • Changes tested locally
  • Code reviewed
  • Documentation updated (if necessary)
  • Unit tests added (if applicable)

Additional Notes

  • Validated with ./scripts/test.sh tests/test_milvus_index_creation.py.
  • Focused test run passed: 17 tests.
  • PR target should be HKUDS/LightRAG:main.
  • PR head should be LiemFrans:fix/milvus-model-isolation.

@danielaskdd
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7aaf56c436

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1222 to +1226
self._activate_existing_collection(self.legacy_namespace)
logger.info(
f"[{self.workspace}] Reusing compatible legacy collection '{self.legacy_namespace}' for namespace '{self.namespace}'"
)
return True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip legacy reuse when embedding model identity is unknown

This branch reuses the unsuffixed legacy collection whenever _activate_existing_collection succeeds, but that compatibility path validates only schema and vector dimension, not the embedding model identity. In a migration from one model to another with the same dimension (for example, two different 1536d models), initialization will silently bind to the old legacy collection and mix incompatible vectors, which undermines retrieval correctness and defeats the model-isolation goal.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants