Skip to content

k8s-deploy: add DocumentDB backend (MongoDB-compatible)#3009

Open
hossain-rayhan wants to merge 1 commit intoHKUDS:mainfrom
hossain-rayhan:rayhan/add-documentdb-guidelines
Open

k8s-deploy: add DocumentDB backend (MongoDB-compatible)#3009
hossain-rayhan wants to merge 1 commit intoHKUDS:mainfrom
hossain-rayhan:rayhan/add-documentdb-guidelines

Conversation

@hossain-rayhan
Copy link
Copy Markdown

Description

Adds an optional DocumentDB backend to the k8s-deploy/ flow as a single MongoDB-compatible alternative to running PostgreSQL and Neo4j side-by-side. DocumentDB (documentdb/documentdb-kubernetes-operator) ships its own operator, so this PR wires it in alongside the existing KubeBlocks-based databases rather than as a KubeBlocks addon.

🙏 Looking for an early sanity-check on the approach before I polish further. The DocumentDB project provides its own Kubernetes operator (Helm chart at https://documentdb.github.io/documentdb-kubernetes-operator), so I deliberately did not package it as a KubeBlocks addon, instead 01-prepare.sh installs the upstream operator directly when ENABLE_DOCUMENTDB=true. Happy to refactor if maintainers would prefer a different integration shape.

DocumentDB itself is not GA yet (upstream notes: "under active development but not yet recommended for production use"), so the README places it under a clearly-labeled Preview / Experimental section, separate from the Production Deployment flow.

Related Issues

N/A — inspired by documentdb/documentdb-kubernetes-operator#362.

Changes Made

New folder (mirrors siblings like databases/postgresql/, databases/neo4j/):

  • k8s-deploy/databases/documentdb/values.yaml — single-node DocumentDB CR + docdb-credentials Secret (placeholder password), images pinned to ghcr.io/documentdb/documentdb-kubernetes-operator/{documentdb,gateway}:0.110.0.

Wiring into the existing scripts (all gated on ENABLE_DOCUMENTDB=false by default):

  • databases/00-config.sh — added ENABLE_DOCUMENTDB flag with a note that it uses its own operator (not KubeBlocks).
  • databases/01-prepare.sh — installs cert-manager (only if missing) and the DocumentDB operator Helm chart into the documentdb-operator namespace.
  • databases/02-install-database.sh — applies the CR + Secret and waits for status.status="Cluster in healthy state".
  • databases/03-uninstall-database.sh — deletes the CR.
  • databases/04-cleanup.sh — uninstalls the DocumentDB operator (cert-manager intentionally left in place).
  • lightrag/values.yaml — adds MONGO_DATABASE: lightrag plus comments documenting the override env vars.
  • install_lightrag.sh — sources 00-config.sh, gates PG/Neo4j password lookups on their ENABLE_* flags, and adds a DocumentDB block that:
    1. Reads status.connectionString from the DocumentDB resource,
    2. eval-resolves the embedded $(kubectl get secret …) substitution,
    3. Swaps the ClusterIP for the in-cluster DNS name and strips replicaSet=rs0 (which conflicts with directConnection=true in pymongo),
    4. Sets LIGHTRAG_KV_STORAGE=MongoKVStorage, LIGHTRAG_GRAPH_STORAGE=MongoGraphStorage, LIGHTRAG_DOC_STATUS_STORAGE=MongoDocStatusStorage, and LIGHTRAG_VECTOR_STORAGE=NanoVectorDBStorage.

Docs:

  • k8s-deploy/README.md — new top-level Preview / Experimental: DocumentDB Backend section after Production Deployment, with a ⚠️ "not for production use" warning, enable instructions, and caveats.
  • k8s-deploy/databases/README.md — short note in step 3 about the auto-install.

Checklist

  • Changes tested locally (full install/uninstall cycle on a Kubernetes 1.35 cluster — DocumentDB cluster reaches healthy state, LightRAG connects, KV/graph/doc-status collections appear in DocumentDB)
  • Code reviewed
  • Documentation updated (k8s-deploy/README.md, k8s-deploy/databases/README.md)
  • Unit tests added (n/a — pure deployment/scripting changes; bash -n passes on all modified scripts)

Additional Notes

  • Vectors are not stored in DocumentDB. DocumentDB does not implement the MongoDB Atlas $vectorSearch operator that MongoVectorDBStorage requires, so embeddings stay on the LightRAG PVC via NanoVectorDBStorage. KV / graph / doc-status data do live in DocumentDB collections.
  • Kubernetes 1.35+ is required because the DocumentDB operator uses the ImageVolume feature.
  • Two non-fatal startup warnings from DocumentDB are expected and safe to ignore: createIndex.collation is not implemented yet and Pipeline stage name not recognized: $listSearchIndexes.
  • All existing PG/Neo4j flows are unchanged when ENABLE_DOCUMENTDB=false (the default).

Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants