feat(server): support TLS certificate hot-reload#1870
Conversation
|
All contributors have signed the DCO ✍️ ✅ |
|
I have read the DCO document and I hereby sign the DCO. |
|
recheck |
|
Thanks for taking this on and working to resolve the cert-manager renewal issue. One design concern: this currently implements TLS hot-reload as a timed polling loop rather than a file watcher. The worker uses Can we make this event-driven instead? A file watcher over the parent directories of That would make renewal detection immediate, reduce configuration surface, and avoid polling races where a tick lands during a projected-volume update and then waits a full interval before recovering. |
Signed-off-by: Yuedong Wu <dwcn22@outlook.com>
Signed-off-by: Yuedong Wu <dwcn22@outlook.com>
Summary
Add file-watcher-based TLS certificate hot-reload to the gateway, allowing cert/key/CA rotation without restarting. Uses
notifyto watch parent directories andArcSwapfor atomic config swapping so in-flight TLS handshakes are never blocked.Related Issue
Fixes #1836
Changes
TlsAcceptorinternals withArcSwap<ServerConfig>for lock-free atomic config swapsnotify::recommended_watcher— watches cert/key/CA parent directories and reloads on filesystem changes with a 1-second debounce (handles Kubernetes Secret atomic-swap patterns)AtomicBoolguard to prevent duplicatespawn_reload_worker()callsConfigStateChangeevents on reload success (Informational) and failure (Medium)rustls::crypto::ring::sign— surfaces bad key types at startup instead of handshake timegenerate_test_certs_with_ca,install_rustls_provider,write_test_file) intotls_test_utils.rsTesting
mise run pre-commitpassesE2E test record
Step 1: Create cluster
Step 2: Enable TLS
Create
deploy/helm/openshell/ci/values-reload-test.yaml:Add to
deploy/helm/openshell/skaffold.yamlafter- ci/values-skaffold.yaml:- ci/values-reload-test.yamlStep 3: Deploy
Output:
Step 4: Verify file watcher started
KUBECONFIG=kubeconfig kubectl -n openshell logs openshell-0 | grep -i watcherOutput:
Step 5: Capture initial cert fingerprint
Output:
Step 6: Overwrite TLS secret (simulate cert-manager renewal)
Output:
Step 7: Verify new cert is served — no pod restart
Wait for kubelet sync + watcher detection:
7a. Pod restarts
Output:
7b. Current cert from gateway
Output:
7c. Cert stored in Secret (should match gateway)
Output:
7d. Reload logs
Output:
Results
AC:47:<...>:2C4F:01:<...>:FF/etc/openshell-tls/server,/etc/openshell-tls/client-caCONFIG:RELOADED [INFO]The file watcher detected the updated certificate in the watched directory, triggered a reload after the 1-second debounce window, and atomically swapped the active TLS config. Zero pod restarts. The OCSF log entry provides observability that was completely missing before.
Checklist