Skip to content

feat: add TLS certificate support for Docker contexts#3728

Open
Itx-Psycho0 wants to merge 4 commits into
knative:mainfrom
Itx-Psycho0:feat/docker-context-tls
Open

feat: add TLS certificate support for Docker contexts#3728
Itx-Psycho0 wants to merge 4 commits into
knative:mainfrom
Itx-Psycho0:feat/docker-context-tls

Conversation

@Itx-Psycho0
Copy link
Copy Markdown
Contributor

Description

Extends Docker context detection (from #3684) to support TLS certificates stored in Docker contexts. This enables secure connections to remote Docker daemons configured via Docker contexts.

Fixes #3719

Changes

Extended getDockerContextHost() to getDockerContextConfig() which returns both host and TLS configuration
Added DockerContextConfig struct to hold host and TLS settings
Load TLS certificates from context's tls/ directory
Write certificates to temp directory and configure via environment variables
Reuse existing TLS functionality via newHttpClient()
Added comprehensive test with mock TLS daemon

Benefits

Remote Docker setups work automatically with TLS
Consistent with Docker CLI behavior - if docker commands work, func commands work
No manual environment variables needed
Proper TLS support for secure connections

Testing

Added TestNewClient_DockerContextTLS with mock TLS-enabled daemon
All existing Docker tests pass
make check passes

Related

@knative-prow knative-prow Bot requested review from dsimansk and jrangelramos May 14, 2026 11:11
@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented May 14, 2026

Hi @Itx-Psycho0. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knative-prow knative-prow Bot added size/L 🤖 PR changes 100-499 lines, ignoring generated files. needs-ok-to-test 🤖 Needs an org member to approve testing labels May 14, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 78.51240% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.34%. Comparing base (e435a89) to head (af939b8).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
pkg/docker/docker_client.go 78.51% 19 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3728      +/-   ##
==========================================
+ Coverage   56.95%   57.34%   +0.39%     
==========================================
  Files         181      181              
  Lines       21116    21225     +109     
==========================================
+ Hits        12026    12172     +146     
+ Misses       7866     7823      -43     
- Partials     1224     1230       +6     
Flag Coverage Δ
e2e 35.78% <0.82%> (-0.04%) ⬇️
e2e go 31.35% <1.05%> (-1.08%) ⬇️
e2e node 27.15% <1.05%> (-1.05%) ⬇️
e2e python 31.72% <1.05%> (-1.08%) ⬇️
e2e quarkus 27.27% <1.05%> (-1.07%) ⬇️
e2e rust 26.66% <1.05%> (?)
e2e springboot 25.20% <1.05%> (-1.07%) ⬇️
e2e typescript 27.26% <1.05%> (-1.07%) ⬇️
e2e-config-ci 28.20% <1.05%> (+10.50%) ⬆️
integration 17.29% <1.05%> (-0.01%) ⬇️
unit macos-14 45.11% <40.00%> (+0.06%) ⬆️
unit macos-latest 45.11% <40.00%> (+0.06%) ⬆️
unit ubuntu-24.04-arm 45.64% <77.68%> (+0.31%) ⬆️
unit ubuntu-latest 46.32% <76.84%> (+0.31%) ⬆️
unit windows-latest 44.98% <13.68%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Extends Docker context detection (from knative#3684) to support TLS certificates
stored in Docker contexts. This enables secure connections to remote Docker
daemons configured via Docker contexts.

Changes:
- Extended getDockerContextHost() to getDockerContextConfig() which returns
  both host and TLS configuration
- Added DockerContextConfig struct to hold host and TLS settings
- Modified newHttpClient() to check Docker context first, then fall back
  to environment variables
- Added newHttpClientFromContext() to create HTTP client from context config
- Load TLS certificates directly from context (no temp files or env vars)
- Added comprehensive test with mock TLS daemon

Benefits:
- Remote Docker setups work automatically with TLS
- Consistent with Docker CLI behavior
- No manual environment variables needed
- Proper TLS support for secure connections
- Clean implementation without temp files

Fixes knative#3719
@Itx-Psycho0 Itx-Psycho0 force-pushed the feat/docker-context-tls branch from 5ecc50a to 1af6e75 Compare May 14, 2026 13:27
Comment thread pkg/docker/docker_client.go Fixed
@matejvasek
Copy link
Copy Markdown
Contributor

/ok-to-test

@knative-prow knative-prow Bot added ok-to-test 🤖 Non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test 🤖 Needs an org member to approve testing labels May 14, 2026
@matejvasek
Copy link
Copy Markdown
Contributor

Review: feat: add TLS certificate support for Docker contexts

Branch: Itx-Psycho0/feat/docker-context-tls
Commit: 1af6e757b
Files: pkg/docker/docker_client.go, pkg/docker/docker_client_test.go


Overall

The approach of building the TLS config in-memory via newHttpClientFromContext() is sound. The test with a real mTLS mock daemon is solid. There are a few issues that should be addressed before merging.


Significant

1. Wrong fallback TLS path

docker_client.gogetDockerContextConfig(), around the fallback block:

// Docker stores context TLS files in contexts/meta/<sha256-hash>/
hash := sha256.Sum256([]byte(contexts[0].Name))
tlsPath = filepath.Join(dockerConfigDir, "contexts", "meta", fmt.Sprintf("%x", hash))

Docker stores TLS files under contexts/tls/<hash>/docker/, not contexts/meta/<hash>/. The meta directory holds meta.json metadata, not certificates. The comment is also incorrect.

The test masks this because it always sets Storage.TLSPath explicitly, so the fallback is never exercised. A test that omits Storage.TLSPath (or sets it to "<IN MEMORY>") and places certs at the correct fallback location would catch this.

2. Precedence inversion: context TLS overrides explicit env vars

docker_client.gonewHttpClient():

func newHttpClient() *http.Client {
	// First, try to get TLS config from Docker context
	if contextConfig := getDockerContextConfig(); contextConfig != nil && len(contextConfig.TLSCert) > 0 && len(contextConfig.TLSKey) > 0 {
		return newHttpClientFromContext(contextConfig)
	}

	// Fall back to environment variables
	tlsVerifyStr, tlsVerifyChanged := os.LookupEnv("DOCKER_TLS_VERIFY")
	...

Context is checked before DOCKER_TLS_VERIFY. If someone explicitly sets DOCKER_TLS_VERIFY=0 to disable TLS, but a Docker context with certs exists, context TLS will be used and the env var is never consulted. Convention is that env vars override config files, not the other way around. The env var check should come first, and context TLS should only be tried when DOCKER_TLS_VERIFY is not set.

3. Context TLS applied even when DOCKER_HOST is set manually

newHttpClient() is called unconditionally for all TCP connections, including when DOCKER_HOST was explicitly set by the user (not from context detection). If someone sets DOCKER_HOST=tcp://my-host:2376 without DOCKER_TLS_VERIFY, the code will now silently pick up TLS certs from whatever Docker context happens to be active. This mixes two independent configuration sources in a surprising way. Context TLS should only activate when the host itself came from context detection.

4. docker context inspect is executed twice

NewClient() calls GetDockerContextHostFunc()getDockerContextHost()getDockerContextConfig() (first subprocess). Then inside the isTCP branch, newHttpClient() calls getDockerContextConfig() again (second subprocess). Each call forks the docker CLI. The config should be fetched once and reused.


Minor

5. DockerContextConfig is exported but internal-only

The struct is only used within the docker package. It should be unexported (dockerContextConfig).

6. Redundant DOCKER_CONFIG passthrough

if dockerConfig := os.Getenv("DOCKER_CONFIG"); dockerConfig != "" {
    cmd.Env = append(os.Environ(), "DOCKER_CONFIG="+dockerConfig)
}

When cmd.Env is nil, the child process inherits the parent's full environment, which already includes DOCKER_CONFIG. This block just adds a duplicate entry and can be deleted.

7. getDockerContextHost() wrapper does unnecessary disk I/O

The backward-compat wrapper calls getDockerContextConfig() which reads cert files from disk, only to discard everything except .Host. When only the host is needed (the common case during host detection in NewClient), this is wasted I/O.

8. Silent error on malformed cert/key

In newHttpClientFromContext():

cert, err := tls.X509KeyPair(contextConfig.TLSCert, contextConfig.TLSKey)
if err == nil {
    ...
}

If X509KeyPair fails (malformed cert/key), the error is silently swallowed and the client is returned without client certificate auth. This will produce a confusing TLS handshake failure at connection time instead of a clear error at setup time. Consider returning an error or at least logging a warning.

@Itx-Psycho0
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review @matejvasek! I see the issues, let me fix them:

  1. Wrong TLS path - will change to contexts/tls/<hash>/
  2. Precedence - will check env vars first, only use context if not set
  3. Will only apply context TLS when host came from context detection
  4. Will cache the config to avoid calling docker context inspect twice

Working on the fixes now!

Fixes based on @matejvasek's review:

Significant fixes:
1. Fixed TLS path: use contexts/tls/<hash>/ not contexts/meta/<hash>/
2. Fixed precedence: env vars (DOCKER_TLS_VERIFY) now override context
3. Context TLS only applies when host came from context detection
4. Cache context config to avoid calling 'docker context inspect' twice

Minor improvements:
5. Unexported dockerContextConfig (internal-only struct)
6. Removed redundant DOCKER_CONFIG passthrough (auto-inherited)
7. Added error logging for malformed certificates

The implementation now correctly:
- Checks DOCKER_TLS_VERIFY env var first
- Only uses context TLS when env vars are not set
- Only applies context TLS when host came from context
- Calls docker CLI once instead of twice
- Logs warnings for cert loading failures
Comment thread pkg/docker/docker_client.go Dismissed
@matejvasek
Copy link
Copy Markdown
Contributor

Review: feat: add TLS certificate support for Docker contexts

Branch: Itx-Psycho0/feat/docker-context-tls
Commits: 1af6e757b, 1277ea5df
Files: pkg/docker/docker_client.go, pkg/docker/docker_client_test.go


Overall

Good iteration. The core approach — building TLS config in-memory via newHttpClientFromContext(), caching context config to avoid double subprocess spawns, and respecting env var precedence — is sound. Most issues from the previous round have been addressed. Two items remain before this is mergeable.


What was fixed

  • Double subprocess spawn — fixed. contextConfig is fetched once in NewClient() and passed to newHttpClient(contextConfig).
  • Wrong fallback TLS path — fixed. Now uses contexts/tls/<hash>/ with a corrected comment.
  • Precedence inversion — fixed. newHttpClient() checks DOCKER_TLS_VERIFY first; context TLS is only tried when env vars are not set.
  • Context TLS applied when DOCKER_HOST set manually — fixed. contextConfig is only populated when the host came from context detection; when DOCKER_HOST is set via env var, it stays nil.
  • Exported type — fixed. dockerContextConfig is now unexported.
  • Redundant DOCKER_CONFIG passthrough — fixed. Replaced with a comment.
  • Silent error on bad cert/key — fixed. Now logs a warning to stderr.

Remaining issues

1. GetDockerContextHostFunc and getDockerContextHost() are now dead code

On upstream/main, NewClient() calls GetDockerContextHostFunc(). On this branch, NewClient() calls getDockerContextConfig() directly — GetDockerContextHostFunc is defined and exported but never called anywhere:

$ git grep 'GetDockerContextHostFunc' upstream/main
pkg/docker/docker_client.go: if contextHost := GetDockerContextHostFunc(); contextHost != "" {   # <-- used
pkg/docker/docker_client.go: var GetDockerContextHostFunc = getDockerContextHost

$ git grep 'GetDockerContextHostFunc' Itx-Psycho0/feat/docker-context-tls
pkg/docker/docker_client.go: var GetDockerContextHostFunc = getDockerContextHost                  # <-- defined but never called

This is a minor breaking change: any downstream code that mocked GetDockerContextHostFunc to inject test behavior will find their mock silently ignored. Two options:

  • Remove GetDockerContextHostFunc and getDockerContextHost() entirely if no external consumers depend on them.
  • Replace them with a GetDockerContextConfigFunc (or similar) mockable variable and use that in NewClient(), preserving the testability contract.

2. Test doesn't exercise the fallback TLS path

The fix to use contexts/tls/<hash>/ is correct, but the test always sets Storage.TLSPath explicitly via createDockerContextConfigWithTLS, so the fallback branch is never executed. Adding a test case where Storage.TLSPath is empty or "<IN MEMORY>" with certs placed at contexts/tls/<hash>/ would validate the fix actually works.


Nits

  • The warning on stderr (fmt.Fprintf(os.Stderr, "Warning: ...")) works but log.Printf would be more conventional Go. Not blocking.

@matejvasek
Copy link
Copy Markdown
Contributor

Also please add test for the previous TLS functionality using the envvars too.

Address remaining code review feedback from matejvasek:

1. Remove dead code:
   - Removed GetDockerContextHostFunc variable
   - Removed getDockerContextHost() wrapper function
   These are no longer called anywhere since getDockerContextConfig()
   is now called directly.

2. Add test for fallback TLS path:
   - Added TestNewClient_DockerContextTLS_FallbackPath
   - Tests the scenario where storage.TLSPath is "<IN MEMORY>" or empty
   - Verifies that TLS certificates are still found via the calculated
     path based on context name hash (contexts/tls/<sha256-hash>/)
   - This exercises the fallback logic that was previously untested

All tests pass including the new fallback path test.
@knative-prow knative-prow Bot added size/XL 🤖 PR changes 500-999 lines, ignoring generated files. and removed size/L 🤖 PR changes 100-499 lines, ignoring generated files. labels May 15, 2026
@matejvasek
Copy link
Copy Markdown
Contributor

Please add also test for the "old" functionality: TLS without context via the envvars.

Add TestNewClient_TLS_EnvVars to test the original TLS functionality
using environment variables (DOCKER_TLS_VERIFY, DOCKER_CERT_PATH)
without Docker context.

This ensures backward compatibility with the pre-context TLS
configuration method and addresses Matej's feedback to test the
'old' functionality.

The test:
- Creates a TLS-enabled mock Docker daemon
- Sets up TLS certificates in a directory
- Configures TLS via environment variables (not context)
- Verifies successful TLS connection and client cert authentication

All tests pass including the new env vars test.
Copy link
Copy Markdown
Contributor

@matejvasek matejvasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Issues

1. Bug: TLS cert path missing the endpoint subdirectory

Docker stores context TLS files at contexts/tls/<hash>/docker/{ca,cert,key}.pem (where docker is the endpoint name), not directly at contexts/tls/<hash>/. The Docker CLI source (store.go ResetEndpointTLSMaterial) writes to:

filepath.Join(s.tls, contextDirOf(name), endpointName, fileName)

And docker context inspect's Storage.TLSPath returns the context-level directory (contexts/tls/<hash>/), not the endpoint-level one.

The PR reads from:

os.ReadFile(filepath.Join(tlsPath, "ca.pem"))   // contexts/tls/<hash>/ca.pem

But should read from:

os.ReadFile(filepath.Join(tlsPath, "docker", "ca.pem"))   // contexts/tls/<hash>/docker/ca.pem

This applies to both the Storage.TLSPath path and the fallback calculated path. The tests pass because they create certs at the wrong level to match the code, not where Docker actually stores them. On a real Docker installation with TLS-enabled contexts, the certs would not be found.

2. Mock daemon doesn't serve the Info endpoint

startMockDaemon returns "OK" (text/plain) for all requests. The TLS tests call dockerClient.Info() and assert nfo.Info.ID == "mock-daemon":

nfo, err := dockerClient.Info(ctx, client.InfoOptions{})
if nfo.Info.ID != "mock-daemon" {
    t.Errorf(...)
}

The Docker client should fail to JSON-decode "OK" into a system.Info struct, causing Fatalf on the error check. If it somehow succeeds, the ID would be "", not "mock-daemon", so the Errorf would fire. Either the mock needs to serve proper JSON for /info, or the Info() assertions should be removed.

3. Context TLS tests don't isolate from DOCKER_TLS_VERIFY

TestNewClient_DockerContextTLS and the fallback variant set DOCKER_HOST="" and DOCKER_CONFIG but don't unset DOCKER_TLS_VERIFY. If the test environment has DOCKER_TLS_VERIFY set, newHttpClient() takes the env-var code path instead of the context path, and the test doesn't exercise what it claims. These tests should explicitly unset DOCKER_TLS_VERIFY and DOCKER_CERT_PATH.

4. Test meta.json includes Storage field that Docker doesn't persist

createDockerContextConfigWithTLS writes a Storage block into meta.json. Docker CLI doesn't store Storage in meta.json -- it computes it at runtime from the store directory structure. The real docker context inspect will ignore the Storage in meta.json and compute its own value. This means the test's TLSPath value may or may not match what docker context inspect actually returns, making the test fragile.


Minor / Style

  • Code duplication -- newHttpClient (env var path) and newHttpClientFromContext duplicate the dialer + transport + http.Client construction. A small shared helper for the bottom half would reduce this.
  • Silent cert read failures -- getDockerContextConfig silently returns empty cert slices when os.ReadFile fails for individual cert files. A debug/trace log would help users troubleshoot TLS issues (the PR does log X509KeyPair failures, which is good, but not file-read failures).

Positive

  • Caching contextConfig to avoid calling docker context inspect twice is a good improvement.
  • Proper precedence: env vars override context config.
  • Clean separation of newHttpClientFromContext from the env-var path.
  • Good test structure with TLS mock daemon (CA, server cert, client cert).
  • Removing the exported GetDockerContextHostFunc mock variable in favor of DOCKER_CONFIG-based test isolation is the right approach.
  • The fallback path calculation for <IN MEMORY> TLS paths is a nice touch.

Copy link
Copy Markdown
Contributor

@matejvasek matejvasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #3728 — feat: add TLS certificate support for Docker contexts

Summary

This PR extends the Docker context detection (from #3684) to also load TLS certificates stored in Docker contexts. It renames getDockerContextHost() to getDockerContextConfig(), returning a dockerContextConfig struct that carries both the host and TLS cert/key/CA bytes. The newHttpClient() function is updated to accept this config and fall back to context-based TLS when env vars aren't set.

Issues

1. TLS cert path lookup assumes flat directory — Docker uses a subdirectory per endpoint

At docker_client.go:450-453, the fallback path is computed as:

hash := sha256.Sum256([]byte(contexts[0].Name))
tlsPath = filepath.Join(dockerConfigDir, "contexts", "tls", fmt.Sprintf("%x", hash))

But Docker actually stores TLS files under contexts/tls/<hash>/docker/ (one subdirectory per endpoint name). The certs won't be found at the computed path on a real Docker installation. The tests pass only because they write certs directly into contexts/tls/<hash>/ — mimicking the bug rather than real Docker layout.

2. contextConfig is passed to newHttpClient even when DOCKER_HOST is set via env

At line 184, newHttpClient(contextConfig) is called whenever the scheme is TCP. But contextConfig is only populated when the default socket doesn't exist (the os.IsNotExist branch). If DOCKER_HOST is set as an env var (line 94), contextConfig will be nil, so this works correctly — but only by coincidence. The code would be clearer if the call was newHttpClient(nil) when DOCKER_HOST came from the environment, or if the logic was restructured.

3. Missing DOCKER_TLS_VERIFY env var cleanup in context TLS tests

TestNewClient_DockerContextTLS and TestNewClient_DockerContextTLS_FallbackPath set DOCKER_HOST and DOCKER_CONFIG via t.Setenv, but they don't explicitly clear DOCKER_TLS_VERIFY. If DOCKER_TLS_VERIFY happens to be set in the test runner's environment, the env-var path in newHttpClient would take precedence over context TLS, and the test would pass for the wrong reason. Adding t.Setenv("DOCKER_TLS_VERIFY", "") would make the tests more robust.

4. os.Getenv("HOME") won't work on all platforms

At docker_client.go:446:

dockerConfigDir = filepath.Join(os.Getenv("HOME"), ".docker")

On Windows, HOME is typically not set (USERPROFILE is used instead). The test already skips Windows, but the production code doesn't. Consider using os.UserHomeDir() instead, which is cross-platform.

5. CodeQL warning about InsecureSkipVerify

The two CodeQL alerts on InsecureSkipVerify are false positives in the context of this PR — the InsecureSkipVerify in the env-var path was already present before this PR, and in the context path it's only set when SkipTLSVerify is explicitly configured. However, a //nolint or #nosec annotation with a justification comment would silence the scanner and document the intent.

Minor Nits

  • The "crypto/sha256" and "fmt" imports were added to the production code solely for the TLS path fallback logic. If issue #1 above is fixed to read the TLSPath from docker context inspect output correctly, the sha256 import may become unnecessary.
  • The comment at line 399 (// Note: DOCKER_CONFIG is automatically inherited from parent environment) is obvious and could be removed.
  • newHttpClientFromContext logs to stderr on cert parse failure (fmt.Fprintf(os.Stderr, ...)). The rest of the codebase doesn't use this pattern — consider using a proper logger or just returning nil.

What's Good

  • Clean precedence model: env vars override context config.
  • The dockerContextConfig struct is a reasonable abstraction for carrying context state.
  • The test coverage is thorough: context TLS, fallback path, and env-var backward compatibility are all covered with real mTLS connections to mock daemons.
  • The caching of contextConfig to avoid calling docker context inspect twice is a good optimization.

Verdict

The core idea is sound, but the TLS path lookup (issue #1) appears to be incorrect for real Docker installations, which would make this feature silently not work outside of the tests. That should be verified and fixed before merging.

@knative-prow
Copy link
Copy Markdown

knative-prow Bot commented May 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Itx-Psycho0
Once this PR has been reviewed and has the lgtm label, please ask for approval from matejvasek. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test 🤖 Non-member PR verified by an org member that is safe to test. size/XL 🤖 PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add TLS certificate support for Docker contexts

3 participants