Skip to content

fix(plugin-oracle): bound the login handshake and explain native encryption stalls#1786

Merged
datlechin merged 1 commit into
mainfrom
fix/oracle-native-encryption-login-timeout
Jun 29, 2026
Merged

fix(plugin-oracle): bound the login handshake and explain native encryption stalls#1786
datlechin merged 1 commit into
mainfrom
fix/oracle-native-encryption-login-timeout

Conversation

@datlechin

Copy link
Copy Markdown
Member

Problem

Connecting to Oracle 11.2.0.4 with the Native network encryption option turned on hangs for ~60 seconds and then fails with an unhelpful "connection reset by peer", and no diagnostic panel (#1746). The original crash in this issue is already fixed; this is the remaining "can't connect" report.

Diagnosis (from the user's packet captures)

  • With native encryption off (default), the login completes and the connection works (plaintext pcap: full auth, clean close).
  • With it on, OracleNIO advertises AES via the ANO/SNS handshake, the server's encrypted response is never completed by the client, and the socket stalls until the server RSTs at exactly 60 seconds (encrypted pcap: all post-handshake data is ciphertext, then a 60s gap then RST).
  • oracleNativeEncryption is only ever set by the user's toggle (default false; no import/SSL/migration path sets it), so this is the opt-in encryption path failing against a server that does not complete it.

Two defects make this awful:

  1. No login-handshake timeout. OracleNIO's connectTimeout bounds only the TCP connect; the TNS/ANO/auth phase has no deadline, and TablePro wraps connect() with none. So a stalled login hangs until the server's 60s RST.
  2. Useless error. The RST maps to .connectionFailed -> a raw IOError string with no diagnostic panel and no mention of native encryption (the one toggle that fixes it).

Fix (app/plugin side only, no fork change)

  • Bounded login timeout. New generic withTimeout helper in TableProPluginKit (the existing MetadataConnectionPool reimplements this pattern inline). The Oracle connect() now races the whole handshake against a 30s deadline, so a stall fails fast. 30s is comfortably above the ~16s a real (slow) login took in the captures and half the server's 60s.
  • Actionable error + diagnostic panel. When the connect times out or is reset/dropped during the handshake and native encryption is on, the error names native network encryption and tells the user to turn it off, with a PluginDiagnostic panel (previously this category showed none). Pure decision logic (OracleConnectErrorClassifier.isLikelyNativeEncryptionFailure) lives in TableProPluginKit and is unit-tested. No silent fallback to plaintext: the user asked for encryption, so we fail with guidance rather than overriding their choice.

This does not make native encryption work on 11g (a deep fork-level ANO/AES-login fix the maintainer already deprioritized by making it opt-in). It turns a 60s silent hang into a fast, self-explanatory failure.

ABI

TableProPluginKit gains withTimeout, TimeoutError, and isLikelyNativeEncryptionFailure (all new public symbols). scripts/check-pluginkit-abi.sh shows only additions, so this is additive with no currentPluginKitVersion bump. Labeled abi-additive.

Tests

New TableProPluginKitTests target: withTimeout (stall throws within bound; fast op returns; operation errors propagate) and the encryption-failure classifier truth table. 8 pass.

Shipping

Oracle is a registry-only plugin, so this reaches users after re-releasing the Oracle plugin.

Fixes #1746

https://claude.ai/code/session_0198faM6VCrViRU4XwRoS1DC

@datlechin datlechin added the abi-additive PluginKit ABI diff reviewed as additive; no version bump needed label Jun 29, 2026
@mintlify

mintlify Bot commented Jun 29, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
TablePro 🟡 Building Jun 29, 2026, 10:53 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@datlechin datlechin merged commit f7e21b9 into main Jun 29, 2026
3 of 4 checks passed
@datlechin datlechin deleted the fix/oracle-native-encryption-login-timeout branch June 29, 2026 22:54

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b372d58748

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +21 to +25
defer { group.cancelAll() }
guard let result = try await group.next() else {
throw TimeoutError(seconds: seconds)
}
return result

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make the timeout independent of child cancellation

For a stalled operation that does not observe task cancellation, this helper still waits past the deadline: the timeout child throws, cancelAll() runs, but exiting a withThrowingTaskGroup waits for the still-running operation child to finish. The new test only covers Task.sleep, which is cancellable, so it misses the exact Oracle login-stall case if OracleNIO.connect is blocked on I/O and does not complete on cancellation; users can still wait until the server reset instead of getting the 30s error.

Useful? React with 👍 / 👎.

Comment on lines +33 to +34
case .connectionDropped, .connectionFailed:
return true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't map server login errors to encryption failures

When native encryption is enabled, this treats every generic .connectionFailed as a native-encryption failure. classify("server") is explicitly a .connectionFailed in the existing tests, so server-side login failures during connect (for example invalid credentials or unknown service names reported via serverDetail) can lose their real message and instead tell the user to turn encryption off. Limit this override to timeouts and known reset/dropped-handshake codes, or inspect the server detail before replacing it.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

abi-additive PluginKit ABI diff reviewed as additive; no version bump needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

连接Oracle闪退

1 participant