Skip to content

fix: cap VarCharType/VarBinaryType MAX_LENGTH at i32::MAX#339

Open
shyjsarah wants to merge 2 commits into
apache:mainfrom
shyjsarah:fix/varchar-max-length-overflow
Open

fix: cap VarCharType/VarBinaryType MAX_LENGTH at i32::MAX#339
shyjsarah wants to merge 2 commits into
apache:mainfrom
shyjsarah:fix/varchar-max-length-overflow

Conversation

@shyjsarah
Copy link
Copy Markdown
Contributor

Purpose

When paimon-rust is used as the REST client, CreateTableRequest payloads carrying a STRING column are rejected by the REST server with:

Could not parse type at position N: ...
Input type string: VARCHAR(4294967295)

Root cause: VarCharType::MAX_LENGTH (and VarBinaryType::MAX_LENGTH) were defined as isize::MAX as u32. On 64-bit targets isize::MAX = 2^63 - 1, which truncates to u32::MAX = 4294967295 when cast to u32. VarCharType::string_type() therefore produced a VarCharType whose Display emits VARCHAR(4294967295). The server-side DataTypeJsonParser calls Integer.parseInt on the length token and throws NumberFormatException, since the value exceeds Java's Integer.MAX_VALUE (2147483647).

Java↔Java traffic does not surface this because Java's VarCharType.asSQLString() short-circuits length == MAX_LENGTH to the bare STRING alias — only the Rust client wrote the numeric form, exposing the off-by-one against Java's int range.

The wire-format length cap is a protocol constant, not a function of host pointer width. The fix pins both MAX_LENGTH constants to i32::MAX as u32 = 2147483647, exactly matching Java Integer.MAX_VALUE on all targets (32-bit was already correct by coincidence; 64-bit was broken).

Brief change log

  • crates/paimon/src/spec/types.rs
    • VarCharType::MAX_LENGTH: isize::MAX as u32i32::MAX as u32, with a comment recording why this must equal Java Integer.MAX_VALUE.
    • VarBinaryType::MAX_LENGTH: same change, same reason.
  • Added regression test test_max_length_fits_java_integer asserting:
    • Both constants equal i32::MAX as u32.
    • VarCharType::string_type().to_string() and VarBinaryType at MAX_LENGTH produce length tokens that parse as Java i32.

Tests

  • New unit test: paimon::spec::types::tests::test_max_length_fits_java_integer covers the wire-format invariant.
  • Existing fixture-driven tests (test_data_type_serialize, test_data_type_deserialize) continue to pass — none of the fixtures used the old overflow value, so no fixture updates were required.
  • Manual end-to-end repro (Rust client → REST CreateTable → server) on the reporter's side is the verification path; reviewers can synthesize the same payload by serializing a schema containing VarCharType::string_type() and round-tripping through DataTypeJsonParser.

API and Format

  • No public-API signature change. VarCharType::MAX_LENGTH and VarBinaryType::MAX_LENGTH are pub const values that are now smaller (2147483647 instead of the truncated 4294967295). Callers that constructed types via these constants will now produce shorter, interoperable VARCHAR(...) / VARBINARY(...) lengths.
  • Storage / wire format: a Rust-only writer that previously persisted a schema with the buggy 4294967295 length would have produced data unreadable by any Java reader, so the prior behavior was effectively unusable cross-language. New writes match the Java canonical value. No migration is required for tables actually created by paimon-java.

Documentation

None. Behavior change is invisible to users at the SQL/API surface; the fix only restores the documented "STRING == VarCharType.STRING_TYPE" semantics.

shyjsarah and others added 2 commits May 20, 2026 04:56
`isize::MAX as u32` overflows on 64-bit targets and produces
`u32::MAX = 4294967295`. The Display impl then serializes
`VarCharType::string_type()` as `VARCHAR(4294967295)`, which Java's
`DataTypeJsonParser` (used by Bennett to deserialize REST
`CreateTableRequest` payloads) rejects via `Integer.parseInt`:

    Could not parse type at position N: ...
     Input type string: VARCHAR(4294967295)
     (through reference chain:
      org.apache.paimon.rest.requests.CreateTableRequest["schema"]
      ->org.apache.paimon.schema.Schema["fields"]
      ->java.util.ArrayList[1])

The wire-format length cap is a protocol constant, not a function of
host pointer width. Pin both `MAX_LENGTH` constants to `i32::MAX`
(2147483647), matching Java `Integer.MAX_VALUE` exactly.

Java-to-Java traffic did not surface this bug because Java's
`asSQLString` short-circuits length==MAX to the bare `STRING` alias,
while paimon-rust's Display always writes the numeric form.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
VarBinaryType exposes `try_new(nullable, length)` (not `with_nullable`,
which only exists on VarCharType). Use the right constructor so the
new regression test compiles.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant