Add Short and BigDecimal type mappings to SqlTypeUtils by gknz · Pull Request #763 · apache/wayang

gknz · 2026-05-30T15:03:31Z

Summary

Adds proper SQL type support for Short and java.math.BigDecimal across the
table-sink write path. Previously neither had a JDBC type mapping, so the sink
created such columns as VARCHAR(255) (numbers stored as text, with no numeric
ordering/aggregation in the DB). This PR maps them to real numeric SQL types and
binds their values correctly on both the Java and Spark execution paths.

Changes

1. Type mapping (SqlTypeUtils)

Short / short → SMALLINT
BigDecimal → NUMERIC(38,18)
The PostgreSQL dialect inherits both from the default map.

2. Value binding (the DDL type alone is not enough)
Mapping the column type without binding the value would bind a string into a
numeric column. Each in-JVM sink is fixed:

JavaTableSink: bind via PreparedStatement.setShort(...) / setBigDecimal(...)
instead of the setString fallback.
SparkTableSink: map Short → DataTypes.ShortType and
BigDecimal → DataTypes.createDecimalType(38, 18), so the DataFrame schema
(and the JDBC write derived from it) produces SMALLINT / NUMERIC(38,18).

Why `NUMERIC(38,18)` rather than a bare `NUMERIC`

A bare NUMERIC is not portable for decimals. Per the SQL standard, omittting
the scale means scale 0, so most engines create an integer column and silently
truncate the fractional part: H2, MySQL (DECIMAL(10,0)), SQL Server
(DECIMAL(18,0)), Derby, DB2 all do this (e.g. 12.345 → 12). PostgreSQL is the
lenient exception (bare NUMERIC is unbounded), which can hide the bug. An
explicit scale avoids the truncation on every database.

Why specifically `38` and `18`

These are not arbitrary: (38, 18) is Spark's DecimalType.SYSTEM_DEFAULT —
the precision/scale Spark assigns to a decimal when none is otherwise known
(38 is also the maximum precision Spark's Decimal supports, and a value widely
supported by Oracle/SQL Server). We deliberately reuse the same values on the
Java/DDL path (SqlTypeUtils → NUMERIC(38,18)):

38 = max total significant digits (generous headroom),
18 = digits after the decimal point (ample for monetary/rate data),
and, crucially, this makes the Java and Spark sinks emit identical column
types for a BigDecimal. In Wayang the optimizer decides which platform runs
the sink, so the output schema must not depend on that choice — matching Spark's
default keeps both paths consistent.

Trade-off: a fixed scale pads short values with trailing zeros
(12.345 → 12.345000000000000000). This is numerically identical and is exactly
what the Spark path already produced;=, in return decimals aree portable and lossless
across all databases.

Tests

SqlTypeUtilsTest: the full default type map (incl. Short → SMALLINT,
BigDecimal → NUMERIC(38,18)) and PostgreSQL inheritance.
SparkTableSinkTest:
- direct mapping checks for getSparkDataType(...) — the whole Java-class →
  Spark DataType table, the String fallback, and explicitly
  BigDecimal → DecimalType(precision=38, scale=18);
- an end-to-end write of all supported types to in-memory H2, asserting the
  round-tripped values and the created column types (SMALLINT, NUMERIC(38,18)).
JavaTableSinkTest: the analogous end-to-end test for the Java sink.

All tests run against in-memory H2 — no external database required.

Notes

The in-database JDBC sinks (PostgresTableSinkOperator, etc.) are unaffected :
they compose CREATE TABLE ... AS SELECT and let the DB infer types, so they
never bind a Java value or use SqlTypeUtils.
getSparkDataType was widened from private to package-private only to make
the mapping unit-testable.

Short and java.math.BigDecimal had no entry in the JDBC type map, so columns of those Java types were created as VARCHAR(255) by the JDBC table sink in overwrite mode. Map Short (and short) to SMALLINT and BigDecimal to NUMERIC; the PostgreSQL dialect map inherits both from the default map. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

SqlTypeUtils now maps Short->SMALLINT and BigDecimal->NUMERIC for the generated DDL, but the sinks still bound those values via the String fallback. Complete the write path: - JavaTableSink: bind via setShort / setBigDecimal - SparkTableSink: map Short->ShortType and BigDecimal->DecimalType(38,18) so the Spark DataFrame writes SMALLINT / NUMERIC columns instead of text

mspruc · 2026-06-01T06:24:03Z

Hi thanks for the contribution, if possible it would also be nice to have a unit test for this

zkaoudi · 2026-06-01T07:03:12Z

@mspruc what unit test do you have in mind? This PR seems to simply add two more data types

mspruc · 2026-06-01T08:38:35Z

@zkaoudi well something that tests the sinks ideally, e.g. for https://github.com/gknz/wayang/blob/efb35979111a6bb84192acd2ade8424bcd1ee463/wayang-platforms/wayang-spark/src/main/java/org/apache/wayang/spark/operators/SparkTableSink.java#L156

we have two magic numbers, I'd like to make sure this conversion is stable.

Add unit and end-to-end tests pinning the Short/BigDecimal handling across the table sinks, and fix a decimal-truncation issue the tests surfaced. - SqlTypeUtils: map BigDecimal to NUMERIC(38,18) instead of a bare NUMERIC. A bare NUMERIC defaults to scale 0 on most engines (H2, MySQL, SQL Server, Derby, ...) and silently truncates the fractional part; an explicit scale fixes this on every database and matches Spark's DecimalType default (38, 18). - SqlTypeUtilsTest: cover the full default type map (including Short and BigDecimal) and confirm the PostgreSQL dialect inherits them. - SparkTableSink: make getSparkDataType package-private so it can be unit-tested. - SparkTableSinkTest: assert the Java-class to Spark DataType mapping (including BigDecimal to DecimalType(38,18) and the String fallback), plus an end-to-end write of all supported types to H2 verifying values and SMALLINT/NUMERIC(38,18). - JavaTableSinkTest: end-to-end write of all supported types to H2 verifying values and column types.

gknz · 2026-06-02T21:09:58Z

@mspruc hey! I've added unit tests for sqlTypeUtils, SparkTableSink, and JavaTableSink. Please feel free to read the updated descriptions for all the changes/additions I went with and why. If you have any comments, please feel free to let me know!

gepa and others added 2 commits May 30, 2026 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Short and BigDecimal type mappings to SqlTypeUtils#763

Add Short and BigDecimal type mappings to SqlTypeUtils#763
gknz wants to merge 3 commits into
apache:mainfrom
gknz:feature/sqltypeutils-short-bigdecimal-mapping

gknz commented May 30, 2026 •

edited

Loading

Uh oh!

mspruc commented Jun 1, 2026

Uh oh!

zkaoudi commented Jun 1, 2026

Uh oh!

mspruc commented Jun 1, 2026

Uh oh!

gknz commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gknz commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why NUMERIC(38,18) rather than a bare NUMERIC

Why specifically 38 and 18

Tests

Notes

Uh oh!

mspruc commented Jun 1, 2026

Uh oh!

zkaoudi commented Jun 1, 2026

Uh oh!

mspruc commented Jun 1, 2026

Uh oh!

gknz commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gknz commented May 30, 2026 •

edited

Loading

Why `NUMERIC(38,18)` rather than a bare `NUMERIC`

Why specifically `38` and `18`