Skip to content

Latest commit

 

History

History
319 lines (277 loc) · 30 KB

File metadata and controls

319 lines (277 loc) · 30 KB

Supported Spark Expressions

Comet supports the following Spark expressions. Expressions that are marked as Spark-compatible will either run natively in Comet and provide the same results as Spark, or will fall back to Spark for cases that would not be compatible.

All expressions are enabled by default, but most can be disabled by setting spark.comet.expression.EXPRNAME.enabled=false, where EXPRNAME is the expression name as specified in the following tables, such as Length, or StartsWith. See the Comet Configuration Guide for a full list of expressions that be disabled.

Expressions that are not Spark-compatible will fall back to Spark by default and can be enabled by setting spark.comet.expression.EXPRNAME.allowIncompatible=true.

Conditional Expressions

Expression SQL Spark-Compatible?
CaseWhen CASE WHEN expr THEN expr ELSE expr END Yes
If IF(predicate_expr, true_expr, false_expr) Yes

Predicate Expressions

Expression SQL Spark-Compatible?
And AND Yes
EqualTo = Yes
EqualNullSafe <=> Yes
GreaterThan > Yes
GreaterThanOrEqual >= Yes
LessThan < Yes
LessThanOrEqual <= Yes
In IN Yes
IsNotNull IS NOT NULL Yes
IsNull IS NULL Yes
InSet IN (...) Yes
Not NOT Yes
Or OR Yes

String Functions

Expression Spark-Compatible? Compatibility Notes
Ascii Yes
BitLength Yes
Chr Yes
Concat Yes Only string inputs are supported
ConcatWs Yes
Contains Yes
EndsWith Yes
InitCap No Behavior is different in some cases, such as hyphenated names.
Left Yes Length argument must be a literal value
Length Yes
Like Yes
Lower No Results can vary depending on locale and character set. Requires spark.comet.caseConversion.enabled=true
OctetLength Yes
Reverse Yes
RLike No Uses Rust regexp engine, which has different behavior to Java regexp engine
StartsWith Yes
StringInstr Yes
StringRepeat Yes Negative argument for number of times to repeat causes exception
StringReplace Yes
StringLPad Yes
StringRPad Yes
StringSpace Yes
StringTranslate Yes
StringTrim Yes
StringTrimBoth Yes
StringTrimLeft Yes
StringTrimRight Yes
Substring Yes
Upper No Results can vary depending on locale and character set. Requires spark.comet.caseConversion.enabled=true

JSON Functions

Expression Spark-Compatible? Compatibility Notes
GetJsonObject No Spark allows single-quoted JSON and unescaped control characters which Comet does not support

Date/Time Functions

Expression SQL Spark-Compatible? Compatibility Notes
DateAdd date_add Yes
DateDiff datediff Yes
DateFormat date_format Yes Partial support. Only specific format patterns are supported.
DateSub date_sub Yes
DatePart date_part(field, source) Yes Supported values of field: year/month/week/day/dayofweek/dayofweek_iso/doy/quarter/hour/minute
Days days Yes V2 partition transform. Supports DateType and TimestampType inputs.
Extract extract(field FROM source) Yes Supported values of field: year/month/week/day/dayofweek/dayofweek_iso/doy/quarter/hour/minute
FromUnixTime from_unixtime No Does not support format, supports only -8334601211038 <= sec <= 8210266876799
Hour hour No Incorrectly applies timezone conversion to TimestampNTZ inputs (#3180)
LastDay last_day Yes
Minute minute No Incorrectly applies timezone conversion to TimestampNTZ inputs (#3180)
Second second No Incorrectly applies timezone conversion to TimestampNTZ inputs (#3180)
TruncDate trunc Yes
TruncTimestamp date_trunc No Incorrect results in non-UTC timezones (#2649)
UnixDate unix_date Yes
UnixTimestamp unix_timestamp Yes
Year year Yes
Month month Yes
DayOfMonth day/dayofmonth Yes
DayOfWeek dayofweek Yes
WeekDay weekday Yes
DayOfYear dayofyear Yes
WeekOfYear weekofyear Yes
Quarter quarter Yes

Math Expressions

Expression SQL Spark-Compatible? Compatibility Notes
Abs abs Yes
Acos acos Yes
Add + Yes
Asin asin Yes
Atan atan Yes
Atan2 atan2 Yes
BRound bround Yes
Ceil ceil Yes
Cos cos Yes
Cosh cosh Yes
Cot cot Yes
Divide / Yes
Exp exp Yes
Expm1 expm1 Yes
Floor floor Yes
Hex hex Yes
IntegralDivide div Yes
IsNaN isnan Yes
Log log Yes
Log2 log2 Yes
Log10 log10 Yes
Multiply * Yes
Pow power Yes
Rand rand Yes
Randn randn Yes
Remainder % Yes
Round round Yes
Signum signum Yes
Sin sin Yes
Sinh sinh Yes
Sqrt sqrt Yes
Subtract - Yes
Tan tan Yes
Tanh tanh Yes
TryAdd try_add Yes Only integer inputs are supported
TryDivide try_div Yes Only integer inputs are supported
TryMultiply try_mul Yes Only integer inputs are supported
TrySubtract try_sub Yes Only integer inputs are supported
UnaryMinus - Yes
Unhex unhex Yes

Hashing Functions

Expression Spark-Compatible?
Md5 Yes
Murmur3Hash Yes
Sha1 Yes
Sha2 Yes
XxHash64 Yes

Bitwise Expressions

Expression SQL Spark-Compatible?
BitwiseAnd & Yes
BitwiseCount Yes
BitwiseGet Yes
BitwiseOr | Yes
BitwiseNot ~ Yes
BitwiseXor ^ Yes
ShiftLeft << Yes
ShiftRight >> Yes

Aggregate Expressions

Expression SQL Spark-Compatible? Compatibility Notes
Average Yes, except for ANSI mode
BitAndAgg Yes
BitOrAgg Yes
BitXorAgg Yes
BoolAnd bool_and Yes
BoolOr bool_or Yes
CollectSet No NaN dedup differs from Spark. See compatibility guide.
Corr Yes
Count Yes
CovPopulation Yes
CovSample Yes
First No This function is not deterministic. Results may not match Spark.
Last No This function is not deterministic. Results may not match Spark.
Max Yes
Min Yes
StddevPop Yes
StddevSamp Yes
Sum Yes, except for ANSI mode
VariancePop Yes
VarianceSamp Yes

Window Functions

Window support is disabled by default due to known correctness issues. Tracking issue: [#2721](https://github.com/apache/datafusion-comet/issues/2721).

Comet supports using the following aggregate functions within window contexts with PARTITION BY and ORDER BY clauses.

Expression Spark-Compatible? Compatibility Notes
Count Yes
Max Yes
Min Yes
Sum Yes

Note: Dedicated window functions such as rank, dense_rank, row_number, lag, lead, ntile, cume_dist, percent_rank, and nth_value are not currently supported and will fall back to Spark.

Array Expressions

Expression Spark-Compatible? Compatibility Notes
ArrayAppend Yes
ArrayCompact No
ArrayContains Yes
ArrayDistinct Yes
ArrayExcept No
ArrayFilter Yes Only supports case where function is IsNotNull
ArrayInsert No
ArrayIntersect No
ArrayJoin No
ArrayMax Yes
ArrayMin Yes
ArrayRemove Yes
ArrayRepeat No
ArrayUnion No Behaves differently than spark. Comet sorts the input arrays before performing the union, while Spark preserves the order of the first array and appends unique elements from the second.
ArraysOverlap Yes
CreateArray Yes
ElementAt Yes Input must be an array. Map inputs are not supported.
Flatten Yes
GetArrayItem Yes

Map Expressions

Expression Spark-Compatible?
GetMapValue Yes
MapKeys Yes
MapEntries Yes
MapValues Yes
MapFromArrays Yes

Struct Expressions

Expression Spark-Compatible? Compatibility Notes
CreateNamedStruct Yes
GetArrayStructFields Yes
GetStructField Yes
JsonToStructs No Partial support. Requires explicit schema.
StructsToJson No Does not support Infinity/-Infinity for numeric types (#3016)

Conversion Expressions

Expression Spark-Compatible Compatibility Notes
Cast Depends on specific cast See the Comet Compatibility Guide for list of supported cast expressions and known issues

SortOrder

Expression Spark-Compatible? Compatibility Notes
NullsFirst Yes
NullsLast Yes
Ascending Yes
Descending Yes

Other

Expression Spark-Compatible? Compatibility Notes
Alias Yes
AttributeReference Yes
BloomFilterMightContain Yes
Coalesce Yes
CheckOverflow Yes
KnownFloatingPointNormalized Yes
Literal Yes
MakeDecimal Yes
MonotonicallyIncreasingID Yes
NormalizeNaNAndZero Yes
PromotePrecision Yes
RegExpReplace No Uses Rust regexp engine, which has different behavior to Java regexp engine
ScalarSubquery Yes
SparkPartitionID Yes
ToPrettyString Yes
UnscaledValue Yes