Skip to content

Commit 6b7e322

Browse files
committed
docs: clarify withInfo is for fallback reasons; restore sort-order rollup
Expand the doc comments on withInfo/withInfos/hasExplainInfo to make clear that these record fallback reasons surfaced in extended explain output, and that any call to withInfo is a signal that the node falls back to Spark. Also restore the child-expression rollup for native range-partitioning sort orders that was lost in the earlier refactor: when exprToProto fails on a sort-order expression, its own fallback reasons (e.g. strict floating-point sort) are now copied onto the shuffle's reasons so they surface alongside 'unsupported range partitioning sort order'.
1 parent 994189c commit 6b7e322

2 files changed

Lines changed: 43 additions & 29 deletions

File tree

spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala

Lines changed: 39 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -200,22 +200,29 @@ object CometSparkSessionExtensions extends Logging {
200200
}
201201

202202
/**
203-
* Attaches explain information to a TreeNode, rolling up the corresponding information tags
204-
* from any child nodes. For now, we are using this to attach the reasons why certain Spark
205-
* operators or expressions are disabled.
203+
* Record a fallback reason on a `TreeNode` (a Spark operator or expression) explaining why
204+
* Comet cannot accelerate it. Reasons recorded here are surfaced in extended explain output
205+
* (see `ExtendedExplainInfo`) and, when `COMET_LOG_FALLBACK_REASONS` is enabled, logged as
206+
* warnings. The reasons are also rolled up from child nodes so that the operator that remains
207+
* in the Spark plan carries the reasons from its converted-away subtree.
208+
*
209+
* Call this in any code path where Comet decides not to convert a given node - serde `convert`
210+
* methods returning `None`, unsupported data types, disabled configs, etc. Do not use this for
211+
* informational messages that are not fallback reasons: anything tagged here is treated by the
212+
* rules as a signal that the node falls back to Spark.
206213
*
207214
* @param node
208-
* The node to attach the explain information to. Typically a SparkPlan
215+
* The Spark operator or expression that is falling back to Spark.
209216
* @param info
210-
* Information text. Optional, may be null or empty. If not provided, then only information
211-
* from child nodes will be included.
217+
* The fallback reason. Optional, may be null or empty - pass empty only when the call is used
218+
* purely to roll up reasons from `exprs`.
212219
* @param exprs
213-
* Child nodes. Information attached in these nodes will be be included in the information
214-
* attached to @node
220+
* Child nodes whose own fallback reasons should be rolled up into `node`. Pass the
221+
* sub-expressions or child operators whose failure caused `node` to fall back.
215222
* @tparam T
216-
* The type of the TreeNode. Typically SparkPlan, AggregateExpression, or Expression
223+
* The type of the TreeNode. Typically `SparkPlan`, `AggregateExpression`, or `Expression`.
217224
* @return
218-
* The node with information (if any) attached
225+
* `node` with fallback reasons attached (as a side effect on its tag map).
219226
*/
220227
def withInfo[T <: TreeNode[_]](node: T, info: String, exprs: T*): T = {
221228
// support existing approach of passing in multiple infos in a newline-delimited string
@@ -228,22 +235,24 @@ object CometSparkSessionExtensions extends Logging {
228235
}
229236

230237
/**
231-
* Attaches explain information to a TreeNode, rolling up the corresponding information tags
232-
* from any child nodes. For now, we are using this to attach the reasons why certain Spark
233-
* operators or expressions are disabled.
238+
* Record one or more fallback reasons on a `TreeNode` and roll up reasons from any child nodes.
239+
* This is the set-valued form of [[withInfo]]; see that overload for the full contract.
240+
*
241+
* Reasons are accumulated (never overwritten) on the node's `EXTENSION_INFO` tag and are
242+
* surfaced in extended explain output. When `COMET_LOG_FALLBACK_REASONS` is enabled, each new
243+
* reason is also emitted as a warning.
234244
*
235245
* @param node
236-
* The node to attach the explain information to. Typically a SparkPlan
246+
* The Spark operator or expression that is falling back to Spark.
237247
* @param info
238-
* Information text. May contain zero or more strings. If not provided, then only information
239-
* from child nodes will be included.
248+
* The fallback reasons for this node. May be empty when the call is used purely to roll up
249+
* child reasons.
240250
* @param exprs
241-
* Child nodes. Information attached in these nodes will be be included in the information
242-
* attached to @node
251+
* Child nodes whose own fallback reasons should be rolled up into `node`.
243252
* @tparam T
244-
* The type of the TreeNode. Typically SparkPlan, AggregateExpression, or Expression
253+
* The type of the TreeNode. Typically `SparkPlan`, `AggregateExpression`, or `Expression`.
245254
* @return
246-
* The node with information (if any) attached
255+
* `node` with fallback reasons attached (as a side effect on its tag map).
247256
*/
248257
def withInfos[T <: TreeNode[_]](node: T, info: Set[String], exprs: T*): T = {
249258
if (CometConf.COMET_LOG_FALLBACK_REASONS.get()) {
@@ -259,25 +268,27 @@ object CometSparkSessionExtensions extends Logging {
259268
}
260269

261270
/**
262-
* Attaches explain information to a TreeNode, rolling up the corresponding information tags
263-
* from any child nodes
271+
* Roll up fallback reasons from `exprs` onto `node` without adding a new reason of its own. Use
272+
* this when a parent operator is itself falling back and wants to preserve the reasons recorded
273+
* on its child expressions/operators so they appear together in explain output.
264274
*
265275
* @param node
266-
* The node to attach the explain information to. Typically a SparkPlan
276+
* The parent operator or expression falling back to Spark.
267277
* @param exprs
268-
* Child nodes. Information attached in these nodes will be be included in the information
269-
* attached to @node
278+
* Child nodes whose fallback reasons should be aggregated onto `node`.
270279
* @tparam T
271-
* The type of the TreeNode. Typically SparkPlan, AggregateExpression, or Expression
280+
* The type of the TreeNode. Typically `SparkPlan`, `AggregateExpression`, or `Expression`.
272281
* @return
273-
* The node with information (if any) attached
282+
* `node` with the rolled-up reasons attached (as a side effect on its tag map).
274283
*/
275284
def withInfo[T <: TreeNode[_]](node: T, exprs: T*): T = {
276285
withInfos(node, Set.empty, exprs: _*)
277286
}
278287

279288
/**
280-
* Checks whether a TreeNode has any explain information attached
289+
* True if any fallback reason has been recorded on `node` (via [[withInfo]] / [[withInfos]]).
290+
* Callers that need to short-circuit when a prior rule pass has already decided a node falls
291+
* back can use this as the sticky signal.
281292
*/
282293
def hasExplainInfo(node: TreeNode[_]): Boolean = {
283294
node.getTagValue(CometExplainInfo.EXTENSION_INFO).exists(_.nonEmpty)

spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ import org.apache.spark.util.random.XORShiftRandom
4848

4949
import com.google.common.base.Objects
5050

51-
import org.apache.comet.CometConf
51+
import org.apache.comet.{CometConf, CometExplainInfo}
5252
import org.apache.comet.CometConf.{COMET_EXEC_SHUFFLE_ENABLED, COMET_SHUFFLE_MODE}
5353
import org.apache.comet.CometSparkSessionExtensions.{hasExplainInfo, isCometShuffleManagerEnabled, withInfos}
5454
import org.apache.comet.serde.{Compatible, OperatorOuterClass, QueryPlanSerde, SupportLevel, Unsupported}
@@ -411,6 +411,9 @@ object CometShuffleExchangeExec
411411
for (o <- orderings) {
412412
if (QueryPlanSerde.exprToProto(o, inputs).isEmpty) {
413413
reasons += s"unsupported range partitioning sort order: $o"
414+
// Roll up fallback reasons recorded on the sort-order expression (e.g. strict
415+
// floating-point sort) so they surface in the shuffle's explain output.
416+
o.getTagValue(CometExplainInfo.EXTENSION_INFO).foreach(reasons ++= _)
414417
}
415418
}
416419
for (dt <- orderings.map(_.dataType).distinct) {

0 commit comments

Comments
 (0)