Skip to content

Commit 7c733de

Browse files
committed
Fix PlanStabilitySuite
1 parent 8174a87 commit 7c733de

2,432 files changed

Lines changed: 428641 additions & 19 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

dev/regenerate-golden-files.sh

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
# Usage: ./dev/regenerate-golden-files.sh [--spark-version <version>]
2323
#
2424
# Options:
25-
# --spark-version <version> Only regenerate for specified Spark version (3.4, 3.5, or 4.0)
25+
# --spark-version <version> Only regenerate for specified Spark version (3.4, 3.5, 4.0 or 4.1)
2626
# If not specified, regenerates for all versions.
2727
#
2828
# Examples:
@@ -119,7 +119,7 @@ main() {
119119
echo "Usage: $0 [--spark-version <version>]"
120120
echo ""
121121
echo "Options:"
122-
echo " --spark-version <version> Only regenerate for specified Spark version (3.4, 3.5, or 4.0)"
122+
echo " --spark-version <version> Only regenerate for specified Spark version (3.4, 3.5, 4.0 or 4.1)"
123123
echo " If not specified, regenerates for all versions."
124124
exit 0
125125
;;
@@ -133,9 +133,9 @@ main() {
133133

134134
# Validate target version if specified
135135
if [ -n "$target_version" ]; then
136-
if [[ ! "$target_version" =~ ^(3\.4|3\.5|4\.0)$ ]]; then
136+
if [[ ! "$target_version" =~ ^(3\.4|3\.5|4\.0|4\.1)$ ]]; then
137137
echo "[ERROR] Invalid Spark version: $target_version"
138-
echo "[ERROR] Supported versions: 3.4, 3.5, 4.0"
138+
echo "[ERROR] Supported versions: 3.4, 3.5, 4.0, 4.1"
139139
exit 1
140140
fi
141141
fi
@@ -155,7 +155,7 @@ main() {
155155
if [ -n "$target_version" ]; then
156156
versions=("$target_version")
157157
else
158-
versions=("3.4" "3.5" "4.0")
158+
versions=("3.4" "3.5" "4.0", "4.1")
159159
fi
160160

161161
# Install and regenerate for each version
Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
== Physical Plan ==
2+
TakeOrderedAndProject (44)
3+
+- * Project (43)
4+
+- * BroadcastHashJoin Inner BuildRight (42)
5+
:- * Project (36)
6+
: +- * BroadcastHashJoin Inner BuildRight (35)
7+
: :- * Project (29)
8+
: : +- * BroadcastHashJoin Inner BuildRight (28)
9+
: : :- * Filter (11)
10+
: : : +- * HashAggregate (10)
11+
: : : +- * CometColumnarToRow (9)
12+
: : : +- CometColumnarExchange (8)
13+
: : : +- * HashAggregate (7)
14+
: : : +- * Project (6)
15+
: : : +- * BroadcastHashJoin Inner BuildRight (5)
16+
: : : :- * Filter (3)
17+
: : : : +- * ColumnarToRow (2)
18+
: : : : +- Scan parquet spark_catalog.default.store_returns (1)
19+
: : : +- ReusedExchange (4)
20+
: : +- BroadcastExchange (27)
21+
: : +- * Filter (26)
22+
: : +- * HashAggregate (25)
23+
: : +- * CometColumnarToRow (24)
24+
: : +- CometColumnarExchange (23)
25+
: : +- * HashAggregate (22)
26+
: : +- * HashAggregate (21)
27+
: : +- * CometColumnarToRow (20)
28+
: : +- CometColumnarExchange (19)
29+
: : +- * HashAggregate (18)
30+
: : +- * Project (17)
31+
: : +- * BroadcastHashJoin Inner BuildRight (16)
32+
: : :- * Filter (14)
33+
: : : +- * ColumnarToRow (13)
34+
: : : +- Scan parquet spark_catalog.default.store_returns (12)
35+
: : +- ReusedExchange (15)
36+
: +- BroadcastExchange (34)
37+
: +- * CometColumnarToRow (33)
38+
: +- CometProject (32)
39+
: +- CometFilter (31)
40+
: +- CometNativeScan parquet spark_catalog.default.store (30)
41+
+- BroadcastExchange (41)
42+
+- * CometColumnarToRow (40)
43+
+- CometProject (39)
44+
+- CometFilter (38)
45+
+- CometNativeScan parquet spark_catalog.default.customer (37)
46+
47+
48+
(1) Scan parquet spark_catalog.default.store_returns
49+
Output [4]: [sr_customer_sk#1, sr_store_sk#2, sr_return_amt#3, sr_returned_date_sk#4]
50+
Batched: true
51+
Location: InMemoryFileIndex []
52+
PartitionFilters: [isnotnull(sr_returned_date_sk#4), dynamicpruningexpression(sr_returned_date_sk#4 IN dynamicpruning#5)]
53+
PushedFilters: [IsNotNull(sr_store_sk), IsNotNull(sr_customer_sk)]
54+
ReadSchema: struct<sr_customer_sk:int,sr_store_sk:int,sr_return_amt:decimal(7,2)>
55+
56+
(2) ColumnarToRow [codegen id : 2]
57+
Input [4]: [sr_customer_sk#1, sr_store_sk#2, sr_return_amt#3, sr_returned_date_sk#4]
58+
59+
(3) Filter [codegen id : 2]
60+
Input [4]: [sr_customer_sk#1, sr_store_sk#2, sr_return_amt#3, sr_returned_date_sk#4]
61+
Condition : (isnotnull(sr_store_sk#2) AND isnotnull(sr_customer_sk#1))
62+
63+
(4) ReusedExchange [Reuses operator id: 49]
64+
Output [1]: [d_date_sk#6]
65+
66+
(5) BroadcastHashJoin [codegen id : 2]
67+
Left keys [1]: [sr_returned_date_sk#4]
68+
Right keys [1]: [d_date_sk#6]
69+
Join type: Inner
70+
Join condition: None
71+
72+
(6) Project [codegen id : 2]
73+
Output [3]: [sr_customer_sk#1, sr_store_sk#2, sr_return_amt#3]
74+
Input [5]: [sr_customer_sk#1, sr_store_sk#2, sr_return_amt#3, sr_returned_date_sk#4, d_date_sk#6]
75+
76+
(7) HashAggregate [codegen id : 2]
77+
Input [3]: [sr_customer_sk#1, sr_store_sk#2, sr_return_amt#3]
78+
Keys [2]: [sr_customer_sk#1, sr_store_sk#2]
79+
Functions [1]: [partial_sum(UnscaledValue(sr_return_amt#3))]
80+
Aggregate Attributes [1]: [sum#7]
81+
Results [3]: [sr_customer_sk#1, sr_store_sk#2, sum#8]
82+
83+
(8) CometColumnarExchange
84+
Input [3]: [sr_customer_sk#1, sr_store_sk#2, sum#8]
85+
Arguments: hashpartitioning(sr_customer_sk#1, sr_store_sk#2, 5), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=1]
86+
87+
(9) CometColumnarToRow [codegen id : 9]
88+
Input [3]: [sr_customer_sk#1, sr_store_sk#2, sum#8]
89+
90+
(10) HashAggregate [codegen id : 9]
91+
Input [3]: [sr_customer_sk#1, sr_store_sk#2, sum#8]
92+
Keys [2]: [sr_customer_sk#1, sr_store_sk#2]
93+
Functions [1]: [sum(UnscaledValue(sr_return_amt#3))]
94+
Aggregate Attributes [1]: [sum(UnscaledValue(sr_return_amt#3))#9]
95+
Results [3]: [sr_customer_sk#1 AS ctr_customer_sk#10, sr_store_sk#2 AS ctr_store_sk#11, MakeDecimal(sum(UnscaledValue(sr_return_amt#3))#9,17,2) AS ctr_total_return#12]
96+
97+
(11) Filter [codegen id : 9]
98+
Input [3]: [ctr_customer_sk#10, ctr_store_sk#11, ctr_total_return#12]
99+
Condition : isnotnull(ctr_total_return#12)
100+
101+
(12) Scan parquet spark_catalog.default.store_returns
102+
Output [4]: [sr_customer_sk#13, sr_store_sk#14, sr_return_amt#15, sr_returned_date_sk#16]
103+
Batched: true
104+
Location: InMemoryFileIndex []
105+
PartitionFilters: [isnotnull(sr_returned_date_sk#16), dynamicpruningexpression(sr_returned_date_sk#16 IN dynamicpruning#5)]
106+
PushedFilters: [IsNotNull(sr_store_sk)]
107+
ReadSchema: struct<sr_customer_sk:int,sr_store_sk:int,sr_return_amt:decimal(7,2)>
108+
109+
(13) ColumnarToRow [codegen id : 4]
110+
Input [4]: [sr_customer_sk#13, sr_store_sk#14, sr_return_amt#15, sr_returned_date_sk#16]
111+
112+
(14) Filter [codegen id : 4]
113+
Input [4]: [sr_customer_sk#13, sr_store_sk#14, sr_return_amt#15, sr_returned_date_sk#16]
114+
Condition : isnotnull(sr_store_sk#14)
115+
116+
(15) ReusedExchange [Reuses operator id: 49]
117+
Output [1]: [d_date_sk#17]
118+
119+
(16) BroadcastHashJoin [codegen id : 4]
120+
Left keys [1]: [sr_returned_date_sk#16]
121+
Right keys [1]: [d_date_sk#17]
122+
Join type: Inner
123+
Join condition: None
124+
125+
(17) Project [codegen id : 4]
126+
Output [3]: [sr_customer_sk#13, sr_store_sk#14, sr_return_amt#15]
127+
Input [5]: [sr_customer_sk#13, sr_store_sk#14, sr_return_amt#15, sr_returned_date_sk#16, d_date_sk#17]
128+
129+
(18) HashAggregate [codegen id : 4]
130+
Input [3]: [sr_customer_sk#13, sr_store_sk#14, sr_return_amt#15]
131+
Keys [2]: [sr_customer_sk#13, sr_store_sk#14]
132+
Functions [1]: [partial_sum(UnscaledValue(sr_return_amt#15))]
133+
Aggregate Attributes [1]: [sum#18]
134+
Results [3]: [sr_customer_sk#13, sr_store_sk#14, sum#19]
135+
136+
(19) CometColumnarExchange
137+
Input [3]: [sr_customer_sk#13, sr_store_sk#14, sum#19]
138+
Arguments: hashpartitioning(sr_customer_sk#13, sr_store_sk#14, 5), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=2]
139+
140+
(20) CometColumnarToRow [codegen id : 5]
141+
Input [3]: [sr_customer_sk#13, sr_store_sk#14, sum#19]
142+
143+
(21) HashAggregate [codegen id : 5]
144+
Input [3]: [sr_customer_sk#13, sr_store_sk#14, sum#19]
145+
Keys [2]: [sr_customer_sk#13, sr_store_sk#14]
146+
Functions [1]: [sum(UnscaledValue(sr_return_amt#15))]
147+
Aggregate Attributes [1]: [sum(UnscaledValue(sr_return_amt#15))#9]
148+
Results [2]: [sr_store_sk#14 AS ctr_store_sk#20, MakeDecimal(sum(UnscaledValue(sr_return_amt#15))#9,17,2) AS ctr_total_return#21]
149+
150+
(22) HashAggregate [codegen id : 5]
151+
Input [2]: [ctr_store_sk#20, ctr_total_return#21]
152+
Keys [1]: [ctr_store_sk#20]
153+
Functions [1]: [partial_avg(ctr_total_return#21)]
154+
Aggregate Attributes [2]: [sum#22, count#23]
155+
Results [3]: [ctr_store_sk#20, sum#24, count#25]
156+
157+
(23) CometColumnarExchange
158+
Input [3]: [ctr_store_sk#20, sum#24, count#25]
159+
Arguments: hashpartitioning(ctr_store_sk#20, 5), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=3]
160+
161+
(24) CometColumnarToRow [codegen id : 6]
162+
Input [3]: [ctr_store_sk#20, sum#24, count#25]
163+
164+
(25) HashAggregate [codegen id : 6]
165+
Input [3]: [ctr_store_sk#20, sum#24, count#25]
166+
Keys [1]: [ctr_store_sk#20]
167+
Functions [1]: [avg(ctr_total_return#21)]
168+
Aggregate Attributes [1]: [avg(ctr_total_return#21)#26]
169+
Results [2]: [(avg(ctr_total_return#21)#26 * 1.2) AS (avg(ctr_total_return) * 1.2)#27, ctr_store_sk#20]
170+
171+
(26) Filter [codegen id : 6]
172+
Input [2]: [(avg(ctr_total_return) * 1.2)#27, ctr_store_sk#20]
173+
Condition : isnotnull((avg(ctr_total_return) * 1.2)#27)
174+
175+
(27) BroadcastExchange
176+
Input [2]: [(avg(ctr_total_return) * 1.2)#27, ctr_store_sk#20]
177+
Arguments: HashedRelationBroadcastMode(List(cast(input[1, int, true] as bigint)),false), [plan_id=4]
178+
179+
(28) BroadcastHashJoin [codegen id : 9]
180+
Left keys [1]: [ctr_store_sk#11]
181+
Right keys [1]: [ctr_store_sk#20]
182+
Join type: Inner
183+
Join condition: (cast(ctr_total_return#12 as decimal(24,7)) > (avg(ctr_total_return) * 1.2)#27)
184+
185+
(29) Project [codegen id : 9]
186+
Output [2]: [ctr_customer_sk#10, ctr_store_sk#11]
187+
Input [5]: [ctr_customer_sk#10, ctr_store_sk#11, ctr_total_return#12, (avg(ctr_total_return) * 1.2)#27, ctr_store_sk#20]
188+
189+
(30) CometNativeScan parquet spark_catalog.default.store
190+
Output [2]: [s_store_sk#28, s_state#29]
191+
Batched: true
192+
Location [not included in comparison]/{warehouse_dir}/store]
193+
PushedFilters: [IsNotNull(s_state), IsNotNull(s_store_sk)]
194+
ReadSchema: struct<s_store_sk:int,s_state:string>
195+
196+
(31) CometFilter
197+
Input [2]: [s_store_sk#28, s_state#29]
198+
Condition : ((isnotnull(s_state#29) AND (static_invoke(CharVarcharCodegenUtils.readSidePadding(s_state#29, 2)) = TN)) AND isnotnull(s_store_sk#28))
199+
200+
(32) CometProject
201+
Input [2]: [s_store_sk#28, s_state#29]
202+
Arguments: [s_store_sk#28], [s_store_sk#28]
203+
204+
(33) CometColumnarToRow [codegen id : 7]
205+
Input [1]: [s_store_sk#28]
206+
207+
(34) BroadcastExchange
208+
Input [1]: [s_store_sk#28]
209+
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [plan_id=5]
210+
211+
(35) BroadcastHashJoin [codegen id : 9]
212+
Left keys [1]: [ctr_store_sk#11]
213+
Right keys [1]: [s_store_sk#28]
214+
Join type: Inner
215+
Join condition: None
216+
217+
(36) Project [codegen id : 9]
218+
Output [1]: [ctr_customer_sk#10]
219+
Input [3]: [ctr_customer_sk#10, ctr_store_sk#11, s_store_sk#28]
220+
221+
(37) CometNativeScan parquet spark_catalog.default.customer
222+
Output [2]: [c_customer_sk#30, c_customer_id#31]
223+
Batched: true
224+
Location [not included in comparison]/{warehouse_dir}/customer]
225+
PushedFilters: [IsNotNull(c_customer_sk)]
226+
ReadSchema: struct<c_customer_sk:int,c_customer_id:string>
227+
228+
(38) CometFilter
229+
Input [2]: [c_customer_sk#30, c_customer_id#31]
230+
Condition : isnotnull(c_customer_sk#30)
231+
232+
(39) CometProject
233+
Input [2]: [c_customer_sk#30, c_customer_id#31]
234+
Arguments: [c_customer_sk#30, c_customer_id#32], [c_customer_sk#30, static_invoke(CharVarcharCodegenUtils.readSidePadding(c_customer_id#31, 16)) AS c_customer_id#32]
235+
236+
(40) CometColumnarToRow [codegen id : 8]
237+
Input [2]: [c_customer_sk#30, c_customer_id#32]
238+
239+
(41) BroadcastExchange
240+
Input [2]: [c_customer_sk#30, c_customer_id#32]
241+
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [plan_id=6]
242+
243+
(42) BroadcastHashJoin [codegen id : 9]
244+
Left keys [1]: [ctr_customer_sk#10]
245+
Right keys [1]: [c_customer_sk#30]
246+
Join type: Inner
247+
Join condition: None
248+
249+
(43) Project [codegen id : 9]
250+
Output [1]: [c_customer_id#32]
251+
Input [3]: [ctr_customer_sk#10, c_customer_sk#30, c_customer_id#32]
252+
253+
(44) TakeOrderedAndProject
254+
Input [1]: [c_customer_id#32]
255+
Arguments: 100, [c_customer_id#32 ASC NULLS FIRST], [c_customer_id#32]
256+
257+
===== Subqueries =====
258+
259+
Subquery:1 Hosting operator id = 1 Hosting Expression = sr_returned_date_sk#4 IN dynamicpruning#5
260+
BroadcastExchange (49)
261+
+- * CometColumnarToRow (48)
262+
+- CometProject (47)
263+
+- CometFilter (46)
264+
+- CometNativeScan parquet spark_catalog.default.date_dim (45)
265+
266+
267+
(45) CometNativeScan parquet spark_catalog.default.date_dim
268+
Output [2]: [d_date_sk#6, d_year#33]
269+
Batched: true
270+
Location [not included in comparison]/{warehouse_dir}/date_dim]
271+
PushedFilters: [IsNotNull(d_year), EqualTo(d_year,2000), IsNotNull(d_date_sk)]
272+
ReadSchema: struct<d_date_sk:int,d_year:int>
273+
274+
(46) CometFilter
275+
Input [2]: [d_date_sk#6, d_year#33]
276+
Condition : ((isnotnull(d_year#33) AND (d_year#33 = 2000)) AND isnotnull(d_date_sk#6))
277+
278+
(47) CometProject
279+
Input [2]: [d_date_sk#6, d_year#33]
280+
Arguments: [d_date_sk#6], [d_date_sk#6]
281+
282+
(48) CometColumnarToRow [codegen id : 1]
283+
Input [1]: [d_date_sk#6]
284+
285+
(49) BroadcastExchange
286+
Input [1]: [d_date_sk#6]
287+
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [plan_id=7]
288+
289+
Subquery:2 Hosting operator id = 12 Hosting Expression = sr_returned_date_sk#16 IN dynamicpruning#5
290+
291+
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
TakeOrderedAndProject
2+
+- Project
3+
+- BroadcastHashJoin
4+
:- Project
5+
: +- BroadcastHashJoin
6+
: :- Project
7+
: : +- BroadcastHashJoin
8+
: : :- Filter
9+
: : : +- HashAggregate
10+
: : : +- CometColumnarToRow
11+
: : : +- CometColumnarExchange
12+
: : : +- HashAggregate
13+
: : : +- Project
14+
: : : +- BroadcastHashJoin
15+
: : : :- Filter
16+
: : : : +- ColumnarToRow
17+
: : : : +- Scan parquet spark_catalog.default.store_returns [COMET: Native DataFusion scan does not support subqueries/dynamic pruning]
18+
: : : : +- SubqueryBroadcast
19+
: : : : +- BroadcastExchange
20+
: : : : +- CometColumnarToRow
21+
: : : : +- CometProject
22+
: : : : +- CometFilter
23+
: : : : +- CometNativeScan parquet spark_catalog.default.date_dim
24+
: : : +- BroadcastExchange
25+
: : : +- CometColumnarToRow
26+
: : : +- CometProject
27+
: : : +- CometFilter
28+
: : : +- CometNativeScan parquet spark_catalog.default.date_dim
29+
: : +- BroadcastExchange
30+
: : +- Filter
31+
: : +- HashAggregate
32+
: : +- CometColumnarToRow
33+
: : +- CometColumnarExchange
34+
: : +- HashAggregate
35+
: : +- HashAggregate
36+
: : +- CometColumnarToRow
37+
: : +- CometColumnarExchange
38+
: : +- HashAggregate
39+
: : +- Project
40+
: : +- BroadcastHashJoin
41+
: : :- Filter
42+
: : : +- ColumnarToRow
43+
: : : +- Scan parquet spark_catalog.default.store_returns [COMET: Native DataFusion scan does not support subqueries/dynamic pruning]
44+
: : : +- ReusedSubquery
45+
: : +- BroadcastExchange
46+
: : +- CometColumnarToRow
47+
: : +- CometProject
48+
: : +- CometFilter
49+
: : +- CometNativeScan parquet spark_catalog.default.date_dim
50+
: +- BroadcastExchange
51+
: +- CometColumnarToRow
52+
: +- CometProject
53+
: +- CometFilter
54+
: +- CometNativeScan parquet spark_catalog.default.store
55+
+- BroadcastExchange
56+
+- CometColumnarToRow
57+
+- CometProject
58+
+- CometFilter
59+
+- CometNativeScan parquet spark_catalog.default.customer
60+
61+
Comet accelerated 18 out of 49 eligible operators (36%). Final plan contains 10 transitions between Spark and Comet.

0 commit comments

Comments
 (0)