Skip to content

ClickHouse: Support scalar expressions in WITH clause#2333

Open
alrevuelta wants to merge 2 commits into
apache:mainfrom
alrevuelta:with-scalar-clickhouse
Open

ClickHouse: Support scalar expressions in WITH clause#2333
alrevuelta wants to merge 2 commits into
apache:mainfrom
alrevuelta:with-scalar-clickhouse

Conversation

@alrevuelta
Copy link
Copy Markdown

Fixes #2221.

ClickHouse's WITH clause supports a reversed-order form <expression> AS <identifier> (alongside, and freely interleaved with, traditional CTEs). Currently the parser hard-errors on the leading non-identifier token:

WITH 42 AS answer SELECT answer FROM t
-- Expected: identifier, found: 42

Changes

  • New WithItem enum (Cte(Cte) | Named { expr, alias }); With::cte_tables: Vec<Cte> becomes With::items: Vec<WithItem>.
  • New dialect flag Dialect::supports_with_clause_scalar_expression (default false, true for ClickHouseDialect only).
  • New parse_with_item routes each WITH-list element: when the leading token can't begin a CTE name, it parses <expr> AS <ident> directly; otherwise it tries the CTE form via maybe_parse and falls back. parse_cte itself is unchanged.

Tests

Added in tests/sqlparser_clickhouse.rs, covering: literal scalar (the issue's example), string literal, aggregate, scalar subquery, bare-identifier disambiguation, mixing scalar + CTE in one list, lambda, and a negative test confirming non-ClickHouse dialects still reject the form.

Comment thread tests/sqlparser_clickhouse.rs Outdated
Comment on lines +1901 to +1910
#[test]
fn parse_with_clause_named_expression_unsupported_in_other_dialects() {
// The named-expression form is only enabled for ClickHouse; other
// dialects should still reject `WITH 42 AS answer …`.
let res = sqlparser::parser::Parser::parse_sql(
&GenericDialect {},
"WITH 42 AS answer SELECT answer FROM t",
);
assert!(res.is_err(), "expected parse error, got {res:?}");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip this, no need to test what the parser does for an invalid query for unsupported dialects

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread tests/sqlparser_clickhouse.rs Outdated
}

#[test]
fn parse_with_clause_named_expression_ast() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can merge this with the test above since they cover the same feature

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread tests/sqlparser_clickhouse.rs Outdated
#[test]
fn parse_with_clause_named_expression() {
// Plain literal scalar.
clickhouse().verified_stmt("WITH 42 AS answer SELECT answer FROM t");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the tests, lets use all_dialects_where instead of hardcoding it to clickhouse

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/parser/mod.rs Outdated
Comment on lines +14654 to +14661
// CTE form must start with an identifier. If the leading token
// can't begin one (e.g. `42`, `(SELECT …)`, `(x, y) -> …`), this
// is unambiguously the named-expression form.
if matches!(self.peek_token().token, Token::Word(_)) {
if let Some(cte) = self.maybe_parse(|p| p.parse_cte())? {
return Ok(WithItem::Cte(cte));
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified to use peek_subquery_or_cte_start? e.g.

that we do if self.dialect.supports() and !self.peek_subquery_or_cte_start { parse expr } else { parse cte }

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, not sure i fully understand this. with that, will WITH 42 AS answer SELECT answer FROM t parse? peek_subquery_or_cte_start returns false on 42 (not (SELECT / (WITH), so we'd go to parse_cte, which then errors on 42 not being an identifier?

Comment thread src/parser/mod.rs Outdated
Comment on lines +14642 to +14648
/// Parse a single item in a `WITH` clause.
///
/// In standard SQL this is always a CTE (`name [(cols)] AS (query)`).
/// Dialects that enable [`Dialect::supports_with_clause_scalar_expression`]
/// — currently only ClickHouse — also accept the reversed form
/// `<expression> AS <identifier>`, which can be freely interleaved with
/// CTEs in the same comma-separated list.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Parse a single item in a `WITH` clause.
///
/// In standard SQL this is always a CTE (`name [(cols)] AS (query)`).
/// Dialects that enable [`Dialect::supports_with_clause_scalar_expression`]
/// — currently only ClickHouse — also accept the reversed form
/// `<expression> AS <identifier>`, which can be freely interleaved with
/// CTEs in the same comma-separated list.
/// Parse a single item in a `WITH` clause.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/ast/query.rs Outdated
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub enum WithItem {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repr wise can we do something like?

pub enum WithExpression {
    Cte(...),
    Cse(ExprWithAlias), // can we reuse exprwithalias?
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/ast/query.rs Outdated
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub enum WithItem {
/// A traditional common table expression: `name [(cols)] AS [MATERIALIZED] (query)`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A traditional common table expression: `name [(cols)] AS [MATERIALIZED] (query)`.
/// A common table expression.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/ast/query.rs Outdated
Comment on lines +780 to +785
/// `<expr> AS <alias>` — binds an expression (literal, scalar subquery,
/// lambda, …) to a name visible in the surrounding query.
///
/// See ClickHouse's [common scalar expressions][1].
///
/// [1]: https://clickhouse.com/docs/sql-reference/statements/select/with#common-scalar-expressions
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `<expr> AS <alias>` — binds an expression (literal, scalar subquery,
/// lambda, …) to a name visible in the surrounding query.
///
/// See ClickHouse's [common scalar expressions][1].
///
/// [1]: https://clickhouse.com/docs/sql-reference/statements/select/with#common-scalar-expressions
/// A common scalar expression.
///
/// [Clickhouse]: https://clickhouse.com/docs/sql-reference/statements/select/with#common-scalar-expressions

Comment thread src/ast/query.rs Outdated
pub cte_tables: Vec<Cte>,
/// The items declared by this `WITH` clause: traditional CTEs and,
/// for dialects that support it, named expressions.
pub items: Vec<WithItem>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub items: Vec<WithItem>,
pub exprs: Vec<WithExpression>,

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/ast/query.rs Outdated
Comment on lines +757 to +758
/// The items declared by this `WITH` clause: traditional CTEs and,
/// for dialects that support it, named expressions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The items declared by this `WITH` clause: traditional CTEs and,
/// for dialects that support it, named expressions.
/// The expressions declared by this `WITH` clause.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alrevuelta
Copy link
Copy Markdown
Author

@iffyio thanks for the comments, fixed.

there is only this that i dont fully understand if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ClickHouse: Parsing error for WITH statement using scalar expressions

2 participants