[FLINK-39623][table-planner] Make CAST(BYTES AS STRING) strict on invalid UTF-8#28125
[FLINK-39623][table-planner] Make CAST(BYTES AS STRING) strict on invalid UTF-8#28125gustavodemorais wants to merge 3 commits intoapache:masterfrom
Conversation
| @@ -0,0 +1,41 @@ | |||
| --- | |||
| title: "Release Notes - Flink 2.4" | |||
There was a problem hiding this comment.
this should be done by release managers
the release notes for the PR/jira should be put in Release notes of jira issue field and RM could incorporate them into final release notes doc
There was a problem hiding this comment.
But does it harm to start early? I actually like this idea, it makes a RM work easier later.
There was a problem hiding this comment.
i guess the main issue is that when RM prepares release he/she goes through steps defined in https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
there is a step to look into release notes in jira and nothing about this....
i'm not against this approach if it will be documented
There was a problem hiding this comment.
Thanks for the note, @snuyanzin. I've added the release notes to the jira ticket and removed from the PR. Also added a short note with this info in Agents.md f860662
I think having it in the PR is nice to give the chance for the release notes to be reviewed directly. However, it's of course a pain to change an existing working process. Do you personally have a preference as someone who has had more experience with releases?
| - sql: CAST(value AS type) | ||
| table: ANY.cast(TYPE) | ||
| description: Returns a new value being cast to type type. A CAST error throws an exception and fails the job. When performing a cast operation that may fail, like STRING to INT, one should rather use TRY_CAST, in order to handle errors. If "table.exec.legacy-cast-behaviour" is enabled, CAST behaves like TRY_CAST. E.g., CAST('42' AS INT) returns 42; CAST(NULL AS STRING) returns NULL of type STRING; CAST('non-number' AS INT) throws an exception and fails the job. | ||
| description: Returns a new value being cast to type type. A CAST error throws an exception and fails the job. When performing a cast operation that may fail, like STRING to INT, one should rather use TRY_CAST, in order to handle errors. If "table.exec.legacy-cast-behaviour" is enabled, CAST behaves like TRY_CAST. E.g., CAST('42' AS INT) returns 42; CAST(NULL AS STRING) returns NULL of type STRING; CAST('non-number' AS INT) throws an exception and fails the job. Casting BINARY/VARBINARY/BYTES to a CHAR/VARCHAR/STRING type validates that the input is well-formed UTF-8 and throws on invalid sequences. Use MAKE_VALID_UTF8 to substitute the Unicode replacement character `U+FFFD` for invalid bytes, TRY_CAST to return NULL, or set "table.exec.legacy-bytes-to-string-cast" to "true" to restore the prior silent-substitution behavior. |
There was a problem hiding this comment.
can we have sql more consistent: in some places it is with back ticks, in some without
twalthr
left a comment
There was a problem hiding this comment.
LGTM % Sergey's comments
|
Thanks for the reviews, @snuyanzin and @twalthr. Addressed the comments 🙂 |
1d4d39e to
f860662
Compare
What is the purpose of the change
CAST from BINARY/VARBINARY/BYTES to a CHAR/VARCHAR/STRING type now validates UTF-8 and throws on invalid input instead of silently substituting U+FFFD. A new ExecutionConfigOption restores the prior behaviour for users who need it. Part of FLIP-568.
Brief change log
Verifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving): yes - new ConfigOption on ExecutionConfigOptionsDocumentation
Was generative AI tooling used to co-author this PR?
2.1.117 (Claude Code)