-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(regexp_replace): Move regex preprocessing to functions/lib for Sp…
…ark reuse and fix backslash handling (#10981) Summary: 1. Move Presto's pattern and replacement preprocessing for the `regex_replace` function to `functions/lib` so that Spark can reuse this code. 2. Update the function `prepareRegexpReplaceReplacement`. The reason is that `RE2` only supports '\\' followed by a digit or another '\\'. However, in Presto Java and Spark, '\\' in replacements will be ignored, so we unescape this when preparing. Diff in prepareRegexpReplaceReplacement Before ```c++ // Un-escape dollar-sign '$'. static const RE2 kUnescapeRegex(R"(\\\$)"); VELOX_DCHECK( kUnescapeRegex.ok(), "Invalid regular expression {}: {}.", R"(\\\$)", kUnescapeRegex.error()); RE2::GlobalReplace(&newReplacement, kUnescapeRegex, "$"); ``` After ```c++ // Un-escape character except digit or '\\' static const RE2 kUnescapeRegex(R"(\\([^0-9\\]))"); VELOX_DCHECK( kUnescapeRegex.ok(), "Invalid regular expression {}: {}.", R"(\\([^0-9\\]))", kUnescapeRegex.error()); RE2::GlobalReplace(&newReplacement, kUnescapeRegex, R"(\1)"); ``` Pull Request resolved: #10981 Reviewed By: kgpai Differential Revision: D66376796 Pulled By: kagamiori fbshipit-source-id: a12e3eb9e91fa295c5986e1e373379b5c1f6a5e6
- Loading branch information
1 parent
f33b40d
commit 3000981
Showing
7 changed files
with
183 additions
and
118 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters