Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6805] Support hex function for Hive and Spark Library #4172

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

xuzifu666
Copy link
Member

Currently Spark Hive hex and unhex function not introduce for calcite, need to support it.

jira: https://issues.apache.org/jira/browse/CALCITE-6805

@xuzifu666
Copy link
Member Author

@rubenada Thanks for your review, all comments had addressed, PTAL

@rubenada
Copy link
Contributor

Thanks @xuzifu666 . I left another minor comment, but overall lgtm.
One small detail: there's a typo on the Jira title, PR title and (most importantly) on the commit message (Libaray instead of Library), could you please fix that?

@xuzifu666 xuzifu666 changed the title [CALCITE-6805] Support hex and unhex for Hive and Spark Libaray [CALCITE-6805] Support hex and unhex for Hive and Spark Library Jan 29, 2025
@xuzifu666
Copy link
Member Author

Thanks @xuzifu666 . I left another minor comment, but overall lgtm. One small detail: there's a typo on the Jira title, PR title and (most importantly) on the commit message (Libaray instead of Library), could you please fix that?

Very Thanks for @rubenada review , had addressed comments and fix the title of pr and commit, PTAL~

@rubenada rubenada added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Jan 29, 2025
"abc",
"VARCHAR NOT NULL");
f.checkString("unhex('')", "", "VARCHAR NOT NULL");
f.checkNull("unhex(cast(null as varbinary))");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add more tests

e.g.

scala> val df = spark.sql("select unhex(null)")
df: org.apache.spark.sql.DataFrame = [unhex(NULL): binary]

scala> df.show()
+-----------+
|unhex(NULL)|
+-----------+
|       null|
+-----------+


scala> val df = spark.sql("select unhex(0)")
df: org.apache.spark.sql.DataFrame = [unhex(0): binary]

scala> df.show()
+--------+
|unhex(0)|
+--------+
|    [00]|
+--------+


scala> val df = spark.sql("select unhex(-1)")
df: org.apache.spark.sql.DataFrame = [unhex(-1): binary]

scala> df.show()
+---------+
|unhex(-1)|
+---------+
|     null|
+---------+


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@caicancai caicancai Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that unhex(0) is equal to null? I tested it in Spark and the result is [00]

Copy link
Member Author

@xuzifu666 xuzifu666 Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unhex maybe had some problem in hive and spark side, after discussion I remove the implement of unhex from this pr, this pr change to support hex only and issue also changed too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuzifu666 feel free to open a separate Jira ticket (related to this one) to cover the unhex implementation, even if you don't plan to work on it right away.

@xuzifu666 xuzifu666 force-pushed the calcite6805 branch 2 times, most recently from 6b93d9b to 3e88ead Compare January 29, 2025 13:00
@xuzifu666 xuzifu666 closed this Jan 29, 2025
@xuzifu666 xuzifu666 reopened this Jan 29, 2025
@xuzifu666
Copy link
Member Author

@caicancai had addressed comments,PTAL

@@ -364,6 +364,21 @@ public static String toHex(ByteString byteString) {
return Hex.encodeHexString(byteString.getBytes());
}

/** SQL HEX(varchar) function. */
public static String hex(String value) {
return Hex.encodeHexString(value.getBytes(UTF_8));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's UTF-8, but I'd accept it if it's compatible with all tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, UTF-8 seems ok for all, I had consider other funciton such md5/base32.. function also use UTF8 and CI is OK.

@xuzifu666 xuzifu666 changed the title [CALCITE-6805] Support hex and unhex for Hive and Spark Library [CALCITE-6805] Support hex function for Hive and Spark Library Jan 30, 2025
@rubenada
Copy link
Contributor

LGTM, @caicancai waiting for you approval too before finalize the merge.

Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add some tests

@xuzifu666
Copy link
Member Author

Need to add some tests

Thanks for reminder, I had add more tests for it~ @caicancai

Copy link
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@caicancai
Copy link
Member

Just a small suggestion, you don't need to compress and submit every time, just compress and submit after the final approval.

@caicancai caicancai merged commit 8539f51 into apache:main Jan 30, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants