-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MONGOCRYPT-759 Implement CFold #941
Conversation
src/unicode/fold.c
Outdated
CLIENT_ERR("unicode_fold: Either case or diacritic folding must be enabled"); | ||
return false; | ||
} | ||
*out_str = bson_malloc(len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to null-terminate output
*out_str = bson_malloc(len); | |
*out_str = bson_malloc(len + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now null-terminating.
src/unicode/fold.c
Outdated
*out_len = (size_t)(output_it - *out_str); | ||
*out_str = realloc(*out_str, *out_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally wouldn't bother with the realloc
here just to shrink it. The folded string is not gonna be around long enough to make it worth the realloc cost.
Need to null terminate output string:
*out_len = (size_t)(output_it - *out_str); | |
*out_str = realloc(*out_str, *out_len); | |
*output_it = '\0'; | |
*out_len = (size_t)(output_it - *out_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
test/test-unicode-fold.c
Outdated
const char nfd2[] = {'C', 'a', 'f', 'E', 0xcc, 0x81, 0}; | ||
const char nfd2_lower[] = {'c', 'a', 'f', 'e', 0xcc, 0x81, 0}; | ||
TEST_UNICODE_FOLD_ALL_CASES(nfd2, nfd2_lower, "CafE", "cafe"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add:
TEST_UNICODE_FOLD("fo\0bar", 6, "fo\0bar", 6, kUnicodeFoldToLower | kUnicodeFoldRemoveDiacritics); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more nit, and lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor test suggestion.
Co-authored-by: Kevin Albertson <[email protected]>
Co-authored-by: Kevin Albertson <[email protected]>
The new
unicode/
implements folding as implemented inmongo::unicode::String::caseFoldAndStripDiacritics
in the server. The two_map.c
files are generated bygen_[diacritic|casefold]_map.py
in the server, with modifications so that they work in C. Note that these maps are not the same as the ones on the server: Those use unicode 8.0, while we use unicode 13.0.0 (for the simple reason that this is the latest unicode version supported by the version of python we use for the server).We now do random unicode string generation for unit testing, rather than the static strings we had before. My hope is that this will be able to test a wider variety of cases than I myself can think of.