my normalization implementation #4647
gpawru
started this conversation in
Show and tell
Replies: 1 comment 2 replies
-
Thanks for the note! Do you have a sense of where the performance difference might come from? Is it just that your data size is larger (100 kB is quite large)? I would be curious to see if you can improve on ICU4X performance with smaller data files. Alternatively, it would be compelling to see numbers if you can share data between the four normalization routines instead of shipping big data for each one. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone!
First of all, thank you very much for what you do, it's inspiring! 🤟
Some time ago, I decided to write a few articles about Unicode, and logically, I wrote several code examples for these articles. More specifically, I implemented my own normalization (in progress - collations; articles not yet published - being reviewed by the editor, but that's all off-topic).
By chance, I noticed this section and decided to share my results (maybe it will be useful to someone?). I would like to ask for advice: perhaps there is something worth correcting? Should I make a crate out of these drafts?
Comments in the code are in russian (since the main purpose was to supplement the articles), but Google translate will handle it :)
I decided to sacrifice a bit of data size for performance. Here are the sizes of the compressed data:
Tests used: UCD tests + results comparsion with ICU4X; all combinations tests on early stages (not included in the repositories, SLOW, but helped to discover #4527 )
For benchmarks, I took texts in various languages (most common) (public domain texts, for some languages - simply Google translated), trimmed to 100 Kb, and made two versions: regular and predecomposed.
And finally the benchmarks:
Decomposing, µs :
Composing, µs :
Repositories:
Footnotes
dec - predecomposed texts used ↩ ↩2 ↩3 ↩4
Beta Was this translation helpful? Give feedback.
All reactions