-
Notifications
You must be signed in to change notification settings - Fork 35
Make Fuzz.string generate unicode strings #204
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not very helpful to break someone's test suite and the only way to fix it is to use a forked core. This looks good for the most part, but I think we should hold until the core fix is released. Adding blocked label.
src/Util.elm
Outdated
|
||
-- all | ||
, frequency pairs | ||
|> constant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks the same as the "single generator".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The single generator was incorrect before. It's fixed now :)
This should be safe to merge into 0.19 now. Elm doesn't break emojis in 0.19, but the language still doesn't handle reversal of unicode strings properly, it simply reverses the array of unicode code points.
prints
|
I'll hit the approve button, but we should be careful about the release and have a plan to rollback if we break a lot of tests. |
I'm getting very confused regarding elm-community/elm-test vs elm-explorations/test. Is this repo legacy 0.18, and going to be left as-is from now on, with new development being done in elm-explorations/test? If so, this PR should be merged into elm-explorations/test, not this repo. |
Oh shoot, I got turned around and thought this was the active repo. It is not. |
Yeah, I think we should archive this repo. Any objections? |
Sure, as soon as we've migrated the PR's |
This implements #201. It also replaces the implementation for #198 #200 (generate whitespace-only strings).
The main difference is that I changed the way we generate strings to make use of equivalence classes:
The character classes are:
[32, 126]
(inclusive)space, \t, \n
🌈, ❤, 🔥
~^¨
, or rather̃, ̂, ̈,
(without the commas)I chose these characters since I think they give people who write websites in English for English users a simple reason to support all of unicode, which helps a lot of people. Bilinguals, people whose names use non-ascii characters, and anyone who wants to use Emoji benefits from this. The characters are all familiar to English speakers, and they cover the most important classes in unicode. Emoji require two utf-16 code units, as does most (all?) Asian characters. Combining diacritical mark support covers all of extended Latin, i.e. all of Europe.
The only large character class remaining is inline right-to-left text, but I've left that one out because I have no idea what properties you'd test in such a script that aren't relevant to the rest of unicode. Arabic letters change shape depending on where in a sentance they're used, the byte order doesn't match the order the characters are presented on screen, and it becomes possible to generate invalid strings both in the generation and shrinking phases.
This change builds on the assumption that "abc" and "xyz" are probably going to trigger the same errors, because they share a lot of properties; a-z, lowercase, sorted, unique characters, same length, substring of the English alphabet.
I haven't written a custom shrinker for this subset of unicode, maybe I should do that?
I've previously been unsure if we want to release this right away, or wait for core to release a patch. It's not looking like we're getting a patch until the next major version is released, and 0.19 isn't in alpha yet.
String.toList
, and those users will have to use a non-core version of that function until core releases a patch, if they have this bug. TheString.toList >> String.fromList
identity seems to hold both with and without this bug, so it's not going to affect a lot of users.