-
Notifications
You must be signed in to change notification settings - Fork 35
Distinguishing between identical-looking characters in test output #225
Comments
Things get a little more complicated when you consider we might see more than just a string, but I think this could still work. Consider: I imagine we would want to handle things like this:
I believe tricky bit there would be inserting the parenthesis correctly. Alternatively, if we could get the upstream toString function to behave this way then we'd get this behavior for "free". I think that's where this really belongs. I've opened https://github.com/elm-lang/core/issues/930 and I referenced this issue there. However, I'm not sure how a change to toString would affect the diffing code. |
The I'm 100% for splitting strings into `"foo" ++ "\xA0" ++ "bar baz" given the current compiler constraints. The main question is, which characters do we consider printable? Do we have unicode support in the terminal? Does the terminal font support this new batch of emojis? Ascii-only seems way too restrictive. I'd like both. Something like this, where we assume 100% unicode support, but also show a safe output with a lot of escape sequences if we're not using a small subset of unicode that we deem safe, probably just I'd also like any escape sequences in one of the values to cause a escape sequence to be printed in the other value in the same format, so you can compare easily. See below, where space isn't a special character, but no-breaking-space is, so we escape both to show the difference.
Combine this with some Elm-style super helpful error messages to explain why we're printing both values two times. This should also help with problems with unicode normalization, like |
That's a good point. Given that Evan rewrote the parser for 0.19, this may not be an issue after then. 🤔
For that to be true, I think there would need to be other compelling use cases for that, and I'm not sure what those would be. It seems like we have identified one case where we want this behavior:
Agreed! Here's an idea:
Not the most efficient, but seems like it would work. 😄
If the terminal font is lacking support, my understanding is that it'll still print characters that are discernable as different (so you can clearly see where the problem is)—they'll just be the wrong characters. I think we can decouple that discussion from this one; if people report that as a problem, we can deal with it based on learning what their particular scenario is. For our purposes, I think the answer is "do this for visually indistinguishable characters"—as in, any character for which there exists another Unicode character that cannot be visually told apart from it.
👍 Sounds good to me!
Hm...is this better? To me, I found the example in OP easier to compare. I can quickly see "oh, there's a space there versus a...I don't know what the other thing is, but it's not a space." In this case (error message or no), I have to do more work to reach that conclusion. (" |
I like the idea of breaking things like this, but I agree it could be more confusing as-is. How about doing the same splitting, but still printing "normal" characters like this:
That way it's still easy to compare the space vs. the unicode space, but it's also easy to see what the regular one is.
I think the same issues apply anywhere where a person is actually reading the output of toString. For example, consider a string with non-printable characters: For most types, toString returns a string you could feed to the repl to generate that type. If it's changed to output strings like "\x01\x02\x03" that's would continue to be true, but it isn't the case with the current behavior. |
As far as I'm aware, this will be solved by removing octal and hexadecimal escape sequences, in favour of unicode escape sequences |
I would like to print both, like in my snippet above. It's really unhelpful to get So I would like something like this:
(without all the We would highlight the diff in both the string and the unicode-safe output. I'm not entirely sure the
I would still like to know that it's something that looks like a space, because that's probably relevant to me. If it's a non-breaking space or an emoji tells me quite a lot about what could be the problem. |
Oh, excellent!
That's interesting. We could show hints whenever we do this, e.g.
We could always show the |
Looks like Evan has switched to Guessing that means that we'll get |
I like @drathier's idea of displaying the unicode in addition to the normal output. I wonder if we can intelligently display this only when there are special characters present? It also sounds like we should wait for 0.19, and gather more examples of characters that are problematic in practice. |
Looks like we're waiting for 0.19 before we look into this. Adding a blocked label. |
0.19 is here, we are no longer blocked. Related to elm-explorations/test#38 |
As #217 notes:
This is a problem that exists regardless of which test runner you're using, and regardless of what kind of test you're writing (fuzz or not).
Here's an example of test output that could be confusing right now:
Those look identical! Why did
Expect.equal
fail?The answer is that there are two different flavors of space being used here, but you can't tell that just from looking at the output.
This would become clear if the above output looked like this instead:
(The
++
is there because"foo\x00A0bar baz"
would represent the string"fooꂺr baz"
- which is not what we want.)This seems like it would resolve #217 in a way that's both clear and which would not require any extra effort on the part of test authors.
Implementation notes:
String
to aList String
so that test runners could, for example, do different syntax highlighting on the++
operators.Thoughts? @rhofour @mgold @drathier
The text was updated successfully, but these errors were encountered: