-
Notifications
You must be signed in to change notification settings - Fork 35
Should it be okay for invalid fuzzers to fail mid-test? #243
Comments
Tons of questions, we could split a few over to different issues. I thought we wanted to remove Quick-fix for the out of memory issues could be to increase node heap size with
I use fuzzers because they're great at finding new bugs. I don't want them to forget about old bugs it found previously. If that means auto-generating tests, manually adding edge-cases or something else, I don't know. I'm using elm-test-tables right now for manually storing edge cases in test suites, but python hypothesis stores inputs (per-test seeds) in a file and always runs the seeds in that file before it starts fuzzing. Thus, you can never get a test suite that once failed CI to go through by retrying. There's still the problem of tests going through on the first attempt, and failing someone else's PR by blowing up later, but that's not fixable with fuzzers. Ah well. Regarding the slow tests, can we look into choosing the number of runs for each fuzz test based on how large the fuzzer is? Maybe add common edge cases with each fuzzer? Maybe store a weight in each fuzzer of roughly how many interestingly unique values it has, so we don't generate as many |
@drathier I couldn't find any "hanging" fuzzers. Issue #132 is probably relevant here: it says that problem is resolved. So, the reasons for removing |
I went back to every #62: What used to hang indefinitely now terminates quickly. If we keep The concept of an "invalid fuzzer" was created because it seemed to be coming up in a few different cases, but it's actually just the two that Richard mentioned at the top of the thread. Meanwhile, it seems that we have a working implementation of It's also possible that @drathier's idea of |
Overall, at this point in my life I don't feel strongly that
I think this assumes a general solution for fuzzer-releated performance issues exists. Barring such a solution our only recourse is to attack each performance problem separately.
I don't completely agree with this. If a test occasionally fails because of a bug of the code under test, that's property testing striking gold. If a test occasionally fails because of a bug in the test, that's test flakiness. Few things are hated as universally as test flakiness 😅.
I think this is a good idea, but I would use it to run more tests if the fuzzer is complicated. As it stands it's weird that we generate run the same 100 test cases when fuzzing a single This would make test performance worse though.
😻 These are great!
I think the primary problem of the the runtime error is its poor error message.
Yeah, I think this is at the moment the biggest disadvantage of using I feel if we keep |
It seems like the two problems with
Fuzz.andThen
mentioned in #161 boil down to:Fuzz.andThen
is only one of several ways to end up with slow tests when using fuzzers. If we need a general solution to that problem anyway, then removingFuzz.andThen
from the API doesn't solve it.Fuzz.andThen
. They may only sometimes show up as invalid. See Can create a crash by returning an invalid fuzzer fromandThen
#160It seems that we have only two functions that can potentially return invalid fuzzers:
Fuzz.oneOf
. This doesn't need to use the concept of invalid fuzzers. It can beoneOf : Fuzzer a -> List (Fuzzer a) -> Fuzzer a
so that it always has at least one value to work with.Fuzz.frequency
. Even it did the same "one or more" trick asFuzz.oneOf
, it would still have two invalid cases: if any of its weights aren't positive numbers, or if all the weights do not add up to a positive number.One potential solution would be if we didn't have
Fuzz.frequency
and instead only hadFuzz.oneOf
with the "requires at least one element at compile time" API.The example in the frequency docs is:
This could also be written:
In my opinion, this is not actually better. It's worse for performance, and
List.repeat
has the same problem asList.frequency
: what happens when you pass it negative numbers? We're just shifting the burden elsewhere.Realizing this made me rethink whether it's such a problem after all that you can get invalid fuzzers midway through a test run. The whole point of fuzzers is to expose edge cases; by using them at all, we accept that some of our tests will fail on some runs and not others - in fact, we're often hoping for that to happen!
Assuming we fixed the error message in #160 to be nice, how bad would it actually be if sometimes fuzzers turned out to be invalid midway through a run, leading to a failed test? It doesn't seem likely to lead to a whole lot of frustration in practice.
Thoughts?
The text was updated successfully, but these errors were encountered: