fuzzing: Move to an iterator pattern for input. #20998

gcoakes · 2024-08-08T08:49:56Z

Previously, the std.testing.fuzzInput interface was called once for a given test function. This had the downside that the test functions were unable to initialize state independent of the fuzz loop.

Now, std.testing.fuzzInput has been replaced with std.testing.fuzzer which returns a std.testing.Fuzzer. To receive fuzz input, the function must loop on the std.testing.Fuzzer.next function, retrieving one std.testing.Run at a time. That run object contains a reference to a per-loop allocator and the fuzz input for that loop.

Additionally, with the ability to accept multiple fuzz inputs within the test function, the corpus provided to std.testing.fuzzer is now used to dry-run the test will all the provided inputs. In a future commit, this may also be used to seed the fuzzer.

Some notes on testing:

The newly added web interface is unable to load for me with many error messages emitted in the console. This issue exists on the master branch independent of my changes.
Many aarch64 related tests are failing on my machine. I suspect this is a local issue not related to my changes. This issue also exists on the master branch.

tauoverpi · 2024-08-08T10:10:51Z

Would this block running multiple fuzz tests at the same time given control is now inverted? As it is now (and as I understand it) the fuzzer is able to fully control how it decides to perform it's work thus can divide tasks among multiple processes and run in any order (fuzzing the same test but exploring different input paths or fuzzing multiple tests that are independent of one another, etc).

Since std.testing.fuzzInput is used to discover fuzz tests, would it make more sense to have the user pass setup and teardown to the fuzzer such that it keeps control over how and when? Example of how this could look:

test {
    const Context = struct {
        shared: void = {}
    
        /// Setup all shared resources for the fuzz test
        pub fn setup(_: *@This()) !void {}
        /// Clean up after a run
        pub fn teardown(_: *@This()) !void {}
    };

    const context, const input = std.testing.fuzzInput(Context, .{
      .corpus = &.{ ... },
    });
    
    std.testing.expect(!std.mem.eql(u8, "canyoufindme", input));
}

This way both are run by the Fuzzer when needed without taking control over the fuzzing loop from the test.

(In the above example fuzzInput has the type fn(comptime Context: ?type, FuzzInputOptions) if (Context) |Ctx| struct { Ctx, []const u8 } else []const u8)

gcoakes · 2024-08-08T13:07:38Z

Would this block running multiple fuzz tests at the same time given control is now inverted?

Does move control over the fuzz loop into the test function.
Does NOT regress on concurrent fuzz tests since the existing behavior runs a single test function in a loop.
Does turn that behavior into an explicit aspect of the public API, barring any stack switching shenanigans inside of std.testing.Fuzz.next.

Some pros for your idea:

Invert the relationship of the setup/teardown with respect to std.testing.allocator. The test function could then use std.testing.allocator like normal while the setup/teardown could get a different longer lived allocator.
The entry_addr as identified by std.testing.fuzzInput would not have an additional conditional as would be the case with this PR.

Some cons:

Requires wrapping all cross-input state in a context struct.

I'm of two minds on this question and don't have a strong opinion either direction.

I originally got the idea for this PR from @andrewrk on IRC where he suggested passing a fuzz function. When I sat down to implement that idea, I began to realize an iterator pattern would have nearly the same characteristics but felt more ergonomic.

@tauoverpi, I can put up a second PR implementing your setup/teardown idea tonight, if you don't get to it first. That way we can have something more tangible to compare.

Previously, the `std.testing.fuzzInput` interface was called once for a given test function. This had the downside that the test functions were unable to initialize state independent of the fuzz loop. Now, `std.testing.fuzzInput` has been replaced with `std.testing.fuzzer` which returns a `std.testing.Fuzzer`. To receive fuzz input, the function must loop on the `std.testing.Fuzzer.next` function, retrieving one `std.testing.Fuzzer.Run` at a time. That run object contains a reference to a per-loop allocator and the fuzz input for that loop. Additionally, with the ability to accept multiple fuzz inputs within the test function, the corpus provided to `std.testing.fuzzer` is now used to dry-run the test will all the provided inputs. In a future commit, this may also be used to seed the fuzzer.

gcoakes · 2024-08-10T01:59:34Z

Rebased to the latest master branch. Minimal local testing was done.

andrewrk · 2024-08-10T02:21:14Z

Before:

test "example" {
    const input = std.testing.fuzzInput(.{});
    // do something with input
}

After:

test "example" {
    var fuzzer = std.testing.fuzzer(.{});
    while (fuzzer.next()) |input| {
        // do something with input
    }
}

This is not an improved API.

Furthermore the implementation is more complicated and introduces more branching on the hot path.

I suggest to get master branch working before making a PR...

gcoakes · 2024-08-10T03:16:14Z

@andrewrk, would you mind elaborating on what is the optimal API for a fuzz test which has the ability to retain state across inputs?

mlugg · 2024-08-10T03:18:39Z

What state are you trying to retain across inputs? That sounds like a very bad idea in general. The execution of a single fuzz test iteration should be based solely on the input provided by the fuzzer.

gcoakes · 2024-08-10T03:44:42Z

The immediately apparent ones I can think of would be:

Pre-allocating buffers to eliminate allocation during the fuzz loop
Fuzzing some kind of server with an in-process stream test harness so program counters could be observed. You would have to be careful to make use of a single input in a self-contained way (i.e.: a single loop would POST / and if successfully posted, then also DELETE /:id).

Frankly, I don't personally have a concrete use case that I was trying to solve. Andrew had just mentioned the idea on IRC, and I thought it (a) was reasonably constrained, (b) would be a good way to become familiarized with the code in the fuzzer, (c) and sounded entertaining.

gcoakes force-pushed the fuzzer branch from b30f45f to d91c4da Compare August 10, 2024 01:53

andrewrk closed this Aug 10, 2024

gcoakes deleted the fuzzer branch August 10, 2024 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fuzzing: Move to an iterator pattern for input. #20998

fuzzing: Move to an iterator pattern for input. #20998

gcoakes commented Aug 8, 2024

tauoverpi commented Aug 8, 2024 •

edited

Loading

gcoakes commented Aug 8, 2024

gcoakes commented Aug 10, 2024

andrewrk commented Aug 10, 2024

gcoakes commented Aug 10, 2024

mlugg commented Aug 10, 2024

gcoakes commented Aug 10, 2024

fuzzing: Move to an iterator pattern for input. #20998

fuzzing: Move to an iterator pattern for input. #20998

Conversation

gcoakes commented Aug 8, 2024

tauoverpi commented Aug 8, 2024 • edited Loading

gcoakes commented Aug 8, 2024

gcoakes commented Aug 10, 2024

andrewrk commented Aug 10, 2024

gcoakes commented Aug 10, 2024

mlugg commented Aug 10, 2024

gcoakes commented Aug 10, 2024

tauoverpi commented Aug 8, 2024 •

edited

Loading