Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzzing: Move to an iterator pattern for input. #20998

Closed
wants to merge 1 commit into from

Conversation

gcoakes
Copy link
Contributor

@gcoakes gcoakes commented Aug 8, 2024

Previously, the std.testing.fuzzInput interface was called once for a given test function. This had the downside that the test functions were unable to initialize state independent of the fuzz loop.

Now, std.testing.fuzzInput has been replaced with std.testing.fuzzer which returns a std.testing.Fuzzer. To receive fuzz input, the function must loop on the std.testing.Fuzzer.next function, retrieving one std.testing.Run at a time. That run object contains a reference to a per-loop allocator and the fuzz input for that loop.

Additionally, with the ability to accept multiple fuzz inputs within the test function, the corpus provided to std.testing.fuzzer is now used to dry-run the test will all the provided inputs. In a future commit, this may also be used to seed the fuzzer.

Some notes on testing:

  1. The newly added web interface is unable to load for me with many error messages emitted in the console. This issue exists on the master branch independent of my changes.
  2. Many aarch64 related tests are failing on my machine. I suspect this is a local issue not related to my changes. This issue also exists on the master branch.

@tauoverpi
Copy link
Contributor

tauoverpi commented Aug 8, 2024

Would this block running multiple fuzz tests at the same time given control is now inverted? As it is now (and as I understand it) the fuzzer is able to fully control how it decides to perform it's work thus can divide tasks among multiple processes and run in any order (fuzzing the same test but exploring different input paths or fuzzing multiple tests that are independent of one another, etc).

Since std.testing.fuzzInput is used to discover fuzz tests, would it make more sense to have the user pass setup and teardown to the fuzzer such that it keeps control over how and when? Example of how this could look:

test {
    const Context = struct {
        shared: void = {}
    
        /// Setup all shared resources for the fuzz test
        pub fn setup(_: *@This()) !void {}
        /// Clean up after a run
        pub fn teardown(_: *@This()) !void {}
    };

    const context, const input = std.testing.fuzzInput(Context, .{
      .corpus = &.{ ... },
    });
    
    std.testing.expect(!std.mem.eql(u8, "canyoufindme", input));
}

This way both are run by the Fuzzer when needed without taking control over the fuzzing loop from the test.

(In the above example fuzzInput has the type fn(comptime Context: ?type, FuzzInputOptions) if (Context) |Ctx| struct { Ctx, []const u8 } else []const u8)

@gcoakes
Copy link
Contributor Author

gcoakes commented Aug 8, 2024

Would this block running multiple fuzz tests at the same time given control is now inverted?

  • Does move control over the fuzz loop into the test function.
  • Does NOT regress on concurrent fuzz tests since the existing behavior runs a single test function in a loop.
  • Does turn that behavior into an explicit aspect of the public API, barring any stack switching shenanigans inside of std.testing.Fuzz.next.

Some pros for your idea:

  • Invert the relationship of the setup/teardown with respect to std.testing.allocator. The test function could then use std.testing.allocator like normal while the setup/teardown could get a different longer lived allocator.
  • The entry_addr as identified by std.testing.fuzzInput would not have an additional conditional as would be the case with this PR.

Some cons:

  • Requires wrapping all cross-input state in a context struct.

I'm of two minds on this question and don't have a strong opinion either direction.

I originally got the idea for this PR from @andrewrk on IRC where he suggested passing a fuzz function. When I sat down to implement that idea, I began to realize an iterator pattern would have nearly the same characteristics but felt more ergonomic.

@tauoverpi, I can put up a second PR implementing your setup/teardown idea tonight, if you don't get to it first. That way we can have something more tangible to compare.

Previously, the `std.testing.fuzzInput` interface was called once for a
given test function. This had the downside that the test functions were
unable to initialize state independent of the fuzz loop.

Now, `std.testing.fuzzInput` has been replaced with `std.testing.fuzzer`
which returns a `std.testing.Fuzzer`. To receive fuzz input, the
function must loop on the `std.testing.Fuzzer.next` function, retrieving
one `std.testing.Fuzzer.Run` at a time. That run object contains a
reference to a per-loop allocator and the fuzz input for that loop.

Additionally, with the ability to accept multiple fuzz inputs within the
test function, the corpus provided to `std.testing.fuzzer` is now used
to dry-run the test will all the provided inputs. In a future commit,
this may also be used to seed the fuzzer.
@gcoakes
Copy link
Contributor Author

gcoakes commented Aug 10, 2024

Rebased to the latest master branch. Minimal local testing was done.

@andrewrk
Copy link
Member

Before:

test "example" {
    const input = std.testing.fuzzInput(.{});
    // do something with input
}

After:

test "example" {
    var fuzzer = std.testing.fuzzer(.{});
    while (fuzzer.next()) |input| {
        // do something with input
    }
}

This is not an improved API.

Furthermore the implementation is more complicated and introduces more branching on the hot path.

I suggest to get master branch working before making a PR...

@andrewrk andrewrk closed this Aug 10, 2024
@gcoakes
Copy link
Contributor Author

gcoakes commented Aug 10, 2024

@andrewrk, would you mind elaborating on what is the optimal API for a fuzz test which has the ability to retain state across inputs?

@mlugg
Copy link
Member

mlugg commented Aug 10, 2024

What state are you trying to retain across inputs? That sounds like a very bad idea in general. The execution of a single fuzz test iteration should be based solely on the input provided by the fuzzer.

@gcoakes
Copy link
Contributor Author

gcoakes commented Aug 10, 2024

The immediately apparent ones I can think of would be:

  • Pre-allocating buffers to eliminate allocation during the fuzz loop
  • Fuzzing some kind of server with an in-process stream test harness so program counters could be observed. You would have to be careful to make use of a single input in a self-contained way (i.e.: a single loop would POST / and if successfully posted, then also DELETE /:id).

Frankly, I don't personally have a concrete use case that I was trying to solve. Andrew had just mentioned the idea on IRC, and I thought it (a) was reasonably constrained, (b) would be a good way to become familiarized with the code in the fuzzer, (c) and sounded entertaining.

@gcoakes gcoakes deleted the fuzzer branch August 10, 2024 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants