Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrays of streams aren't very random #5054

Open
skybrian opened this issue Jun 6, 2024 · 6 comments
Open

arrays of streams aren't very random #5054

skybrian opened this issue Jun 6, 2024 · 6 comments

Comments

@skybrian
Copy link

skybrian commented Jun 6, 2024

🐛 Bug Report

There are a lot of similar numbers between streams generated within the same arbitrary.

(Note: I'm not actually using streams; I was just curious about how they worked.)

To Reproduce

import fc from 'fast-check';

const stream = fc.array(fc.infiniteStream(fc.integer({min: 0, max: 9})), {minLength: 10, maxLength: 10});

fc.sample(stream, {numRuns: 1, seed: 42})[0].forEach(s => {
  const items = [];
  for (let i = 0; i < 10; i++) {
    items.push(s.next().value);
  }
  console.log(items.join(', '));
});

Output:

% deno run main.ts
2, 2, 6, 3, 1, 6, 3, 6, 3, 1
8, 2, 6, 3, 1, 6, 3, 6, 3, 1
8, 4, 3, 4, 0, 7, 2, 5, 3, 9
2, 6, 3, 1, 6, 3, 6, 3, 1, 6
4, 6, 3, 1, 6, 3, 6, 3, 1, 6
4, 0, 7, 2, 5, 3, 9, 1, 8, 2
6, 3, 1, 6, 3, 6, 3, 1, 6, 9
2, 3, 1, 6, 3, 6, 3, 1, 6, 9
2, 5, 3, 9, 1, 8, 2, 2, 1, 3
3, 1, 6, 3, 6, 3, 1, 6, 9, 2

Notice how the the first two streams are identical except for the first number. Also, the sequence '6 3 1' happens a lot more than expected.

Expected behavior

I expected it to be more random.

Your environment

I'm using deno, but I do get the same result using a main.mjs file with node:

Packages / Softwares Version(s)
fast-check 3.19.0
node 18.16.1
@skybrian
Copy link
Author

skybrian commented Jun 6, 2024

Looking at the code, it generates one random number for appliedBiasFactor and then clones the generator. So it seems like each stream will get a random number generator that's only one call to nextInt() apart.

I tried it with noBias and the streams are identical:

% deno run main.ts 
0, 4, 9, 8, 4, 3, 4, 0, 7, 2
0, 4, 9, 8, 4, 3, 4, 0, 7, 2
0, 4, 9, 8, 4, 3, 4, 0, 7, 2
0, 4, 9, 8, 4, 3, 4, 0, 7, 2
0, 4, 9, 8, 4, 3, 4, 0, 7, 2
0, 4, 9, 8, 4, 3, 4, 0, 7, 2
0, 4, 9, 8, 4, 3, 4, 0, 7, 2
0, 4, 9, 8, 4, 3, 4, 0, 7, 2
0, 4, 9, 8, 4, 3, 4, 0, 7, 2
0, 4, 9, 8, 4, 3, 4, 0, 7, 2

@dubzzz
Copy link
Owner

dubzzz commented Jun 8, 2024

Yes, indeed it reminds me of an old bug I had years ago but cross runs. I have an idea to fix it (the generator of stream should force the arbitrary to jump when created so that sequences get clearly different). I'll work on it in the coming days.

Thanks a lot for the report. By the way I'll need to check some other arbitraries as it might not be the only one having the problem.

@skybrian
Copy link
Author

skybrian commented Jun 8, 2024

I’m thinking it might be best to jump the parent random number generator before returning, to allocate some random numbers for the child. But for an infinite stream, how many should it allocate? No matter what we give it, it could run out of random numbers. Then what?

Maybe it could throw an exception, treating it like an infinite loop. But on the other hand, correlated random numbers aren’t really a hard failure, since they could in theory happen by coincidence, though it’s very unlikely. And who knows, it might catch a bug. So, better to let the test keep running and let correlations happen in large streams? Maybe there’s a point where the test run should be killed, though.

This suggests that it might be okay to jump the parent RNG ahead a few thousand numbers and call it a day.

Regarding where this bug can occur: it seems like this is the sort of thing that Rust’s borrow checker would prevent? We can think of an Arbitrary’s generate() method as borrowing its parent’s RNG, which is mutable. Any random numbers it generates before returning are safe, but if it allows the RNG to escape via an object or closure, any random numbers generated later could overlap.

@skybrian
Copy link
Author

skybrian commented Jun 9, 2024

A more principled solution would handle the case of a stream of streams. It looks like a random number generator that has a split operation is the way to go.

I haven’t tried it, but I see that there is an npm for the SplitMix algorithm, which is the one used in Java’s SplittableRandom class. (The algorithm seems generally well-regarded, but the npm is pretty obscure.)

https://www.npmjs.com/package/splitmix

@skybrian
Copy link
Author

skybrian commented Jun 9, 2024

LXM is a newer RNG that also does splitting. It was included in Java 17. The authors of the paper are Guy Steele (who contributed to SplitMix) and Sabastiano Vigna (who published xoroshiro128+).

It combines a xoroshiro generator with an LCG generator, with an extra parameter. They initialize the child generator using random numbers from the parent. The extra bits for the parameter are a way to make collisions less likely.

@skybrian
Copy link
Author

There's a different way to think about this problem that seems pretty elegant and might be a lot easier, since it doesn't require a splittable random number generator:

The iterator doesn't need to access the random number generator at all. Instead, the stream is built out of an arbitrary array of minimum length 1. The iterator repeats the array as many times as necessary. So the "smallest" infinite stream would repeat a single element. For fc.infiniteStream(fc.nat()), that would be a stream of all zeroes.

A test that doesn't diverge will only read a finite amount of data from the stream, so in theory, this approach can still generate any possible test case, provided that a long enough array gets generated. It can't generate all the digits of PI, but it could generate enough digits to cause a failure. The repetition is there just to satisfy the API.

My guess is that shrinking might work pretty well with this scheme? A nice thing about it is that small infinite streams can be printed out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants