Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normally-distributed random numbers #101

Open
NevinBR opened this issue Jan 30, 2020 · 7 comments
Open

Normally-distributed random numbers #101

NevinBR opened this issue Jan 30, 2020 · 7 comments
Labels
new module Proposals for new modules

Comments

@NevinBR
Copy link
Contributor

NevinBR commented Jan 30, 2020

It is common to need a random sample from either a real or complex normal distribution.

The standard library provides static random(in:) methods which sample uniformly, in a constrained extension of BinaryFloatingPoint. (The constraint is RawSignificand: FixedWidthInteger.)

I propose that we add similar functionality for sampling from a normal distribution on Real and Complex with the corresponding constraints.

@stephentyrone
Copy link
Member

stephentyrone commented Jan 31, 2020

Reasonable. I'm on vacation until next week, and probably won't get a chance to write a more detailed response until I return, but a couple quick things to consider:

  • I plan to provide a random module in SN, including both sampling from the usual distributions and some additional generators beyond what the stdlib has (especially one or more fast seeded generators; I have a branch somewhere with a quick draft of a PCG implementation, for example). Obviously this would be an easy first thing for such a module.

  • There are some interesting questions around tradeoffs in quality vs speed for these transforms, and also how to design an API that allows improving the transform in ways that may produce better results (i.e. users should be able to explicitly opt-into stable-forever behavior, if they need to get the exact same results for all time, even if we have a better algorithm or fix a bug in the future). There's no really great precedent for what this should look like yet, AFAIK (though some fairly obvious options).

  • There are a few workloads that fundamentally depend on correctly-rounded samples of distributions--e.g. if the noise added in differential privacy is not a correctly-rounded sample of the appropriate distribution, the entire scheme collapses. So we may want to think about what an API that supports that sort of guarantee would look like (you wouldn't want it to be the default behavior, because it's prohibitively slow for many other uses).

@karwa
Copy link

karwa commented Feb 1, 2020

Yes please :)

Here's TensorFlow's version, for reference.

@NevinBR
Copy link
Contributor Author

NevinBR commented Feb 1, 2020

I wonder—is sincos optimized enough to make it worthwhile to generate pairs of values from a normal distribution? And how about, um, sincos(piTimes:)?

extension Real {
  static func boxMullerTransform(_ x: Self, _ y: Self) -> (Self, Self) {
    let r = sqrt(-2 * log(x))
    return (r * sin(piTimes: 2*y), r * cos(piTimes: 2*y))
  }
}

If it is well-optimized, then for use-cases that require multiple values a sequence can be built two at a time. And when only one number is needed, a convenience method can provide it.

extension Real where Self: BinaryFloatingPoint, RawSignificand: FixedWidthInteger {
  static func randomNormalPair() -> (Self, Self) {
    return boxMullerTransform(random(in: 0...1), random(in: 0...1))
  }
  
  static func randomNormal(mean: Self = 0, standardDeviation sigma: Self = 1) -> Self {
    return randomNormalPair().0 * sigma + mean
  }
}

extension Complex where RealType: BinaryFloatingPoint, RealType.RawSignificand: FixedWidthInteger {
  static func randomNormal(mean: Self = 0, standardDeviation sigma: RealType = 1) -> Self {
    let (a, b) = RealType.randomNormalPair()
    return Self(a * sigma, b * sigma) + mean
  }
}

@stephentyrone
Copy link
Member

I wonder—is sincos optimized enough to make it worthwhile to generate pairs of values from a normal distribution? And how about, um, sincos(piTimes:).

No need; when the host system libm provides __sincospi (Darwin), LLVM will automatically fuse separate calls to __sinpi and __cospi (Evan Cheng added this optimization way back when I put those functions in the Darwin libm). When we're not using a host system libm function, the optimizer can see both functions and do outlining+CSE to eliminate the redundant operations.

@NevinBR
Copy link
Contributor Author

NevinBR commented Feb 1, 2020

My question was not whether we need to write sincos.

My question was whether we should design the API to include a function which generates pairs of normally-distributed values.

@stephentyrone
Copy link
Member

stephentyrone commented Feb 1, 2020

Oh, well, that doesn't require anything of sincos; you still get to benefit from doing half as many sqrt and log operations (sqrt is fast on some recent hardware, but log is about as costly as sin or cos is), so that seems like a no-brainer.

@peteroupc
Copy link

peteroupc commented Feb 2, 2020

There are a few workloads that fundamentally depend on correctly-rounded samples of distributions--e.g. if the noise added in differential privacy is not a correctly-rounded sample of the appropriate distribution, the entire scheme collapses. So we may want to think about what an API that supports that sort of guarantee would look like (you wouldn't want it to be the default behavior, because it's prohibitively slow for many other uses).

Note that the paper you cite introduces an attack that applies only to floating-point algorithms (more-specifically, floating-point implementations of the Laplacian distribution); as the paper says: "Fixed-point or integer-valued algorithms are immune to our attack". In fact, as I write in "Randomization with Real Numbers", "random non-integer numbers are rarely if ever seen in serious information security applications", and I wasn't much aware of the use of floating-point numbers in differential privacy until today.

I can't think of a security application where random integers (or at most random fixed-point numbers) could not have been used instead of random floating-point numbers.

Also, as for generating normally-distributed random numbers, the Box-Muller transform is only one possibility, and is not the only one that generates pairs of numbers at a time (another is the polar method). Other possibilities include ratio of uniforms, CDF inversion, Karney's algorithm, and ziggurat, which don't necessarily generate pairs of random numbers.

@stephentyrone stephentyrone added the new module Proposals for new modules label Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new module Proposals for new modules
Projects
None yet
Development

No branches or pull requests

4 participants