Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review of random variate generation articles for a programmer audience #18

Open
peteroupc opened this issue Sep 16, 2022 · 10 comments
Open

Comments

@peteroupc
Copy link
Owner

peteroupc commented Sep 16, 2022

The following pages of mine deal with random variate generation, Bernoulli factories, and exact sampling. They contain numerous algorithms for such sampling.

My audience for these articles is computer programmers with mathematics knowledge, but little or no familiarity with calculus.

I post this issue to seek comments on the following aspects:

  • Are the algorithms in the articles easy to implement? Is each algorithm written so that someone could write code for that algorithm after reading the article?
  • Do the articles have errors that should be corrected?
  • Are there ways to make the articles more useful to the target audience?

Comments on other aspects of these articles are also welcome.

The articles follow:

Letting them know: @lmendo, @PaulSanchez, @maciej-bendkowski

@peteroupc
Copy link
Owner Author

peteroupc commented Jan 18, 2023

Still seeking reviews on my articles on random variate generation.

Again letting them know about my articles in this issue on random variate generation: @lmendo, @PaulSanchez, @maciej-bendkowski, @lcrocker, @dj-on-github .

@Shoeboxam
Copy link

Shoeboxam commented Mar 5, 2023

Transitioning from this conversation.
I was hoping to get some guidance on sampling Gumbel RVs, and now have some questions/observations about specific algorithms on your site (from the linked issue). Thanks!

@peteroupc
Copy link
Owner Author

peteroupc commented Mar 5, 2023

Sampling the probability $\exp(-\exp(-x))$ involves rewriting to $\exp(-\exp(-(m+\lambda)\cdot\mu)$ where $\mu=1$, $m=floor(x)$ and $\lambda=x-floor(x)$. Then build a coin that simulates $\exp(-(m+\lambda)\cdot\mu)$ with those parameters, and use that coin (call it $\nu$) to simulate $\exp(-\nu)$. In the algorithm I give for $\exp(-(m+\lambda)\cdot\mu)$, $\lambda$ and $\mu$ both lie in $[0, 1]$.

The following is a proof of the algorithm $\exp(-(m+\lambda)\cdot\mu)$.

First, suppose $\mu = 1$. Each iteration of the loop in the algorithm returns 0 if a Poisson random variate with mean $t$ is other than 0, where $t$ is $\lambda$ in the last iteration, or 1 otherwise. Since a Poisson random variate is 0 with probability $exp(-t)$, the iteration will terminate the algorithm with probability $1-exp(-t)$ and "succeed" with probability $exp(-t)$. If all the iterations "succeed", the algorithm will return 1, which will happen with probability $exp(-\lambda) \cdot (exp(-1))^m = exp(-(m+\lambda))$. Now suppose $\mu$ is in $[0, 1)$. Then the Poisson variate just mentioned has mean $t\mu$ rather than $t$, so that each iteration succeeds with probability $1-exp(-t\mu)$ and the final algorithm returns 1 with probability $exp(-\lambda\mu) \cdot (exp(-\mu))^m = exp(-(m+\lambda)\mu)$.

@Shoeboxam
Copy link

Ah, it's basically the same as Cannone et al, but with a Poisson sampler instead. I was hesitant to choose those parameters because I was concerned about finding the PSRN of frac(exp(-x)). I did have success recreating the algorithm, but I'm finding the pseudocode for the poisson1 sampler in Duchon/Duvignau to be wrong, it's too concentrated to be Poisson.
Interestingly, your summary of the algorithm here does work.

Duchon poisson1 test
def poisson1():
    """Samples from Poisson(λ = 1)

    https://www.combinatorics.org/ojs/plugins/generic/pdfJsViewer/pdf.js/web/viewer.html?file=https%3A%2F%2Fwww.combinatorics.org%2Fojs%2Findex.php%2Feljc%2Farticle%2Fdownload%2Fv23i4p22%2Fpdf%2F#subsection.5.3
    """
    import random

    n = 1
    g = 0
    k = 1
    while True:
        i = random.randint(0, n + 1)
        if i == n + 1:
            k += 1
        elif i > g:
            k -= 1
            g = n + 1
        else:
            return k
        n += 1

def test_poisson1():
    from collections import Counter
    trials = 5000
    counts = Counter(poisson1() for _ in range(trials))
    from math import exp, factorial

    print("i: actual ideal")
    for i in range(5):
        print(f"{i}: {(counts.get(i) or 0) / trials:.4f} {exp(-1) / factorial(i):.4f}")

test_poisson1()
i: actual ideal
0: 0.2716 0.3679
1: 0.5174 0.3679
2: 0.1632 0.1839
3: 0.0378 0.0613
4: 0.0088 0.0153
exact exp(-exp(-x)) sampler
import random
from math import floor


def bernoulli(p):
    """Sample from Bernoulli(`p`) when `p` in [0, 1]"""
    assert 0 <= p <= 1
    return random.randrange(p.denominator) < p.numerator


def poisson1():
    """Samples from Poisson(λ = 1)
    Algorithm from: https://www.combinatorics.org/ojs/plugins/generic/pdfJsViewer/pdf.js/web/viewer.html?file=https%3A%2F%2Fwww.combinatorics.org%2Fojs%2Findex.php%2Feljc%2Farticle%2Fdownload%2Fv23i4p22%2Fpdf%2F#subsection.5.3
    Implementation from: https://stats.stackexchange.com/a/551576
    """
    n = 1
    g = 0
    k = 1
    while True:
        j = random.randint(0, n)
        if j < n and j < g:
            return k
        if j == n:
            k += 1
        else:
            k -= 1
            g = 1 + n
        n += 1


def bernoulli_exp_neg_01(f):
    """Sample from Bernoulli(p = exp(-E[f()])) for f() in [0, 1]"""
    return not any(f() for _ in range(poisson1()))


def bernoulli_exp_neg(x):
    """Sample from Bernoulli(p = exp(-(m + λ)))
    https://peteroupc.github.io/bernoulli.html#exp_minus__m____lambda_____mu
    """
    m = floor(x)
    if any(poisson1() for _ in range(m)):
        return False
    return bernoulli_exp_neg_01(lambda: bernoulli(x - m))


def bernoulli_exp_neg_exp_neg(x):
    """Sample from Bernoulli(p = exp(-exp(-x)))"""
    return bernoulli_exp_neg_01(lambda: bernoulli_exp_neg(x))

At this point I just need to figure out a Bernoulli factory for the fractional part of the exponential PSRN. Thanks again for all your help!

@peteroupc
Copy link
Owner Author

peteroupc commented Mar 5, 2023

Your implementation of Duchon and Duvignau's algorithm (as given in their paper) is implemented incorrectly. random.randint(a,b) generates a uniform integer in $[a,b]$, while Random(m) in their paper "returns a uniform number in [m]", which the paper defines as the set of integers in $[1, m]$. Thus, Random(n+1) should be random.randint(1, n+1).

@peteroupc
Copy link
Owner Author

@Shoeboxam : Do you have further comments?

By the way you are encouraged to implement yourself any of the algorithms in "Bernoulli Factory Algorithms" and report on your implementation experience; that's pretty much one of the requests in the opening post. The same is true for the other articles in the opening post. Part of my goal of this issue is to correct any errors or unclear points in those articles.

@Shoeboxam
Copy link

Shoeboxam commented Mar 9, 2023

@peteroupc Thanks for checking in. I used inverse transform sampling to address the original reason why I was trying to sample Gumbel variates:
opendp/opendp#653
I find this approach pretty appealing, because it's simple and also gets me samples from distributions with closed-form inverse CDFs to within an arbitrarily small interval.

You've amassed an incredible collection of very cool algorithms and it's taking some time to take it in. The PSRN Tulap sampler, for example, is neat.

Feedback
It is easy to get lost in an algorithm description if it isn't accompanied with a high-level intuition. For my uses, I can write my own proofs, so long as I understand the gist of how it works. Something as simple as pointing out when an intermediate variable is distributed in a certain way is really helpful, or just outlining the overall strategy. Granted, many of your existing algorithms already do this, or have links to papers. I expect I'll be able to see through them quicker with a bit more experience, though.

It is also easy to get lost in the site. I found it surprising there are more Bernoulli factories and PSRNs in "More Algorithms", when you already have pages for them. In my experience so far, the best way to not get lost on the site is to just read the entire site! ;)

Nits:

  • in "Misc Observations" > "Weighted Choice with Log Probabilities" > "Weighted Choice with Gumbel", did you mean to overwrite q_i with $exp(q_i/\lambda)$ instead?
  • an extra period in this url: [^109]: In the privacy context, see, for example, Awan, J. and Rao, V., 2021. "Privacy-Aware Rejection Sampling", arXiv:2108.00965.

@peteroupc
Copy link
Owner Author

I appreciate your comments. In response I have addressed your "nits" and rearranged the content. In particular, the "More Arbitrary-Precision Samplers" page is now obsolete and its content was moved to other pages in the site.

@peteroupc
Copy link
Owner Author

@Shoeboxam : For your information, issue #17 lists open questions I have on the articles in this repository, including the pages listed in this issue. One question I've just added that may interest you is: Find additional algorithms to sample continuous distributions with PSRNs that meet the properties desired for such algorithms. Other open questions relate to Bernoulli factories.

@peteroupc
Copy link
Owner Author

peteroupc commented Jun 2, 2023

@Shoeboxam : For your information, the following paper on a security consideration when sampling for information security (and differential privacy) purposes came to my attention. I have updated the security considerations on "Randomization and Sampling Methods" to take it into account.

Ben Dov, Y., David, L., et al., "Resistance to Timing Attacks for Sampling and Privacy Preserving Schemes¨, FORC 2023.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants