Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baillie-PSW vs. Deterministic Miller-Rabin #2

Open
chiphogg opened this issue Nov 16, 2024 · 3 comments
Open

Baillie-PSW vs. Deterministic Miller-Rabin #2

chiphogg opened this issue Nov 16, 2024 · 3 comments

Comments

@chiphogg
Copy link

Hello,

I'm a neophyte to fast integer factoring and primality testing, but I've recently been reading up on the various techniques.

I can see that this library uses deterministic Miller-Rabin. The README links to this Wikipedia page section, which currently states:

The Miller test is not used in practice. For most purposes, proper use of the probabilistic Miller–Rabin test or the Baillie–PSW primality test gives sufficient confidence while running much faster. It is also slower in practice than commonly used proof methods such as Adleman–Pomerance–Rumely primality test and Elliptic curve primality which give results that do not rely on unproven assumptions. For theoretical purposes requiring a deterministic polynomial time algorithm, it was superseded by the AKS primality test, which also does not rely on unproven assumptions.

I've recently implemented the Baillie-PSW myself, including the strong version of the Lucas test. Nicely's Baillie-PSW page states that the cost is roughly equivalent to 3 to 7 iterations of the Miller-Rabin test.

However, from cursory browsing of your project's website, it's clear that you've put great effort into optimizing your particular deterministic Milller-Rabin implementation. For example: you're doing something with hashing to choose the bases, but I don't really understand the idea behind this step. Nevertheless, I'd put more stock in your implementation than this paragraph from Wikipedia, even though you link directly to it in your README.

Could you please comment on how you see the performance of the best deterministic Miller-Rabin implementations relative to Baillie-PSW?

And, could you give a brief summary (or provide a link to a tutorial) on the idea behind the base hashing?

@JASory
Copy link

JASory commented Jan 3, 2025

The idea behind hashing is to take the relatively few composites that pass a strong fermat test to some base, partition them into sets and then search for another base that eliminates all of them. The hashing function is selected to distribute them roughly evenly with minimal computation. (In my software I use a single multiplication and shift).

The best fermat hashtables run in about 2x the time of a single fermat test, or even slightly less since a montgomery transform of 2 can be performed faster than any other base.

From a cursory reading of this library it doesn't seem to be using an optimal algorithm (many intervals call 3 bases, despite the fact that a lucas_v test is already faster ), although the way the arithmetic is performed may be able to compensate for it.

If you are familiar with Rust, you can read the source code of f-analysis' to_hashtable function, for a "straightforward" implementation. See also the more complex corrector_set function which can "correct" a hashtable to work with only a single base over a small interval. It does this by running over the residue classes of the multiplicative hash instead of just the known pseudoprimes to a select base. (This was used to enable machine-prime to use only 1 fermat base for n < 2^47).

Note: I have no affiliation with this project and have just discovered it, so I can't give any information about it.

@chiphogg
Copy link
Author

chiphogg commented Jan 9, 2025

Thanks for chiming in!

I think the thing I had a hard time understanding was what exactly was being hashed. I found this website which linked to this paper, and now I finally understand:

  • You hash $n$, the number being tested.
  • The value in the hash table is a single base, which gives correct results for 100% of the values in that bucket.

There's clearly a lot more to it than that (doing a little trial division, designing the hash function, finding all the bases, etc.), but at least I finally understand the basic idea.

@JASory
Copy link

JASory commented Jan 19, 2025

Trial division produces very little benefit, since the majority of pseudoprimes are semiprimes with relatively large factors, and the ones with small factors will be eliminated by the majority of bases. The primary issue to look out for is ensuring that primes aren't mapped to a bucket that contains a base that has the prime as a factor. (Because the fermat test requires that gcd(a,p) = 1 and clearly gcd(b*p,p) != 1 . The FJ64_16K table that the authors provide contains such an error (691 if I remember correctly).

Another approach is to directly use the hashed value as the base rather than the index, but of course this is only practical in a triple-base test, and doesn't seem to be any better than 2,15 or 2,9375 followed by a hashtable (which can both be implemented with half the memory of a triple-base test that F&J provide). And triple-base tests are worse than a base-2 fermat test followed by a Lucas-V test in both memory and time complexity, so that avenue is basically a dead-end anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants