Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor canonicalization scheme out of
SddBuilder
into trait; crea…
…te generic `SddBuilder` (#56) The main goal of this *very hefty* PR is to allow two **canonicalization schemes**: - one based on the canonicity property, which requires compression to perform pointer equality - one based on "semantic hashing", where we instead uniquify an Sdd based on a weighted model count with some randomly-generated weights After talking to Steven, we chose to use a mixin approach - the `SddBuilder` becomes generic, taking in a "canonicalization scheme" trait. Each implementer of the trait needs to be able to do any operation related to checking if two SDDs are equal - this includes SDD equality, but also a host of related backend work, including: - managing a `BackedRobinhoodTable` for BinarySDDs and SDDs - implementing a hasher for the above table, as well as utility get and insert functions - cache the apply operation To properly implement the above, I also had to apply the same genericizing approach to: - the apply cache, which now conforms to an `ApplyCacheMethod` - the hasher for the BackedRobinhoodTable, which now allows an arbitrary hash function rather than `FxHasher` - which becomes the semantic hash when necessary. To avoid polluting this struct, I instead require a hash function to be passed to get/insertion. I pulled all of these functions out of the `SddBuilder`; the defaults move to the `CompressionCanonicalizer`, which should behave **exactly the same** as the previous `SddBuilder`. I've soft-verified this with tests. Since I'm making a complex struct generic, this PR also touches many files - mostly replacing instances of a naked `SddBuilder` with `SddBuilder<CompressionCanonicalizer>`. I've localized "new implementation code" to `canonicalize.rs` and the files for the apply cache and bump table. I've also manually added some WMC code for `SddAnd`, `SddOr`, and `BinarySdd` that I'd like to remove - see below. I've also temporarily added a `sub` operator to the finite field. I'm merging this PR in because: 1. I'm confident that this doesn't change the default behaviour, and 2. I'm worried about branch divergence ## next steps There are a handful of things that still should be done. - undoing my temporary hacks: - ideally, I should deduplicate the semantic hashing work from the hard-coded implementations for `SddAnd`, `SddOr`, and `BinarySdd`. I'd like to somehow make this a WMC, but I'm not sure how to do that without recreating a new `SddPtr` - if we plan on keeping the finite field as a semiring, we should remove my `-` before it gets into the rest of the codebase - running larger benchmarks on number of nodes, correctness metrics - I've ... bricked my instance I think? Or, I can't log in. But, the plan was to: - try a handful of different primes - likely within a few orders of magnitude - try different CNFs of varying complexity - measure: - overall correctness - and how it scales with primes - number of nodes between uncompressed, compression-compressed, and semantic-compression; I may have to exclude the uncompressed versions for large CNFs - optimizing references over cloning - right now, there's a *ton* of cloning that goes on since I was more concerned about correctness than efficiency. It's mostly localized to creating the `Canonicalizer`, so it can be amortized across SDD operations; but, I think it warrants a serious look - in particular, most of the Canonicalizer's operations aren't mutable anyways, so with some shuffling around I should be able to make most of them immutable read references anyways! ## testing ### unit tests, quickcheck I've written a few more unit tests and quickcheck tests to verify that this works for small cases. Even with a relatively large quickcheck sample size, the semantic canonicalizer still maintains its uniqueness benefits most of the time, which is great! (I planned on doing some more testing on the server, but I'm now curiously locked out of my account after running a large test - concerning) Some useful quickcheck tests to run: ``` $ QUICKCHECK_TESTS=20000 cargo test prob_equiv_sdd_eq_vs_prob_eq $ QUICKCHECK_TESTS=20000 cargo test prob_equiv_sdd_inequality $ QUICKCHECK_TESTS=20000 cargo test qc_sdd_canonicity ``` ### script I've rewritten the compare-canonicalize script to better test this difference. Here's an example of usage: ```sh $ cargo run --bin semantic_hash_experiment -- --file cnf/rand-3-25-75-1.cnf ``` Interestingly, I haven't been able to get substantial differences in the number of nodes for small CNFs. I think this is a bug in my implementation or my benchmark. However, given that the default canonicalizer still works as intended, I'm thinking that we merge this in now before it diverges more from `main`.
- Loading branch information