Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Practical Use Case Example #7

Open
GregoryMorse opened this issue Aug 1, 2020 · 3 comments
Open

Practical Use Case Example #7

GregoryMorse opened this issue Aug 1, 2020 · 3 comments

Comments

@GregoryMorse
Copy link

Hello and thanks again for the great project. I read the code, read your thesis and am intrigued. I have also seen some other modern research in this area which basically says its not practical yet.

I have a perfect practical example. We often know libraries like hashcat have options for brute forcing passwords that use SHA1 hash verification.

You have SHA1(salt.password) == hash. This is of course a full 80 round SHA1.

So the question is given that for example a known length password (maybe length of 12) that is say UTF-16-LE encoded, is there any chance that SAT solving will be superior to a brute force library which can process of combinations billions per second given GPU vectorization?

This means knowing 512-12*7=428 known bits (7 unknown bits per byte since the high bit is in the unprintable range), and 84-bits to determine as well as knowing the hash, is this going to be possible? Its a great practical situation. Also knowing the characters are alphanumeric gives 26+26+10=62 combinations and adding CNF clauses for the 128-62 invalid character combinations is also easy to do.

Trying the SAT solver it easily can find 7 bits in under a second. It can find 14 bits in a few seconds. 21 bits takes a few minutes. 28 bits takes hours, etc. Is this really the best we can do with preimage attacks on SHA1 with SAT solvers? Because it makes hashcat look like the faster approach still.

Even if it could detect bad password prefixes, it would make for a fast process perhaps by looking at MCS or MUS. Or if there were some way to get bit equivalence correlations among the remaining 84 bits based on that Boolean equations, that would be a power of 2 reduction per equivalence found. I've even thought about writing 160 clauses/equations with 512 variables with the exponential blow-up for CNF to see if that might yield anything. But so far I have yet to make a breakthrough.

Obviously 62^12~=2^71 so brute force will not cut it either. This seems like the perfect scenario for a SAT solver to shine through and have a practical advantage over other techniques. Unfortunately even estimating the time that will be needed is difficult with Python SAT solvers at least.

Anyway, I leave it at this - is this a practical example where a SAT solver will have an advantage or is 80-round SHA1 simply too hard even for a mere 84 bits which even have some constraints on them.

@vegard
Copy link
Owner

vegard commented Aug 2, 2020

Hi, thanks for the interest :-)

Unfortunately I don't think there are any examples yet where SAT solvers have an advantage.

For me, this program (and my thesis) was a personal quest to understand exactly what makes algorithms like SHA-1 so difficult to break. In my mind, finding a preimage should intuitively be possible using symbolic manipulation, almost like solving an algebra equation. I've found that there is very little information out there offering true and useful insights about this and I believe the answer lies somewhere deep in complexity theory and information theory.

My conclusions are:

  • SAT itself is probably NP, meaning we will never have a polynomial-time algorithm to solve SAT problems in general
  • many hard SAT problems may nevertheless be solved quickly by SAT solvers in practice (whether due to hidden structures that we can take advantage of or because of clever implementations like 2-watched literals)
  • we have not yet found optimal algorithms, heuristics, representations, and encodings for SAT
  • combinations of SAT and brute force ("forward" calculation) may be faster than either approach on their own; the SHAttered attack did this and in fact used my program as part of their attack
  • a rough explanation of what makes SHA-1 difficult to work out backwards (i.e. starting from the hash and finding an input for it) is that you have two lengthy calculations which by themselves are easy to solve, the problem is that you have to commit to certain choices which mean that those two calculations won't match up at the end

@GregoryMorse
Copy link
Author

GregoryMorse commented Aug 4, 2020

Hi vegard, thanks for the response.

I was thinking if I could write 160 equations with only 84 variables in that case though I realize the exponential blow up in CNF would be an issue. Does anyone know for SHA1 if you try to write 160 equations is this going to lead to 2^512 clauses or how many precisely? Of course this does away with all the variable introductions for XOR and ADD operations. But perhaps you could find certain bit prefixes which can never work if you had the equations in this form. Even DNF form (which I think also might have the same 2^512 issue) would be useful here. The SAT solver does too much work which might be avoided, if you could for example start concluding quickly that certain bit combinations are impossible. Could try all combinations of 2-bits, then 3-bits, etc. If you find any bits that have to be different or the same or what have you, immediately the search space starts reducing dramatically. But the problem of space blow-up seems to make this type of analysis impractical. It would be huge and even extremely profitable if there were a way to do this.

Another idea is that if you have SALT1+password+padding=hash1 and SALT2+password+padding=hash2, is that maybe you could combine the two SAT systems into one which could enable much faster solving? I have exactly this example. And the salts are only 32-bits. But the password and padding is identical.

This is a perfect practical example. Brute force does not benefit at all from having two separate salt/hash pairs with the same password. In fact it would be slower to have to try them both. But the SAT solver should be able to have a speed increase here. This type of scenario makes for interesting research I feel.

Furthermore I did some calculations, and technically if I only need to find 84-bits of preimage with some restrictions on those bits, then only using 22-bits of hash output are more than likely enough to find it without a collision.

@vegard
Copy link
Owner

vegard commented Aug 5, 2020

I encourage you to take a small CNF (or boolean circuit) and try to convert it to DNF or ANF by hand, it's a very enlightening experience :-)

About adding constraints to simplify the instance, that's one of the really counterintuitive things I found in my project/thesis. The problem is that when you add constraints to the problem you're also removing actual solutions that the solver might have otherwise found. So you're often making it harder for the solver unless you know with 100% certainty that the constraint you're adding is really just making an implied constraint explicit. (And even if it's a true constraint that doesn't exclude any solutions, it might still just slow the solver down.)

Another thing about combinations of bits in hashing algorithms is that very often there are no such combinations of 2-bits or 3-bits which are truly impossible. Most random selections of a small number of bits/variables will be able to have any combination of values in practice.

One possible way to make forwards that I didn't have the time to explore fully is to look at the clauses generated by conflict analysis, especially those that are shorter than the inputs to the conflict analysis algorithm. These clauses have a special status, since they are exactly those clauses which express a "true" constraint on the instance, but which were not explicit in the original instance. I think by connecting the dots to the higher-level description of SHA-1 you can maybe discover some interesting things that could be used to simplify the instance before solving even starts. Basically use conflict analysis to discover hidden/non-obvious relationships between the bits/variables in the instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants