vegard / sha1-sat Goto Github PK

View Code? Open in Web Editor NEW

46.0 5.0 14.0 36 KB

SAT instance generator for SHA-1

License: GNU General Public License v3.0

C++ 94.67% Shell 0.49% Perl 4.84%

sha1-sat's Introduction

sha1-sat -- SAT instance generator for SHA-1

Compiling

To compile, you first need to make sure you have the Boost libraries installed. When you do, simply run make.sh:

bash make.sh

Running

To generate a CNF instance encoding a preimage attack on the full SHA-1 algorithm, run:

./main --cnf --rounds=80 --hash-bits=160 > instance.cnf

To look at the possible options, run:

./main --help

The program can also generate OPB instances (pseudo-boolean constraints) if you specify --opb instead of --cnf.

Verifying solutions

To verify that the solution output by the solver is actually correct, run:

perl verify-preimage instance.cnf solution | ./verify-preimage

Here, 'solution' is the file output e.g. by minisat or the 'v'-line for other popular solvers like CryptoMiniSAT or PrecoSAT. The program returns an error code of 0 if and only if the solution is correct.

Using espresso

Part of the encoding used by this program is generated using the logic minimiser espresso. In particular, it is used to minimise the truth tables for half-adders.

You used to be able to obtain espresso from ftp://ftp.cs.man.ac.uk/pub/amulet/balsa/other-software/espresso-ab-1.0.tar.gz, but it is no longer available there. As a consequence, I have decided to move the dependency on espresso "out of line" and provide the precomputed output tables (see the data/ subdirectory).

If you wish to regenerate these tables (or if you would like to encode them for adders with different number of operands), this is how you would use espresso:

wget ftp://ftp.cs.man.ac.uk/pub/amulet/balsa/other-software/espresso-ab-1.0.tar.gz
tar xzvf espresso-ab-1.0.tar.gz
(cd espresso-ab-1.0 && ./configure && make)

g++ -std=c++ -o mkhalfadder mkhalfadder.cc
./mkhalfadder 2 2 | espresso-ab-1.0/src/espresso > data/halfadder-2-2.out.txt

About

I developed this program as part of my master thesis. You can find it here: https://www.duo.uio.no/handle/10852/34912.

The program creates the smallest and easiest-to-solve instances that I know about; in particular, it uses a novel encoding of adders which is described in more detail in my short benchmark description "Instance generator for encoding preimage, second-preimage, and collision attacks on SHA-1", which you can find on page 119 of the SAT Competition 2013 proceedings: https://tuhat.helsinki.fi/portal/files/27982690/sc2013_proceedings.pdf.

Please e-mail comments, suggestions, etc. to me at [email protected].

If you use the program in your research, please make a note of this in your acknowledgements and let me know about your paper/thesis/etc.!

sha1-sat's People

Stargazers

Watchers

Forkers

msoos rhuizer babooppa6 issxyq rzs840707 hao-zi peterg75 darioconca stephangocht finsenty54 olegzaikin

sha1-sat's Issues

Wrong number of clauses in CNF header

The command
./main --cnf --rounds=80 --hash-bits=160 > instance.cnf
builds a CNF with the header
p cnf 13408 478924
However, in fact the CNF contains 478636 clauses.
That is why some solvers, e.g. kissat, throw errors when parsing
the CNF. Is it the case that some clauses are not written or just
the header is wrong?

GIves runtime errors but runs without espresso

$ ./sha1-gen --cnf > /dev/null
terminate called after throwing an instance of 'std::runtime_error'
what(): execve() failed
terminate called after throwing an instance of 'std::runtime_error'
what(): execve() failed
terminate called after throwing an instance of 'std::runtime_error'
what(): execve() failed
terminate called after throwing an instance of 'std::runtime_error'
what(): execve() failed
terminate called after throwing an instance of 'std::runtime_error'
what(): execve() failed

Is this OK? It gives a CNF, but I am not sure it's the best CNF? It's a bit confusing :(

Windows translation

Thanks for this nice project. I got it working on Windows and will give relevant code snippets though conditional compilation and integration based on that would be needed to add it to the repo.

Headers and RNG stuff (taken from boost):

#include <algorithm>
#include <io.h>
#include <process.h>
#include <fcntl.h>
#include <Windows.h>
#include <intrin.h>
#define __builtin_popcount __popcnt

#include <boost/random/linear_congruential.hpp>
...
//#include <sys/wait.h>
//#include <unistd.h>
…
static boost::rand48 rng48;

espresso child process communication (fork -> Windows child process equivalent using as much C library and as little WinAPI as possible):

			if (_pipe(wfd, 4096, _O_BINARY | _O_NOINHERIT) == -1)
				throw std::runtime_error("pipe() failed");

			if (_pipe(rfd, 4096, _O_BINARY | _O_NOINHERIT) == -1)
				throw std::runtime_error("pipe() failed");
			int fdStdIn = _dup(wfd[0]);
			int fdStdOut = _dup(rfd[1]);
			_close(rfd[1]);
			_close(wfd[0]);
			TCHAR szCmdline[] = TEXT("espresso");
			PROCESS_INFORMATION pi{};
			STARTUPINFO si{};
			si.cb = sizeof(STARTUPINFO);
			si.hStdInput = (HANDLE)_get_osfhandle(fdStdIn);
			si.hStdOutput = (HANDLE)_get_osfhandle(fdStdOut);
			si.hStdError = si.hStdOutput;
			si.dwFlags |= STARTF_USESTDHANDLES;
			if (!CreateProcess(NULL, szCmdline, NULL, NULL, TRUE, 0, NULL, NULL, &si, &pi))
				std::runtime_error("CreateProcess failed");
			CloseHandle(pi.hThread);
			_close(fdStdIn);
			_close(fdStdOut);

And the termination wait:

			while (true) {
				DWORD status;
				status = WaitForSingleObject(pi.hProcess, INFINITE);
				if (status == WAIT_OBJECT_0) break;
				throw std::runtime_error("wait() failed");
			}
			CloseHandle(pi.hProcess);

lrand48() change to rng48() and srand48(rand()) changes to rng48.seed(rand()).

Not so much - certainly could be integrated.

As for building espresso on Windows, a valid getopt.c and getopt.h are necessary as well as changing cvrin.c to not FREE(PLA->filename); since strdup in Windows apparently is using a C-library managed buffer not a malloc'ed one. A crash on exit will cause the child pipe to become unreadable unfortunately.

I tested many of the command line options and found that compact-adders seems to not work at all - leads to UNSAT. However, I am parsing the output myself in a Python script and using various Python SAT solver libraries like pysat, pycosat or pycryptosat and it works with both --cnf and --opb. (I'm not sure how to test the half adders or xor options currently).

PB Encoding Buggy

Hi,
Nice work, but there is a bug with the PB encoding of the sum modulo 32:

The used encoding for add2 (similar issue exists for add5) is

\sum_{i=0}^{31} 2^i x_i 2^i y_i - 2^i z_i = 0

The problem is that this rules out all cases where x + y >= 2^32. The addition should be modulo 2^32, not ruling out values larger than 2^32. Hence, we need to use one more bit for z (in case of add5 it is 3 more bits I think), i.e.,

\sum_{i=0}^{31} 2^i x_i 2^i y_i \sum_{i=0}^{32} - 2^i z_i = 0

to encode the equality.

I tried to fix it and also add a nicer PB encoding for XORs in my fork at

https://github.com/StephanGocht/sha1-sat

However, the instance is still not solved by propagation in our solver roundingsat, when I set all input bits to fixed and no output bits to fixed, which I think it should, so there might be another problem with the encoding, which is why I didn't make a pull request.

Doesn't give help with -h

$ ./main -h
terminate called after throwing an instance of 'boost::exception_detail::clone_implboost::exception_detail::error_info_injector<boost::program_options::unknown_option >'
what(): unknown option -h
zsh: abort (core dumped) ./main -h

Practical Use Case Example

Hello and thanks again for the great project. I read the code, read your thesis and am intrigued. I have also seen some other modern research in this area which basically says its not practical yet.

I have a perfect practical example. We often know libraries like hashcat have options for brute forcing passwords that use SHA1 hash verification.

You have SHA1(salt.password) == hash. This is of course a full 80 round SHA1.

So the question is given that for example a known length password (maybe length of 12) that is say UTF-16-LE encoded, is there any chance that SAT solving will be superior to a brute force library which can process of combinations billions per second given GPU vectorization?

This means knowing 512-12*7=428 known bits (7 unknown bits per byte since the high bit is in the unprintable range), and 84-bits to determine as well as knowing the hash, is this going to be possible? Its a great practical situation. Also knowing the characters are alphanumeric gives 26+26+10=62 combinations and adding CNF clauses for the 128-62 invalid character combinations is also easy to do.

Trying the SAT solver it easily can find 7 bits in under a second. It can find 14 bits in a few seconds. 21 bits takes a few minutes. 28 bits takes hours, etc. Is this really the best we can do with preimage attacks on SHA1 with SAT solvers? Because it makes hashcat look like the faster approach still.

Even if it could detect bad password prefixes, it would make for a fast process perhaps by looking at MCS or MUS. Or if there were some way to get bit equivalence correlations among the remaining 84 bits based on that Boolean equations, that would be a power of 2 reduction per equivalence found. I've even thought about writing 160 clauses/equations with 512 variables with the exponential blow-up for CNF to see if that might yield anything. But so far I have yet to make a breakthrough.

Obviously 62^12~=2^71 so brute force will not cut it either. This seems like the perfect scenario for a SAT solver to shine through and have a practical advantage over other techniques. Unfortunately even estimating the time that will be needed is difficult with Python SAT solvers at least.

Anyway, I leave it at this - is this a practical example where a SAT solver will have an advantage or is 80-round SHA1 simply too hard even for a mere 84 bits which even have some constraints on them.