limo1996 / sat-solver Goto Github PK

Parallel SAT Solver

C++ 45.30% Shell 2.11% Python 28.47% CMake 0.72% Objective-C 0.46% TeX 22.94%

sat sat-solver parallel-computing high-performance-computing cdcl-algorithm dpll-algorithm mpi

sat-solver's Introduction

Parallel SAT Solver

Semester project for Design of Parallel and High Performance Computing class at ETHZ. Includes two communication models for DPLL algorithm that could be combined with local CDCL. If you want to find out more about communication models or other techniques that we have used please refer to our final report.

Build

Please follow these steps in order to successfully compile the source code:

Open a terminal and navigate to the directory where you want to store the repository
git clone https://github.com/limo1996/SAT-Solver.git
cd SAT-Solver
cmake .
make
Three executables (./sequential_main, ./parallel_main, ./stealing_main) should be generated

Invoking

Three executables are available:

The sequential version of solver is named ./sequential_main
The parallel version that uses master-worker communication pattern is named ./parallel_main
The parallel version that uses work stealing communication pattern is named ./stealing_main and from now we will call it Stealing version

Usage

Usage of sequential version:

./sequential_main [-s CDCL/DPLL] <CNF_input_file>

Example usage of sequential version:

./sequential_main -s CDCL cnfs/benchmark_formulas/flat75-4.cnf

Usage of parallel version:

./parallel_main [-local-cdcl branching_factor : int] <CNF_input_file>

Example usage of parallel version:

./parallel_main -local-cdcl 3 cnfs/benchmark_formulas/ais8.cnf

Usage of stealing version:

./stealing_main [-local-cdcl branching_factor : int] <CNF_input_file>

Example usage of stealing version:

./stealing_main -local-cdcl 2 cnfs/benchmark_formulas/anomaly.cnf

Testing

We have developed a testing python wrapper whose documentation can be found here The python wrapper can also be used to run the solver on the Euler Supercompute Cluster of ETH.

Results

Final report in pdf format can be found here.
Performance graphs can be found here and here.

sat-solver's People

Contributors

Stargazers

Watchers

Forkers

zhenghaogithub

sat-solver's Issues

Folder reorganisation

We have too much folders in main folder. After Mondays presentation we should move all tests to common test folder, rename python_wrapper to scrips, move randomCNF to scripts.

Runtime performance measurements on euler

We should store some runtime measurements in files on euler.
Such that one can then manually download them, run a script and get a nice plot (x-axis: size of formula i.e. number of variables, y-axis: runtime)
There are 3 changes necessary as far as I can see:

add information to how large randomly generated formulas are to the file names (overlaps with #8 )
measure runtime in parallel_main.cpp and store it in a file
write a script that given such files can create the plots

Additionally we should have far more and far larger formulas! -> #8

More test cases!

We need more test-cases!
Small ones and also large ones.
We should store them in the repository such that everyone can run the same tests.

2 things:

just generate a lot of them (I would say 100 of size (number of variables) 10, 30, 50, 100, 200 at least)
change the names of the test files such that we know how "large" these formulas are (useful for evaluation)

More than one CNF in file

Is it valid to have more than one CNF in a file? Parser seems to be able to parse multiple CNFs from one file...

Use a smarter encoding in DPLL

encode the cnf formulas as integer arrays and boolean arrays instead of using object structures -> should require a lot less space -> better spacial locality -> faster!

Implement CDCL

If we want to get more overall performance we should implement CDCL!

Cleanups and Code organization stuff

Move encoding and decoding of model to model class
change get_model to get_partial_model in CNF.cpp
implement get_model in CNF.cpp, and move the functionality from send_sat solver there
Regarding the two last points: yes it's super confusing right now

Fix glitch in master worker communication

As discussed we should iprobe on a different tag.
The exception that I planted seems to be triggered in a couple of cases (13 times out of 840 executions).
Here's the output file from euler:
lsf.o52407426

Should we do work stealing scheduling?

use integers instead of strings for var names

in Variable.cpp/Variable.h

Initial Progress Presentation

Create a google slides presentation.
I think it should cover the following aspects:

how did we start (sequential algorithm that we took over)
how did we parallelize it?
how did we test it?
some initial performance evaluations
what do we plan to do next?

"Measure" difficulty of formulas

One of the main questions of the professors was:
"Why are the dots so spread for larger formulas?"

We should be able to answer that question in a "scientific" way, i.e. something like this:
"The formula on the top of the plot is very difficult to solve, the dpll algorithm needs to make 25 branching steps until it is able to solve the formula.
On the other hand the formula on the bottom is very easy to solve, the dpll algorithm doesn't need to branch at all".

We should come up with a numerical representation of the difficulty of a formula, one suggestion (used above) is to use the maximal branching depth until a solution is found or unsat is proven.
Note that that does not mean we can just count the number of "branches" taken during an execution, we need to associate true and false branches on the same variable and count them as one.
Effectively we need to find the depth of the tree.
But counting the number of branches in total would also be a metric, it would tell us how "bad" the decisions that our algorithm made are. I would suggest we do both...

A good starting point is to change the stderr output in dpll.cpp such that we record the necessary information.
We could then create a python script that reads the stderr output and comes up with the branching depth and the overall number of taken branches.

I suggest that we just do it on the sequential version, as it will be identical to the parallel one (this is a property of the cnf formula).

Move encoding and parsing of variables from worker to CNF

Jan, I think it would be better to move encoding and parsing of formula to CNF class so I can reuse it in Master class (I need to parse final formula and print it). I will push in few minutes data class named State that basically hold array of unsigned ints and its size + getter and setter.

Your methods can looks like:
State CNF::encode(); //cnf will encode itself to State object

CNF cnf = new CNF(State state); // cnf will parse itself from State object
// ---- or -----
CNF cnf = new CNF();
cnf.parse(State state);

What do you think about that?

Use more than 48 cores on Euler

Whenever I try to run something on Euler on more than 48 cores I get the following error message:

virtual.40d: Queue has been closed. Job not submitted.

Weird...

Here's the full output of main.py:

please enter your nethz_account name : ebjan
Euler parameters:
           num_nodes: [16, 64, 128]
            num_runs: 4
             timeout: 15 [seconds]
 max_overall_runtime: 240 [minutes]
         test_folder: 3-sat_instances_medium
[email protected]'s password: *
to_euler.tar                                                                      100% 1860KB   1.8MB/s   00:00    
Pseudo-terminal will not be allocated because stdin is not a terminal.
[email protected]'s password: *
Generic job.
virtual.40d: Queue has been closed. Job not submitted.
#########################################################
unpacking tar archive...
calling run_me_on_euler.sh...
#########################################################
run_me_on_euler.sh is taking over
including necessary modules...
node counts: 16 64 128
choosing highest node count: 128
submitting compilation and run job to the batch system...
an error occurred during job submission, exiting
#########################################################

Fix DPLL

Looks like our DPLL is not working...
fail_example1.cnf and fail_example2.cnf are examples of such behavior...

Identify cnfs in the plots

We should add an option to the plotting script to add numbers to identify formulas.

An idea on how this might look like:
https://stackoverflow.com/questions/14432557/matplotlib-scatter-plot-with-different-text-at-each-data-point

We should add a command line option to the plotting script that enables the numbers.
For command line option parsing one could use the click python package, example usage is in main.py.

limo1996 / sat-solver Goto Github PK

sat-solver's Introduction

Parallel SAT Solver

Build

Invoking

Usage

Usage of sequential version:

Usage of parallel version:

Usage of stealing version:

Testing

Results

sat-solver's People

Contributors

Stargazers

Watchers

Forkers

sat-solver's Issues

Recommend Projects

Recommend Topics

Recommend Org