What the research is about ...
A large number of problems that occur in knowledge representation,
learning, planning, scheduling, and other areas
are known to be NP-complete in their most general form. NP stands for 'nondeterministic
polynomial '.
These are problems that cannot be solved in polynomial time (as far as anyone
knows), but if you could guess the answer, you could efficiently check its correctness.
The best known algorithms for solving such problems are known to require exponential
run time (in the worst case).Literally thousands of computational problems have
been shown to be NP-complete.
The completeness property of NP-complete problems means that if an efficient
algorithm for solving just one of these problems could be found, we could immidiately
have an eficient algorithm for all NP-complete problems!
We focus on the K-satisfiability (K-SAT) problem, which is characteristic
of the class of NP-complete problems.
The satisfiability problem, or SAT, and it was the first member of the notorious
class known as NP- complete problems.
SAT has practical importance. In artificial intelligence various methods of
logical deduction and theorem-proving are related to SAT. And similar issues
arise in computer software for scheduling, such as assigning flight crews to
aircraft or planning the production run of an automobile factory .In recent
years SAT has attracted further attention for another reason. Although the hardest
SAT problems do seem very hard, many problem instances yield easily to elementary
methods. If you make up thousands of SAT problems at random, simple algorithms
quickly solve all but a few of them. Looking at these results more closely,
investigators discovered a curious pattern. The hard and easy
instances are not mixed up haphazardly; as a certain parameter is varied,
the problems go from easy to hard and back to easy again.
A physicist looking at this pattern would note a resemblance to the critical
behavior observed near phase transitions in fluids and magnetic materials. And
indeed there is a corresponding phase transition in the SAT system: In one phase
almost all the propositions can be satisfied, but in another phase almost none
can. The cases that are hardest to resolve lie near the transition between these
regimes.
K-SAT problem is defined by a set of N boolean variables,each of which can
be assigned "true" or "false",and
a set of M clauses. Each clause is a logical OR of K variables (where each variable
can be negated).
For example, 2-SAT formula might look like this: (x OR y) AND ( (NOT x) OR (NOT
y) );
The goal is to find the assignment to the variables such that all clauses are
satisfied.Because all the connectives inside each clause are OR operators, any
true literal in a clause makes the entire clause true. On the other hand, because
the clauses are linked by ANDs, any false clause makes the entire formula false.
To put it another way, a true formula is any CNF expression that has no false
clauses; note that this definition includes the empty formula, with no clauses
at all. Conversely, a true clause must have at least one true literal, so that
an empty clause is false.
We study the typical case complexity of K-SAT by considering an ensemble of
randomly generated K-SAT formulae.
For each formula,we generate M clauses, where each clause is generated by randomly
selecting K variables from the
set of N variables and negating each selected variable with probability 0.5
to construct the clause. The value of K
is important: we say that 1-SAT and 2-SAT belong to the class P(polynomial time
solvable problems (the linear time
algorithm is known). For k>=3 K-SAT is NP-complete. We focus on on K=3 because
1)if all clauses are of length two then polynomial algorithms are known 2)a
theory with clauses longer than 3 can be converted to an equvalent theory with
clauses
of length 3 with only a linear increase in the length of the theory.
Although we have no polynomial-time algorithm for SAT, we can do better than exhaustive search --we can do backtracking. The basic strategy is to explore a branch of the tree of possible solutions until you come to a dead end, then back up to some earlier choice point and try another branch. If that path also fails, you back up further still, to an even earlier decision point, until eventually you either find a solution or run out of branches.
The backtracking algorithm for CNF formulas can be imagined as a contest between two players, the Optimist and the Pessimist. The Optimist strives to find a satisfying labeling by looking into each clause for at least one true literal. If she finds one, she strikes out the entire clause. If she can eliminate all the clauses in this way, then the original formula is satisfiable. Meanwhile the Pessimist searches for false literals and removes them from the clauses in which they appear. If she can show that every labeling yields at least one empty clause, then the formula is unsatisfiable.
The backtracking algorithm has exponential running time in the worst case. If you are unlucky, backtracking can take just as long as exhaustive search, but in practice it often runs much faster because it can prune whole limbs from the search tree without exploring their leaves. Also we are using Martin Davis and Hilary Putnam algorithms that suggest attending first to any variable that appears in a singleton clause -- a clause with just one literal(this literal should be True).
We represent the results according to the ratio of clauses to variables, M/N
(the crucial parameter for describing SAT statistics). And we see that the proportion
of satisfiable formulas decreases as m/n increases. Formulas with only a few
clauses and many variables can almost always be satisfied, since most of the
variables appear only once or twice, and a conflict between them is unlikely;
in this region the formulas are said to be underconstrained.
At the other end of the spectrum, with many clauses and few variables, each
variable can be expected to appear in many clauses, so that conflicts are frequent;
here the formulas are overconstrained, and few of them
are satisfiable. The crossover point (for 2-SAT M/N =1,
for 3-SAT M/N=4.2)
is a region between those two. Indeed below this critical point almost all formulae
are satisfiable, whereas above it almost all
are unsatisfiable. In fact the hardest problem found in the underconstrained
region is many times easier than the easiest unsatisfiable problem found in
the neighborhood of the crossover point. The SAT cost curve looks like the curve
of phase transitions and critical phenomena.In nature, phase transitions are
classified as continuous or discontinuous; for example, the onset of magnetization
is continuous, whereas the freezing and boiling of water are discontinuous.
What about transitions in SAT?
The 2-SAT transition is continuous, but the 3-SAT transition is discontinuous.
So,we will try to study properties of this problem, differentiate its patterns, and ,probably,to come to the solution closer.