671: DAA --- Lecture 11

Backtracking and Branch-and-Bound

The solution to many problems can be viewed as making a sequence of decisions. For example, finding a coloring of a graph with vertices {1,2, ... ,n} can be viewed as making a sequence of decisions, where the ith decision is which color to use for the ith vertex, i = 1,2, ,,,. n. Given a weighted graph a solution to the Traveling Salesman problem can be viewed as making a sequence of decisions concerning which city to visit next in the tour. A solution to the 0/1 Knapsack problem can be viewed as making a sequence of decisions as to which objects to put in the knapsack. Given a set of positive integers A = {a1, a2, ... , an}, and a positive integer Sum, the Sum of Subsets problem is to find a subset of S of A (if any) S such that the sum of the elements in S equals Sum. Again, a solution to the Sum of Subsets problem can be viewed as making a sequence of decisions as to which elements to include in the set S.

Anytime the solution to a problem involves making sequence of decisions, it can be modeled using a state space tree T. The root of T corresponds to not yet making the first decision. Then the children of the root consists of nodes corresponding to all possible choices for the first decision. The children of a first-decision node v consists of nodes corresponding to all possible choices for the second decision given that v was the first decision, and so forth. The total number of decision sequences is usually exponential in the input size n (i.e., were each solution consists of at most n decisions), since there is usually at least two children of every node not corresponding to the final decision (or next-to-final) decision.

Backtracking and Branch-and-Bound are problem solving strategies that are guaranteed to find a solution to any problem modeled by a (finite) state space tree, since they ensure that every node in the tree T that can possibly be a solution node is examined. The two strategies differ in the way in which the tree T is searched. Backtracking follows a "depth-first" search strategy, whereas Branch-and-Bound follows a "breadth-first" search strategy. Backtracking has the advantage that there is no need to explicitly implement the state space tree, since we only are maintaining a path in this tree at any given point in the algorithm. Branch-and-Bound, on the other hand, needs to explicitly maintain the tree (at least that part of the tree generated), since all the children of a node are generated before proceeding further. Thus, we need to explicitly generate nodes in the state space tree, and keep track of the parent of these nodes. The question is, then, why consider Branch-and-Bound? The answer is that for some problems goal nodes might be found in a FIFO Branch-and-Bound that are at a shallow depth in the tree, whereas Backtracking might look at a large portion of the implicit state space tree before finding a shallow-depth goal.

We illustrate Backtracking with a simple example from games, namely, placing n queens on a n-by-n chessboard so that no two queens are attacking. A given queen can attack everything in its row or column, as well as anything on the two diagonals incident with its position. In particular, if the is a placement of the n queens where no two attack, then each column (or row) must contain exactly one queen. Thus, we can assume that the ith queen is placed in the ith column, and view the problem as making a sequence of n decisions, where the ith decision is the choice of row for the ith queen, i = 1, ..., n. Of course, the state-space tree for this problem is enormous, i.e., contains nn nodes, but Backtracking will only visit a small portion of this tree due to bounding.

In the Backtracking solution, we try to place the ith queen in the first row where it won't be attacked by the placement of the previous i - 1 queens. If we can't find such a row, then we must backtrack and change the position of the last of the i - 1 queens for which such a change is possible. If there is no such queen, then there is no solution to the n-queens problem. In the following figure, we show a sequence of board positions generated by Backtracking for n = 4, where we only show the two positions where a backtrack must be made before a solution is found.

.

Of course, placing n queens is not a very practical problem, and moreover, there are solutions not based on Backtracking. However, there are a number of practical problems for which no solution is known other than variations of Backtracking. One such problem is CNF satisfiability. Recall that a formula involving boolean variables x1, x2, ..., xn and their negations is in Conjuctive Normal Form (CNF) if it is a conjunction of disjunctions. For example, the formula


(x1 .or. x3) .and. (-x1 .or. x2 .or. x3) .and. (-x1 .or. -x2) .and. (x2 .or. -x3)
is in CNF. The fundamental question about CNF formulas is their satisfiability, i.e., is there a truth assignment to the boolean variables in the formula which makes the formula evalute to TRUE? Of course, given an assignment to the variables, it is easy to check whether or not it is an satsifying assignment by making a linear scan through the formula, checking whether each clause (= disjunction) contains a variable that evalutes to TRUE.

The satisfiability problem is clearly suitable for Backtracking, since finding a satisfying assignment (or determining that none exists) can be viewed as making n decisions about the truth assignment to the n boolean variables. Note that the (static, see below) state-space tree modeling the CNF satisfiability problem has 2n+1 - 1 nodes, making it infeasible for exhaustive search unless some bounding can be generated.

The CNF satisfiability problem (also called the SAT problem) is one of the most famous in computer science, as it was the first decision problem to be proved NP-complete (Cook). Also, since the problem has many practical applications in such areas as verifying VLSI electronic circuits and in cryptology, it is a very active research area today. In fact, there is an annual international conference devoted to problems related to SAT and which includes an on-line contest for the best SAT solver on a given test suite of CNF formulas. This conference was held here in Cincinnati a couple of years ago, and some of our faculty (Franco, Schlipf) are internationally know researchers in this area.

There are many heuristics that have been developed to help bound the backtracking search of the CNF state-space tree. Two simple ones amount to choosing the next variable to branch on depending on which variable satisfies the most clauses. This is basically a greedy strategy (see Chapter 7). Preprocessing the input CNF formula and branching based on the fixed ordering of the variables subject to this greedy strategy yields a state space tree that is static with respect to this ordering. Slightly more clever is to chose the variable to "branch on" next that satisfies the most clauses not yet satisfied (again, a type of greedy strategy, which agrees with the staightforward greedy strategy at the beginning). The state space tree in this case is dynamic (i.e., varies with the problem as well as within the problem), as opposed to the static state-space tree associated with the n-queens problem.

The following figure illustrates a static state-space tree for any CNF formula involving 3 boolean variables. Nodes of the tree correspond to variables, and edges are labeled 0 or 1 according to whether we assign FALSE or TRUE to the variable in node. Triangular leaf nodes indicate that all three variables have been assigned truth values. The second figure shows the path in the tree corresponding to the unique satisfying assignment for the above CNF formula. Here we visited 10 nodes in the tree before a finding the satisfying assignment (visiting a node corresponds to assigning a truth value to the variable in the node). Of course, if we could have chosen more cleverly, it could have been found rather quickly. For example, if we used the greedy heuristic mentioned above, would only have visited 3 nodes before finding a solution. But nobody knows clever enough heuristics that can avoid exponential worst-case performance for CNF satisfiability.


(x1 .or. x3) .and. (-x1 .or. x2 .or. x3) .and. (-x1 .or. -x2) .and. (x2 .or. -x3)

In Branch-and-Bound, one generates all the children of a node in the state space tree before going on to children of that node. This forces one to explicitly maintain the nodes in the tree using using something like a queue or stack. Often the queue that is maintained is a priority queue, where nodes are given a priority based on some heuristic. In spite of the overhead incurred by explicitly maintence of the nodes in the tree, Branch-and-Bound can often be more efficient than Backtracking when solutions might be found in relatively shallow portions of the tree. However, for CNF, when there is a little or no choice for a satisfying assignment, Backtracking will outperform Branch-and-Bound.

In optimization problems, such as the 0/1 Knapsack problem (see Chapter 7), bounding is also done by maintaining the best solution found so far in the search. If you can determine that a subtree cannot possibly contain a solution that is better than one already found, then the node can be bounded (the subtree pruned from the search). For example, a given node in the tree can be bounded if you allow fractional portions of the remaining objects in the subtree rooted at the node to be placed in the knapsack and you still could not improve an already generated solution.

Our second illustration of backtracking is the Sum of Subsets problem. The input to the Sum of Subsets problem is a set S of positive integers together with a given Sum, and the output is a subset of S whose elements add up to Sum (or the message that no such subset exists). There are two natural (static) state space trees modeling this problem (and any problem that involves choosing elements from a given set, such as 0/1 Knapsack). Both models proceed by scanning the elements in A in the order a1, a2, ... . Nodes in the variable tuple state space tree T are represented explicitly by indices of the chosen elements, whereas in the fixed tuple state space tree, nodes are represented by the 0/1 characteristic vector corresponding to the chosen elements. For example,
if A = {a1, a2, a3, a4, a5}, then in the variable tuple state space tree, the subset {a2, a4, a5} is represented by the node {2,4,5}, whereas in the fixed tuple state space tree it is represented by (0,1,0,1,1). The following figures illustrate the portion of the two different state space trees generated for
A = {1, 4, 5, 10, 4} and Sum = 9.

In the optimization version of Sum of Subsets, we want to find a subset having the fewest number of elements that sums to Sum. Here we keep track of the number of elements UB in the best solution found so far. When a node is examined which cannot do better than UB elements, then it is bounded. We first illustrate this for Backtracking and the same set A and Sum as before.

The following figure shows that portion of the state space tree generated by FIFO Branch-and-Bound. You will note that there is no need to maintain the variable UB or LoBd in this problem, since the first goal node found will automatically be optimal because of the breadth-first nature of FIFO Branch-and-Bound.

The fixed tuple state space tree is usually used for problems where the solutions (if any) are always a leaf nodes in the tree. In other words, a solution sequence always involves exactly n decisions. Such examples include graph coloring, boolean formula satisfiability, Traveling Salesman, and so forth. Sometimes, especially in optimization problems, the state space tree can be pruned from the outset. For example, suppose you want to determine the minimum number of colors needed to color a graph (so that no two adjacent vertices have the same color). If a graph has n vertices, then we need at most n colors. Considering the vertices in the order, it is clear that we can restrict color of vertex 1 to color 1. In the same way, we can restrict the color of vertex 2 to 1 or 2, and so forth. In other words, considering the vertices in the order, v1, v2, ... , vn, you can limit the color choices for vertex vi to colors 1,2, ..., i.