In this article, we discuss two typical and educational problems which can be well solved using randomized algorithm - global min cut problem and median problem.
Global Min Cut
Problem Background
In graph theory, a cut is a partition of the vertices of a graph into two disjoint subsets. The size of a cut is defind as the number of edges across two partitions. A cut is minimum if the size of the cut is not larger than the size of any other cut. The illustration below shows a minimum cut: the size of this cut is 2.
The gloabl min cut problem is how to find the minimum cut in a given graph. A closely related problem is the min s-t cut problem, where two extral vertices s and t are provided. In the min s-t cut problem, we require s and t are divided by the cut. In other words, s should be in one vertex partition, and t should be in another.
One can solve the min s-t cut problem using any polynomial-time algorithm for max s-t flow, and the current fastest algorithm runs in
Karger’s Algorithm
Here we only consider multigraphs without self-loops, i.e., there are possibly multiple edges between a pair of distinct vertices but no edges linking to itself. Following is the randomized algorithm to find the global min cut.
In each loop, this algorithm randomly choose an edge from the graph, and do the edge contraction in line 3. During the edge contraction operation (
Note that after each edge contraction, the new graph is still a multigraph without self-loops.
If you remain confusion about the edge contraction after view the above picture, we provide a formal definition here. We define
to be the graph resulted from contraction of by:
- Replace
by a new vertex ; - Replace every edge
or where by a new edge .
The algorithm ends when there are only two vertices left in the graph, and output the number of edges between them.
In fact, the edge contraction process is partitioning the original vertex set, and the edges between last two vertices represent the cut of this partition.
Here is a lemma describing the success possibility of this algorithm:
The proof of this lemma is at the end of this article.
As we did before, we can boost the success possibility by multiple runs. Here, if we run this algorithm for
The first inequality we used here is union bound, which says that for a set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the individual events; the second inequality is
, which is an extended form of Bernoulli’s inequality. These two inequalities are extremely useful in the possibility bounding of randomized algorithm.
So, the success possibility after multiple runs is at least
Finally, let’s look at the overall running time. If we use the adjacency matrix representation, every edge contraction takes
Karger-Stein Algorithm
First, let’s examine the possibility of global min cut surviving when only
Looking close to this inequality, we will find the lower bound of surviving possibility is close to one in initial loops. For example, after the first loop, where
We can observe that when
Defind
Solving the recursion gives
The running time of Algorithm2 satisfies the recursion:
Therefore,
Median
In this section, let’s disucss a easier one - the median problem, which is described as this: Given an unsorted list
QuickSelect is a simple randomized algorithm with expected running time
- Find a good “pivot”
; - Partition
into and ; - The median is in one of
and , and recurse.
We say the pivot
Solving the recursion gives
Here we introduce a simple strategy to find good pivots - picking random elements from
Knowing this, we can follow such a procedure to find a good pivot: Choose a random pivot and check if it is good; if not, repeat. The number of rounds needed to find a good one is
Checking whether a pivot
is good is equivalent to checking whether and , which takes time.
Conclusion
After going through these two basic problems, you may find that the design of randomized algorithm is not as complicated as normal algorithms. All the algorithms we discussed in this article are not longer than four lines! The intricate thing is to decide how many runs are neeeded to boost the success possibility to the
“Success possibility at least
” is a de facto standard in the theory field.
Proof
This section is optional to read.
Given
Let’s prove the lemma:
Denote the sequence of edges chosen and contracted by Algorithm 1 as
A key observation here is that Algorithm 1 outputs
By the chain rule, we have
For two events
and , the chain rule states that
wheredenotes the conditional probability of B given A.
Let’s look at the first term:
where
Next, we need to introduce a fact about the relationship between
The average degree of a graph
The symbol
means the degree of the vertex .
Since
Combining above two formulas, we get
So,
For the second term, suppose
More generally, for each
Finally, we get
as claimed.