Analysis and Design of Algorithms
1 Probability Bounds
Markov Inequality
For a nonnegative random variable , for any ,
(1) |
If we have the variance of a random variable, we can apply more detailed analyses:
Chebyshev Inequality
Let be a random variable with and , then
(2) |
The proof is straightforward after applying Markov inequality to .
Chernoff Bound
Let be independent random variables. . Chernoff bound takes advantage of higher-order conditions of independent variables. For Bernoulli random variables with probability ,
(3) |
For
(4) |
The proof is by considering the random variable , which contains higher-order information of .
1.1 An Example: Median Finding
Goal:
-
•
Input a set of integers
-
•
Find the median : at least elements and at least elements .
We want to find , such that , with small enough. Specifically, we do
-
•
Count number of elements in larger than , and smaller than .
-
•
Sort in time.
-
•
Deduce the median.
How to choose ? We first select a multi-set of elements iid from , and sort .
-
•
Let be the smallest element and be the smallest one.
-
•
Compute . If or , we fail. If , we fail.
-
•
Otherwise, we sort and return the smallest element in .
The runtime is indeed, but we have to bound the probability of failure.
. If , then .
. If then .
.
by applying Chebyshev bound. (Could get better bound if using Chernoff bound?)
. Either elements larger than or elements smaller than . Suppose the first happens, then has at least samples that are among the largest elements in . (Consider the rightmost interval in .) Let , and if is larger than the element. . . Then
(5) |
By union bound, we fail with probability less than .
2 Flows
A graph . Compute its maximal flow.
Ford-Fulkerson algorithm , where , is the largest flow capacity.
Edmands Karp .
Theorem 2.1.
The maximal flow from to equals the min-cut that separates .
Note: Recursively reducing paths from to gives the maximal flow. In the end and will be disconnected, which naturally yields the solution for a Min-Cut.
2.1 Applications
Bipartite Matching Using max-flow algorithms to find maximum matching for bipartite matching.
Theorem 2.2 (Hall).
A bipartite graph has a perfect matching iff or , , where is the set of neighbors of .
Image Segmentation Let be the cost of assigning pixel to the foreground. is the cost of assigning it to the background. And is the cost of separating and .
(1) |
This can be translated into a Min-cut problem.
2.2 Min-Cut
Instead of finding an - Min-Cut, we find an partition of vertices .
Karger’s randomized algorithm
Repeatedly contract two nodes together, randomly, until only two supernodes are left. The runtime would be . ( for each edge contraction.)
This randomized algorithm returns the correct answer with probability
(2) |
Repeat time, so the overall runtime would be time to achieve a failure probability.
Proof Sketch..
When there are supernodes left, we have
(3) |
This is because each node must have degree if the minimum cut is .
Then we multiply the probability together, which yields the final bound. ∎
Corollary 2.3.
There are at most number of min-cuts in a graph.
2.3 Complexity classes of BPP: randomized poly time
A randomized algorithm that runs in time, with correct probability for any input .
Boosting: Run an algorithm in BPP times independently and take a majority vote, then by Chernoff bound,
(4) |
3 Streaming Algorithms
Inputs coming in consistently. Cannot store all data. Need an online algorithm to deal with incoming data.
Setup.
Inputs: , where .
Assumption: The length of input is known, and the range is known.
Let .
, where .
, space complexity: .
What about ? Can we find an algorithm with logarithmic space complexity?
3.1 Deterministic algorithm fails for
Assume there exists a deterministic algorithm to compute exactly. We construct a stream:
(5) |
Run the algorithm. Assume the memory at the end is .
We then initialize the memory to and feed and to the algorithm. There are two possibilities:
-
•
increases by 1, if ,
-
•
increases by , if .
It’s possible to completely recover by , so must have at least possible values. The space complexity is .
3.2 Randomized algorithm for
Hash function .
-
•
Initialize: set
-
•
Process : Add to .
-
•
Return .
Space: , so it takes space.
We show that is an unbiased estimation of . For , let .
(6) |
so
(7) |
We now bound the variance of to apply tail bounds.
(8) | ||||
(9) | ||||
(10) |
so .
(11) |
This bound is great, but not so satisfactory. Actually, using the median of means trick, we can get an exponential boost, i.e., a w.p. at least using space .
3.3 Derandomization
Definition 3.1.
A family of random varibles is called -wise independent if for every -tuple , are independent.
Specifically, when they are -wise independent, it is called fully independent.
Remark. The randomized algorithm above does not need full independence. In fact, all we need is a -wise independence (required in the analysis of variance). This is also the reason why we can construct such a hash function in but not .
Definition 3.2.
A family of hash functions is called -wise independent if for any distinct points and ,
(12) |
3.4 Construction of -wise independent hash functions
Let where is a prime.
(13) |
Then
(14) |
So this is a 2-wise independent hash function family. We get p random variables with space.
We can extend to to create a -wise independent hash function family, with space complexity of .
3.5 Algorithm for .
Function .
-
•
Initialize
-
•
Process : Let be the largest power of 2 that divides . If , .
-
•
Return .
Intuition:
If the stream has distinct elements, there is a good chance that one of the values of will be divisible by .
If there are no more than distinct elements, it’s unlikely that an has more than zeros in a row.
, indicator random variable that is divisible by .
. So if , no elements have or more zeros, so .
(15) |
so
(16) |
(17) |
4 Johnson-Lindenstrauss Lemma
JL Lemma states that we can map data that lie in a high-dimensional space to a low-dimensional one, preserving some of the structures in data.
Lemma 1 (JL Lemma).
Let be a set of points in . For some dimension , there exists a matrix such that ,
(18) |
4.1 Norm Preservation
Lemma 2.
For any integer , and integer , there exists a distribution on real matrices, such that for any ,
(19) |
Proof of JL Lemma by lemma 2..
Let . We apply union bound over , and the proof is done. ∎
In fact is the construction we need for this lemma.
Proof of lemma 2.
Random choose the entry of . .
(20) |
This is a random variable .
(21) |
Now let
(22) | ||||
(23) | ||||
(24) | ||||
(25) | ||||
(26) |
The last equality holds because for . Then applying the fact that and let , ,
(27) |
∎
4.2 Packing
In we can only have perfectly orthogonal vectors. Let be unit orthogonal vectors. After applying JL lemma, we have . Then for .
This means that for , , near-orthogonal vectors in .
Lower bound. Let be two -near orthogonal vectors in , then two unit balls with centers with radius are disjoint. However, the total volume of the ball with radius is .
With a more fine-grained analysis, we can get a tighter lower bound
4.3 Application: Approximate Nearest Neighbor
. Given a query , find the closest to .
If is small, we have efficient data structures with quasi-linear in space and in time.
If is large, we need space and query ; or in space and query time .
Definition 4.1 (-Approximate NN).
Given , an -approximate nearest neighbor is an output such that .
Assume the closest point to has distance 1.
-
•
Preprocess:
Fix a grid on with side length . Let be the set of grid cells that contain at least one point at a distance at most 1 from .
Store for each .
-
•
The space consumption for each :
(28) -
•
The query time is .
How to do better using JL lemma?
In the preprocessing phase, project to a space with dimension .
The space cost is then
(29) |
Time cost of projection during a query is time, so the time is
(30) |
5 Linear Programming
Standard form
(31) |
Dual problem
(32) | ||||
(33) |
5.1 Uncapacitated Facility Location
Let be the set of clients and be a set of facilities. There is a close to open facility .
Goal: choose a subset of that minimizes .
should satisfy the triangle quality.
We find an equivalent integer programming:
(34) |
This problem is in general NP-hard. Hence we relax the integer constraint to .
How to round a solution of the linear program to an integer solution?
Fix an order for clients. Let . We say and are close if .
Dual problem:
(35) | ||||
(36) | ||||
(37) | ||||
(38) |
Greedy deterministic LP rounding
-
•
Solve Primal and dual LPs to obtain and
-
•
Order the set of users, according to increasing .
For a user in ,
-
•
Let be the user of smallest , Let minimize . Open facility , assign it to user . Remove from .
-
•
, assign to . Remove from .
Analysis.
Note that , we have
(39) | ||||
() | ||||
(LP constraint) | ||||
(40) |
Suppose a user is assigned to facility . Let .
(41) | ||||
(42) | ||||
(43) |
This is because complementary slackness implies
(44) |
Therefore,
(45) |
(46) |
6 Semi-definite programming
We consider the Max-Cut problem on a graph . We want to choose a subset , to maximize , i.e., to maximize the number of edges connecting a node in to a node outside .
Integer program for Max-Cut
(47) |
Randomly cutting the graph gives us a approximation.
(48) | |||
(49) |
6.1 SDP definition
Definition 6.1 (Positive semi-definite matrix).
Let be symmetric. is positive semi-definite or , if the following equivalent statements are true:
-
•
, ,
-
•
for some .
-
•
All eigenvalues of are nonnegative.
Definition 6.2 (SDP standard form).
(50) |
Dual problem
(51) | |||
(52) | |||
(53) |
To derive this, remember that we want
(54) |
so for all , which means .
6.2 Strong duality
Theorem 6.3 (Slater’s condition).
Strong duality holds for SDP problems if the feasible region has an interior point.
6.3 Max Cut
Now we relax the max cut problem to a SDP problem.
(55) | ||||
(56) |
Then let be the adjacency matrix and be the gram matrix. Then this is a SDP.
Theorem 6.4 (This is an SDP!).
Let be the adjacency matrix, if and otherwise. We define to be the gram matrix . Then is equivalent to the following SDP
(57) |
Theorem 6.5 (Goemans-Williamson).
Solve the SDP to obtain . Then . Find by decomposing . Choose a vector uniformly on the sphere .
Set .
6.4 Quadratic programs
(58) |
This problem is hard. Specifically, it is harder than Max-cut, i.e., . To see this, let be extremely large, so must hold. Then Max-cut is reduced to this problem.
Relaxtion
(59) |
Let where has diagonal entries to be 0. Since , for , .
This is an SDP program.
Theorem 6.6 (Grothendieck’s inequality).
(60) |
We first try the hyperplane rounding we used in Goemans-Williamson,
SDP contributes: .
Rounding contributes:
However, could be negative, making this bound infeasible.
Fortunately, we have the following lemma, which directly proves the theorem. To see this, we first apply this transformation in the lemma and then do the hyperplane rounding.
Lemma 3.
For any unit vectors , , then there exists a new set of unit vectors , s.t.
(61) |
Proof.
Define tensor product . Let .
By Taylor’s expansion,
(62) |
∎
Let
(63) |
and
(64) |
Then .
And . This is because removing the sign in the Taylor’s expansion yields .
Then
(65) |
Lavasz theta function
Let be the size of the largest independent set of .
Let be the chromatic number of . And let be the complement of .
Then
(66) |
We define
(67) | ||||
(68) | ||||
(69) |
Theorem 6.7.
.
Proof.
First .
Fact: unit vectors such that .
Then any -coloring of yields .
The other direction: . We solve the SDP to obtain . For an independent set, consider of the independent set
(70) |
Then at least one term of . However, . ∎
7 Online Algorithms
7.1 Multiplicative Weights
We wish to make predictions over days, relying on expects that we have. Define to be the loss of expert , on day . Without loss of generality, .
Our goal is to minimize the regret
(71) |
Naive idea: to pick the best expert so far. Then there exists a set of so that the regret .
Weighted Majority. Assuming we are making binary choices. Assign each expect with weight . We can make a majority vote and predict
(72) |
Assumption: There is an expert who is always correct.
Algorithm: Remove the experts with the wrong predictions. Then each time we make a mistake, we halve the number of experts. Regret = . This is basically the idea of the multiplicative weights algorithm.
If we remove the assumption that there is an expert always correct, we can restart after removing all experts. The bound is if the best expert makes mistakes.
The idea of multiplicative weight on binary outcomes
-
•
Set .
-
•
After observing the outcome at day , set if makes a wrong prediction.
Analysis: let . Then
(73) |
(74) |
Therefore,
(75) |
However, this does not guarantee a bound on the regret, due to the constant. We now state the full version of multiplicative weights.
Multiplicative weights algorithm
-
•
Set .
-
•
For ,
-
–
Follow expert with probability .
-
–
-
–
Theorem 7.1.
If , then the multiplicative weight gives the following bound on regret
(76) |
If is known, , regret .
Proof.
Define a potential function . Then
(77) |
Therefore,
(78) |
Hence .
On the other hand, let be any expert,
(79) |
Then let be the optimal expert,
(80) |
∎
Remark. When , regret .
7.2 Application of Multiplicative Weight
Zero-sum Games.
(81) |
Theorem 7.2 (Von Neumann).
Let be mixed strategies of players and , then
(82) |
Another way of phrasing: , then there is a value ,
(83) |
We try to prove this theorem by multiplicative weights. Let be the strategies over days.
Then
(84) |
and
(85) |
Adding them together,
(86) |
Let , then
(87) |
This means
(88) |
When goes to infinity, the two bounds are equal, which proves our theorem.
7.3 Application: Max Flow
Consider an unweighted graph . The max flow problem can described by a linear program:
Let be the set of all paths from to , then
(89) | ||||
(90) | ||||
(91) |
The dual problem:
(92) | ||||
(93) | ||||
(94) |
Zero sum game conversion:
Let be the optimal flow, a player chooses a path, and a player chooses an edge (might be a mixed strategy). Payoff for is 1 if and 0 otherwise.
Lemma 4.
Let be the value of the game. Then .
Proof.
Given an optimal solution to the dual, plays edge with probability , then for all paths, the payoff for is
(95) |
Let be the optimal solution to the primal problem. chooses a path with probability . For any edge,
(96) |
∎
Remark. We now want to run a multiplicative weights algorithm on this zero-sum game, but the problem is that the primal problem has exponentially many variables. We have make some modification to fix this issue.
-
•
For each , use multiplicative weights to choose a distribution on edges. Let be the best response to , which is to find the shortest path algorithm with weights .
-
•
Set the reward vector to be: .
-
•
Suppose the solution is , we route units of flow on each . Denote this route as .
Lemma 5.
This algorithm routes units on each edge, for .
Proof.
Suppose for contradiction that routes more than on , then there is more than of the path use edge .
If the player players edge in hindsight, then the payoff is more than .
In each step, player gets at most in expectation, because is the best response.
Note that . ∎
7.4 Application: Adaboost
Goal: to learn an unknown function , given a sequence of training samples , . We want to minimize
(97) |
Weak learner: does slightly better than randomly guessing, with loss .
We consider samples in the training set as experts.
For hypothesis , the penalty for expert is 0 if and 1 otherwise. (Intuition: we want to sample more on hard samples).
In each round, the algorithm gives a distribution over experts and obtains a hypothesis which is a weak learner for , i.e., the penalty .
The final hypothesis labels according to a majority vote over .
Analtsis:
Let be the set of training samples labeled incorrectly by the final hypothesis.
Penalty for each , then since is a majority vote,
(98) |
Let the potential function be defined by .
Then
(99) |
(100) |
Then we derive , where
8 Spectral Method
8.1 Sparsest cut
Definition 8.1 (Edge expansion).
Let be an undirected graph, and be a subset of vertices. The edge expansion of is
(101) |
where be the total degree of the vertices in .
The edge expansion of a cut is .
The edge expansion of a graph is
(102) |
Sparsest cut: Compute .
Example.
-
•
Cycle:
-
•
Clique: .
-
•
Barbell:
Let be the adjacency matrix of . Suppose is a -regular graph. The normalized Laplacian is defined as
(103) |
We compute the eigenvalues , then the Cheeger inequality states that
(104) |
Lemma 6.
Let be a symmetric matrix and be eigenvalues of . Then
(105) |
Fact. If is an eigenvector of , then
(106) |
Note that the Laplacian of a -regular matrix is .
(107) |
Let .
Theorem 8.2.
Let be a -regular, undirected graph, let be the eigenvalues of . Then
-
•
and .
-
•
if and only if has at least connected components.
-
•
if and only if at least one of the connected components of is bipartite.
Proof.
i. Trivial.
ii. If .
If , there exists a -dimensional , , for all . This means must be constant within each connected component. Therefore, the dimension of can be at most the number of connected components. The reverse direction can be proved by letting be the space of vectors that are constant within each connected component.
iii. . Note that
(108) |
Therefore,
(109) |
If , then there is a vector , with , so for all , . Let , . Then must be disconnected from . Moreover, is bipartite. ∎
8.1.1 Generalize to general graphs
where . .
Then we generalize
(110) |
8.1.2 Proof of Cheeger’s theorem
We want to prove .
The easy part: .
Note that (To fill.)
The hard part: .
Lemma 7.
Given any , there exists an algorithm to find with
(111) |
Proof.
We can solve the problem efficiently when . The problem is how should we round the solution.
Algorithm: sweep cut. Choose a threshold at random, and output . This procedure can be simply derandomized by enumerating all possible cuts.
(112) | ||||
(113) | ||||
(114) | ||||
(115) |
The first term is just the numerator. Note that .
Therefore,
(116) |
This means
(117) |
It means there exists a such that
(118) |
∎
Now we only need to show a construction of from , which satisfies , and at the same time
(119) |
Since we know . Suppose WLOG. Find the smallest such that
(120) |
and let . Let and . Then we know
(121) |
Lemma 8.
.
Proof.
First, because their numerators and their denominators . Note that is derived from .
Moreover,
(122) |
and
(123) |
Then
(124) |
which means one of and must be less than . ∎
Finally, we conclude that since
(125) |
8.2 Spectal Clustering
Problem: input a set of points with a measure of similarity . How do we cluster them?
-
•
Embed into by using eigenvectors of . Suppose are the smallest eigenvalues. We map to .
-
•
When , it is equivalent to our sweep cut algorithm.
9 Random Walk
2-SAT problem.
Start with a random assignment. At each step, choose a clause that is not satisfied, and flip a variable in the clause.
Analysis.
Consider the Hamming distance between the current assignment to the best. ( is the number of variables). The hitting time to should be .
9.1 Random walk on graphs
Let be an undirected graph.
At each time, a random walk is at some node . At time , the random walk chooses a neighbor of at random and moves to that neighbor.
A “lazy” random walk: stays at with probability .
Stationary distribution.
-
•
Does there exist one?
-
•
If exists, how long does it take to converge? (mixing time)
Let be the probability distribution at time .
(126) |
We can write this as a matrix multiplication:
(127) |
where , and is the adjacency matrix. is called the transition matrix.
Definition 9.1 (Stationary distribution).
A probability distribution over is a stationary distribution if
(128) |
We can directly write out a stationary distribution
(129) |
Definition 9.2 (ergodic).
A random walk is called ergodic if there is a distribution , such that for any initial distribution ,
(130) |
Theorem 9.3.
A random walk is ergodic if and only if it is connected and not bipartite.
Lazy random walk:
(131) |
and in matrix forms:
(132) |
Theorem 9.4.
A lazy random walk is ergodic if and only if is connected.
Proof of theorem 9.3 and 9.4 for -regular graphs..
Let be the eigenvalues of , and be the corresponding eigenvectors.
From theorem 8.2,
-
•
-
•
if is connected.
-
•
. If is not bipartite if and only if .
Let , where . Then
(133) |
If is connected and not bipartite, then , for all . Thus, as .
(134) |
For lazy random graph, when is connected. So as . ∎
9.2 Mixing time
Definition 9.5 (Mixing time).
The smallest time such that , where is the stationary distribution.
Definition 9.6 (Spectral gap).
We define the spectral gap of a graph by
(135) |
For regular random walks,
(136) |
Then
(137) | ||||
(138) | ||||
(139) |
10 Expander Graph
Definition 10.1.
A two-sided spectral expander is a graph, such that
-
•
is d-regular
-
•
-
•
Explicit: If it takes time to produce .
-
•
Strongly explicit: If it takes time to produce the neighbor of a given vertex.
Remark. For complete graphs, . For a path, .
Proposition 10.2 (Expanders are reliable networks.).
For expander graphs, if edges are removed, the largest connected component is .
10.1 Random walks on expander graphs
On a complete graph, it takes bits to represent a -steps random walk. On an expander, it takes .
10.2 Application: Pseudo-random number generation
Syooise we are given an algorithm with a constant error rate. We can decrease its error rate to exponentially small by repeating polynomial times.
Using Chernoff bound and repeat times, the error rate is . Suppose the algorithm requires random bits. The repeat-Chernoff requires random bits. However, with expanders, we reduce the number of random bits to .
Algorithm.
-
•
Choose at random.
-
•
Take steps running a random walk on to generate .
Note that a strongly explicit graph is required, as the number of vertices is exponentially large.
Expander graph on , , , ( is the adjacency).
Let be the set that the algorithm is incorrect on. . Let . Let .
. The characteristic vectors of and is defined and .
Let and . is the transition matrix. The probability that a vertex chosen according to in is
(140) |
is the probability that a walk starts at a vertex in and goes to . The probability that the walk is in at time is ( is an arbitrary subset of time in ) We want to show
(141) |
where is and otherwise.
Lemma: .
Then
(142) |
Then,
(143) |
For lemma: Let , with . Then .
(144) |
11 Hardness Assumption
11.1 Encryption
Alice sends a message to Bob. Bob should be the only person who can decode the message.
Private-key encryption
-
•
Shared key: random bit .
-
•
To send a message , Alice sends . (One time pad).
Public-Key encryption
-
•
Key generation: Generate a public key and a secret key. Publish .
-
•
Encryption: take randomness ,
-
•
Decryption: .
Definition 11.1 (Computationally indistinguishable).
We say computationally indistinguishable, if:
If there exists an efficient algorithm that distinguishes , then some underlying hardness assumption is broken.
11.2 Hardness
Definition 11.2 (Learning with errors).
The LWE problems are defined by
-
•
with .
-
•
is sampled uniformly at random,
-
•
Consider . Given , solving is easy.
-
•
Given , and .
Assumption 11.3 (Decisional LWE assumption).
Let sampled from a truncated Gaussian, and .
(145) |
11.3 Encryption scheme from LWE.
Private-key scheme:
-
•
Shared key .
-
•
To encrypt a message , pick at random and pick .
(146) -
•
Send .
-
•
To decrypt, output .
Public-key scheme:
The only difference is that we treat as the public-key, and treat as the secret key.
-
•
Randomness . Encrypt:
-
•
Decrypt : round .
Security of the Public-key scheme.
We can show is indistinguishable to uniform by the following lemma.
Lemma 9 (Leftover Hash).
For , and uniformly at random, then the following two distributions are statistically close:
(147) |
11.4 Worst-case to average-case reduction
Start with a distribution over , such that is hard.
Sample uniformly at random, .
12 Quantum Computing
Consider qubit , and .
Definition 12.1 (Quantum circuits.).
A quantum circuit is defined by a unitary matrix (), so that
(148) |
Definition 12.2 (Universal quantum gates).
-
•
Toffoli: .
-
•
Hadamard: , and .
One can check that both gates are unitary.
12.1 Efficient period finding
Simon’s problem. Input a black-box function . Promise that there exists , . How to find ?
Classical algorithms take exponential time.
Quantum speedup.
-
•
Create a uniform superposition: by .
-
•
Apply in superposition:
(149) Note that this operation is unitary (reversible) as we can subtract to recover.
-
•
Measure the second register. If we measure , the probability of is , and the remaining state
(150) -
•
Apply .
(151) Notice that if .
-
•
Sample many to get linear equations . Solve .
12.2 Reduction of factorization to period finding
Factoring: , find .
-
•
Find with .
-
•
Construct function . So if and only if . We can thus find .
Lemma 10.
If , chosen uniformly at random and , then with probability , is even and .
-
•
. Then is a nontrivial factor of .