2015-12-01

Graph - Shortest Path Tree (SPT)



Shortest path TREE - 
given a start vertex from a graph, find the shortest path to all other vertices, that is, shortest path tree.

Shortest Path

  • Acyclic Shortest Path in DAG
  • Single source shortest path
    • Dijkstra
    • Bellman-Ford 
  • All-pairs shortest path
    • Floyed-Warshall

Acyclic Shortest Path in DAG

  • Consider vertices in topological order
  • Relax all edges pointing from that vertex

Dijkstra's

  • Consider vertices in increasing order of distance from s.
  • Add vertex to tree and relax all edges pointing from that vertex.
  • Running time: O(|E|.Tdk + |V|.Tem), where Tdk is the running time to decrease priority of a vertex in Priority-Queue, Tem is the running time to extract min value from priority queue and this is typically O(1) for different implementation of priority queue. Tdk depends on different implementation of priority queue, and worth research on that. [TODO]
  • since it consider vertex which is closet to source, so the path which the last second vertex is closest to source will be found first if ecmp exists.
  • A little discussion about step-1:
    • to select vertex in increasing order of distance from s means to greedily select closest unprocessed vertex which probably requires a priority-queue to extract closest vertex. 
    • since the distance to each vertex has been updated dynamically during the edges relaxation, so the priority-queue must support methods like increase_priority()
    • but to increase the priority of an element, one has to find it at first. how to find an element efficiently in a priority-queue?
    • a heap implemented priority-queue is not good at searching/finding which implies we need advanced data structures, such as Fibonacci-heap etc.
  • Consider how to express a path:
    • if we generate a shortest path tree, obviously there is only one path from s to other vertices. in this case, it could be stored as simply as an array with n length, each element in the array stores its previous vertex.
    • a general requirement is to store all the paths from s to t, which requires to support ecmp (equal-cost multi-path). that makes the results a graph instead of a tree. however, if the results are just stored as a graph, when retrieving a path (s, t), one has to use path finding algorithms, like dfs, to search again. as an alternative, we could improve the retrieve performance with the price of more memory consumption.

Bellman-Ford

  • Repeat V times: relax all E edges
    • Relax - test if we can improve the shortest path to a vertex v found so far by passing through another vertex
    • runs in O(|V|.|E|) time
  • works for graph with negative weights, but not negative cycles
  • run one more iteration, then we can detect if there is negative cycle.
  • also, this algorithm does not support ecmp. (actually we can view ecmp as a matter of storage or data structure instead of algorithm itself) if this case, path with minimal hops will be found first.
  • Question: if we relax the edges as different order, for a specific path (s, t), is it possible that we could have different path when ecmp exists?

Floyd-Warshall

  • a dynamic programming example solving all-pairs shortest path
  • running time O(|V|^3), suitable for dense graph; for sparse graph, run Dijkstra for every vertex could be an alternative with running time O(|V|.|E|.log(|V|) using binary-heap)
  • support edges with negative weight but no negative cycle in graph
  • recursive formula:
    • shortestPath(i, j) = min{shortestPath(i, j), shortestPath(i, k) + c(k, j) try every k }
  • pseudo code
    • initial distance matrix: d[n][n] = INF, n = |V|
    • foreach v in vertex, d[v][v] = 0
    • foreach e in edges, d[e.start][e.end] = e.weight
    • for k from 1 : N
    •     for i from 1 : N
    •         for j from 1 : N
    •             d[i][j] = min(d[i][k] + d[k][j], d[i][j])
  • path with minimal vertex number will be found first when existing ecmp.

Reference

    

2015-11-30

Some Probability Distribution

Expectation and Variance

  • expectation is the mean value of a sequence of values.      
    • E(x) = sum(1...n) / n
  • variance is the average of squared difference from mean. 
    • Var(x) = E(x - E(x))^2
  • standard deviation is the square root of variance.              
    • Sigma = sqrt(Var(x))
  • using standard deviation, we know what is normal(standard), what is extra large or small.

Uniform Distribution



  •  a and b are solvable given E(x) and V(x)

Normal (Gaussian) Distribution



 Binomial Distribution (Bernoulli Trials)

  • a sequence of independent yes/no experiment, called Bernoulli experiment
  • when n = 1, binomial distribution is Bernoulli distribution


  • e.g. 
    • toss a coin 100 times, what is the prob. the head occurs 30 times? (n = 100, k = 30, p = 1/2)


Poisson Distribution



Geometric Distribution



  • modelling the trials up to and including the first success (k = 1, 2, 3, ...)
  • e.g.
    • toss a fair coin until the first heads, E(x) = 1/p = 2, which means on average tossing 2 times we will get heads.


  • modelling the number of failures until the first success (k = 0, 1, 2, 3, ...)
  • e.g.
    • assume the success prob. of something is p, the mean number of success is: (1-p)/p.

Reference




2015-11-15

String Matching

Naive Algorithm


  • shifting pattern 1 element each mismatch
  • running time O(n.m)
  • n - the length of string
  • m - the length of pattern
  • the goal of following 3 string-matching algorithm is to achieve O(n + m) running time. 
  • Obviously, if m is small, it doesn't make much difference among those algorithms. (It's kinda of like the question about when dynamic programming is less efficient than brute-force one for knapsack problem: O(nk) >= O(2^k))

KMP Algorithm


  • idea: have to scan text, which takes O(n), but can we try to shift more than 1 position each mismatch?
  • if s[i] != p[j], that means s[i-j ... i-1] == p[0 ... j-1], if there is common substring in p[0 ... j-1], try to use it. so we do a little pre-process of the pattern to calculate the longest common substring for each substring.

Boyer-Moore-Horspool Algorithm


  • compare letter from rightmost of pattern to left (instead of comparing from left to right like KMP algorithm does)
  • what is the difference of comparing from RIGHT to LEFT? 
    • let's assume s[i] != p[j] (j = 0 ... m-1) and s[i] even does not exist in the pattern, then we can shift (j + 1) position directly. 
    • if j = m - 1, we start comparing s[i + m] with p[m - 1] without missing any match.
    • the  above case is the best one, which enable us to shift m position, the max possible value, but how about s[i] do present in the pattern? how many shift can we perform?
    • let's assume s[i] exists in pattern, and s[i] = p[k] ( k < j, since k must on the left side of j), so we can shift (j - k) position, which is going to align s[i] with p[k].
    • from the above analysis, we are considering if s[i] in pattern or not, then decide how many shift we can perform. KMP has a different way, it concerns the longest common substring of p[0...j-1], try to align the common substring with s[... i-1], and continue comparing s[i] with p[length-of-longest-substring(p[0...j-1])]
  • like KMP algorithm, we do a little pre-process about the pattern, called "bad match table", the value is how many shift we can perform from s[i], which is the start comparison point not necessarily the mismatch point.
    • to calculate the "bad match table"
      • possible shifts = length - 1 - index
      • we can see the possible shifts are getting smaller if a character is duplicate. 
      • last letter of pattern is length if not already defined

Robin-Karp Algorithm


  • this algorithm compares the hash value between the pattern and the substring of text, instead of comparing the substring itself with the pattern.
  • if we compute the hash value for each substring, then it doesn't make much difference with naive algorithm even worse, since the hash function takes O(m) running time.
  • the key idea is to achieve O(1) running time when calculating hash value for each substring from text, that is, rolling hash, it updates the new hash value instead of recomputing it.
  • then how does it work?
    • let x = (b^m-1)s[i] + (b^m-2)s[i+1] + ... + (b^0)s[i+m-1], 
    • let y = (b^m-1)s[i+1] + ... + (b^0)s[i+m]
    • then, y = (x - s[i](b^m-1)b + s[j+m], b = |alphabets|
    • h(y) = (h(x).b%q - (s[i]%q).(b^m%q) + s[j+m] %q) %q
    • b%q and (b^m)%q are constant which is pre-compute-able, that means h(y) could be updated with O(1) time.
  • single pattern searching vs. multiple pattern searching
    • single pattern searching: KMP, BMH
    • multiple pattern searching: rolling hash is a choice (e.g. Leetcode - repeated DNA sequence)

Reference




2015-11-01

Graph - Network Flow Problems

Network Flow 

  • in a finite directed graph (also called a network), each edge has a capacity and each edge receives a flow. Flow satisfies 2 constraints:
    • the amount of flow cannot exceed the capacity of the edge.
    • the amount of flow into a node equals the amount of flow out of it except source, which has only outgoing flow, or sink which has only incoming flow.
  • flow network can be used to model
    • traffic in a road system
    • circulation with demands
    • fluid in pipes
    • currents in an electrical circuit
    • etc
  • Max-Flow Problem: following network (from CLRS) has a maximum possible flow from source-0 to sink-5 is 23. how to find the max possible flow in a network?
  • Bipartite Matching Problem
    • Bipartite graph: each node has no edge for the same layer
  • Min-cost Cut Problem
    • Remove some edges from graph G such that there is no path from s to t after removing the edges (image destroy bridges with minimum cost)

Finding Max-Flow (Ford-Fulkerson Algorithm)

Ford-Fulkerson algorithm:

Start with 0 flow.
While there exists an augmenting path:
    - find an augmenting path
    - compute bottleneck capacity
    - increase flow on that path by bottleneck capacity
* this FF algorithm description is very much simplified. To implement the algorithm, one has to understand concepts like: residual graph, forward edge, backward edge.

Questions:

  • how to find an augmenting path?
  • if FF terminates, does it always compute a max-flow?
  • Does FF always terminate? If so, after how many augmentations?

Min-cost Cut

This is a variant of max-flow problem. Basically, the problem says how to remove some edges with min-cost such that no path from s to t. For the network from CLRS book, the min-cost cut as follows:

  • solution
    • max-flow min-cut theorem says: (max flow) = (min-cut). 
      • from above example, we know the max flow is 23, the min-cut is 12 + 7 + 4 = 23
      • CLRS book has proof about this theorem
      • sometimes min-cost cut also calls minimum s-t cut (s-t: source-terminal/sink)
      • similar to: (min vertex cover) = (max independent set), for a bipartite graph (Konig Theorem)
    • one can use FF algorithm to find max-flow, but how to find min-cost cut?
    • one answer is to use residual graph
      • run FF algorithm and consider the final residual graph
      • collect all reachable vertices from source (obviously, this set does not include t)
      • min-cost cut edges are those edges  from reachable vertices to unreachable vertices
  • min-cost cut problem is different with min-cut problem mentioned in Randomized Algorithm
    • randomized min-cut problem is about undirected graph, this is about directed graph
    • randomized min-cut problem is cutting minimum number of edges of G to disconnect the graph. min-cost cut is about disconnecting s-t, and is about min-cost of edges not about edge numbers.
  • min-cut problem is much harder than min-cost cut problem discussed here!

[TODO] Min-cost Max-flow 

finding a cheapest possible way of sending a mount of flow through a network.

Bipartite Matching


Reference

2015-10-31

Randomized Algorithms


  • Make random choices during execution
  • Traditional algorithms (deterministic algorithms) analyze the behavior of an algorithm on an "average" input, rather than a worst-case input.
  • Randomized algorithms do not require new assumptions about the nature of the input. It's purely internal to the algorithm. So it makes the underlying model more powerful.
  • 2 broad categories of randomized algorithms
    • Las Vegas randomized algorithms
      • use randomness to achieve good expected running times, e.g. randomized-quicksort
      • guarantee the output is always correct
    • Monte Carlo randomized algorithms
      • have a deterministic running time
      • the output is correct with a certain probability.

Randomized Divide-and-Conquer

the "divide" step is performed using randomization and use expectations of random variables to analyze the time.

Randomized-Selection

  • A more general description of finding-the-median problem is selection. Sometimes called Quick-Selection.
  • Given a set of n numbers S and a number k between 1 and n, Select(s, k) returns the kth largest element in S.
  • Basic Ideas of Select(s, k)
    • choose a splitter from s
    • partition S into S1 and S2 where {S1} < k and {S2} > k
    • if |S1| == k - 1 then splitter is the answer
    • if |S1| >= k then answer lies in S1, recursively call Select(s1, k)
    • if |S1| < k then answer lies in S2, recursively call Select(s2, k - |S1| - 1)
  • Good splitter is very important to make this algorithm running in O(n) time. And should produce each subset into approximately equal in size.
  • The deterministic algorithm
    • partition the number into n/5 group, so each group has 5 numbers
    • find the median of each group, then we have n/5 medians
    • recursively find the median from n/5 medians
    • use the median as the pivot to partition S into S1 and S2
    • recursively search either in S1 or S2
    • T(n) = n + T(n/5) + T{max(|S1|, |S2|)} = n + T(n/5) + T(7/10n) => O(n)
  • Random Splitters
    • choose a "well-centered" splitter = from S uniformly at random
    • running time is O(n), the analysis is a little complicated...
    • randomized selection (aka quick-selection) is Las Vegas randomized algorithm
  • There is a randomized find-approximate-middle algorithm
    • pick an element from A uniformly at random (what a simple algorithm!)
    • this algorithm returns a correct answer with probability 8/10. It is Monte Carlo randomized algorithm.

Randomized-Quicksort

  • improve the Quicksort performance against the worst case instance
  • the only difference between randomized-quicksort and deterministic quicksort is in picking on pivot randomness or not to partition the input sequence.
  • with randomized picking on pivot, it is expected to have better performance in worst case.
  • For instance, with n = 10,000, the probability of X < nln(n) or X > 3nln(n) is less than 0.5%. (data from my instructor's lecture notes.)

Min-cut problem

  • Problem description: Given an undirected graph G=(V, E), we define a cut of G to be a partition of V into 2 non-empty sets A and B.
  • Deterministic Algorithm for this problem:
    • use min-cost s-t cut solution for every pair of vertices
    • running time: O(V^5) = (Ford-Fulkerson Algorithm for min-cost s-t cut needs O(V.E) = O(V^3), there are possible V^2 pairs)
    • so we can see the power of randomized algorithm.
  • Introduce a concept: edge contraction
    • pickup any edge (u, v), combine u and v into u'
    • remove all edges between u and v
    • connect all remaining edges incident to u and v
  • Randomized Algorithm of Min-cut (Karger's Algorithm)
    • while the graph has more than 2 edges
      • randomly pick an edge to contract
    • return the remaining edges as min-cut
    • running time O(|V|)
  • We can test that the algorithm above is not always correct. It is a Monte Carlo randomized algorithm
  • Let's look at the probability of correct answer. 
    • Say the min-cut edges number is k, the question is how many odd the k edges will survive at last? 
    • first, we know the probability of min-cut survive at the first contraction is Pr(min-cut survive) = 1 - k/e   (e is the total number of edges of G)
    • observation-1: k <= degrees of every node
    • observation-2: (nk/2) <= e    ===> 1 - k/e >= 1 - 2/n = (n-2)/n
    • the odds that min-cut survive all n-2 contraction is: <= (n-2)/2 * ((n-1) - 2)/(n-1) * ... * 1/3 = 2/(n(n-1)) = 1/C(n, 2)
    • if we execute C(n, 2) runs, the probability of incorrect answer would be: (1 - C(n, 2))^(C(n, 2)) = 1/e = 1/2.7
    • as we can see, C(n, 2) runs still seem to result pretty high probability of incorrect answer.
    • if we execute ln(n).C(n, 2) runs, the probability of resulting incorrect answer will be 1/n.

Reference


Sorting

Comparison based sorting algorithms

  • Bubble sort
    • O(n^2), in-place, stable
  • Insertion sort
    • O(n^2), in-place, stable
    • in practical, it is faster for small set of input
    • cache friendly
  • Selection sort
    • O(n^2), in-place, stable
  • Merge sort     
    • O(nlgn) time, extra space, stable
    • This algorithm could be used for big chunk of data which is unable to fit all of them into memory, because we can load part of the data into memory and sort them, at the last, merge each partly sorted pieces.
  • Quick sort      
    • O(nlgn) time, in-place, constant space, unstable 
    • 3 ways to choose pivot
      • always pick first one: easy implement but no good performance for sorted input.
      • With randomized algorithm running time analysis, which is choosing the pivot number randomly from the input, the worst case running time O(n^2) can be improved to O(nlgn) even for sorted input.
      • however, generate random number is not cheap. that comes strategy 3: median-of-3 (left, right, middle) which is more practical and also has good performance.
  • Heap sort       
    • O(nlgn) time, in-place, stable
    • Heap is very good way for extracting max/min although heap sort is a nice application of heap
  • BST sort
    • O(nlgn) time if we keep the tree balanced

Non-comparison based sorting algorithms

  • Counting sort
    • has to know the range of input data, for instance, grading could be within 100, age within 200 etc. This is also a limitation of this algorithm.
    • need extra linear space to count each element of input
  • Bucket sort
    • chop data chunk into small pieces, then use other algorithm to sort each piece. In this sense, one could argue it is comparison based sorting algorithm probably.
    • assuming the data has uniform distribution, otherwise this method does make much sense.
    • divide-and-conquer methodology
    • degenerate to counting sort if each element has a bucket
  • Radix sort
    • sorting based on each digit and put them into queue
    • easier way to go through digit from right to left.
    • running time: O(kn), k is the digit length of input
    • if input has different length of digits, this algorithm also works. Just output those elements with shorter digits, because they were sorted earlier than with longer digits.
    • stable

 Why are there so many sorting algorithm?

  • depends on size of input (insertion sorting may not that bad for small amount of input, for N <= 20, faster than quicksort)
  • depends on data distribution (non-comparison sorting algorithm)
  • depends on memory limits (merge sort could be very useful for huge mount of input)

Reference

2015-10-18

Graph - Topological Sort, DFS, BFS

max number of edges:
  • n(n-1)/2, for undirected graph
  • n(n-1), for directed graph.

Topological Sort

  1. only make sense if it is a digraph, and also DAG (Directed Acyclic Graph)
  2. there could be more than one topological sort for a graph
  3. following is a natural way to get topological sort, besides, we can use DFS to track the finishing order, and it's the reverse order of topological sort.
  • ALGORITHM: (O(n + e) + O(n + e) = O(e), e> n generally speaking )
  • find the nodes with indegree 0 and put into Q. (O(n + e))
  • while Q is not empty
  •     n = dequeue(Q)
  •     for each edge starts from n, do                      (O(e))
  •          n.adj_node.indegree--
  •          if (n.adj_node.indgree == 0)  enqueue(Q, n.adj_node)

BFS

  1. application of BFS
    • testing Bipartite
    • Ford-Fulkerson algorithm for maximum flow problem
    • Garbage Collection (refer Cheney's algorithm and here)
    • detecting cycle (DFS also works)
    • path finding (DFS also works)
    • Prim's MST and Dijkstra's Single source shortest path use structure similar to BFS.
    • find all neighbor nodes in p2p networks, like BitTorrent
    • find people within a given distance 'k' from a person in social networking websites
bfs(G, s)         // (O(n + e))
    unmark all vertices
    v = s           // start from here
    do
        if v is marked, continue
        unmark v
        put v.adj_node into Q
        while Q is not empty
            u = dequeue(Q)
            if u is not marked, do
            mark u and enqueue(Q, u)
    while existing unmarked nodes

DFS

  1. more broadly used than bfs, since it gives more information
  2. differ with BFS which Q, it uses stack to keep the back tracking info
  3. two useful thing we can track: the start time and the finishing time, we could do that by just putting its parent node info together into the stack
  4. applications
    • topological sort is the reverse of finishing time 
    • detecting cycle in a graph is another application of dfs
    • for unweighted graph, dfs travel produces Minimum Spanning Tree and All-Pair-Shortest-Path.
    • find strong connected graph: every vertex is reachable from every vertex.
dfs(G, s)
    mark all vertices as unvisited
    push s into stack S
    while S is not empty
        u = pop(S)
        unmark u
        for all edges (u, v)
            if (v is not visited) push(S, v)

Testing Bipartite (Application of BFS)

Def. An undirected graph G is bipartite if the nodes can be colored blue or white such that every edge has one blue and one white end.
Corollary. A graph is bipartite iff it contains no odd length cycle. That is, no edge between nodes of the same layer, which makes BFS a good application for testing bipartite.

Strongly Connected Components (Application of DFS)

Definition

Given a digraph G=(V, E), a Strongly Connected Components is the subgraph in which any vertex can reach from each other.

Observation

Graph G and its transpose graph have the same Strongly Connected Components. So if vertices u and v are reachable from G iff reachable from its transpose graph.

Algorithm

  • DFS(G) computes finishing time for each vertices and produces dfs forest
  • GT (transpose G: reverse all edges in G)
  • DFS(GT ), but order the vertices in decreasing order of finish time
  • the vertices in each DFS tree in the forest of DFS(GT ) is a Strongly Connected Component.
  • example (from CLRS book)
  • why must do dfs from transpose graph? Consider only nodes {c,d,g} for above graph, if don't dfs to transport graph, and picked up c to do dfs, we get {c,d,g} is a strong connected graph, but it is not.

Reference