COMP 150-ALG reading material

COMP 150: Data structures, algorithms, graphs

This page is meant to be accessed only by students in comp150, or by permission of the instructor. Please keep the link private.

About videos: Low-priority edits (mainly gluing videos) need to be done for sections: 9, 12, 19, 21. They are still fine to watch in their current state.
Still entirely under construction and not yet part of the course: 6, 15, 16, 27.

All links below point to publicly available web content, or a mirror of such content, and/or have been added with permission of the owner.

Some general resources:

This book by Goodrich and Tamassia. I have a copy I can lend.
Awesome notes on algorithms by Jeff Erickson.
For strings, see the book by Gusfield. For graphs, try Diestl. For the probabilistic method, see Alon and Spencer, or Matousek's work (with Nesetril or Vondrak). For spanners, see Narasimhan and Smid. Details are given in the corresponding sections below. If you need a copy of any of these, let me know.

Sections

Binomial heaps, Fibonacci heaps, Quake heaps
AVL and WAVL trees
Optimal BSTs (static)
Splay trees
Geometry of BSTs (and dynamic optimality)
Skip lists (no video yet; notes to be refreshed at some point)
Expected (max) depth of a randomly built BST
Fractional cascading
High-dimensional range counting
All-pairs shortest paths
Bipartite matching
Network Flow
Linear Programming (focusing on 2D)
String matching I: the Z algorithm
String matching II: edit distance (to do)
String matching III: Suffix trees (no video yet; notes to be refreshed)
Polygonal path simplification
Spanners I: Theta graphs
Spanners II: Chew's algorithm (high level description)
Ramsey numbers
The Probabilistic Method
Planarity and crossing number
Vertex coloring
Edge coloring
BB-alpha trees and Scapegoat trees (no video yet)
Huffman codes
Other

Binomial heaps, Fibonacci heaps, Quake heaps
- Prerequisite knowledge: Regular heaps. Also, amortization (mainly for Fibonacci heaps)
- Class notes:
- Video:
  - Binomial heaps: Edited version [26min]
  - Fibonacci heaps: Edited version [72min]
  - There is no video for Quake heaps. They were covered in class in Fall 2017.
- Links
  - Binomial heaps
    - Just a note: they are covered in CLRS, 2nd edition, not 3rd.
  - Fibonacci heaps
    - Found in both the 2nd and 3rd edition of CLRS. But instead I recommend Jeff Erickson's notes. Most of my lecture is based on this order of describing things, as opposed to CLRS.
    - MIT 6.854J Lecture Notes, 2005.
    - Fredman, Tarjan 1987 paper.
  - Quake heaps
    - Timothy Chan's paper
- Extensions (and possible project topics) -- for links, see wiki on Fibonacci heaps or binomial heaps
  - (Rank)-Pairing heaps
  - Brodal heaps
  - Strict Fibonacci heaps
  - Skew binomial heaps
  - Weak heaps
  - Soft heaps (see the end of this presentation)
  - Violation heaps
  - In general, find applications of any special heap structure.

AVL and WAVL trees
- Prerequisite knowledge: BST basics (insertion, deletion, rotation). Familiarity with red-black trees is a bonus.
- Class notes
  - AVL.pdf (planning to clarify and simplify the end)
  - WAVL.pdf
- Video:
  - AVL: no video available. It is fine to skip this and do WAVL instead.
  - WAVL: edited version [32min]
- Links and resources
  - wiki on WAVL
  - Textbook by Goodrich and Tamassia (see top of page)
  - Haeupler, Sen, Tarjan: Rank-balanced trees. Conf: (WADS'09). Journal: (TAlg'15).
- Extensions (and possible project topics)
  - Relaxed AVL trees
  - Top-down rebalancing (red-black, AVL, WAVL)
  - Deletion details (comparison of red-black and WAVL, lazy delete)
  - Amortized number of promotions in AVL
  - Amortized rotation cost in AVL
  - Description of AVL, WAVL and red-black trees as rank-balanced (see 3rd link, above)
  - Other balanced trees (not AVL)
    - Left-leaning red-black trees (Bayer 1971, Andersson 1993) wiki
Optimal BSTs (restricted only to searching, all search frequencies known in advance)
- Prerequisite knowledge: dynamic programming, and definition of expected value
- Class notes
  - optimal-BST.pdf
- Video:
  - Edited version [47min]
- Links and resources
  - CLRS chapter 15.
  - Knuth's paper (see below)
  - wiki
- Extensions (and possible project topics)
  - Knuth's quadratic-time algorithm is in the original cubic-time paper.
  - Hu and Tucker quadratic time / linear space result for zero-weight keys. See also "alphabetic trees".
  - Knuth's O(nlogn)-time improvement of Hu & Tucker. See CLRS chapter 15 end notes.
  - Mehlhorn's approximation algorithm
  - Nearly optimal BSTs: greedy algorithm
  - Static optimality theorem, and dynamic optimality conjecture (links are in the splay tree section below)
  - Lazy finger model
Splay trees
- Prerequisite knowledge: amortization.
- Class notes:
  - splay.pdf
- Video:
  - Edited version [53min]
- Links
  - wiki on Splay trees
  - wiki on dynamic optimality in general
- Extensions (and possible project topics)
  - Precursor to splay trees (Allen and Munro 1978), rebalances by rotating up.
  - Randomized splay trees
  - splaysort
Geometry of BSTs (and dynamic optimality)
- Prerequisite knowledge: It helps to understand the definition of optimal BSTs (see above).
- Class notes:
  - geomBST.pdf and BST optimality summary page
- Video:
  - Edited version [54min]
  - Summary of BST optimality
    Note: When looking at the summary table, bottom-left box, I used the word "conjectured" for splay trees. But the box claimed O(OPT). I have since changed that box in the course notes to say "conjectured", matching what's written on the right. Splay trees are O(OPT), where OPT is the performance of a static tree. They are only conjectured to be OPT in the context of trees that can use rotations.
- Links
  - wiki on treaps and a way of constructing randomized BSTs
  - Paper by Demaine, Harmon, Iacono, Kane, Patrascu
  - MIT 6.851 Lecture Notes/Video, 2012 (contains a description of the big picture for BST optimality, and (after 35:00) geometry of BSTs. Doesn't cover splay trees that much)
  - Geometry of BSTs.
- Extensions (and possible projects)
  - John Iacono's paper on dynamic optimality
  - wiki on Iacono's working set structure
  - Dynamic finger property, level-linked trees (see 13:00 in the MIT video above)
  - Badoiu et al. paper on unifying the working set and dynamic finger properties.
  - Bose et al. paper on layered working-set trees.
  - Combining BSTs. Among several results, this shows that, for dynamic trees, restarting each search at the root doesn't create a disadvantage compared to resuming a search where the previous one ended.
  - Followup MIT video. It shows lower bounds for number of points needed to make a grid pattern valid, and describes Tango trees.
  - wiki on Tango trees
Skip lists
- Prerequisite knowledge: discrete probability (events, coin flipping; see my discrete math notes on probability)
- Class notes: NEED UPDATING
  - full slideshow and condensed (from COMP160 - Fall 2014)
- Video:
  - TO DO
- Links:
  - MIT videos (find Lecture 12)
Expected (max) depth of a randomly built BST.
- Prerequisite knowledge: expected value, indicator random variables
- Class notes:
  - randomBST.pdf
- Video
  - Part 1: warmup [3min]
  - Part 2: Jensen's inequality (edited) [25min]
  - Part 3: Getting the bound (edited) [15min]
- Links:
  - MIT videos (find Lecture 9, starting from 22:30)
  - CLRS: Chapter 12, p.299-302.
- Extensions (and possible project topics)
  - Look up Devroye's precise bound?
  - Determine probability that a random BST has little-omega(logn) depth. Analyze variance of depth. See work by Devroye.
Fractional cascading
- Prerequisite knowledge: none
- Class notes
  - fractional-cascading.pdf (Might eventually add in a simple application)
- Video:
  - Edited version [28min]
- Links
  - Chazelle & Guibas, 1986
- Extensions (and possible project topics)
  - Find applications. There are several in Chazelle's papers, but a more up-to-date search would be nice.
  - Dynamic version (if not done in class). Could start with the wiki.
  - Extension to general non-path like dependencies.
High dimensional range counting
- Prerequisite knowledge: 1D range counting. Fractional cascading is related to a speedup for range trees, but is not essential to understand the basic idea.
- Class notes
  - kd-trees.pdf
  - range-trees.pdf
- Video:
  - Range trees (new version) [29min]
  - kd trees:
    - Edited version [26min]
- Links
  - Notes by David Mount
- Extensions (and possible project topics)
  - Dynamic range trees
  - Dynamic kd-trees.
  - Other applications of these trees.
All-pairs shortest paths
- Prerequisite knowledge: single-source shortest paths, dynamic programming
- Class notes
  - all-pairs.pdf
- Video:
  - Edited version [55min]
- Links (for APSP and also SSSP)
  - CLRS covers APSP quite well in chapter 25. See chapter notes for tons of advanced results.
  - I enjoyed Jeff Erickson's notes.
  - Floyd-Warshall algorithm wiki
  - Johnson's algorithm wiki
  - wiki for the APSP problem.
  - Various papers (could add many more)
Bipartite matching
- Class notes:
  - bipartite-matching.pdf
- Video:
  - Edited version [20min]
- Links
  - What I cover is found in the Graph Theory book by Diestel. Please ask me if you need to see a copy. There are many resources online. Please send me your favorite ones and i'll add them here.
  - Hall's theorem wiki
- Extensions (and possible project topics)
  - Look into matchings in non-bipartite graphs, algorithms that use network flow, and algorithms that don't.
Network Flow
- Prerequisite knowledge: None, for the simple intro that we do. It's nice to look at bipartite matching before this section.
- Class notes
  - network-flow.pdf
- Video:
  - Link [76min]
  - Around 1:08:25... it's actually 2M iterations, not 1M, but the point is made.
- Links
  - CLRS covers network flow in chapter 26. There is a lot more detail, and more advanced algorithms are given as well.
  - Example showing how the Ford-Fulkerson method might not terminate or even converge, if capacities are irrational and we are not careful about how to find augmenting paths.
- Extensions (and possible project topics)
  - Details of any algorithm (e.g. some specific implementation of Ford-Fulkerson, or something entirely different).
  - Find applications. In particular one of my sources on the probabilistic method mentions using network flow to get an algorithm (need to look this up again).
Linear Programming (focusing on 2D)
- Class notes
  - LP.pdf
- Video:
  - Edited version [31min]
- Links
  - The paper by Nimrod Megiddo containing the algorithm for LP in 2D (section 2). It also contains extensions and applications.
  - Another paper by Megiddo on LP for any fixed dimension.
  - See section 4 of this link for the 2D LP algorithm. See also this applet but ignore the description.
- Further comments
  - An important lesson of this lecture is that failing but removing a constant fraction of data is usually equivalent to succeeding. Another classic example of this is linear-time median computation.
- Extensions (and possible project topics)
  - Higher dimensional linear programming is a fundamental topic in CS. There is plenty to research here: the simplex algorithm, duality and its practical interpretation, interior point methods, integer programming, stochastic linear programming, etc.
String matching (Z algorithm)
- Class notes
  - Z.pdf
- Video:
  - Edited version [44min]
- Links
  - The Z algorithm is described nicely in the textbook by Gusfield. A copy of the section on Z is provided here.
  - I was initially going to do the KMP algorithm, but Gusfield's book convinced me to do Z instead. Still, the KMP and Boyer-Moore algorithms offer some advantages over Z, when the problem is extended (e.g., online matching). See Gusfield's book for an explanation (the link above might not contain it though, so ask me). There are also extensions of KMP and Boyer-Moore as well, and the book is packed with all kinds of string problems to explore.
- Extensions (and possible project topics)
  - KMP, and in particular its advantage over Z.
  - Boyer-Moore
  - Other string matching algorithms and specialized problems.
String matching (edit distance)
- Prerequisite knowledge: Dynamic programming
- Class notes
  - TO DO
- Video:
  - TO DO
- Links
  - wiki
- Extensions (and possible project topics)
  - To add here once class notes are made. Gusfield's book will have several extensions.
Suffix trees (more impressive string matching, and more)
- Class notes
  - 260-suffix-trees.pdf (TO BE UPDATED)
- Video:
  - TO DO.
- Links
  - Paper by Esko Ukkonen.
  - Primary source used for our class: Chapters 5 and 6 of Gusfield's book.
  - wiki: suffix tree
  - wiki: Ukkonen's algorithm
- Extensions (and possible project topics)
  - Other linear-time suffix tree algorithms
  - Suffix arrays
  - Other applications of suffix trees, besides lightning fast pattern matching. Gusfield's book has a chapter or two on this.
Polygonal path simplification.
- Prerequisite knowledge: graph basics. BFS and DAGs are mentioned.
- Class notes:
  - chain-approx.pdf
- Video:
  - Edited version [35min]
- Links
  - The "Iterative Endpoints Fit" algorithm is also called the "Ramer-Douglas-Peucker" algorithm. See the wiki. In the references are the original two papers, plus a speed-up by Hershberger and Snoeyink.
  - I still need to find the original Iri-Imai paper (and/or other sources about it, and possible improvements).
- Extensions (and possible project topics)
  - dynamic programming approach
  - sub-V³ algorithm for parallel strip distance, with preprocessing
  - Several other improvements exist that could be added to the list.
Spanners I: Theta graphs
- Class notes:
  - theta.pdf
- Video:
  - Edited version [32min]
- Links
  - wiki on Theta graphs
  - wiki on Yao graphs which are quite similar.
  - Chapter 4, Geometric Spanner Networks, by Narasimhan and Smid. If you can't find the book, ask me.
- Extensions (and possible project topics)
  - Explore heuristic improvements?
  - Construction of Theta graphs (see p.63-69 in the book above)
  - Other spanners. See book, especially section 20.3
  - Solve an open problem. See section 20 in the book.
Spanners II: Chew's algorithm
- Prerequisite knowledge: graph basics. Planarity and Delaunay triangulations are mentioned.
- Class notes:
  - chew.pdf
- Video:
  - Part 1A, Part 1B, Part 2. [55min]
- Links
  - Tech report by Paul Chew (should be equivalent to his conference publication), on the empty L1-circle spanner.
  - Link to the journal version of Chew's work, where the empty circle is actually an equilateral triangle. Improves the L1 result. For a copy of the paper, please ask.
  - wiki on geometric spanners.
  - A lower bound on the stretch factor of the Delaunay triangulation. If you attend this lecture in class, this is the "coffee bet" result that I mention (it won't be on video).
- Extensions (and possible project topics)
  - Details of Chew's journal version
  - See previous section: there are a lot of results and open problems on spanners. Make a survey.
Ramsey numbers: cliques vs independent sets
- Prerequisite knowledge: none
- Class notes
  - ramsey.pdf
- Video:
  - A coloring game [edited, 4min]
  - Ramsey numbers [edited, 29min]
- Links
  - Ramsey's theorem wiki
  - Look up Ramsey numbers, Ramsey's theorem, and Ramsey theory
  - Graph theory book by Diestl (or many others)
  - See section on the probabilistic method for another result on Ramsey numbers
- Extensions (and possible project topics)
  - Find R(5,5). Ok, just kidding. But you could look into how known bounds for small or large Ramsey numbers have been obtained.
The Probabilistic Method
- Prerequisite knowledge: expected value, linearity of expectation, IRVs: everything here
- Class notes
  - probabilistic-method.pdf
- Video:
  - Covering points with disks: edited version [11min]
  - Ramsey number lower bound: edited version [18min]
  - Tournaments: [44min] part 1, part 2, part 3, part 4, part 5.
  - Large bipartite subgraph [11min]
  - Large independent set: [14min] part 1, part 2
  - Large dominating set: Edited version [14min]
  - Hypergraph coloring: There is no video yet, but the notes are quite simple.
- Links
  - Books (ask if you need to borrow)
    - An Invitation to Discrete Mathematics. Second Edition. Jiri Matousek and Jaroslav Nesetril
    - The Probabilistic Method. Noga Alon and Joel H. Spencer
    - The Probabilistic Method (Lecture Notes). Jiri Matousek and Jan Vondrak
- Extensions (and possible project topics)
  - We have splashed around in the shallow end of the pool. Deep sea diving is available in the two books called "The probabilistic method", mentioned just above.
Planarity and crossing number
- Corequisite knowledge: for the crossing number, it helps to have understood some basics about the probabilistic method.
- Class notes:
  - Euler's formula for planar graphs, and implications: Euler.pdf
  - Extension to non-planar graphs: crossing-number.pdf
  - Future addition: applications
- Video:
  - Euler's formula [edited version, 28min]
  - Crossing number [edited version, 25min]
- Links
  - wiki: 4 color theorem
  - Notes on Kuratowski's theorem (why any graph not containing K₅ or K_3,3 as a subgraph -- or as a minor -- must be planar). The class notes only show why a graph containing one of those subgraphs is non-planar.
  - The crossing number inequality
    - wiki
    - Notes by Michiel Smid
    - Notes by Terry Tao
    - Applications: k-sets (see the wiki or the paper by Tamal Dey), and the Szemeredi-Trotter theorem on point-line incidences (wiki).
- Extensions (and possible project topics)
  - Explain applications of the crossing number.
  - Polynomial-time algorithms for deciding if a graph is planar (or if it has a low crossing number)
  - Hardness proof for finding the crossing number in general
  - Algorithms for "untangling" graphs"
  - Kuratowski's theorem and similar material
  - Anything to do with planarity, its applications, specialized proofs for subclasses of graphs, cases where planar graphs permit faster algorithms than in general...
  - Look into the field of graph drawing.
Vertex coloring
- Prerequisite knowledge: Euler's formula for planar graphs (see directly above)
- Class notes:
  - vertex-coloring.pdf
- Video:
  - Edited version [30min]
    Note: this is a continuation of the Euler formula video.
- Links
  - wiki: Grotzsch's theorem
  - wiki: Hadwiger-Nelson problem
- Extensions (and possible project topics)
  - Efficiently determining chromatic number, given the knowledge that it will be constant (but greater than 2)
  - Hardness of determining chromatic number in general
  - Three-coloring triangle-free graphs
  - Efficiently determining chromatic number for special classes of graphs.
  - Applications of vertex coloring
  - Approximations of optimal coloring
Edge coloring
- Class notes:
  - edge-coloring.pdf
- Video
  - Edited version [68min]
- Links
  - wiki: edge coloring
  - Vizing's theorem
    - The theorem is found in several texts on graph theory. For instance, "Graph Theory" by Reinhard Diestel. In addition, here are some links that can be found online.
    - Notes by Michele Zito, that closely match what I do in class. In fact here is another description, apparently partially inspired from the previous one.
    - A different proof from the homepage of Lex Schrijver.
    - Another proof on planet math.org.
    - This presentation uses induction on vertices and is based on "Short Proofs of Classical Theorems", by Adrian Bondy. I have a copy if anyone is interested. Generally there are some other theorems in there that might be worth taking a look at (project). The proof in the presentation is not rigorous.
- Extensions (and possible project topics)
  - Different proofs of Vizing's theorem
  - Efficient algorithms for determining edge-chromatic number for specific classes of graphs
  - Constructively approximating optimal coloring. Algorithms that use Delta+1 colors.
  - On the duality of edge coloring and vertex coloring: These problems have the following equivalency: Given a graph G, by placing a dual vertex on every edge of G, and connecting dual vertices if their corresponding primal edges in G have a common endpoint, we obtain what is called the line graph of G. Edge-coloring G is the same as vertex-coloring its line graph. Some care should be taken when considering these issues. Not all graphs are line graphs. Given that the number of vertices and edges change, algorithmic results might not be as useful when you dualize. It would be a good exercise to prove (or even state) some of the results we obtain in class in terms of this duality. This would be a nice project topic.
BB-alpha trees and Scapegoat trees
- Class notes:
  - BB-alpha-scapegoat.pdf
- Video
  - None yet.
- Links
  - BB-alpha trees
    - wiki. Contains useful links. (e.g., Nievergelt and Reingold, & Blum and Mehlhorn).
    - Analysis of height
    - amortized analysis for insertion
    - Notes by Michiel Smid. He uses both the accounting and potential methods to amortize, in full detail. An application of BB-alpha trees is mentioned on the last page. I would like to add the reference to this list.
    - Still need to resolve why we can amortize easily for alpha=1/3 but not for 1/2, even though there is the result saying alpha > 1/3 implies 1/2
  - Scapegoat trees
    - Open Data Structures by Pat Morin. See section 8. He uses the accounting method for amortizing. I use the potential method in my notes.
    - paper by Galperin and Rivest.
    - paper by Andersson.
    - wiki
Huffman codes
- Class notes:
  - compression.pdf
- Video
  - Fall'18 lecture [73min]
- Links
  - wiki
  - CLRS discusses the basics in chapter 16.3
  - More details can be found in Introduction to Data Compression, by Khalid Sayood. I have a copy to lend, and it's easy to find online.
Other topics of interest
- MST verification algorithm by Valerie King
  - pdf ... (better copy?)
  - slides
- Dynamic graph connectivity
  - arXiv
- Heavy path decomposition
  - wiki (see applications)
  - Link-cut trees
    - wiki