COMP 150: Data structures, algorithms, graphs



This page is meant to be accessed only by students in comp150, or by permission of the instructor. Please keep the link private.



About videos: Low-priority edits (mainly gluing videos) need to be done for sections: 9, 12, 19, 21. They are still fine to watch in their current state.
Still entirely under construction and not yet part of the course: 6, 15, 16, 27.

All links below point to publicly available web content, or a mirror of such content, and/or have been added with permission of the owner.



Some general resources:

Sections
  1. Binomial heaps, Fibonacci heaps, Quake heaps
  2. AVL and WAVL trees
  3. Optimal BSTs (static)
  4. Splay trees
  5. Geometry of BSTs (and dynamic optimality)
  6. Skip lists (no video yet; notes to be refreshed at some point)
  7. Expected (max) depth of a randomly built BST
  8. Fractional cascading
  9. High-dimensional range counting
  10. All-pairs shortest paths
  11. Bipartite matching
  12. Network Flow
  13. Linear Programming (focusing on 2D)
  14. String matching I: the Z algorithm
  15. String matching II: edit distance (to do)
  16. String matching III: Suffix trees (no video yet; notes to be refreshed)
  17. Polygonal path simplification
  18. Spanners I: Theta graphs
  19. Spanners II: Chew's algorithm (high level description)
  20. Ramsey numbers
  21. The Probabilistic Method
  22. Planarity and crossing number
  23. Vertex coloring
  24. Edge coloring
  25. BB-alpha trees and Scapegoat trees (no video yet)
  26. Huffman codes
  27. Other


  1. Binomial heaps, Fibonacci heaps, Quake heaps

  2. AVL and WAVL trees
    • Prerequisite knowledge: BST basics (insertion, deletion, rotation). Familiarity with red-black trees is a bonus.
    • Class notes
    • Video:
      • AVL: no video available. It is fine to skip this and do WAVL instead.
      • WAVL: edited version [32min]
    • Links and resources
    • Extensions (and possible project topics)
      • Relaxed AVL trees
      • Top-down rebalancing (red-black, AVL, WAVL)
      • Deletion details (comparison of red-black and WAVL, lazy delete)
      • Amortized number of promotions in AVL
      • Amortized rotation cost in AVL
      • Description of AVL, WAVL and red-black trees as rank-balanced (see 3rd link, above)
      • Other balanced trees (not AVL)
        • Left-leaning red-black trees (Bayer 1971, Andersson 1993) wiki

  3. Optimal BSTs (restricted only to searching, all search frequencies known in advance)
    • Prerequisite knowledge: dynamic programming, and definition of expected value
    • Class notes
    • Video:
    • Links and resources
      • CLRS chapter 15.
      • Knuth's paper (see below)
      • wiki
    • Extensions (and possible project topics)
      • Knuth's quadratic-time algorithm is in the original cubic-time paper.
      • Hu and Tucker quadratic time / linear space result for zero-weight keys. See also "alphabetic trees".
      • Knuth's O(nlogn)-time improvement of Hu & Tucker. See CLRS chapter 15 end notes.
      • Mehlhorn's approximation algorithm
      • Nearly optimal BSTs: greedy algorithm
      • Static optimality theorem, and dynamic optimality conjecture (links are in the splay tree section below)
      • Lazy finger model

  4. Splay trees
  5. Geometry of BSTs (and dynamic optimality)
    • Prerequisite knowledge: It helps to understand the definition of optimal BSTs (see above).
    • Class notes:
    • Video:
      • Edited version [54min]
      • Summary of BST optimality
        Note: When looking at the summary table, bottom-left box, I used the word "conjectured" for splay trees. But the box claimed O(OPT). I have since changed that box in the course notes to say "conjectured", matching what's written on the right. Splay trees are O(OPT), where OPT is the performance of a static tree. They are only conjectured to be OPT in the context of trees that can use rotations.
    • Links
    • Extensions (and possible projects)
      • John Iacono's paper on dynamic optimality
      • wiki on Iacono's working set structure
      • Dynamic finger property, level-linked trees (see 13:00 in the MIT video above)
      • Badoiu et al. paper on unifying the working set and dynamic finger properties.
      • Bose et al. paper on layered working-set trees.
      • Combining BSTs. Among several results, this shows that, for dynamic trees, restarting each search at the root doesn't create a disadvantage compared to resuming a search where the previous one ended.
      • Followup MIT video. It shows lower bounds for number of points needed to make a grid pattern valid, and describes Tango trees.
      • wiki on Tango trees

  6. Skip lists
  7. Expected (max) depth of a randomly built BST.
    • Prerequisite knowledge: expected value, indicator random variables
    • Class notes:
    • Video
    • Links:
      • MIT videos (find Lecture 9, starting from 22:30)
      • CLRS: Chapter 12, p.299-302.
    • Extensions (and possible project topics)
      • Look up Devroye's precise bound?
      • Determine probability that a random BST has little-omega(logn) depth. Analyze variance of depth. See work by Devroye.

  8. Fractional cascading
    • Prerequisite knowledge: none
    • Class notes
    • Video:
    • Links
    • Extensions (and possible project topics)
      • Find applications. There are several in Chazelle's papers, but a more up-to-date search would be nice.
      • Dynamic version (if not done in class). Could start with the wiki.
      • Extension to general non-path like dependencies.

  9. High dimensional range counting
  10. All-pairs shortest paths
  11. Bipartite matching
    • Class notes:
    • Video:
    • Links
      • What I cover is found in the Graph Theory book by Diestel. Please ask me if you need to see a copy. There are many resources online. Please send me your favorite ones and i'll add them here.
      • Hall's theorem wiki
    • Extensions (and possible project topics)
      • Look into matchings in non-bipartite graphs, algorithms that use network flow, and algorithms that don't.

  12. Network Flow
    • Prerequisite knowledge: None, for the simple intro that we do. It's nice to look at bipartite matching before this section.
    • Class notes
    • Video:
      • Link [76min]
      • Around 1:08:25... it's actually 2M iterations, not 1M, but the point is made.
    • Links
      • CLRS covers network flow in chapter 26. There is a lot more detail, and more advanced algorithms are given as well.
      • Example showing how the Ford-Fulkerson method might not terminate or even converge, if capacities are irrational and we are not careful about how to find augmenting paths.
    • Extensions (and possible project topics)
      • Details of any algorithm (e.g. some specific implementation of Ford-Fulkerson, or something entirely different).
      • Find applications. In particular one of my sources on the probabilistic method mentions using network flow to get an algorithm (need to look this up again).

  13. Linear Programming (focusing on 2D)
    • Class notes
    • Video:
    • Links
      • The paper by Nimrod Megiddo containing the algorithm for LP in 2D (section 2). It also contains extensions and applications.
      • Another paper by Megiddo on LP for any fixed dimension.
      • See section 4 of this link for the 2D LP algorithm. See also this applet but ignore the description.
    • Further comments
      • An important lesson of this lecture is that failing but removing a constant fraction of data is usually equivalent to succeeding. Another classic example of this is linear-time median computation.
    • Extensions (and possible project topics)
      • Higher dimensional linear programming is a fundamental topic in CS. There is plenty to research here: the simplex algorithm, duality and its practical interpretation, interior point methods, integer programming, stochastic linear programming, etc.

  14. String matching (Z algorithm)
    • Class notes
    • Video:
    • Links
      • The Z algorithm is described nicely in the textbook by Gusfield. A copy of the section on Z is provided here.
      • I was initially going to do the KMP algorithm, but Gusfield's book convinced me to do Z instead. Still, the KMP and Boyer-Moore algorithms offer some advantages over Z, when the problem is extended (e.g., online matching). See Gusfield's book for an explanation (the link above might not contain it though, so ask me). There are also extensions of KMP and Boyer-Moore as well, and the book is packed with all kinds of string problems to explore.
    • Extensions (and possible project topics)
      • KMP, and in particular its advantage over Z.
      • Boyer-Moore
      • Other string matching algorithms and specialized problems.

  15. String matching (edit distance)
    • Prerequisite knowledge: Dynamic programming
    • Class notes
      • TO DO
    • Video:
      • TO DO
    • Links
    • Extensions (and possible project topics)
      • To add here once class notes are made. Gusfield's book will have several extensions.

  16. Suffix trees (more impressive string matching, and more)
  17. Polygonal path simplification.
    • Prerequisite knowledge: graph basics. BFS and DAGs are mentioned.
    • Class notes:
    • Video:
    • Links
      • The "Iterative Endpoints Fit" algorithm is also called the "Ramer-Douglas-Peucker" algorithm. See the wiki. In the references are the original two papers, plus a speed-up by Hershberger and Snoeyink.
      • I still need to find the original Iri-Imai paper (and/or other sources about it, and possible improvements).
    • Extensions (and possible project topics)
      • dynamic programming approach
      • sub-V3 algorithm for parallel strip distance, with preprocessing
      • Several other improvements exist that could be added to the list.

  18. Spanners I: Theta graphs
    • Class notes:
    • Video:
    • Links
    • Extensions (and possible project topics)
      • Explore heuristic improvements?
      • Construction of Theta graphs (see p.63-69 in the book above)
      • Other spanners. See book, especially section 20.3
      • Solve an open problem. See section 20 in the book.

  19. Spanners II: Chew's algorithm
    • Prerequisite knowledge: graph basics. Planarity and Delaunay triangulations are mentioned.
    • Class notes:
    • Video:
    • Links
      • Tech report by Paul Chew (should be equivalent to his conference publication), on the empty L1-circle spanner.
      • Link to the journal version of Chew's work, where the empty circle is actually an equilateral triangle. Improves the L1 result. For a copy of the paper, please ask.
      • wiki on geometric spanners.
      • A lower bound on the stretch factor of the Delaunay triangulation. If you attend this lecture in class, this is the "coffee bet" result that I mention (it won't be on video).
    • Extensions (and possible project topics)
      • Details of Chew's journal version
      • See previous section: there are a lot of results and open problems on spanners. Make a survey.

  20. Ramsey numbers: cliques vs independent sets
    • Prerequisite knowledge: none
    • Class notes
    • Video:
    • Links
      • Ramsey's theorem wiki
      • Look up Ramsey numbers, Ramsey's theorem, and Ramsey theory
      • Graph theory book by Diestl (or many others)
      • See section on the probabilistic method for another result on Ramsey numbers
    • Extensions (and possible project topics)
      • Find R(5,5). Ok, just kidding. But you could look into how known bounds for small or large Ramsey numbers have been obtained.

  21. The Probabilistic Method
    • Prerequisite knowledge: expected value, linearity of expectation, IRVs: everything here
    • Class notes
    • Video:
    • Links
      • Books (ask if you need to borrow)
        • An Invitation to Discrete Mathematics. Second Edition. Jiri Matousek and Jaroslav Nesetril
        • The Probabilistic Method. Noga Alon and Joel H. Spencer
        • The Probabilistic Method (Lecture Notes). Jiri Matousek and Jan Vondrak
    • Extensions (and possible project topics)
      • We have splashed around in the shallow end of the pool. Deep sea diving is available in the two books called "The probabilistic method", mentioned just above.

  22. Planarity and crossing number
    • Corequisite knowledge: for the crossing number, it helps to have understood some basics about the probabilistic method.
    • Class notes:
      • Euler's formula for planar graphs, and implications: Euler.pdf
      • Extension to non-planar graphs: crossing-number.pdf
      • Future addition: applications
    • Video:
    • Links
    • Extensions (and possible project topics)
      • Explain applications of the crossing number.
      • Polynomial-time algorithms for deciding if a graph is planar (or if it has a low crossing number)
      • Hardness proof for finding the crossing number in general
      • Algorithms for "untangling" graphs"
      • Kuratowski's theorem and similar material
      • Anything to do with planarity, its applications, specialized proofs for subclasses of graphs, cases where planar graphs permit faster algorithms than in general...
      • Look into the field of graph drawing.

  23. Vertex coloring
    • Prerequisite knowledge: Euler's formula for planar graphs (see directly above)
    • Class notes:
    • Video:
      • Edited version [30min]
        Note: this is a continuation of the Euler formula video.
    • Links
    • Extensions (and possible project topics)
      • Efficiently determining chromatic number, given the knowledge that it will be constant (but greater than 2)
      • Hardness of determining chromatic number in general
      • Three-coloring triangle-free graphs
      • Efficiently determining chromatic number for special classes of graphs.
      • Applications of vertex coloring
      • Approximations of optimal coloring

  24. Edge coloring
    • Class notes:
    • Video
    • Links
      • wiki: edge coloring
      • Vizing's theorem
        • The theorem is found in several texts on graph theory. For instance, "Graph Theory" by Reinhard Diestel. In addition, here are some links that can be found online.
        • Notes by Michele Zito, that closely match what I do in class. In fact here is another description, apparently partially inspired from the previous one.
        • A different proof from the homepage of Lex Schrijver.
        • Another proof on planet math.org.
        • This presentation uses induction on vertices and is based on "Short Proofs of Classical Theorems", by Adrian Bondy. I have a copy if anyone is interested. Generally there are some other theorems in there that might be worth taking a look at (project). The proof in the presentation is not rigorous.
    • Extensions (and possible project topics)
      • Different proofs of Vizing's theorem
      • Efficient algorithms for determining edge-chromatic number for specific classes of graphs
      • Constructively approximating optimal coloring. Algorithms that use Delta+1 colors.
      • On the duality of edge coloring and vertex coloring: These problems have the following equivalency: Given a graph G, by placing a dual vertex on every edge of G, and connecting dual vertices if their corresponding primal edges in G have a common endpoint, we obtain what is called the line graph of G. Edge-coloring G is the same as vertex-coloring its line graph. Some care should be taken when considering these issues. Not all graphs are line graphs. Given that the number of vertices and edges change, algorithmic results might not be as useful when you dualize. It would be a good exercise to prove (or even state) some of the results we obtain in class in terms of this duality. This would be a nice project topic.

  25. BB-alpha trees and Scapegoat trees
    • Class notes:
    • Video
      • None yet.
    • Links
      • BB-alpha trees
        • wiki. Contains useful links. (e.g., Nievergelt and Reingold, & Blum and Mehlhorn).
        • Analysis of height
        • amortized analysis for insertion
        • Notes by Michiel Smid. He uses both the accounting and potential methods to amortize, in full detail. An application of BB-alpha trees is mentioned on the last page. I would like to add the reference to this list.
        • Still need to resolve why we can amortize easily for alpha=1/3 but not for 1/2, even though there is the result saying alpha > 1/3 implies 1/2
      • Scapegoat trees
        • Open Data Structures by Pat Morin. See section 8. He uses the accounting method for amortizing. I use the potential method in my notes.
        • paper by Galperin and Rivest.
        • paper by Andersson.
        • wiki

  26. Huffman codes
    • Class notes:
    • Video
    • Links
      • wiki
      • CLRS discusses the basics in chapter 16.3
      • More details can be found in Introduction to Data Compression, by Khalid Sayood. I have a copy to lend, and it's easy to find online.

  27. Other topics of interest