COMP 150TP: Text Processing: Inductive Techniques and Applications
Reading Assignments
[MS] corresponds to Manning and Schutze' text
[M] corresponds to Mitchell's machine learning text
[CLR] corresponds to the algorithms textbook by Cormen, Leiserson and Rivest.
[A] corresponds to J. Allen's NLP textbook
[RN] corresponds to the AI textbook:
S. Russell and P. Norvig, Introduction to AI: a Modern Approach, 1995.
[JM] corresponds to Jurafsky and Martin's text
[Ch] corresponds to Charniak's text
-
Introductory material: [MS] Ch 1, 5.1, 5.2
-
Word sense disambiguation,
Lesk's algorithm, vector space representation, term weighting:
[MS] Ch 7.1, 7.3 (esp. 7.3.1), 8.5.1, 15.1, 15.2
- Recommended: consult SENSEVAL paper
- Probabilities, statistical Bayesian inference and the
Naive-Bayes algorithm:
[MS] Ch 2.1. Recommended: consult [CLR] 6.1-3 and [M] parts of Ch 6.
-
Additional practical aspects, evaluation measures and methodology:
WSD paper (Mooney 1996),
[MS] Ch4.1-2, Ch8.1, [M] pages 66-69, and 108-112
-
Perceptron, Winnow, Weighted majority, and spelling correction:
[MS] 16.3, [M] 7.5, Golding and Roth paper.
Recommended: read or at least skim through Littlstone's winnow paper,
and Littlestone and Warmuth weighted-majority paper.
- Decision Trees: [MS] 16.1, [M] Chapter 3
- Statistical Foundations for Hypothesis Testing: [MS] 5.3, [M] Chapter 5
- Introduction to Natural Language and its processing:
[MS] Chapter 3. Recommended:
[RN] chapter 22. Also look through [A] to get an appreciation
of more complex issues.
- Part of Speech Tagging:
[MS], SNoW paper. Recommended: TBL paper.
- Markov chain models and distance measures for
probability distributions: [MS] Ch2.2, Ch 6.
Recommended: consult [JM] and [Ch]
- Hidden Markov Models and their use:
[MS] Ch 9, and 10.2-3
Recommended: consult [Ch] Ch 3-4
- Algorithms for Probabilistic CFGs:
[MS] Ch 11
Recommended: consult [Ch] Ch 6
- Foundations for the EM Algorithm:
[MS] Sec 14.2, and HMM paper
(Rabiner, Levinson, and Sondhi)
- Evaluating and Learning Probabilistic Parsers:
[MS] Ch12.
Recommended: consult [JM] and paper by Collins (1997)
- Clustering and the EM algorithm: [MS] Ch14, [M] Section 6.12