CS 4650 and CS 7650 will meet jointly, on Tuesdays and Thursdays from 3:05 - 4:25PM, in College of Computing 101.
This is a (permanently) provisional schedule. Readings, notes, slides, and homework will change. Readings and homeworks are final at the time of the class before they are due (e.g., thursdays readings are final on the preceding tuesday); problem sets are final on the day they are "out." Please check for updates until then.
- History of NLP and modern applications. Review of probability.
- Reading: Chapter 1 of Linguistic Fundamentals for NLP. You should be able to access this PDF for free from a Georgia Tech computer.
- Optional reading: Functional programming in Python. The scaffolding code in this class will make heavy use of Python's functional programming features, such as iterators, generators, list comprehensions, and lambda expressions. If you haven't seen much of this style of programming before, it will be helpful for you to read up on it before getting started with the problem sets.
- Optional reading: Section 2.1 of Foundations of Statistical NLP. A PDF version is accessible through the GT library.
- Optional reading includes these other reviews of probability.
- Project 0 out
- Slides
- Bag-of-words models, naive Bayes, and sentiment analysis.
- Homework 1 due
- Reading: my notes, chapter 3.
- Optional readings: Sentiment analysis and opinion mining, especially parts 1, 2, 4.1-4.3, and 7; Chapters 0-0.3, 1-1.2 of LXMLS lab guide
- Slides
- Discriminative classifiers: perceptron and passive-aggressive learning; word-sense disambiguation.
- Problem set 0 due
- Problem set 1a out
- Reading: my notes, chapter 5-5.2.
- Optional supplementary reading: Parts 4-7 of log-linear models; survey on word sense disambiguation
- Optional advanced reading: adagrad; passive-aggressive learning
- Slides
- Logistic regression and online learning
- Homework 2 due
- Reading: my notes, chapter 5.3-5.6.
- Optional supplementary reading: Parts 4-7 of log-linear models
- Slides
- Problem set 1a due on September 3 at 3pm
- Problem set 1b out on September 3 at 3pm
- Reading: Expectation maximization chapter by Michael Collins
- Optional supplementary reading: Tutorial on EM
- Optional advanced reading: Nigam et al; Word sense clustering
- Demo: Word sense clustering with EM
- Slides
- N-grams, smoothing, speech recognition
- Reading: Language modeling
- Homework 3 due
- Optional advanced reading: An empirical study of smoothing techniques for language models, especially sections 2.7 and 3 on Kneser-Ney smoothing; A hierarchical Bayesian language model based on Pitman-Yor processes (requires some machine learning background)
- Slides
- Demo
- Problem set 1b due on September 10 at 3pm
- Reading: Knight and May (section 1-3)
- Supplemental reading: my notes, chapter 10-10.3; Jurafsky and Martin chapter 2.
- Slides on morphology
- Transduction and composition, edit distance
- Homework 4 due
- Reading: Chapter 2 of Linguistic Fundamentals for NLP
- Reading: my notes, chapter 10.4- (not done yet)
- Optional reading: OpenFST slides.
- More formal additional reading: Weighted Finite-State Transducers in speech recognition
- Slides
- Part-of-speech tags, hidden Markov models.
- Problem set 2a out
- Reading: Bender chapter 6
- Reading: my notes, chapters 11 and 12.
- Optional reading: Tagging problems and hidden Markov models
- Slides
- Viterbi, the Forward algorithm, and B-I-O encoding.
- Homework 5 due
- Reading: Conditional random fields
- Optional reading: CRF tutorial; Discriminative training of HMMs
- Discriminative structure prediction, conditional random fields, and the forward-backward algorithm.
- Problem set 2a due
- Problem set 2b out (September 24)
- Reading: Forward-backward
- Optional reading: Two decades of unsupervised POS tagging: how far have we come?
- Context-free grammars; constituency; parsing
- Homework 6 due
- Reading: Probabilistic context-free grammars
- Optional reading: My notes, chapter 13.
- Slides on parsing
- Problem set 2b due (October 1, 5pm)
- Reading: my notes, chapter 14.
- Optional reading: Eisner algorithm worksheet; Characterizing the errors of data-driven dependency parsing models; Short textbook on dependency parsing, PDF should be free from a GT computer.
- Slides on dependency parsing
- The always useful language log on non-projectivity in dependency parsing.
- Homework 7 due
- Reading: Lexicalized PCFGs
- Reading: my notes, sections 13.13 and 13.14
- Slides
- Optional reading: Accurate unlexicalized parsing
- Problem set 3 out
- Mostly CCG, but a little about L-TAG and and HPSG.
- Homework 8 due
- Reading: Intro to CCG;
- Slides
- Optional reading: The inside-outside algorithm; Corpus-based induction of linguistic structure; Much more about CCG; LTAG; Probabilistic disambiguation models for wide-coverage HPSG
- Homework 9 due
- Reading: Manning: Intro to Formal Computational Semantics
- Optional reading: Learning to map sentences to logical form;
- Slides
- Frame semantics, and semantic role labeling.
- Homework 10 due
- Problem set 3 due
- Reading: Gildea and Jurafsky sections 1-3; Banarescu et al sections 1-4
- Optional reading: SRL via ILP; Syntactic parsing in SRL; AMR parsing
- Optional video
- Slides
- Notes on Integer Linear Programming for SRL
- Vector semantics, latent semantic indexing, neural word embeddings
- Problem set 4 out
- Reading: Vector-space models, sections 1, 2, 4-4.4, 6
- Optional: my notes, chapter 15; python coding tutorial for word2vec word embeddings
- Slides
- Knowing who's on first.
- Homework 11 due
- Reading: my notes, chapter 16; Multi-pass sieve (good coverage of linguistic features that bear on coreference)
- Optional reading: Large-scale multi-document coreference, Easy victories and uphill battles (a straightforward machine learning approach to coreference)
- Slides
- Coherence; speech acts, discourse connectives
- Homework 12 due
- Reading: Discourse structure and language technology
- Optional: Modeling local coherence; Sentence-level discourse parsing
- Slides
- Rhetorical structure theory, Penn Discourse Treebank
- Reading: Analysis of discourse structure...
- Problem set 4 due
- Learning from the wrong data
- Reading: my notes, chapter 17.
- Optional reading: Jerry Zhu's survey; Jerry Zhu's book
- Slides
- Independent project proposal due on November 16 at 2pm.
- Homework 13 due
- Reading: Collins, IBM models 1 and 2
- Optional Reading: Chiang, Intro to Synchronous Grammars; Lopez, Statistical machine translation
- Work in teams on final project, drop-in with Prof and TA
- See here
- Initial result submissions due December 1 at 5pm.
- Homework 14 due
- Optional reading: Semantic compositionality through recursive matrix-vector spaces; Vector-based models of semantic composition
- See here
- December 5: Initial project report due at 5PM
- December 11: Final project report due at 5PM