Skip to content

Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"

Notifications You must be signed in to change notification settings

ashwincv/gt-nlp-class

 
 

Repository files navigation

CS 4650 and 7650

  • Course: Natural Language Understanding
  • Instructor: Jacob Eisenstein
  • Semester: Fall 2015
  • Time: Mondays and Wednesdays, 3:05-4:25pm
  • TAs: TBD
  • Schedule
  • Grading
  • Policies

This course gives an overview of modern statistical techniques for analyzing natural language. The rough organization is to move from shallow bag-of-words models to richer structural representations of how words interact to create meaning. At each level, we will discuss the salient linguistic phemonena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.

Learning goals

  • Acquire the fundamental linguistic concepts that are relevant to language technology. This goal will be assessed in the short homework assignments, midterm, and class participation.
  • Analyze and understand state-of-the-art algorithms and statistical techniques for reasoning about linguistic data. This goal will be assessed in the midterm, the assigned projects, and class participation.
  • Implement state-of-the-art algorithms and statistical techniques for reasoning about linguistic data. This goal will be assessed in the assigned and independent projects.
  • Adapt and apply state-of-the-art language technology to new problems and settings. This goal will be assessed in the independent project.
  • (7650 only) Read and understand current research on natural language processing. This goal will be assessed in assigned projects and classroom participation.

The assignments, readings, and schedule are subject to change, but I will try to give as much advance notice as possible.

Readings

Readings will be drawn from my notes, from published papers and tutorials, and from the following two texts:

Supplemental textbooks

These are completely optional, but might deepen your understanding of the material.

Prerequisites

The official prerequisite for CS 4650 is CS 3510/3511, "Design and Analysis of Algorithms." This prerequisite is essential because understanding natural language processing algorithms requires familiarity with dynamic programming, as well as automata and formal language theory: finite-state and context-free languages, NP-completeness, etc. While course prerequisites are not enforced for graduate students, prior exposure to analysis of algorithms is very strongly recommended.

Furthermore, this course assumes:

  • Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
  • Background in basic probability, linear algebra, and calculus.
  • Familiarity with machine learning is helpful but not assumed. Of particular relevance are linear classifiers: perceptron, naive Bayes, and logistic regression.

People sometimes want to take the course without having all of these prerequisites. Frequent cases are:

  • Junior CS students with strong programming skills but limited theoretical and mathematical background,
  • Non-CS students with strong mathematical background but limited programming experience.

Students in the first group suffer in the exam and don't understand the lectures, and students in the second group suffer in the problem sets. My advice is to get the background material first, and then take this course.

About

Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%