Computational linguistics (CL) is the science of doing what linguists do with language, but using computers. Natural language processing (NLP) is the engineering discipline of doing what people do with language, but using computers. We'll cover both, though the emphasis is on NLP. We will largely focus on machine learning-based approaches to a wide variety of challenging problems in NLP, with an emphasis on recent deep learning-based techniques. Class time and readings will focus on techniques; homeworks will largely focus on using NLP techniques to address socially relevant problems. A focus throughout the course will be on bias and fairness in machine learning systems.
- Basic Course Information
- Prerequisites
- Coursework and Grading
- Course Project
- Class Policies
- Course Schedule
Instructor | Hal Daumé III (he/him) | |
When | T/R 3:30pm-4:45pm | |
Where | IRB 1116 | |
TAs |
Kianté Brantley (he/him) Trista Cao (she/her) Amr Sharaf (he/him) |
|
Discussion & Homework |
ELMS | |
Office Hours |
Hal: Thr 1:45p-2:30p, IRB 4150 Trista: Mon 4:00p-5:00p, IRB 4th floor, in front of 4105 Amr: Wed 10:00a-11:00a, IRB 4th floor, in front of 4105 |
|
The required prerequisite for this course in an undergraduate AI course, though a machine learning course, an algorithms course, or LING 689/889 (Computational Psycholinguistics) should be sufficient. In particular, you should be able to:
- Program in python
- Use core unix commands (backgound)
- Function with foundational probability and statistics (background)
- Apply essential linear algebra (background)
- Implement and understand central machine learning techniques (e.g., CIML chapters 1-5 and 7)
If you cannot handle all of these things (and cannot pick them up quickly), you should expect to run into challenges in the course.
The components of grading are:
- Homework assignments (7% each, 35% total)
- Course project (35% total)
- Early exam (10%) and late exam (15%)
- In-class/elms participation (5%)
Final class grades will be assigned based on the following mapping, possibly with thresholds adjusted down (but never adjusted up):
Score | Grade | Score | Grade | Score | Grade | ||
---|---|---|---|---|---|---|---|
>=94 | A | >= 90 | A- | ||||
>=87 | B+ | >=84 | B | >= 80 | B- | ||
>=77 | C+ | >=74 | C | >= 70 | C- | ||
>=67 | D+ | >=64 | D | >= 60 | D- |
During this course, you will have five homework assignments that include both programming and written aspects. The written aspects are largely designed to help you do the programming more efficiently, by working through some of the details of what you will implement. These assignments are to be completed individually, and will be graded individually (see "collaboration policy" below). The goal of these assignments are to ensure that you learn and can implement standard NLP techniques, and understand and process language data effectively. These are:
- HW1: Distributional semantics and text categorization (no code hand-in)
- HW2: Neural networks and word embeddings
- HW3: Data collection and evaluation
- HW4+5: Sequence labeling and encoder-decoder models with attention (worth 2 assignments)
You will also complete one, large, course project, in teams of 4-8 students (exceptions are possible). The goal of this project is to enable you to work on a more significant, potentially impactful, project dealing with natural language. See the course project description for more information.
Participation: You are to participate actively in class or in the online discussions. If you participate online, every question you answer well will get you 1% credit (marked by "instructor approved answer"); every question you ask will get you 0.5% credit (marked as "good question" by the instructor or a TA). Asking/answering questions in class counts the same.
Late-ness: In general, nothing may be handed in late without prior approval. However, every student may use one "stuff happens" card for one homework deadline, and every team may use one "stuff happens" card for one project deadline. These cards give you an additional 48 extra hours at no penalty in grade.
Score adjustments: Everyone makes mistakes, including us on grading. If you handed something in and do not get a score for an assignment, or if you believe there is an error in grading (either a homework or exam or project), you may raise this issue with us within one week of when we hand back grades.
A substantial portion of your coursework is a team-based project. You will work in teams. We highly recommend interdisciplinary teams are; and because diverse teams often produce better outcomes than homogenous teams, we encourage you to reach out and work with people who aren't (yet) your friends. As a team, you will complete a project of your choosing throughout the semester. The topic of the course project is open-ended, though it must fulfill certain requirements (most notably, relevance to natural language processing or computational linguistics). This is your opportunity to put your NLP/CL knowledge to use in a project of your choosing.
There are several deliverables for the course project, with associated grade percentages:
- P1: Project brainstorming, pitch, and feedback (5%)
- P2: Survey of related work, and plans for data (5%)
- P3: Description of proposed approach and measures of success (5%)
- P4: Prototype/baseline implementation and initial results (5%)
- P5: Final write-up and presentation (15%)
Each team will be assigned one of the TAs with whom you should meet once before Thanksgiving break. You should also meet with Hal once before Thanksgiving break. We will create a signup sheet; use your own judgment for when would be most useful for you to meet with us.
Please see the course project pages for more details!
Credit: Some ideas for course project implementation are from Walter Lasecki's course on Social Computing Systems and/or Chris Callison-Burch's course on Crowdsourcing.
Disability Support: Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first TWO weeks of the semester.
Laptops in Class: It's been repeatedly documented in many studies that if you can, you are likely better off not using a laptop in class (example study; h/t Jacob Eisenstein). You can make your own decision, but if your laptop use is distracting others, an instructor may ask you to cease using it (in particular, please avoid using websites with popup videos and the like). Please reach out to any instructor if we can help.
Academic Integrity: Any assignment or exam that is handed in must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an E in the course and referred to the University Student Behavior Committee. Please don't take that chance---if you're having trouble understanding the material, please let us know and we will be more than happy to help.
Anti-Harassment: The open exchange of ideas, the freedom of thought and expression, and respectful scientific debate are central to the aims and goals of a this course. These require a community and an environment that recognizes the inherent worth of every person and group, that fosters dignity, understanding, and mutual respect, and that embraces diversity. Harassment and hostile behavior are unwelcome in any part of this course. This includes: speech or behavior that intimidates, creates discomfort, or interferes with a person’s participation or opportunity for participation in the conference. We aim for this course to be an environment where harassment in any form does not happen, including but not limited to: harassment based on race, gender, religion, age, color, national origin, ancestry, disability, sexual orientation, or gender identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking, harassing photography or recording, inappropriate physical contact, and unwelcome sexual attention. Please contact an instructor or CS staff member if you have questions or if you feel you are the victim of harassment (or otherwise witness harassment of others). (Adapted from the ACL Anti-Harassment Policy.)
Web Accessibility: The University of Maryland is committed to equal access to Web content. If you need to request Web content in an alternative format or have comments or suggestions on accessibility, contact itaccessibility@umd.edu.
Note that readings and homeworks are to be completed before the class period on which they are marked. For instance, you should have completed reading TODO before class on 29 Aug, and you must hand in HW1 before class on 12 Sep.
Readings may be from:
- SLP3: Jurafsky and Martin, Speech and Language Processing (3rd edition)
- CIML: Daumé III, A Course in Machine Learning
- NLP: Eisensten, Natural Language Processing
- Neu: Neubig, Neural Machine Translation and Sequence-to-sequence Models: A Tutorial
Date | Topic | Reading | Deadline |
---|---|---|---|
T 27 Aug | Introduction to computational linguistics | ||
R 29 Aug | Distributional semantics | SLP3 6.2-6.5 | |
T 03 Sep | Review: linear models and loss functions | CIML 7 | OH Poll |
R 05 Sep | Text categorization: linguistic features and evaluation | SLP3 4.7, and Stylometry §2,5 |
|
T 10 Sep | Bias and fairness in NLP systems | Webinar* | |
R 12 Sep | Computation graphs and backpropagation | NLP 3.1-3.3 | HW1 |
T 17 Sep | Word meaning as classification | SLP3 6.8-6.9, and RacistAI |
|
R 19 Sep | Data collection and annotation | DataInNLP, and AnnCaseStudy |
|
T 24 Sep | Measurement and validity | Measurement, and MeasurementCaseStudy, Sec "Reliability, Validity, ..." |
|
R 26 Sep | Crowdsourcing annotations | CrowdsourcingNLP, and AnnMyths |
HW2 |
T 01 Oct | CLASS CANCELLED (Hal sick) Multilinguality and linguistic variety | TheBenderRule, and Elicitation, Sec 3, and optional: ActiveElicitation |
|
R 03 Oct | Early exam | ||
T 08 Oct | N-gram language models | SLP3 3 | |
R 10 Oct | Recurrent neural language models | SLP3 9 | |
T 15 Oct | Sequence labeling | CIML 17 | |
R 17 Oct | Encoder-decoder models | Neu 7-7.3.1 Neu 8 | HW3 |
T 22 Oct | Project Pitches | P1 | |
R 24 Oct | Machine translation and evaluation (guest lecturer: Marine Carpuat) | Bleu | |
T 29 Oct | Dependency parsing | SLP3 15-15.4 | |
R 31 Oct | Imitation learning; notes1 | CIML 18 | |
T 05 Nov | Imitation learning II (same slides); notes2 | DepParse | P2 |
R 07 Nov | Reinforcement learning; notes1 | RL4IE | |
T 12 Nov | Reinforcement learning II (same slides); notes2 | RL4IE | |
R 14 Nov | Late exam | ||
T 19 Nov | Semantic parsing | Artzi+Zettlemoyer'13 background | |
R 21 Nov | Language grounding | Matuszek'18 Regier+Carlson'01, to skim | P3 |
T 26 Nov | Language to action | Branavan+al'09 + Khanh+D'19 | |
R 28 Nov | Thanksgiving Break | ||
T 03 Dec | Reading comprehension and question answering | Chen+al'17 Jia+Liang'17 | P4 |
R 05 Dec | Interpretation of neural models | TBA | HW5 |
T 17 Dec | Project Poster Session (10:30a-12:30p) | P5 |
* The webinar link requires you to "register"; if this is an issue for you for any reason, please let any instructor know at least three days ahead of time so we can find a work-around.