Skip to content

Nghiauet/cs224n

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS224N: Natural Language Processing with Deep Learning - A Comprehensive Summary

This README provides a concise yet detailed overview of the key takeaways from each week of the CS224N course. It's designed as a quick reference guide for essential NLP concepts, equations, and models covered.

Week 1: Word Embeddings

  • Word Vectors: Representing words as dense vectors capturing semantic relationships.
    • Word2Vec: Learning word embeddings using a shallow neural network to predict context words (CBOW) or target words (Skip-gram).
      • Objective Function (Skip-gram): Maximize the probability of observing context words given the target word.
    • GloVe (Global Vectors): Combines global co-occurrence statistics with local context window information for robust embeddings.
  • Word Window Classification: Using word embeddings for tasks like Named Entity Recognition by classifying words within a fixed window.
  • Key Equations:
    • Softmax: p(o|c) = exp(u_o^T v_c) / sum(exp(u_w^T v_c))
    • Cross-Entropy Loss: J = - sum(y_i * log(p_i))
  • Evaluation: Intrinsic (e.g., word similarity) and extrinsic (e.g., downstream task performance) evaluation methods for embeddings.

Week 2: Neural Networks and Dependency Parsing

  • Backpropagation: The core algorithm for training neural networks by calculating gradients and updating weights.
  • Neural Networks: Architectures for learning complex patterns from data.
  • Dependency Parsing: Analyzing grammatical structure by identifying head-dependent relationships between words in a sentence.
    • Transition-based Parsing: Uses a stack and a buffer to predict dependency relations sequentially.
  • Key Concepts: Gradient Descent, Chain Rule, Activation Functions (Sigmoid, ReLU), Backpropagation Through Time (BPTT)

Week 3: Recurrent Neural Networks (RNNs) and Language Models

  • RNNs: Networks designed to process sequential data like text by maintaining hidden states that capture information from previous time steps.
  • Language Models: Predicting the probability of a sequence of words, used for tasks like text generation and machine translation.
  • Vanishing Gradients: Difficulty in training RNNs due to gradients vanishing or exploding over long sequences.
  • Key Architectures:
    • LSTM (Long Short-Term Memory): Addresses vanishing gradients with a gating mechanism.
    • GRU (Gated Recurrent Unit): Simplified LSTM with fewer parameters.
  • Key Equations:
    • RNN Hidden State Update: h_t = tanh(W_hh * h_{t-1} + W_xh * x_t + b_h)

Week 4: Machine Translation and Attention

  • Machine Translation: Using neural networks to translate text between languages.
  • Sequence-to-Sequence Models: Encoder-decoder architecture for mapping input sequences to output sequences.
  • Attention Mechanism: Allows the decoder to focus on specific parts of the input sequence when generating the output.
    • Improves performance by handling long-range dependencies.
  • Subword Models: Handling out-of-vocabulary words by representing words as subword units (e.g., BPE, WordPiece).

Week 5: Transformers and Pretraining

  • Transformers: Architecture based on self-attention mechanisms, replacing recurrent networks for sequence modeling.
    • Self-Attention: Allows a word to attend to all other words in the sequence, capturing long-range dependencies effectively.
  • Pretraining: Training large language models on massive text data (e.g., BERT, GPT) and fine-tuning them for specific downstream tasks.
  • Key Concepts: Multi-Head Attention, Positional Encoding, Layer Normalization

Week 6: Question Answering and Natural Language Generation

  • Question Answering (QA): Building systems that can answer questions posed in natural language.
    • Extractive QA: Identifying the answer span within a given context passage.
    • Open-domain QA: Retrieving relevant information from a large knowledge base to answer questions.
  • Natural Language Generation (NLG): Generating coherent and grammatically correct text from structured data or other inputs.
  • Key Models:
    • BiDAF (Bidirectional Attention Flow): Popular model for extractive QA.
    • Pointer Networks: Used for tasks like summarization and text simplification.

Week 7: Coreference Resolution and Large Language Models

  • Coreference Resolution: Identifying and clustering mentions in text that refer to the same entity.
  • Large Language Models (LLMs): Capabilities, limitations, and ethical considerations of LLMs like GPT-3 and their impact on NLP research.
  • Key Concepts: Anaphora Resolution, Mention Detection, Clustering Algorithms

Week 8-11: Advanced Topics and Applications

  • Knowledge Integration: Incorporating external knowledge sources (e.g., knowledge graphs) into language models.
  • Model Analysis and Explanation: Understanding the inner workings of NLP models and making their predictions more interpretable.
  • Social and Ethical Considerations: Bias in NLP models, fairness, and responsible AI development.
  • Future of NLP: Emerging trends and research directions in the field.

Final Project: Implementing BERT from Scratch

Final Project

  • Folder: finnal_project
  • Description: Comprehensive project involving the implementation of a BERT model from scratch.
  • Components:
    • base_bert.py: Base BERT model implementation.
    • bert.py: BERT model adaptations.
    • classifier.py: Classifier implementation.
    • config.py: Configuration settings.
    • evaluation.py: Evaluation metrics and methods.
    • multitask_classifier.py: Multi-task learning classifier.
    • optimizer_test.py: Optimizer testing script.
    • optimizer.py: Optimizer implementation.
    • prepare_submit.py: Script to prepare project submission.
    • sanity_check.data: Data for sanity checks.
    • sanity_check.py: Sanity check script.
    • setup.sh: Setup script.
    • tokenizer.py: Tokenization script.
    • utils.py: Utility functions.
    • README.md: Detailed description and instructions for the final project.
  • Notes: Ensure all dependencies are installed and configurations are set before running the scripts.

About

cs224n self learning course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published