cmput497_a2_yonael_zichun3

In this assginment, we implemented varies n-grams language models, including unsmoothed, Laplace smoothing and deleted interpolation and use the models to compute the perplexity of testing dataset against the training corpus.

Prerequisites

Python 3.7+
virtualenv

Setup

# Setup python virtual environment
$ virtualenv venv --python=python3
$ source venv/bin/activate

# Install python dependencies
$ pip install -r requirements.txt

How to run

$ python main.py --unsmoothed

$ python main.py --laplace --train_dir=custom_path_to_dataset --debug=True

By default, the program assume the training dataset is located at data_train, and the test dataset is located at data_dev. See below for advanced usage.

Moreover, we assume all dataset files follow such naming convention, (.*)-(.*).txt.(tra|dev|tes).

Usage: main.py [OPTIONS]

  You must select one of the language models
  --unsmoothed|--laplace|--interpolation

Options:
  --unsmoothed      Use unsmoothed language model.
  --laplace         Use one-hot model.
  --interpolation   Use interpolation language model.
  --train_dir TEXT  Path to training dataset.
  --test_dir TEXT   Path to test dataset.
  --debug BOOLEAN   Enable debug mode.
  --help            Show this message and exit.

Authors

Yonael Bekele
Michael Lin

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.vscode		.vscode
data_dev		data_dev
data_test		data_test
data_train		data_train
language_detector		language_detector
.editorconfig		.editorconfig
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
explore.py		explore.py
langid.py		langid.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cmput497_a2_yonael_zichun3

Prerequisites

Setup

How to run

Authors

About

Releases

Packages

Contributors 2

Languages

michaellzc/cmput497_a2_yonael_zichun3

Folders and files

Latest commit

History

Repository files navigation

cmput497_a2_yonael_zichun3

Prerequisites

Setup

How to run

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages