Skip to content

Disambiguating Voynich Manuscript transliterations with word embeddings

Notifications You must be signed in to change notification settings

CS-433/ml-project-2-scikit-learn2

Repository files navigation

ML Project 2: Disambiguating Voynich Manuscript transliterations with word embeddings

Team members

  • Jirka Lhotka
  • Francesco Salvi
  • Liudvikas Lazauskas

Repo structure

The repository contains 3 main notebooks aswell as 4 modules:

  • embeddings_italian.ipynb Responsible for training and evaluating embeddings on italian text (Dante's Inferno).
  • embeddings_latin.ipynb Responsible for training and evaluating embeddings on latin text (Albert of Aix).
  • embeddings_voynich.ipynb Responsible for training embeddings on the Voynich Manuscript.
  • corruptions.py Provide methods to compute ambiguities distributions and to artificially corrupt the texts.
  • uncertainties.py Provide a class to represent ambiguities with their contexts and methods to create a list of ambiguities given a corrupted text.
  • baseline.py Provide methods to generate baseline predictions, computing letter frequencies in the text.
  • validation.py Provide methods to generate predictions and to evaluate the models by computing their accuracy.

Data

The texts used in this project can be mainly found in the foler texts/. The folder contains historical texts such as Dante's Inferno and Albert of Aix, and Voynich transliterations available here. The transliterations are further processed with ivtt, and processed texts are found in the data/ folder.

Resources

  • Benchmarks The benchmark used for the Latin synonym selection task can be found in the benchmarks/ folder.

  • Software The software used for filtering and processing the transliterations can be found in software/ folder, taken from here.

  • Documentation Documentation for the usage of IVTT and IVTFF format can be found in the documentation/ folder.

Predictions

The resulting predictions of the model trained on Voynich can be found in the predictions/ folder.

Requirements

About

Disambiguating Voynich Manuscript transliterations with word embeddings

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published