ML Project 2: Disambiguating Voynich Manuscript transliterations with word embeddings

Team members

Jirka Lhotka
Francesco Salvi
Liudvikas Lazauskas

Repo structure

The repository contains 3 main notebooks aswell as 4 modules:

embeddings_italian.ipynb Responsible for training and evaluating embeddings on italian text (Dante's Inferno).
embeddings_latin.ipynb Responsible for training and evaluating embeddings on latin text (Albert of Aix).
embeddings_voynich.ipynb Responsible for training embeddings on the Voynich Manuscript.
corruptions.py Provide methods to compute ambiguities distributions and to artificially corrupt the texts.
uncertainties.py Provide a class to represent ambiguities with their contexts and methods to create a list of ambiguities given a corrupted text.
baseline.py Provide methods to generate baseline predictions, computing letter frequencies in the text.
validation.py Provide methods to generate predictions and to evaluate the models by computing their accuracy.

Data

The texts used in this project can be mainly found in the foler texts/. The folder contains historical texts such as Dante's Inferno and Albert of Aix, and Voynich transliterations available here. The transliterations are further processed with ivtt, and processed texts are found in the data/ folder.

Resources

Benchmarks The benchmark used for the Latin synonym selection task can be found in the benchmarks/ folder.
Software The software used for filtering and processing the transliterations can be found in software/ folder, taken from here.
Documentation Documentation for the usage of IVTT and IVTFF format can be found in the documentation/ folder.

Predictions

The resulting predictions of the model trained on Voynich can be found in the predictions/ folder.

Requirements

Gensim Models
- version: 4.1.2
- package name gensim
NumPy
- version: 1.19.5
- package name numpy
SciPy
- version: 1.7.3
- package name scipy
Natural Language Toolkit
- version: 3.6.5
- package name nltk
Smart Open
- version: 5.2.1
- package name smart-open
The Classical Language Toolkit
- version: 1.0.21
- package name cltk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Project 2: Disambiguating Voynich Manuscript transliterations with word embeddings

Team members

Repo structure

Data

Resources

Predictions

Requirements

Gensim Models

NumPy

SciPy

Natural Language Toolkit

Smart Open

The Classical Language Toolkit

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
benchmarks		benchmarks
data		data
documentation		documentation
predictions		predictions
software		software
texts		texts
.gitignore		.gitignore
Disambiguating Voynich Manuscript transliterations.pdf		Disambiguating Voynich Manuscript transliterations.pdf
README.md		README.md
baseline.py		baseline.py
corruptions.py		corruptions.py
embeddings_italian.ipynb		embeddings_italian.ipynb
embeddings_latin.ipynb		embeddings_latin.ipynb
embeddings_voynich.ipynb		embeddings_voynich.ipynb
requirements.txt		requirements.txt
uncertainties.py		uncertainties.py
validation.py		validation.py

CS-433/ml-project-2-scikit-learn2

Folders and files

Latest commit

History

Repository files navigation

ML Project 2: Disambiguating Voynich Manuscript transliterations with word embeddings

Team members

Repo structure

Data

Resources

Predictions

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Languages