Skip to content

lisni946/Word2Vec_koine_greek

Repository files navigation

Word2Vec_koine_greek

Project Abstract Despite its long and well documented history, Koine Greek lexicography has been slow to adopt techniques for lexical analysis that are truly grounded in modern linguistic theory and method. While the publication of Louw and Nida’s Greek-English Lexicon (1988) is often hailed as a linguistic breakthrough in this regard, promising a reassessment of Koine Greek in light of lexical field theory and componential analysis, major theoretical and methodological issues seriously undercut this lexicon’s claims to linguistic rigor. A number of recent advances in distributional semantics and Natural Language Processing (NLP) present promising new directions for lexicographical tasks. This thesis makes use of one such NLP tool, the vector space model Word2Vec (Mikolov et al., 2013). Word2Vec is an unsupervised learning algorithm that assigns vectors to word tokens based on the distributional profile of each token within a corpus. Model outputs are represented in vector space, and a cosine similarity metric can be used to compute similarity between words. This effectively operationalises Zellig Harris’ (1954) distributional hypothesis—the notion that words appearing in similar contexts will have similar meanings. I seek to demonstrate the utility of Word2Vec for Koine Greek lexicography, specifically for issues relating to linguistic categorisation. I show how categorisation based on corpus data cannot be intuited through a process of logical taxonomic delineation. Instead, vector space modelling shows how categorisation reflects prototypical encyclopaedic knowledge. Since Koine Greek is a dead language—methods of introspection and elicitation being unavailable to the lexicographer—vector space modelling offers a uniquely empirical basis for researching Koine Greek categorisation.

About

Word2Vec Model for Koine Greek Categorisation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published