Word Embeddings (Word2Vec) for Nepali Language

This pre-trained Word2Vec model has 300-dimensional vectors for more than 0.5 million Nepali words and phrases. A separate Nepali language text corpus was created using the news contents freely available in the public domain. The text corpus contained more than 90 million running words.

Word2Vec Model

Embeddings Dimension: 300
Architecture: Continuous - BOW
Training algorithm: Negative sampling = 15
Context (window) size: 10
Token minimum count: 2
Encoded in UTF-8

Download the model from IEEE Dataport: https://ieee-dataport.org/open-access/300-dimensional-word-embeddings-nepali-language

(Size: 1,881,180,827 bytes and File Type: .txt)

Using the Word2Vec model

from gensim.models import KeyedVectors

# Load vectors
model = KeyedVectors.load_word2vec_format(''.../path/to/nepali_embeddings_word2vec.txt', binary=False)

# find similarity between words
model.similarity('फेसबूक','इन्स्टाग्राम')

#most similar words
model.most_similar('ठमेल')

#try some linear algebra maths with Nepali words
model.most_similar(positive=['', ''], negative=[''], topn=1)

The design of the Nepali text corpus and the training of the Word2Vec model was done at Database Systems and Artificial Intelligence Lab, School of Computer and System Sciences, Jawaharlal Nehru University, New Delhi.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
LICENSE		LICENSE
README.md		README.md
word_embd_cover.PNG		word_embd_cover.PNG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Embeddings (Word2Vec) for Nepali Language

Word2Vec Model

Using the Word2Vec model

About

Releases

Packages

License

rabindralamsal/Word2Vec-Embeddings-for-Nepali-Language

Folders and files

Latest commit

History

Repository files navigation

Word Embeddings (Word2Vec) for Nepali Language

Word2Vec Model

Using the Word2Vec model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages