Deep Learning Based Approach for Compound Type Identification in Sanskrit

Code for the paper titled "Revisiting the Role of Feature Engineering for Compound Type Identification in Sanskrit"
This code is adapted and modified from this tutorial by Ruder.

Requirements:

The following software must be installed on your machine.

Python 3.5
Tensorflow 1.13.1
numpy
gensim
pandas
scikit-learn

File organization

code : To get results reported in paper, simply run this python file.
data : contains data required to run this code
model : generated model will be stored to this folder

To run the code:

We have only provided our best word embedding model implementation i.e. FastText. Go to code/train.py file

python train.py

Dataset

Description of data files . We have used same transliteration scheme as that of Hellwig's

Corpora

file name	discription
train/test.csv	This is the dataset for compound type classification task.
compound_dic.pickle	This file is dictionary mapping of compound classification dataset to get word embedding vectors.
Fast_text_features	This folder contains fasttext embedding of classification dataset.

These features can be downloaded from here

Make sure these features are placed in path : data/fast_text_features

Sample data

There are four classes. They are represented by integer mapping: Avyaibhav(0), Bahuvrihi(1), Dvandva(2), Tatpurush(3)

Index	Word1	Word2	Class
1	xqDa	vikramaH	1
2	prawi	icCakaH	0
3	saMmAna	SuSrURA	2

Statistics of Corpora contained in Sanskrit

Corpus	No of Verses	No of words
Vedabase	13013	190343
DCS	127376	3797593
wiki	78K lines	663521
Total		4651457

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
code		code
data		data
LICENSE.md		LICENSE.md
README.md		README.md
temp.xml		temp.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Based Approach for Compound Type Identification in Sanskrit

Requirements:

File organization

To run the code:

Dataset

Corpora

Sample data

Statistics of Corpora contained in Sanskrit

About

Releases

Packages

Languages

License

Jivnesh/ISCLS-19

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Based Approach for Compound Type Identification in Sanskrit

Requirements:

File organization

To run the code:

Dataset

Corpora

Sample data

Statistics of Corpora contained in Sanskrit

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages