Semantic annotation is done through first representing words and documents in the vector space model using word2vec and doc2vec implementations, the vectors are taken as features into a classifier, trained and a model is made which can classify a document with acm classification tree 2012 categories.
$ workon myvirtualenv [Optional]
$ pip3 install -r requirements.txt
Download the Dataset needed for ACM in the ACM Directory from here.
$ python3 run.py
$ python3 classify.py
##Mentors:
- Course Instructor:
- Vasudev Verma
- TA:
- Priya Radhakrishnan
##Major Packages Required
- nltk
- gensim
- numpy
- scikit-learn
- pickle
Members:
Quoc V. Le, and Tomas Mikolov, ''Distributed Representations of Sentences and Documents ICML", 2014
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR”, 2013.
Cao et al., 2015, ''A Novel Neural Topic Model and Its Supervised Extension''. AAAI 2015
Link :- https://cs.stanford.edu/~quocle/paragraph_vector.pdf
Resources are available here.