Skip to content

Venue Prediction with bag of words + heterogenous information using sklearn SGDClassifier

Notifications You must be signed in to change notification settings

ss87021456/Venue-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Venue-Prediction

Venue Prediction with bag-of-words + heterogenous information as features using sklearn SGDClassifier

Dataset DBLP:

training: https://www.dropbox.com/s/rrbksqvvoefrr4p/training.txt?dl=0
validation: https://www.dropbox.com/s/tw094y2xfcoosv3/validation.txt?dl=0
Dataset describe:
Paper_Id \tab Paper_title \tab Publication_venue \tab Cited_Papers \tab Cited_Papers_Venues

Dependency:

python3
sklearn
pandas
numpy
pickle

Pipeline:

mkdir input # Create input directory
<Download training, validation dataset on the link above and move into input directory>
python3 ./src/clean_data.py --input ./input/training.txt --output ./input/cleaned_training.txt
python3 ./src/clean_data.py --input ./input/validation.txt --output ./input/cleaned_validation.txt
python3 ./src/create_data_example.py --train ./input/cleaned_training.txt --validation ./input/cleaned_validation.txt
python3 ./src/train_classifier.py --train ./input/cleaned_training.txt --validation ./input/cleaned_validation.txt

Default Configuration:

bag-of-word dimension: 3000
classifier: sklearn SGDClassifier (default)

Result: (on validation dataset)

Feature F1-micro F1-macro Accuracy
title info. 0.266 0.172 0.267
title + cited_venue info. 0.982 0.758 0.981

About

Venue Prediction with bag of words + heterogenous information using sklearn SGDClassifier

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages