PredStackOverflowTags

Classification of tags for StackOverflow questions using SVM based machine learning models

Classifiers employed

Scikit-Learn library implementation for the following classifiers was used:

• LinearSVC

• SVC with Linear kernel

• Stochastic Gradient Descent (SGD)

Dataset

All the data used is available in form of a Creative Commons Data dump at Intenet-Archive (https://archive.org/details/stackexchange). From this sanitized dump of available data, I have used the Posts and Tags data available for stackoverflow.com. The entire Posts dataset contains a set of 45K tags spread across 37M questions in a 56GB .XML formatted file.

Posts.xml obtained from the dump is the master-collection of all the questions with the format as illustrated in "sample.xml"

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
metastore_db		metastore_db
README.md		README.md
StackOverflowTP.ipynb		StackOverflowTP.ipynb
StackOverflowTP2.ipynb		StackOverflowTP2.ipynb
Tags.xml		Tags.xml
derby.log		derby.log
sample.xml		sample.xml
train3.xml		train3.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PredStackOverflowTags

Classifiers employed

Dataset

About

Releases

Packages

Languages

upmangaurav/PredStackOverflowTags

Folders and files

Latest commit

History

Repository files navigation

PredStackOverflowTags

Classifiers employed

Dataset

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages