text_classify

A general set of tools for text classification, ranking, feature extraction, and prediction

##Introduction/Intention

The goal of this tool is to make it easier to classify documents by providing a simple high level interface for a number of existing tools as well as be a place for novel algorithms to find use among users.

##Dependencies

install nltk install textblob install network x install sci-kit learn

sudo pip install -U -r requirements.txt

download the nltk corpora:

import nltk
nltk.download()

##Installation

To install simply do the following:

sudo python setup.py install

This will install the package.

##Some simple examples

###Naive Bayes Classification

from text_classify.algorithms import naive_bayes  
#Data appears as [([data to classify],[label]),..]
testing = [("hello there","greeting"),("later","goodbye")]
cl = naive_bayes(testing)
test = "Hello there friends"
cl.classify(test) # prints "greeting"

###Support Vector Machines

from text_classify.algorithms import svm, preprocess
#Data appears as [([data to classify],[label]),..]
testing = [("hello there","greeting"),("later","goodbye")]
cl = svm(testing)
test = preprocess("Hello there friends")
cl.classify(test) # prints "greeting"

###Decision Tree

from text_classify.algorithms import decision_tree, preprocess
#Data appears as [([data to classify],[label]),..]
testing = [("hello there","greeting"),("later","goodbye")]
cl = decision_tree(testing)
test = preprocess("Hello there friends")
cl.classify(test) # prints "greeting"

###Text Rank

ranker = algorithms.textrank("hello there friends how are you")
print ranker.keyphrases
print ranker.summary

##Current algorithms supported

###ToDOs

implement Deep Belief Networks
implement neural networks
create a high level interface to send jobs to spark and hadoop

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
build		build
text_classify		text_classify
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_algorithms.py		test_algorithms.py
test_algorithms.pyc		test_algorithms.pyc
testing_cross_validate.py		testing_cross_validate.py
thing.py		thing.py
things.py~		things.py~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text_classify

About

Releases

Packages

Languages

EricSchles/text_classify

Folders and files

Latest commit

History

Repository files navigation

text_classify

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages