Scikit-learn compatible vectorizers built with spaCy NLP famework.
This repo contains customized scikit-learn
compatible classes and vectorizers inspired by CountVectorizer
,
but with more accurate tokenization and lemmatization funcitonality with the help of
spaCy NLP framework. Simple Keras-like
punctuation removal support is also added.
- Python 3.5.4
- scikit-learn 0.19.1
- spaCy 2.0.4
Please refer to the Usage Examples & Tests Jupyter notebook or here.