change BagOfWordsTransformer to CountTransformer #20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changing the name of this transformer for more clarity. Essentially, all three of the transformers we have right now are based on the "bag of words" concept (TF-IDF and BM25 do additional weighting, but they are derived from the document-term matrix - DTM - which is just a count of each word in each document). Thus, one of the more basic forms of this is just the raw DTM which we can call the CountTransformer (in sklearn this is the CountVectorizer).
I think this would technically be a breaking change since we are changing the names of one of the models.