Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change BagOfWordsTransformer to CountTransformer #20

Merged
merged 1 commit into from
Feb 7, 2022

Conversation

pazzo83
Copy link
Collaborator

@pazzo83 pazzo83 commented Feb 7, 2022

Changing the name of this transformer for more clarity. Essentially, all three of the transformers we have right now are based on the "bag of words" concept (TF-IDF and BM25 do additional weighting, but they are derived from the document-term matrix - DTM - which is just a count of each word in each document). Thus, one of the more basic forms of this is just the raw DTM which we can call the CountTransformer (in sklearn this is the CountVectorizer).

I think this would technically be a breaking change since we are changing the names of one of the models.

@pazzo83 pazzo83 requested review from ablaom and storopoli February 7, 2022 17:14
Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pazzo83 Thanks for your continued support of this package.

This change makes sense. I agree it should be marked as breaking.

Could you please open an issue at MLJModels to update the registry after this is merged?
And an issue at MLJ to update the "list of models" in docs, would be good too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants