Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for preprocessing.dictmapper.DictMapper and meta.outlier_classifier.OutlierClassifier #646

Merged
merged 7 commits into from
Mar 29, 2024

Conversation

anopsy
Copy link
Contributor

@anopsy anopsy commented Mar 24, 2024

Example added to preprocessing.dictmapper.DictMapper is the correct pull request.

Hi folks,
I did today the example for DictMapper, this one felt a bit tricky.

Since the DictMapper uses a dictionary to map the values to int's I tried to think of a useful example, that will not mimic the LabelEncoder, but will input meaningful values (population of a city /ranking of a university) if I went to far and should go back to a more abstract example, let me know.

What caught my attention is that if you want to cover multiple columns in a df you need to create a dict that includes all of the values just as I did in my example. Is my intuition that this seems a bit clumsy correct?

I also had my doubts about what's the proper style to space between comments in the dict, this version got all the green checks from Codespace tools, so I went for that. Btw this time I checked how the docstrings rendered via mkdocs on the docs site ;)

I'm very curious what was the purpose of the DictMapper. :D

Before working on a large PR, please check with @FBruzzesi or @koaning to confirm that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.

Description

Example of how to use DictMapper added to docstrings

Partially Fixes #596

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the style guidelines (ruff)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (also to the readme.md)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added tests to check whether the new feature adheres to the sklearn convention
  • New and existing unit tests pass locally with my changes

If you feel your PR is ready for a review, ping @FBruzzesi or @koaning.

@FBruzzesi
Copy link
Collaborator

FBruzzesi commented Mar 24, 2024

Hey @anopsy, thanks for the PR.

One suggestion I have (and I should add it in the issue referencing these examples) is to combine the examples as a unique one in the top level docstring for the class. In this way users will see the following sections: Parameters, Attributes, Example, and then the list of methods rather than single examples in each method.

Example in documentation and its code.

What caught my attention is that if you want to cover multiple columns in a df you need to create a dict that includes all of the values just as I did in my example. Is my intuition that this seems a bit clumsy correct?

This feels buggy as different columns could have share values that one could want to map to different values. It may be useful to either:

  • report it as an issue and think if there is a way to fix it
  • at least document how to work around it (e.g. using sklego ColumnSelector/skrub SelectCols in combination of scikit-learn FeatureUnion)

@anopsy
Copy link
Contributor Author

anopsy commented Mar 24, 2024

Thank you @FBruzzesi,
sure I will combine the examples into one, next time.
I also thought that maybe the author ment it to be used with SelectCols, if that's the case, I can change the example to a single column df or combine with SelectCols if it can be used in a pipeline.

@anopsy
Copy link
Contributor Author

anopsy commented Mar 25, 2024

Hi, I have a question concerning committing from the Codespace. I opened Codespace from my fork, I run git pull origin examples (name of my branch), then added and committed the changes and then I run git push origin examples. The problem is I can't see the Compare&Pull Request button anywhere. I'm completely new to Codespace, tried to google that but couldn't find any specific answer. Should I be opening the Codespaces from scikit-lego original fork for the git push origin to work? I hope it's okay to ask this here. If not let me know and sorry for the inconvenience

@koaning
Copy link
Owner

koaning commented Mar 25, 2024

You probably pushed to your own repository on Github. If you look at your own branches do you see the branch that you've just worked on?

@FBruzzesi
Copy link
Collaborator

FBruzzesi commented Mar 25, 2024

Hey there! I never worked with codespace's but in general once a PR is open, every pushed commit will automatically appear in the PR without the need to Compare & pull request.

If the commits in the screenshots are what you refer to, then you can see them in the PR changes already.
image

@anopsy
Copy link
Contributor Author

anopsy commented Mar 25, 2024

You probably pushed to your own repository on Github. If you look at your own branches do you see the branch that you've just worked on?

Yes and it says: This branch is 5 commits ahead of koaning/scikit-lego:main.

once a PR is open, every pushed commit will automatically appear in the PR without the need to Compare & pull request.

That's convenient to know, thank you Francesco!

If the commits in the screenshots are what you refer to, then you can see them in the PR changes already. image

Yes those are mine! Cool, so it worked after all, thank you folks!

@@ -52,6 +52,33 @@ def fit(self, X, y=None):
ValueError
- If the underlying model is not an outlier detection model.
- If the underlying model does not have a `decision_function` method.

Example
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @anopsy, I have the same feedback as for DictMapper: if you could move the example up in the docstring I think it would be easier and faster for folks to find when scrolling through the api documentation without the need to step down into the .fit(..) method.

I think this example is ready to merge after that change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do that!

@@ -43,6 +43,41 @@ def fit(self, X, y=None):
-------
self : DictMapper
The fitted transformer.

Example
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you manage to add how to make it interact with either sklego.preprocessing.ColumnSelector or sklearn.composeColumnTransformer I believe it would be of great help and ready to merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this double dict was making me really uncomfortable 😅 I'm on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the fixes, but I'm unable to push them. I need to figure out what's going on and be back 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was also confused. However, I finally figured out what was happening and was able to push this time. The problem was that one of the files, sklego.model_selection.py, which I wasn’t even working on, failed to pass the ruff-format and was reformatted by ruff. Since I hadn’t worked on it, I decided to revert that change (git restore) and attempted to commit only the two files I had been working on. But after pushing, I received the message “Everything-up-to-date” and didn’t see any changes on my branch. Today, I decided to accept the formatting changes made by ruff, and I was finally able to push. I hope the formatter didn’t break anything in model_selection.py. What’s the proper way to handle this kind of problem in the future?

Copy link
Collaborator

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🎉 Thanks for the changes!

@FBruzzesi FBruzzesi changed the title Example of preprocessing.dictmapper.DictMapper added Example for preprocessing.dictmapper.DictMapper and meta.outlier_classifier.OutlierClassifier Mar 29, 2024
@FBruzzesi FBruzzesi merged commit 14ef241 into koaning:main Mar 29, 2024
17 checks passed
@FBruzzesi FBruzzesi mentioned this pull request Apr 8, 2024
33 tasks
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOCS] Example usage in docstring
3 participants