Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] bring pandas support to the sklearn wrapper back #904

Merged
merged 3 commits into from
Sep 19, 2017
Merged

[python] bring pandas support to the sklearn wrapper back #904

merged 3 commits into from
Sep 19, 2017

Conversation

StrikerRUS
Copy link
Collaborator

My fault is that in recent PR with sklearn compatibility I forgot about pandas support. This PR is expected to bring pandas support back to the sklearn wrapper.
Fix #901 .

Since scikit-learn has poor pandas support, I think that the best thing to do is just pass pandas.DataFrame to Booster as is.

@msftclas
Copy link

@StrikerRUS,
Thanks for having already signed the Contribution License Agreement. Your agreement was validated by Microsoft. We will now review your pull request.
Thanks,
Microsoft Pull Request Bot

@wxchan
Copy link
Contributor

wxchan commented Sep 11, 2017

aren't there too many tests now? @guolinke

@guolinke
Copy link
Collaborator

@wxchan @StrikerRUS
I think the python version 2.7 3.4 3.5 of OXS test can be removed.
For the appveyor, can we only keep one vs and one mingw ?

@wxchan
Copy link
Contributor

wxchan commented Sep 11, 2017

@guolinke can we disable if-else test on osx?

@guolinke
Copy link
Collaborator

@wxchan yeah

@StrikerRUS
Copy link
Collaborator Author

StrikerRUS commented Sep 11, 2017

@guolinke @wxchan The current total runtime of Appveyor and Travis is approximately equal (if there are no issues on the Travis side :) ) that I think is reasonable. In addition, Travis jobs run in parallel mode (practically all osx-jobs starts at the same time), so it will be no significant reduction of the runtime.
Also I want to remind that the bug of overriding parameters was found with the help of random test (OSX Python 3.5 which you want to remove).

In my opinion, if-else case on OSX could be disabled but all other tests are important.
@guolinke Why do you want to remove them? It the runtime inadmissible?

@guolinke
Copy link
Collaborator

@StrikerRUS
The reason is the osx test are slow and easily time-out.

@StrikerRUS
Copy link
Collaborator Author

@guolinke I believe that it was a sporadic case of Travis OSX freezes.

np.testing.assert_almost_equal(pred0, pred1)
np.testing.assert_almost_equal(pred0, pred2)
np.testing.assert_almost_equal(pred0, pred3)
np.testing.assert_almost_equal(pred0, pred4)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wxchan I can't find out what's wrong with the last test. Can you please help me?

Copy link
Contributor

@henry0312 henry0312 Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is because gbm4 isn't regarded as Classifier or pred4 are probabilities, not labels of class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henry0312 Of course, it's probabilities! How could I be so inattentive?..
Thank you very much!

@henry0312
Copy link
Contributor

henry0312 commented Sep 12, 2017

I think it's better to run tests with Python 2.7, 3.4, 3.5 and 3.6 and we should run the regular test with Python 2.7.x and 3.6.x at least. > #904 (comment), #904 (comment)

Additionaly, as @StrikerRUS pointed out, even if we remove some tests of OSX, the total time of tests won't change.

@StrikerRUS StrikerRUS changed the title [WIP] [python] bring pandas support to the sklearn wrapper back [python] bring pandas support to the sklearn wrapper back Sep 12, 2017
@henry0312
Copy link
Contributor

@guolinke please run travis ci manually. there was something wrong.

@guolinke
Copy link
Collaborator

any updates of this ? ready to merge ?

Copy link
Contributor

@henry0312 henry0312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guolinke guolinke merged commit 0350a9a into microsoft:master Sep 19, 2017
@StrikerRUS StrikerRUS deleted the cat branch September 22, 2017 18:56
guolinke pushed a commit that referenced this pull request Oct 9, 2017
* added test for sklearn handle categorical features

* use raw X, y in sklearn wrapper in case of pandas.DataFrame

* fixed probs
@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sklearn categorical handling
5 participants