-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python package]: suggestion: lgb.Booster.predict() should check that the input X data makes sense #812
Comments
@j-mark-hou sure, very happy to see this feature. |
@j-mark-hou any updates ? |
I think maybe this should be rolled into a more systematic rewrite of the pandas api. There's some current things about the implementation that I don't quite understand, and unfortunately I don't currently have the time to dig deeper into this. Sorry, I'll let you know if I revisit this at some point in the future. |
Currently, it leads to some kind of inconsistency:
Imho, check on feature count is really important, and this lack of assertion may lead to harsh, not so easy to debug issues. |
Closed in favor of being in #2302. We decided to keep all feature requests in one place. Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature. |
I may be interested helping improve this. What would be the ideal behavior here? Does LightGBM make predictions only based on column order (like sklearn), or based on column names? |
The shape of data for prediction is now checked at cpp side, thanks to #2464. The next steps may be to check the type and order of features, and their names in case of pandas DataFrame. |
Reopening because I'm working on this. |
…812) (#4909) * check feature names and order in predict with dataframe * slice df in predict to remove the target * scramble features * handle int column names * only change column order when needed * include validate_features param in booster and sklearn estimators * document validate_features argument * use all_close in preds checks and check for assertion error to compare different arrays * perform remapping and checks in cpp * remove extra logs * fixes * revert cpp * proposal * remove extra arg * lint * restore _data_from_pandas arguments * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * move data conversion to Predictor.predict * use Vector2Ptr Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
This issue has been automatically locked since there has not been any recent activity since it was closed. |
In particular, I'm thinking about these things:
if these things sound reasonable I'd be happy to add these checks to the lgb.Booster.predict() function prior to calling the lgb._InnerPredictor.predict()
The text was updated successfully, but these errors were encountered: