-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorial for adding custom loss function in C++ #114
Comments
Hi, we know about the issue and this is because we calculate derivatives not once per iteration, but many more times. For now you can try to do in on your own. You need to add a function to catboost/libs/algo/error_functions.h add this function name to enumeration with functions and look carefully that the correct enum value is passed everywhere. |
@annaveronika thank you for your reply. That is a shame. Hopefully it can be resolved soon with a speed up for the python Classes and functions. I have no c++ knowledge so will not be able to rewrite this in c++ from scratch. I will look forward to the c++ tutorial as I might be able to edit this for my needs. Is it worth rewriting the above in Cpython? Would this help with a speed up? What about in numba? Finally, it seems from what you are saying the model should run, just very slowly. I will run this on my full dataset and let it go. Hopefully it completes. I will update with a time for the 200,000 ~ points. |
Hi @annaveronika, |
@Ubikas You could add it to C++ code and send it as a pull request. |
Can someone please explain to me what do the der1 and der2 variables signify in the above code as well as in the usage example in the official docs? first and second derivatives? with respect to what? |
@rupak-118 w.r.t the loss function. If the loss function is MSE,
The MSE is a nice example. My issue, and the main topic in this post is with calculating MAE, and more generally, calculating custom loss function quickly enough to be useful. |
@JoshuaC3 Thanks. I was hoping to build one for Recall, but it seems it will be too slow. I'll try and explore the C++ way as mentioned by @annaveronika. Not sure if I'll be able to bring it to fruition, though. |
@rupak-118 do let us know if you get this to work. |
We added c++ user error function tutorial. |
Hi, I am implementing the fair loss function to approximate MAE. 1 2 3 However, it runs incredibly slowly. Is this a known issue or is my implementation of the function below just a bad implementation?
The training time difference for 10, 100 and 1000 samples was around 5x - 10x as long. For my entire dataset 200,000, it took too long and threw a python/jupyter notebook warning.
FYI: Using the built in RMSE gives a decent MAE score of 183 MAE on my application. Using the MAE objective function gave around 360 MAE (hence trying my own MAE approx.). Using the fair loss custom on a 1000 samples (0.5% of the dataset) gave a score of 280 MAE (so good improvements expected if can run it for all 200,000 sample points).
Benchmarking: using XGBoost with custom fair loss gave an MAE of 179 with little speed change (190 MAE when using RMSE as objective function).
Using lightgbm with built-in fair loss object gave 187 MAE and 192 MAE when using RMSE objective function.
The text was updated successfully, but these errors were encountered: