Implement an LGBM->ONNX model conversion + inferencing #147

jfomhover · 2021-11-05T18:18:42Z

The goal of this task is to add another variant to the inferencing benchmark for LightGBM. We already are comparing lightgbm python, lightgbm C, treelite. We'd like to try onnxruntime as it seems to be applicable.

In particular, we'd like to reproduce the results in this post on hummindbird and onnxruntime for classical ML models.

Feel free to reach out to the posters of the blog for collaboration.

The expected impact of this task:

increase the value of the benchmark for the lightgbm community, in particular for production scenarios
identify better production inferencing technologies

⚠️ It is unknown at this point if hummingbird allows the conversion of lightgbm>=v3 models to onnx. If that was impossible, it's still a good think to know, and to report in the hummingbird issues.

Learning Goals

By working on this project you'll be able to learn:

how to use onnxruntime for classical ML models
how to compare inferencing technologies in a benchmark
how to write components and pipelines for AzureML (component sdk + shrike)

Expected Deliverable:

To complete this task, you need to deliver:

2 working python script: one to convert lightgbm models into onnx (using hummingbird?), one to use onnxruntime for inferencing
their corresponding working AzureML component
a successful run of the lightgbm inferencing benchmark pipeline

Instructions

Prepare for coding

Follow the installation process, please report any issue you meet, that will help!
Clone this repo, create your own branch username/onnxruntime (or something) for your own work (commit often!).
In src/scripts/model_transformation create a folder lightgbm_to_onnx/ and copy the content of src/scripts/samples/ in it.

Local development

Let's start locally first.

To iterate on your python script, you need to consider a couple of constraints:

Follow the instructions in the sample script to modify and make your own.
Please consider using inputs and outputs that are provided as directories, not single files. There's a helper function to let you automatically select the unique file contained in a directory (see src/common/io.py function input_file_path)

Here's a couple of links to get you started:

Feel free to check out the current treelite modules (model_conversion/treelite_compile and inferencing/treelite_python). They have a similar behavior. You can also implement some unit tests from tests/scripts/test_treelite_python.py.

Develop for AzureML

Component specification

First, unit tests. Edit tests/aml/test_components.py and watch for the list of components. Add the relative path to your component spec in this list.
You can test your component by running
```
pytest tests/aml/test_components.py -v -k name_of_component
```
Edit the file spec.yaml in the directory of your component (copied from sample) and align its arguments with the expected arguments of your component until you pass the unit tests.

Integration in the inferencing pipeline

WORK IN PROGRESS

The text was updated successfully, but these errors were encountered:

jfomhover added the good first issue Good for newcomers label Nov 5, 2021

jfomhover added this to the Expansion milestone Nov 5, 2021

jfomhover added the inferencing-benchmark label Nov 8, 2021

majercakdavid linked a pull request Oct 17, 2022 that will close this issue

ONNX benchmark implementation #271

Open

majercakdavid self-assigned this Oct 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement an LGBM->ONNX model conversion + inferencing #147

Implement an LGBM->ONNX model conversion + inferencing #147

jfomhover commented Nov 5, 2021 •

edited

Loading

Implement an LGBM->ONNX model conversion + inferencing #147

Implement an LGBM->ONNX model conversion + inferencing #147

Comments

jfomhover commented Nov 5, 2021 • edited Loading

Learning Goals

Expected Deliverable:

Instructions

Prepare for coding

Local development

Develop for AzureML

Component specification

Integration in the inferencing pipeline

jfomhover commented Nov 5, 2021 •

edited

Loading