You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.
The goal of this task is to reproduce results observed in the paper LightGBM: A Highly Efficient Gradient Boosting
Decision Tree and test LightGBM on a publicly available and well known benchmark dataset, to ensure our benchmark reproducibility. We already have a generic training script for LightGBM, so this task will consists in writing a pre-processor for this particular dataset, identify the right parameters for running LightGBM on this sample dataset, and run in AzureML.
The expected impact of this task to:
establish trust in our benchmark by obtaining comparable results with existing reference benchmarks
increase value of this benchmark for the community by providing reproducible results on standard data
Learning Goals
By working on this project you'll be able to learn:
how to write components and pipelines for AzureML (component sdk + shrike)
how to use lightgbm in practice on a sample dataset
how to use mlflow and AzureML run history to report metrics
Expected Deliverable:
To complete this task, you need to deliver:
a working python script to parse original LETOR dataset to feed into LightGBM
a working AzureML component
[stretch] a working pipeline with pre-processing and training, reporting training metrics
Clone this repo, create your own branch username/letor (or something) for your own work (commit often!).
In src/scripts/ create a folder preprocess_letor/ and copy the content of src/scripts/samples/ in it.
Download the LETOR dataset from the original source, unzip it if necessary and put it in a subfolder under data/ at the root of the repo (git ignored).
Local development
Let's start locally first...
WORK IN PROGRESS
Develop for AzureML
WORK IN PROGRESS
The text was updated successfully, but these errors were encountered:
The goal of this task is to reproduce results observed in the paper LightGBM: A Highly Efficient Gradient Boosting
Decision Tree and test LightGBM on a publicly available and well known benchmark dataset, to ensure our benchmark reproducibility. We already have a generic training script for LightGBM, so this task will consists in writing a pre-processor for this particular dataset, identify the right parameters for running LightGBM on this sample dataset, and run in AzureML.
The expected impact of this task to:
Learning Goals
By working on this project you'll be able to learn:
Expected Deliverable:
To complete this task, you need to deliver:
Instructions
Prepare for coding
username/letor
(or something) for your own work (commit often!).src/scripts/
create a folderpreprocess_letor/
and copy the content ofsrc/scripts/samples/
in it.data/
at the root of the repo (git ignored).Local development
Let's start locally first...
WORK IN PROGRESS
Develop for AzureML
WORK IN PROGRESS
The text was updated successfully, but these errors were encountered: