This app creates and fits an XGBoost Gradient Boosted Tree model based on parquet-formatted input data. The arguments to the program are as follows:
train
: (str, required) local path or URI of a parquet file containing training datatest
: (str, required) local path or URI of a parquet file containing test datan-trees
: (int) number of trees for the regressor; default100
m-depth
: (int) maximum depth of trees of regressor; default10
learning-rate
: (float) learning rate of the model; ranges from0.0
to1.0
; default.2
loss
: (str) name of loss function to be used; default"rmse"
label-col
: (str, required) name of label column in dataset;string
inputfeat-cols
: (str) names of columns in dataset to be used as features; input is onestring
with names delimited by commas. If no argument is provided, it is assumed that all columns but the label column are feature columns.
This app currently assumes that the input data is all numerical.
To run the app with default parameters while in the root directory, run the command
mlflow run apps/gbt-regression -P train="insert/data/path/" -P test="insert/data/path/" -P label-col="insert.label.col"