-
Notifications
You must be signed in to change notification settings - Fork 3
Sparse Matrix Support #10
Comments
From my understanding in the case of using the mlr3 data table based your code uses some preprocessing steps to transform the data.table infto a data.frame and then transform this into a numerical format useable by lightgbm and then into a matrix with If the user uses mlr3 with a DataBackendMatrix this matrix could directly be passed to the lgb.Dataset function without as.matrix then the sparsity would even be preserved using the canonical mlr3 way. |
I'm not sure if it worth to set |
I am currently over it! @statist-bhfz, good idea, to move all to the backend_preprocessing; however I need to figure out, how to do it best, since we currently seem to need the "as.matrix" function for passing data.tables to lgb.Dataset |
@kapsner |
Indeed, thats correct. The problem is, that this data = task$data(
cols = task$feature_names,
data_format = "Matrix"
) does not work with data.table backends (there is no internal transformation) and I need to figure out a different solution for allowing both data backends. |
I could be wrong, but it's necessary to specify matrix backend instead of
Possibly
Potential data loss is quite serious contraindication! |
This is how tidymodels seems to handle this issue, but to my understanding this would not be consistent with how mlr3 was designed. If the user already prepared the data as a numeric matrix the data loss should not occur. For factors it would anyway be the data.table backend. |
The current code transforms all data into a matrix with as.matrix()
private$dtrain = lightgbm::lgb.Dataset(
data = as.matrix(data[, task$feature_names, with = F]),
label = label,
free_raw_data = FALSE
)
But both mlr3 and the lightgbm R package support sparse matrices:
https://lightgbm.readthedocs.io/en/latest/R/reference/lgb.Dataset.html
and mlr3
https://mlr3.mlr-org.com/reference/DataBackendMatrix.html
It would be great if sparse matrices (dgCMatrix ) would be supported.
Maybe as(data,"sparseMatrix") or so.
Would be really great if this would be supported.
The text was updated successfully, but these errors were encountered: