Releases: google/yggdrasil-decision-forests
Releases · google/yggdrasil-decision-forests
Python API 0.9.0
0.9.0 - 2024-12-02
Breaking
- Classification Label classes are now consistently ordered lexicographically
(for string labels) or increasingly (for integer labels). - Change typo partial_depepence_plot to partial_dependence_plot on
model.analyze().
Feature
- Add support for Avro file for path / distributed training with the "avro:"
prefix. - Add support for discretized numerical features for in-memory datasets.
- Expose MRR for ranking models.
- Add
model.predict_class
to generate the most likely predicted class of
classification models. - Add support for automatic feature selection with the
feature_selector
learner constructor argument. See the feature selection tutorial for
more details. - Add standalone prediction evaluation
ydf.evaluate_predictions()
. - Add new hyperparameter
sparse_oblique_max_num_projections
. - Add options "POWER_OF_TWO" and "INTEGER" for sparse oblique weights.
- Emit proper errors when using lists for multi-dimensional features.
Fix
- Regression and Ranking CEPs scaling corrected.
Release music
The John B. Sails. Traditional
Python API 0.8.0
0.8.0 - 2024-09-23
Breaking
- Disallow positional parameters for the learners, except for label and task.
- Remove the unsupported / invalid hyperparameters from the Isolation Forest
learner. - Remove parameters for distributed training and resuming training from
learners that do not support these capabilities. - By default,
model.analyze
for a maximum of 20 seconds (i.e.
maximum_duration=20
by default). - Convert boolean values in categorical sets to lowercase, matching the
treatment of categorical features.
Feature
- Warn if training on a VerticalDataset and fail if attempting to modify the
columns in a VerticalDataset during training. - User can override the model's task, label or group during evaluation.
- Add
num_examples_per_tree()
method to Isolation Forest models. - Expose the slow engine for debugging predictions and evaluations with
use_slow_engine=True
. - Speed-up training of GBT models by ~10%.
- Support for categorical and boolean features in Isolation Forests.
- Add
ydf.util.read_tf_record
andydf.util.write_tf_record
to facilitate
TF Record datasets usage. - Rename LAMBDA_MART_NDCG5 to LAMBDA_MART_NDCG. The old name is deprecated but
can still be used. - Allow configuring the truncation of NDCG losses.
- Enable multi-threading when using
model.predict
andmodel.evaluate
. - Default number of threads of
model.analyze
is equal to the number of
cores. - Add multi-threaded results in
model.benchmark
. - Add argument to control the maximum duration of
model.analyze
. - Add support for Unicode strings, normalize categorical set values in the
same way as categorical values, and validate their types. - Add support for distributed training for ranking gradient boosted tree
models.
Fix
- Fix labels of regression evaluation plots
- Improved errors if Isolation Forest training fails.
Release music
Perpetuum Mobile "Ein musikalischer Scherz", Op. 257. Johann Strauss (Sohn)
v1.10.0
1.10.0 - 2024-08-21
Features
- Add support for Isolation Forests model.
- The default value of
num_candidate_attributes
in the CART learner is
changed from 0 (Random Forest style sampling) to -1 (no sampling). This is
the generally accepted logic of CART. - Added support for GCS for file I/O.
Python API 0.7.0
Python API 0.7.0 - 2024-08-21
Feature
- Expose
validate_hyperparameters()
on the learner. - Clarify which parameters in the learner are optional.
- Add support in JAX FeatureEncoder for non-string categorical feature values.
- Improve performance of Isolation Forests.
- Models can be serialized/deserialized to/from bytes with
model.serialize()
andydf.deserialize_model
. - Models can be pickled safely.
- Native support for Xarray as a dataset format for all operations (e.g.,
training, evaluation, predictions). - The output of
model.to_jax_function
can be converted to a TensorFlow Lite
model. - Change the default number of examples to scan when training on files to
determine the semantic and dictionaries of columns from 10k to 100k. - Various improvements of error messages.
- Evaluation for Anomaly Detection models.
- Oblique splits for Anomaly Detection models.
Fix
- Fix parsing of multidimensional ragged inputs.
- Fix isolation forest hyperparameter defaults.
- Fix bug causing distributed training to fail on a sharded dataset containing
an empty shard. - Handle unordered categorical sets in training.
- Fix dataspec ignoring definitions of unrolled columns, such as
multidimensional categorical integers. - Fix error when defining categorical sets for non-ragged multidimensional
inputs. - MacOS: Fix compatibility with other protobuf-using libraries such as
Tensorflow.
Release music
Rondo Alla ingharese quasi un capriccio "Die Wut über den verlorenen Groschen",
Op. 129. Ludwig van Beethoven
Python API 0.6.0
Feature
model.to_jax_function
now always outputs a FeatureEncoder to help feeding
data to the JAX model.- The default value of
num_candidate_attributes
in the CART learner is
changed from 0 (Random Forest style sampling) to -1 (no sampling). This is
the generally accepted logic of CART. model.to_tensorflow_saved_model
support preprocessing functions which have
a different signature than the YDF model.- Improve error messages when feeding wrong size Numpy arrays.
- Add option for weighted evaluation in
model.evaluate
.
Fix
- Fix display of confusion matrix with floating point weights.
Known issues
- MacOS build is broken.
Python API 0.5.0
Feature
- Add support for Isolation Forests model.
- Add
max_depth
argument tomodel.print_tree
. - Add
verbose
argument totrain
method which is equivalent but sometime
more convenient thanydf.verbose
. - Add SKLearn to YDF model converter:
ydf.from_sklearn
. - Improve error messages when calling the model with non supported data.
- Add support for numpy 2.0.
Tutorials
- Add anomaly detection tutorial.
- Add YDF and JAX model composition tutorial.
Fix
- Fix error when plotting oblique trees (
model.plot_tree
) in colab.
Python API 0.4.3
Python API - Changelog
Feature
- Add
model.to_jax_function()
function to convert a YDF model into a JAX
function that can be combined with other JAX operations. - Print warnings when categorical features look like numbers.
- Add support for Python 3.12.
Fix
- Fix cross-validation for non-classification learners.
- Fix missing ydf/model/tree/plotter.js
- Solve dependency collision of YDF Proto between PYDF and TF-DF.
Python API 0.4.1
Python API - Changelog
Fix
- Solve dependency collision to YDF between PYDF and TF-DF. If TF-DF is
installed after PYDF, importing YDF will fails with ahas no attribute 'DType'
error. - Allow for training on cached TensorFlow dataset.
Python API 0.4.0
Python API - 0.4.0 - 2024-04-10
Feature
- Multi-dimensional features can be selected / configured with the
features=
training argument. - Programmatic access to partial dependence plots and variable importances.
- Add
model.to_tensorflow_function()
function to convert a YDF model into a
TensorFlow function that can be combined with other TensorFlow operations.
This function is compatible with Keras 2 and Keras 3. - Add arguments
servo_api=False
andfeed_example_proto=False
for
model.to_tensorflow_function(mode="tf")
to export TensorFlow SavedModel
following respectively the Servo API and consuming serialized TensorFlow
Example protos. - Add
pre_processing
andpost_processing
arguments to the
model.to_tensorflow_function
function to pack pre/post processing
operations in a TensorFlow SavedModel.
Tutorials
- Add tutorial
Vertex AI with TF Serving - Add tutorial
Deep-learning with YDF and TensorFlow
Python API 0.3.0
Python API 0.3.0 - 2024-03-15
Breaking
- Custom losses now require to provide the gradient, instead of the negative
of the gradient. - Clarified that YDF may modify numpy arrays returned by a custom loss
function.
Features
- Allow using Jax for custom loss definitions.
- Allow setting
may_trigger_gc
on custom losses. - Add support for MHLD oblique decision trees.
- Expose hyperparameter
sparse_oblique_max_num_projections
. - HTML plots for trees with
model.plot_tree()
. - Fix protobuf version to 4.24.3 to fix some incompatibilities when using
conda. - Allow to list compatible engines with
model.list_compatible_engines()
. - Allow to choose a fast engine with
model.force_engine(...)
.
Fix
- Fix slow engine creation for some combination of oblique splits.
- Improve error message when feeding multi-dimensional labels.
Documentation
- Clarified documentation of hyperparameters for oblique splits.
- Fix plots, typos.
Release music
Doctor Gradus ad Parnassum from "Children's Corner" (L. 113). Claude Debussy