Merge main to stable #133

rstz · 2024-09-24T10:30:10Z

No description provided.

… most problematic options PiperOrigin-RevId: 666714677

PiperOrigin-RevId: 666880050

PiperOrigin-RevId: 667578796

PiperOrigin-RevId: 668022926

This is a breaking change that will allow us more flexibility in re-ordering the hyperparameters. PiperOrigin-RevId: 668342872

PiperOrigin-RevId: 668404720

PYDF exposes some parameters on every learner that are only supported on some of them. This change cleans up the lists of hyperparameters. This change also fixes some of the documentation of the learners. PiperOrigin-RevId: 668420630

PiperOrigin-RevId: 668905381

PiperOrigin-RevId: 669276932

PiperOrigin-RevId: 670139426

PiperOrigin-RevId: 670559565

The training condition evaluation is composed of a loop over the examples and a switch over the condition type (and a few other things). Prior to this change, the example loop was outside the condition switch loop, forcing the algorithm to re-check the condition type (and other things) for each examples. After this change, the condition type is outside of the example loop. Example of speed-ups: 1. Average speed-up of 3.7% on all benchmark. 2. Speed-up of 10-15% on Adult dataset with GBT. 3. Speed-up of 8% on Adult dataset with RF. 4. Speed-up of 9% on 4M dataset with 200 features with discretized GBT. 5. No speed difference (<1% gain) on 4M dataset with 200 features non-discretized GBT. Note: The absolute gain is the same as 4., but since 5.'s training is longer, the relative gain is insignificant. PiperOrigin-RevId: 670888905

PiperOrigin-RevId: 670917003

…al values. PiperOrigin-RevId: 670980793

PiperOrigin-RevId: 671272015

PiperOrigin-RevId: 671289711

PiperOrigin-RevId: 671322125

PiperOrigin-RevId: 671709902

PiperOrigin-RevId: 672518215

PiperOrigin-RevId: 672951343

PiperOrigin-RevId: 673274751

…on supported task. PiperOrigin-RevId: 673362728

PiperOrigin-RevId: 673388515

PiperOrigin-RevId: 673852441

PiperOrigin-RevId: 675092823

PiperOrigin-RevId: 675094172

PiperOrigin-RevId: 675114325

We want to make the truncation parameter configurable. Renaming the loss is the first step. PiperOrigin-RevId: 675162089

PiperOrigin-RevId: 675173198

…ning PiperOrigin-RevId: 675174273

PiperOrigin-RevId: 675191829

PiperOrigin-RevId: 675508141

…Default to 10 seconds. PiperOrigin-RevId: 675552053

PiperOrigin-RevId: 675560007

…ust. This adds support for Unicode strings, normalizes categorical set values in the same way as categorical values, and validates their types. As a consequence, boolean values in categorical sets are converted to lowercase, matching the treatment of categorical features. PiperOrigin-RevId: 675906253

PiperOrigin-RevId: 675978237

PiperOrigin-RevId: 675993859

This change adds a few fixes to the NDCG truncation - Add a Python test that the learner correctly truncates - Simplify the proto by re-using the existing LambdMart options proto - Set the different ranking options as mutually exclusive hyperparameters - Fix the definition of the truncation hyperparameters as integers PiperOrigin-RevId: 676387366

Old: learner/gradient_boosted_trees/gradient_boosted_trees.proto New: model/gradient_boosted_trees/gradient_boosted_trees.proto PiperOrigin-RevId: 676396458

PiperOrigin-RevId: 676427940

PiperOrigin-RevId: 676466613

Boolean features are split deterministically with positive going to the right and negative going to the left. PiperOrigin-RevId: 676796602

PiperOrigin-RevId: 676926239

PiperOrigin-RevId: 677691273

PiperOrigin-RevId: 677766701

github-advanced-security · 2024-09-24T10:30:12Z

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

rstz and others added 30 commits August 23, 2024 02:55

[YDF] Print a warning when training on a VerticalDataset and fail for…

5c63fdb

… most problematic options PiperOrigin-RevId: 666714677

The user can override the task, label and group of the model evaluation.

07116e6

PiperOrigin-RevId: 666880050

Move logic of categorical set parsing to C++.

44d9efe

PiperOrigin-RevId: 667578796

[YDF] Remove invalid hyperparameters from isolation forests

7848ac3

PiperOrigin-RevId: 668022926

Disallow positional parameters in the Python API

1a4e9f9

This is a breaking change that will allow us more flexibility in re-ordering the hyperparameters. PiperOrigin-RevId: 668342872

Format specialized learner generated file

fd8570f

PiperOrigin-RevId: 668404720

[YDF] Clean up PYDF learner parameters

579c3b8

PYDF exposes some parameters on every learner that are only supported on some of them. This change cleans up the lists of hyperparameters. This change also fixes some of the documentation of the learners. PiperOrigin-RevId: 668420630

Fix spelling mistake.

4e0f99c

PiperOrigin-RevId: 668905381

Add num_examples_per_tree method for Isolation forest models.

b1807df

PiperOrigin-RevId: 669276932

Fix rare bug in the benchmark synthetic data generation.

f531c1c

PiperOrigin-RevId: 670139426

Fix parsing of NAs in Xarray datasets.

abe57f6

PiperOrigin-RevId: 670559565

[YDF] Change margin in Jax finetuning test to reduce flakes

c8d5530

PiperOrigin-RevId: 670917003

Reduce the RAM usage of distributed training with discretized numeric…

7005541

…al values. PiperOrigin-RevId: 670980793

[YDF] Allow prediction and evaluation with slow engine in Python

b080eab

PiperOrigin-RevId: 671272015

Add a test for the accuracy of Isoforest on Adult

c5168ec

PiperOrigin-RevId: 671289711

Isoforest: Fail if training a tree fails

1aa8415

PiperOrigin-RevId: 671322125

Add python utility to read and write TFRecord datasets

abb74f1

PiperOrigin-RevId: 671709902

Fix labels of regression evaluation plots.

fd32d8c

PiperOrigin-RevId: 672518215

[numpy] Fix users of NumPy APIs that are removed in NumPy 2.0.

20529c1

PiperOrigin-RevId: 672951343

Create GZip FileSystem reader without tf dependencies.

3e22c90

PiperOrigin-RevId: 673274751

Improve error message when using the distributed gbt learner with a n…

b7f5472

…on supported task. PiperOrigin-RevId: 673362728

Add support for compressed tfrecords without tf dependencies.

0b87800

PiperOrigin-RevId: 673388515

IsoForest: Refactor splitting code

2a21aa0

PiperOrigin-RevId: 673852441

Multi-threaded model prediction and evaluation

2c77d15

PiperOrigin-RevId: 675092823

Fix documentation typo

9eee47b

PiperOrigin-RevId: 675094172

Multi-threaded benchmark.

b192787

PiperOrigin-RevId: 675114325

[YDF] Rename LAMBDA_MART_NDCG5 to LAMBDA_MART_NDCG

d5630e6

We want to make the truncation parameter configurable. Renaming the loss is the first step. PiperOrigin-RevId: 675162089

[YDF] Allow configuring the truncation of NDCG losses

8b04210

PiperOrigin-RevId: 675173198

[YDF] Force validation dataset to have the same data spec as the trai…

be8a0ab

…ning PiperOrigin-RevId: 675174273

rstz and others added 15 commits September 16, 2024 10:04

[YDF] Expose NDCG truncation parameter in Python evaluations

bccdcb7

PiperOrigin-RevId: 675191829

GZip file writer

fa0bd2f

PiperOrigin-RevId: 675508141

Add parameter to control the maximum duration of the model analysis. …

ec48a7b

…Default to 10 seconds. PiperOrigin-RevId: 675552053

Add support for TFRecord writting without TF dependency.

ee38cda

PiperOrigin-RevId: 675560007

[YDF] Fix test flakes

d2e24d0

PiperOrigin-RevId: 675978237

Add support for PyGrain DataLoaders and Datasets.

601798b

PiperOrigin-RevId: 675993859

[YDF] Move loss options definition

7dafafd

Old: learner/gradient_boosted_trees/gradient_boosted_trees.proto New: model/gradient_boosted_trees/gradient_boosted_trees.proto PiperOrigin-RevId: 676396458

[YDF] Store loss options in the model

a7c8db2

PiperOrigin-RevId: 676427940

Enable distributed training for Ranking GBT.

a95ab83

PiperOrigin-RevId: 676466613

Isoforst: Add support for splits on boolean features

38bb39f

Boolean features are split deterministically with positive going to the right and negative going to the left. PiperOrigin-RevId: 676796602

Move tsl/lib to xla/tsl/lib

3488afc

PiperOrigin-RevId: 676926239

Isoforest: Add support for categorical feature

967aab7

PiperOrigin-RevId: 677691273

[YDF] Prepare release of PYDF 0.8.0

a89064f

PiperOrigin-RevId: 677766701

rstz merged commit ddba189 into stable Sep 24, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge main to stable #133

Merge main to stable #133

rstz commented Sep 24, 2024

github-advanced-security bot commented Sep 24, 2024

Merge main to stable #133

Merge main to stable #133

Conversation

rstz commented Sep 24, 2024

github-advanced-security bot commented Sep 24, 2024