Consistent preprocessing output on all backends #1777

mattdangerw · 2024-08-15T20:55:43Z

Old behavior:

On TF backend, raggeds and strings returned as tf tensors.
On Jax/Torch backends, raggeds and strings returned as lists.
Preprocessing functions outside of __call__, like tokenize(), detokenize(), generate_preprocess(), will always return tf tensors on all backends.

This made it hard to write backend agnostic code. TF shows up in random places, and if you are flipping from tf -> jax or vice versa you have to switch between handling tensors and lists.

New behavior:

On all backends for all preprocessing functions, raggeds and strings are returned as lists.
Inside a tf.data call or tf compiled function, preprocessing layers always output tf.tensors.

This requires a little complexity to avoid over converting back and forth from tf -> python in nested calls, but thankfully we can hide most of that complexity in a decorator.

Old behavior: - On TF backend, raggeds and strings returned as tf tensors. - On Jax/Torch bakcnes, raggeds and strings returned as lists. - Preprocessing functions outside of `call`, like `tokenize()`, `detokenize()`, `generate_preprocess()`, will always return tf tensors. This made it hard to write backend agnostic code. TF shows up in random places, and if you are flipping from tf -> jax or vice versa you have to switch between handling tensors and lists. New behavior: - On all backends for all functions, raggeds and strings are returned as lists. - Inside a `tf.data` call or tf compiled function, preprocessing layers always output tf.tensors. This requires a little complexity to avoid over converting back and forth for tf -> python, but thankfully we can hide most of that complexity in a decorator.

SamanehSaadat

LGTM! Thanks, Matt! Just left a couple of nit comments!

keras_nlp/src/models/bart/bart_preprocessor.py

keras_nlp/src/utils/tensor_utils.py

* Consistent preprocessing output on all backends Old behavior: - On TF backend, raggeds and strings returned as tf tensors. - On Jax/Torch bakcnes, raggeds and strings returned as lists. - Preprocessing functions outside of `call`, like `tokenize()`, `detokenize()`, `generate_preprocess()`, will always return tf tensors. This made it hard to write backend agnostic code. TF shows up in random places, and if you are flipping from tf -> jax or vice versa you have to switch between handling tensors and lists. New behavior: - On all backends for all functions, raggeds and strings are returned as lists. - Inside a `tf.data` call or tf compiled function, preprocessing layers always output tf.tensors. This requires a little complexity to avoid over converting back and forth for tf -> python, but thankfully we can hide most of that complexity in a decorator. * Rename preprocessing_function -> tf_preprocessing_function * address comments

mattdangerw force-pushed the consistent-preprocessing-outputs branch 7 times, most recently from 2008b84 to 5de16cd Compare August 16, 2024 03:04

mattdangerw changed the title ~~[DRAFT] Consistent preprocessing output on all backends~~ Consistent preprocessing output on all backends Aug 16, 2024

mattdangerw marked this pull request as ready for review August 16, 2024 03:05

mattdangerw force-pushed the consistent-preprocessing-outputs branch from 5de16cd to 5fdaa4a Compare August 16, 2024 03:13

mattdangerw requested a review from SamanehSaadat August 16, 2024 17:36

Rename preprocessing_function -> tf_preprocessing_function

331f6a1

mattdangerw force-pushed the consistent-preprocessing-outputs branch from c582212 to 331f6a1 Compare August 16, 2024 20:56

SamanehSaadat approved these changes Aug 16, 2024

View reviewed changes

keras_nlp/src/models/bart/bart_preprocessor.py Outdated Show resolved Hide resolved

keras_nlp/src/utils/tensor_utils.py Outdated Show resolved Hide resolved

address comments

ba9d50d

mattdangerw added the kokoro:force-run Runs Tests on GPU label Aug 19, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Aug 19, 2024

mattdangerw merged commit 180c7ec into keras-team:master Aug 19, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent preprocessing output on all backends #1777

Consistent preprocessing output on all backends #1777

mattdangerw commented Aug 15, 2024 •

edited

Loading

SamanehSaadat left a comment

Consistent preprocessing output on all backends #1777

Consistent preprocessing output on all backends #1777

Conversation

mattdangerw commented Aug 15, 2024 • edited Loading

SamanehSaadat left a comment

Choose a reason for hiding this comment

mattdangerw commented Aug 15, 2024 •

edited

Loading