-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TextVectorization: output_mode={multi_hot, count} promise int arrays but output floats #18973
Comments
* Fix custom functional reload issue * Fix issue with TextVectorization as first Sequential input * Fix text vectorization output spec
( |
@nicdumz I tried with all backends and it seems to retuning can you please double check? |
With a test program containing: import tensorflow as tf, tensorflow.version as tv
print(f"{tv.VERSION}, {tv.COMPILER_VERSION}, {tv.GIT_VERSION}")
v = tf.keras.layers.TextVectorization(output_mode="count")
v.adapt(["foo", "bar", "baz"])
print(v(["bar baz"]).dtype) Output is:
I would have expected an int64 output. |
oh I see, this is a tf keras issue. The change commit you linked was in the Keras 3 repo. |
Thank you, sorry I was not aware of the difference; and thanks for the redirect. |
Documentation for
output_mode
currently reads:"multi_hot": Outputs a single int array per batch, of either vocab_size or max_tokens size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item.
"count": Like "multi_hot", but the int array contains a count of the number of times the token at that index appeared in the batch item.
But this isn't actually the case. A little test to show this:
Source in fact currently outputs ints for
output_mode="int"
, but floats for everything else. This seems to have been introduced as part of ef72bfbThe text was updated successfully, but these errors were encountered: