Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build against TF2.13 #2835

Merged
merged 40 commits into from
Jul 11, 2023
Merged

Build against TF2.13 #2835

merged 40 commits into from
Jul 11, 2023

Conversation

seanpmorgan
Copy link
Member

@seanpmorgan seanpmorgan commented May 31, 2023

Description

Waiting on:

  • TF2.13 release
  • Update to TFA README regarding EOL
  • TF-2.13 macos release

@boring-cyborg boring-cyborg bot added the github label May 31, 2023
@seanpmorgan
Copy link
Member Author

@qlzh727 getting a lot of Please ensure wrapped layer is a valid Keras layer -- is there a new spec we have to comply with?

@qlzh727
Copy link
Member

qlzh727 commented May 31, 2023

I think there is a newly added check in keras-team/keras@7eb8ef2#diff-5b101fb3499cadc6810aaa4ec68b961018d6d0c86ed9e9295e5ffb097e516be2 for the layer instance type checking in wrapper.

@seanpmorgan
Copy link
Member Author

seanpmorgan commented May 31, 2023

Thanks... this is causing breakage throughout the repo ... is there documentation you can share on what what changed here? Can we use a legacy serialization instead?

I'm also seeing seq2seq break for many tests... mind taking a look when time allows?

@qlzh727
Copy link
Member

qlzh727 commented May 31, 2023

From the test error log, it seems that the wrapper was feeding a tensor, rather than a layer instance, is that expected behavior?

Also @nkovela1 who is the original author for the change on keras side.

@boring-cyborg boring-cyborg bot added the layers label Jun 1, 2023
@seanpmorgan
Copy link
Member Author

From the test error log, it seems that the wrapper was feeding a tensor, rather than a layer instance, is that expected behavior?

Also @nkovela1 who is the original author for the change on keras side.

Ack, the test was to verify an error is thrown on a Tensor input and it now raises ValueError instead of AssertionError. We can test for either or using pytest without much issue.

There is another error though that is only happening on TF2.13:

ValueError: Unknown layer: 'SpectralNormalization'. Please ensure you are using a `keras.utils.custom_object_scope` and that this object is included in the scope. See https://www.tens
orflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.

We register our layers using @tf.keras.utils.register_keras_serializable(package="Addons") . Is there a new check that is forcing a keras.utils.custom_object_scope ? Seems like a regression?

@qlzh727
Copy link
Member

qlzh727 commented Jun 1, 2023

Member

There is a fork of SpectralNormalization layer sent to keras with keras-team/keras#17648, and I think you are now getting a naming conflict.

@seanpmorgan
Copy link
Member Author

Member

There is a fork of SpectralNormalization layer sent to keras with keras-team/keras#17648, and I think you are now getting a naming conflict.

@qlzh727 Appears that this isn't related to Addons as I can make minimal reproducible examples:
tensorflow/tensorflow#61085

Can you please help get this looked at before final is cut

@qlzh727
Copy link
Member

qlzh727 commented Jun 26, 2023

Member

There is a fork of SpectralNormalization layer sent to keras with keras-team/keras#17648, and I think you are now getting a naming conflict.

@qlzh727 Appears that this isn't related to Addons as I can make minimal reproducible examples: tensorflow/tensorflow#61085

Can you please help get this looked at before final is cut

Ack. CC'ed related ppl on the issue.

@bhack
Copy link
Contributor

bhack commented Jul 3, 2023

@seanpmorgan You can cherry-pick a6ff967 when 2.13 final is out.

@bhack
Copy link
Contributor

bhack commented Jul 10, 2023

@qlzh727 The test is the same as always but it is failing now. Here the log..
https://source.cloud.google.com/results/invocations/037ec635-4ae1-4658-84bc-d6ec0af40bba/log

@qlzh727
Copy link
Member

qlzh727 commented Jul 10, 2023

@qlzh727 The test is the same as always but it is failing now. Here the log.. https://source.cloud.google.com/results/invocations/037ec635-4ae1-4658-84bc-d6ec0af40bba/log

This is quite weird, the same test is passing for 2.12 and failing for 2.13? Do we see same error if we build with tf-nightly?

@seanpmorgan
Copy link
Member Author

@qlzh727 The test is the same as always but it is failing now. Here the log.. https://source.cloud.google.com/results/invocations/037ec635-4ae1-4658-84bc-d6ec0af40bba/log

This is quite weird, the same test is passing for 2.12 and failing for 2.13? Do we see same error if we build with tf-nightly?

Yes, the tests pass on TF2.12:
https://source.cloud.google.com/results/invocations/836b1250-2695-44d0-ad6d-b7cda5f65722/targets/tensorflow_addons%2Fubuntu%2Fgpu%2Fpy3%2Fpresubmit/log

Based on this commit:
606071c

Sean Morgan added 2 commits July 10, 2023 14:12
@seanpmorgan
Copy link
Member Author

seanpmorgan commented Jul 10, 2023

Confirmed that it is TF2.13 and not the container:

https://source.cloud.google.com/results/invocations/67d3a1b3-2448-411d-9b83-2f90ab204982/targets/tensorflow_addons%2Fubuntu%2Fgpu%2Fpy3%2Fpresubmit/log

@seanpmorgan
Copy link
Member Author

@qlzh727 The test is the same as always but it is failing now. Here the log.. https://source.cloud.google.com/results/invocations/037ec635-4ae1-4658-84bc-d6ec0af40bba/log

This is quite weird, the same test is passing for 2.12 and failing for 2.13? Do we see same error if we build with tf-nightly?

See tests above, confirmed that this is failing for TF2.13 and not TF2.12. It's also failing for tf-nightly:
https://source.cloud.google.com/results/invocations/48c8ca19-45bd-433d-a259-b8cd64ed89bc/targets

@seanpmorgan seanpmorgan changed the title [WIP] Build against TF2.13 Build against TF2.13 Jul 10, 2023
@qlzh727
Copy link
Member

qlzh727 commented Jul 10, 2023

@qlzh727 The test is the same as always but it is failing now. Here the log.. https://source.cloud.google.com/results/invocations/037ec635-4ae1-4658-84bc-d6ec0af40bba/log

This is quite weird, the same test is passing for 2.12 and failing for 2.13? Do we see same error if we build with tf-nightly?

See tests above, confirmed that this is failing for TF2.13 and not TF2.12. It's also failing for tf-nightly: https://source.cloud.google.com/results/invocations/48c8ca19-45bd-433d-a259-b8cd64ed89bc/targets

Thanks for the confirmation, since we should have same test internally, let me see what's the status there.

@seanpmorgan seanpmorgan changed the title Build against TF2.13 [WIP] Build against TF2.13 Jul 10, 2023
@seanpmorgan
Copy link
Member Author

@seanpmorgan Glad to hear it works with the old serialization flags. The majority of TF OSS packages have been debugged for compatibility with the new format along with backwards compatibility so that these flags are not necessary. However, due to TFAddons EOL, I have given TFAddons lower priority due to lack of bandwidth.

Just want to quickly address this comment. The upcoming EOL of TFA should not lower the priority of TFA testing. Through the years we probably caught 20-30 bugs in releases because of the number of tests we run. Checking TFA tests should help the Google team prevent breaking changes and net new bugs, not simply ensure that our builds pass so we can publish.

@bhack
Copy link
Contributor

bhack commented Jul 10, 2023

@qlzh727 The test is the same as always but it is failing now. Here the log.. https://source.cloud.google.com/results/invocations/037ec635-4ae1-4658-84bc-d6ec0af40bba/log

This is quite weird, the same test is passing for 2.12 and failing for 2.13? Do we see same error if we build with tf-nightly?

See tests above, confirmed that this is failing for TF2.13 and not TF2.12. It's also failing for tf-nightly: https://source.cloud.google.com/results/invocations/48c8ca19-45bd-433d-a259-b8cd64ed89bc/targets

@seanpmorgan Is it different on nightly?

RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem .

@qlzh727
Copy link
Member

qlzh727 commented Jul 10, 2023

I just saw the similar error from keras nightly build, which is caused by a numpy version change from tf-nightly.

@bhack
Copy link
Contributor

bhack commented Jul 10, 2023

It seems that with 2.13 we have an error tracing:

@tf.function
def apply_gradients():
   opt.apply_gradients([(grads, var)])
    
device.run(apply_gradients)

@bhack
Copy link
Contributor

bhack commented Jul 10, 2023

    def autograph_handler(*args, **kwargs):
      """Calls a converted version of original_func."""
      try:
        return api.converted_call(
            original_func,
            args,
            kwargs,
            options=converter.ConversionOptions(
                recursive=True,
                optional_features=autograph_options,
                user_requested=True,
            ))
      except Exception as e:  # pylint:disable=broad-except
        if hasattr(e, "ag_error_metadata"):
>         raise e.ag_error_metadata.to_exception(e)
E         tensorflow.python.autograph.impl.api.StagingError: in user code:
E         
E             File "/usr/local/lib/python3.9/dist-packages/keras/src/optimizers/utils.py", line 175, in _all_reduce_sum_fn  **
E                 return distribution.extended.batch_reduce_to(
E         
E             IndexError: tuple index out of range

/usr/local/lib/python3.9/dist-packages/tensorflow/python/eager/polymorphic_function/autograph_util.py:52: StagingError

@seanpmorgan seanpmorgan changed the title [WIP] Build against TF2.13 Build against TF2.13 Jul 11, 2023
@seanpmorgan seanpmorgan requested a review from bhack July 11, 2023 15:07
@seanpmorgan seanpmorgan merged commit aa52afd into tensorflow:master Jul 11, 2023
56 checks passed
@seanpmorgan seanpmorgan deleted the build-tf213 branch July 11, 2023 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants