Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No automatical artifacts upload with tensorflow 2.13 #1112

Open
Kaczmarekrr opened this issue Sep 5, 2023 · 6 comments
Open

No automatical artifacts upload with tensorflow 2.13 #1112

Kaczmarekrr opened this issue Sep 5, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@Kaczmarekrr
Copy link

Kaczmarekrr commented Sep 5, 2023

Hi! First at all thank for a really good tool!

I have some troubles with logging with newer tensorflow version.
On older version (tested on 2.8 but I think might be anything below 2.11) everything works, tf.keras.callbacks.ModelCheckpoint() is saving model and automatically the model is uploaded to clearml local server in our case.

When using older version I got in artifact "outputs models" and there a position for each of saved checkpoints.
But with newer version (2.13) only uploads "variables file" which are not the whole model it is overwriting all over again and is not usable at all. The same goes with MODEL CONFIGURATION which in the older version is uploaded but here nothing happens.
I already tested with every possible saving format - still do not work.

I noticed a warning about imports of tf saying that import need to be fixed.

WARNING:tensorflow:Please fix your imports. Module tensorflow.python.training.tracking.util has been moved to tensorflow.python.checkpoint.checkpoint. The old module will be deleted in version 2.11.

The function TrackableSaver that clearml is using was moved from
tf.python.training.tracking.util to tf.python.training.checkpoint.checkpoint

I hoped that fixing import will make a job. I tried to do this but this is not enough. There is no more warning but still it do not log correctly

To reproduce

Run any training loop on newer tfversion and compere logging results to the older one.

For testing this issue I used basic tf classification tutorial with added code needed by clearml.

Environment

  • Server type: self hosted
  • ClearML SDK Version: 1.12.2
  • ClearML Server Version: WebApp: 1.12.1-397 • Server: 1.12.1-397 • API: 2.26
  • Python Version: 3.10.12
  • OS: Linux 22
@Kaczmarekrr Kaczmarekrr added the bug Something isn't working label Sep 5, 2023
@eugen-ajechiloae-clearml
Copy link
Collaborator

Hi @Kaczmarekrr ! Thank you for letting us know. We will fix this ASAP.

@niemiaszek
Copy link

Thanks @eugen-ajechiloae-clearml, would help a lot. Keras introduced "Keras v3" format with .keras extension as recommended from TF2.13. Not sure if this is related to this issue, but would be nice if ClearML worked with both SavedModel and Keras v3.

@eugen-ajechiloae-clearml
Copy link
Collaborator

@niemiaszek Can you please post an example of "Keras v3"? We would like to look into it as well

@niemiaszek
Copy link

niemiaszek commented Sep 7, 2023

Sure @eugen-ajechiloae-clearml . It's introduced in 2.13 release as default format in place of SavedModel. It can be created according to an example in Keras documentation. Here is an output model generated from this example: example.keras.zip. I had to zip it to upload it directly here. Upon further inspection it contains 3 files: "config.json", "metadata.json" and "model.weights.h5"

@pollfly
Copy link
Contributor

pollfly commented Oct 2, 2023

Hey @Kaczmarekrr! Just letting you know that this issue has been resolved in v1.13.0. Let us know if there are any issues :)

@Kaczmarekrr
Copy link
Author

@pollfly Thanks for letting me know! Already tested. At this moment it works as intended. :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants