Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook 05: load_weights results in "Incompatible tensor with shape (1280, 10)..." #544

Open
ivanthecrazy opened this issue Apr 27, 2023 · 23 comments

Comments

@ivanthecrazy
Copy link

ivanthecrazy commented Apr 27, 2023

When creatingmodel_2 and trying to load the weights by

model_2.load_weights(checkpoint_path)

I'm getting the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-55-d2e3006b884f>](https://localhost:8080/#) in <cell line: 2>()
      1 # Load model from checkpoint, that way we can fine-tune from the same stage the 10 percent data model was fine-tuned from
----> 2 model_2.load_weights(checkpoint_path) # revert model back to saved weights

1 frames
[/usr/local/lib/python3.9/dist-packages/tensorflow/python/ops/resource_variable_ops.py](https://localhost:8080/#) in _restore_from_tensors(self, restored_tensors)
    718             self.handle, self.shape, restored_tensor)
    719       except ValueError as e:
--> 720         raise ValueError(
    721             f"Received incompatible tensor with shape {restored_tensor.shape} "
    722             f"when attempting to restore variable with shape {self.shape} "

ValueError: Received incompatible tensor with shape (1280, 10) when attempting to restore variable with shape (1, 1, 1152, 48) and name Adam/m/block7a_se_reduce/kernel:0.

I tried to download the notebook from this repo, but have the same result.

@mrdbourke mrdbourke changed the title load_weights results in "Incompatible tensor with shape (1280, 10)..." Notebook 05: load_weights results in "Incompatible tensor with shape (1280, 10)..." May 12, 2023
@mrdbourke
Copy link
Owner

mrdbourke commented May 12, 2023

Hi @ivanthecrazy,

Investigating this issue myself.

I'm going through the following resources:

Looks like it's an issue with newer versions of TensorFlow and tf.keras.applications.efficientnet models and using the load_weights() method.

My current solution is installing TensorFlow 2.9.0 (as suggested by the links above) and running it from there.

For example:

# Install TensorFlow 2.9.0 to avoid issues (later versions may work)
# -U stands for "update" and "-q" stands for "quiet"
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

I will make sure this works and investigate it further if something is wrong.

I'll post another comment here once I've fixed the notebook: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb

@mrdbourke
Copy link
Owner

mrdbourke commented May 12, 2023

Update: I've confirmed that running notebook 05 works end-to-end with TensorFlow 2.9.0 (as per the links above).

Install TensorFlow 2.9.0 with:

# Install TensorFlow 2.9.0 to avoid issues (later versions may work)
# -U stands for "update" and "-q" stands for "quiet"
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

I'm not quite sure what's happening with later versions (e.g. 2.10.0+), the issues above seem to be long standing.

The notebook code has been updated to reflect installing TensorFlow 2.9.0 at the start.

See the updated code here and let me know how it goes: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb

@filipposkar
Copy link

I had the same problem. I can verify that the new Daniel's notebook (tf version 2.9.0) works fine. Furthermore all these “Model failed to serialize as JSON” warnings, while fitting the various models have been disappeared.

@mrdbourke
Copy link
Owner

Hi @filipposkar , glad to hear you got it fixed!

Looks like this should also be fixed further in upcoming versions of TensorFlow (e.g. 2.13+).

For now, it looks like TensorFlow 2.9.0 works.

See this comment here: keras-team/tf-keras#383

@mrdbourke
Copy link
Owner

Update: looks like TensorFlow 2.9.0 is still the most stable here, see: #553

TL;DR tried tf-nightly(2.14.0-dev20230520) and it still broke.

@ivanthecrazy
Copy link
Author

Thank you @mrdbourke

@OFALOFAL
Copy link

OFALOFAL commented Jul 1, 2023

i have issiue with changing verion of tensorflow, '!pip install -U -q tensorflow==2.9.0' doesn't work

@VuduVations
Copy link

@OFALOFAL

Here is my temporary work around.

The contributing factor seems to be stemming from line of code at [29] from @mrdbourke 05_transfer_learning where the install of TF is upgrading to latest version of TF, however, some of the dependencies are deprecated in the latest version of TF since we are working with tensorflow 2.9.0.

  1. Removed import tensorflow as tf from block [29] of 05_transfer_learning github

Screenshot 2023-07-01 at 6 18 29 PM

  1. Scroll to top of your code to block [1] and use
  • !pip uninstall -y tensorflow to remove the 2.12.x version

Screenshot 2023-07-01 at 6 24 12 PM

  1. Insert the tensorflow==2.9.0 install and import in block [2]
  • !pip install -U -q tensorflow==2.9.0
  • import tensorflow as tf
  • print(tf.version)
  • from tensorflow import keras

Screenshot 2023-07-01 at 6 24 18 PM

**(notes on protobuf below as the dependency is incompatible; however, the results have compiled the same as predicted.)

  1. I cleared all outputs and compiled the code from the beginning.

Screenshot 2023-07-01 at 6 14 46 PM

The Protobuf dependency used in TensorFlow is used to serialize and deserialize data. This means that it can be used to convert data from one format to another, such as from a Python object to a binary file. This is useful for TensorFlow because it allows models to be saved and loaded easily, and it also allows for communication between different TensorFlow components.

Specifically, Protobuf is used in TensorFlow for the following purposes:

  • Serializing and deserializing TensorFlow models: When a TensorFlow model is saved, it is serialized into a Protobuf file. This file can then be loaded back into TensorFlow to restore the model.

  • Communicating between different TensorFlow components: TensorFlow components, such as the TensorFlow Serving server and the TensorFlow Lite library, use Protobuf to communicate with each other. This allows them to exchange data in a format that is both efficient and easy to understand.

  • Providing a common data format for TensorFlow and other libraries: Protobuf is a widely used data format, so it can also be used to communicate with other libraries that use Protobuf. This makes it easier to integrate TensorFlow with other libraries, such as the gRPC RPC framework.

Overall, the Protobuf dependency used in TensorFlow is a valuable tool that allows TensorFlow models to be saved, loaded, and communicated with other components. It is a versatile data format that is widely used in the industry, and it makes TensorFlow more accessible to other libraries and frameworks. - Source: Bard

@arpadikuma
Copy link

arpadikuma commented Jul 4, 2023

Your version of protobuf will most likely result in errors with tensorflow-datasets
It requires a much more recent version. The issue is that it requires a module called builder.py that's not present in version 3.19.x
The best workaround for that so far is to force reinstall protobuf=3.20.3 using pip install --force-reinstall "protobuf=3.20.3". Pip will complain about incompatibilities left and right but I've found it to work without issues so far with tf 2.9 to 2.12 with tensorflow-datasets and other libraries.

@OFALOFAL

Here is my temporary work around.

The contributing factor seems to be stemming from line of code at [29] from @mrdbourke 05_transfer_learning where the install of TF is upgrading to latest version of TF, however, some of the dependencies are deprecated in the latest version of TF since we are working with tensorflow 2.9.0.

  1. Removed import tensorflow as tf from block [29] of 05_transfer_learning github

Screenshot 2023-07-01 at 6 18 29 PM

  1. Scroll to top of your code to block [1] and use
  • !pip uninstall -y tensorflow to remove the 2.12.x version

Screenshot 2023-07-01 at 6 24 12 PM

  1. Insert the tensorflow==2.9.0 install and import in block [2]
  • !pip install -U -q tensorflow==2.9.0
  • import tensorflow as tf
  • print(tf.version)
  • from tensorflow import keras

Screenshot 2023-07-01 at 6 24 18 PM

**(notes on protobuf below as the dependency is incompatible; however, the results have compiled the same as predicted.)

  1. I cleared all outputs and compiled the code from the beginning.

Screenshot 2023-07-01 at 6 14 46 PM

The Protobuf dependency used in TensorFlow is used to serialize and deserialize data. This means that it can be used to convert data from one format to another, such as from a Python object to a binary file. This is useful for TensorFlow because it allows models to be saved and loaded easily, and it also allows for communication between different TensorFlow components.

Specifically, Protobuf is used in TensorFlow for the following purposes:

  • Serializing and deserializing TensorFlow models: When a TensorFlow model is saved, it is serialized into a Protobuf file. This file can then be loaded back into TensorFlow to restore the model.
  • Communicating between different TensorFlow components: TensorFlow components, such as the TensorFlow Serving server and the TensorFlow Lite library, use Protobuf to communicate with each other. This allows them to exchange data in a format that is both efficient and easy to understand.
  • Providing a common data format for TensorFlow and other libraries: Protobuf is a widely used data format, so it can also be used to communicate with other libraries that use Protobuf. This makes it easier to integrate TensorFlow with other libraries, such as the gRPC RPC framework.

Overall, the Protobuf dependency used in TensorFlow is a valuable tool that allows TensorFlow models to be saved, loaded, and communicated with other components. It is a versatile data format that is widely used in the industry, and it makes TensorFlow more accessible to other libraries and frameworks. - Source: Bard

@ammarsaf
Copy link

ammarsaf commented Aug 9, 2023

Hi @mrdbourke .

I run the line suggested,

!pip uninstall -y tensorflow
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

But it showed this.

Found existing installation: tensorflow 2.9.0
Uninstalling tensorflow-2.9.0:
  Successfully uninstalled tensorflow-2.9.0
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
TensorFlow version: 2.12.0

@arpadikuma
Copy link

Hi @mrdbourke .

I run the line suggested,

!pip uninstall -y tensorflow
!pip install -U -q tensorflow==2.9.0
import tensorflow as tf
print(f"TensorFlow version: {tf.__version__}")

But it showed this.

Found existing installation: tensorflow 2.9.0
Uninstalling tensorflow-2.9.0:
  Successfully uninstalled tensorflow-2.9.0
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
WARNING: Ignoring invalid distribution -ensorflow (/usr/local/lib/python3.10/dist-packages)
TensorFlow version: 2.12.0

Did you restart the runtime? Iirc tensorflow tells you it will only take effect after restarting it

@mrdbourke
Copy link
Owner

Hi all,

After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0 problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0.

You can see a full write-up of the fix here: #575

@talha-0
Copy link

talha-0 commented Aug 21, 2023

It worked for me if i recompile the model before loading weights it may be because the model was training and it changed some layers and the tensor shape was no longer compatible

@mrdbourke
Copy link
Owner

@talha-0 Great catch! Thank you for the update!

@ezawadzki
Copy link

I got the same issue with trying to customize my model for Image Classification.
I noticed that it worked the first time but after I got this error.
After deleting the export model folder each time I do the training, it works, even with Tensorflow=2.11.0

@SGhuman123
Copy link

Hi all,

After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0 problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0.

You can see a full write-up of the fix here: keras-team/keras#575

I tried the solution here but it doesn't seem to work for me

@mrdbourke
Copy link
Owner

Hi all,
After much troubleshooting, I've found the best fix for tf.keras.applications.EfficientNetB0 problems is to simply upgrade to tf.keras.applications.efficientnet_v2.EfficientNetV2B0.
You can see a full write-up of the fix here: keras-team/keras#575

I tried the solution here but it doesn't seem to work for me

Oh dam!

What error are you getting now?

Did you try to reference the updated Notebook 05? See: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/05_transfer_learning_in_tensorflow_part_2_fine_tuning.ipynb

@nika-va
Copy link

nika-va commented Oct 4, 2023

I recompiled the model:

model_2.compile(loss='categorical_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics='accuracy')

and got rid of .ckpt from the checkpoint_path:
checkpoint_path = 'ten_percent_model_checkpoints_weights/checkpoint'
it just works perfectly fine now.

@AgusZanini
Copy link

AgusZanini commented Oct 5, 2023

Using tf.keras.applications.efficientnet_v2.EfficientNetV2B0 didn't work for me, neither using other versions of tensorflow. It only works if I compile the model again before loading weights. If I leave the .ckpt extension or not in the checkpoint path does not affect the result, I think.

@MiaZhengLS
Copy link

I got the similar error when I tried to load the best model from the keras tuner. I'm using a custom transformer model and the tuning works fine.

image

@MiaZhengLS
Copy link

I also tested if I don't create a new tuner instance with the same parameter (except 'overwrite=False') but use the tuner instance created for fine-tuning, I don't get the error anymore but this time I'm required to provide input_shape for model.build
image

@shounak03
Copy link

getting the same error in 2024 as well "ValueError: Received incompatible tensor with shape (1280, 10) when attempting to restore variable with shape (1, 1, 1152, 48) and name Adam/m/block6h_se_reduce/kernel:0.", i tried downloading the 2.9 version but it doesnt work, any help @mrdbourke?

@evgen1100
Copy link

Actually this issue caused because model is recompiling between weights are saved and loaded.
in other words we are trying to load weights in slightly different model (with unlocked layers of base model).
quite obvious solution is - recreate model from scratch and load weights once again (and unlock layers once again if needed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests