Compiling mobilenetV2 from ducumentation: "failed to legalize operation 'stablehlo.convolution' that was explicitly marked illegal" #19852

metal3d · 2025-01-30T08:24:16Z

What happened?

Hell,

I successfully converted MobileNet V2 to mlir but then, the iree-compile failed to create the vmfb file.

I followed the documentation page: https://iree.dev/guides/deployment-configurations/gpu-vulkan/#compile-a-program

Steps to reproduce your issue

# prepare workspace
mkdir -p Projects/ML/ireetest
cd Projects/ML/ireetest
python3.12 -mvenv venv
source venv/bin/activate

pip install tensorflow iree-base-compiler iree-base-runtime iree-tools-tf

# download mobilenet v2 from tfhub (that is now kaggle)
mkdir models
cd models
curl -L -o mobilenetv2.tar.gz \
  https://www.kaggle.com/api/v1/models/google/mobilenet-v2/tensorFlow2/035-128-classification/2/download
tar xf mobilenetv2.tar.gz
cd ..

# checks:
ls -lah models/
total 7,4M
drwxr-xr-x. 1 metal3d metal3d   82 30 janv. 09:01 .
drwxr-xr-x. 1 metal3d metal3d   50 30 janv. 09:19 ..
-rw-r--r--. 1 metal3d metal3d 6,2M 30 janv. 09:01 mobilenetv2.tar.gz
-rwx------. 1 metal3d metal3d 1,3M 15 nov.   2023 saved_model.pb
drwxr-x--x. 1 metal3d metal3d   88 15 nov.   2023 variables


# in python:
python -c "import tensorflow.compat.v2 as tf;model = tf.saved_model.load('./models/');print(list(model.signatures.keys()))"
# output:
['serving_default']


# import
iree-import-tf \
  --tf-import-type=savedmodel_v1 \
  --tf-savedmodel-exported-names=serving_default \
  ./models/ -o iree_input.mlir

# checks:
ls -lah iree_input.mlir
-rw-r--r--. 1 metal3d metal3d 6,5M 30 janv. 09:12 iree_input.mlir

### The problem starts here:

# compile:
 iree-compile --iree-hal-target-backends=vulkan-spirv iree_input.mlir --iree-vulkan-target=ampere -o mobilenet.vmfb
-:549:11: error: failed to legalize operation 'stablehlo.convolution' that was explicitly marked illegal
-:549:11: note: see current operation: %420 = "stablehlo.convolution"(%415, %419) <{batch_group_count = 1 : i64, dimension_numbers = #stablehlo.conv<[b, 0, 1, f]x[0, 1, i, o]->[b, 0, 1, f]>, feature_group_count = 16 : i64, padding = dense<1> : tensor<2x2xi64>, precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>], rhs_dilation = array<i64: 1, 1>, window_strides = array<i64: 1, 1>}> : (tensor<?x64x64x16xf32>, tensor<3x3x1x16xf32>) -> tensor<?x64x64x16xf32>

I tried with savedmodel_v2:

iree-compile --iree-hal-target-backends=vulkan-spirv iree_input.mlir -o mobilenet.vmfb
-:1:1: error: outer module does not contain a vm.module op
-:1:1: note: see current operation:
"builtin.module"() ({
^bb0:
}) : () -> ()
error opening input file: failed to generate bytecode

What component(s) does this issue relate to?

Compiler, Python

Version information

IREE compiler version 3.1.0rc20250107 @ d224220
LLVM version 20.0.0git
Optimized build

Additional context

Running on Fedora 41 - GPU is RTX 3060 - using "Vulkan"

The text was updated successfully, but these errors were encountered:

ScottTodd · 2025-02-04T16:33:54Z

I can reproduce this in Colab: https://colab.research.google.com/gist/ScottTodd/39b0ac7f054650011b2a4012d34b6afa/iree-issue19852.ipynb

MobileNetV2 should work and StableHLO should be stable. Neither is currently the case, but that can be fixed. This is especially worth fixing since as you point out, our documentation for Vulkan and some other backends uses that as the first example. I have a task filed on #18174 to "replace TensorFlow MobileNet example with something more recent / supported" too...

For this specific issue, a few things stand out to me:

The iree-import-tf tool, source at integrations/tensorflow/python_projects/iree_tf/iree/tools/tf/scripts/iree_import_tf/__main__.py, is not generating vhlo (stablehlo's "versioned", stable dialect), it is generating stablehlo. Changing that may be a matter of swapping the tf-lower-to-mlprogram-and-hlo pipeline in there to some vhlo pipeline, if one exists. I assumed someone would have tested that full pipeline back when VHLO support was added in Add VHLO support for IREE #16999, but I guess not :/
The docs at https://iree.dev/guides/ml-frameworks/tensorflow/#using-the-command-line-tool should use the .mlirbc extension for bytecode, and we should also document ways to convert that to text (iree-ir-tool cp input.mlirbc -o output.mlir is my go-to)
Given that the IR is stablehlo and not vhlo, some change to the stablehlo.convolution definition or lowering path must have changed at some point. I wonder when this broke (and a bisect could help), and if using vhlo would actually fix this.

ScottTodd · 2025-02-04T16:52:25Z

Trying to see how we'd get to the VHLO dialect of StableHLO from TF, since ideally we wouldn't need a downstream import tool at all, we'd just consume StableHLO that the framework exports.

The StableHLO website has tutorials for JAX and PyTorch to StableHLO and StableHLO --> TF, but not TF --> StableHLO?

ScottTodd · 2025-02-04T16:57:38Z

Maybe we could bundle stablehlo-translate as part of iree-tools-tf and run serialize_portable_artifact(module, target_version) after the TensorFlow pass pipelines: https://openxla.org/stablehlo/compatibility . Though we'd actually need stablehlo-translate to be matched to what version of stablehlo TensorFlow has, not the version that IREE has, so it would need to be part of the tensorflow python package 🤔.

ScottTodd · 2025-02-04T17:29:24Z

Tried to use the stablehlo nightly releases from close to the installed tensorflow version but I think I'm holding the APIs wrong? Or maybe stablehlo can't handle the additional dialects like ml_program in the program. Can't tell much from "ValueError: failed to serialize module".

# for python 3.11
pip install -f https://github.com/openxla/stablehlo/releases/expanded_assets/dev-wheels stablehlo==1.8.0.1730182293+acc379ab

from mlir.dialects import stablehlo

with open("iree_input_text.mlir", "r") as f:
  data = f.read()
  print(data[-1000:])
  
  serialized = stablehlo.serialize_portable_artifact_str(
      data, stablehlo.get_current_version()
  )

%from_elements_7 : tensor<2xindex>, tensor<2xindex> -> tensor<2xindex>
      %223 = stablehlo.dynamic_broadcast_in_dim %211, %222, dims = [0, 1] : (tensor<?x1001xf32>, tensor<2xindex>) -> tensor<?x1001xf32>
      %224 = stablehlo.dynamic_broadcast_in_dim %213, %222, dims = [0, 1] : (tensor<?x1xf32>, tensor<2xindex>) -> tensor<?x1001xf32>
      %225 = stablehlo.subtract %223, %224 : tensor<?x1001xf32>
      shape.assuming_yield %225 : tensor<?x1001xf32>
    }
    %217 = stablehlo.exponential %216 : tensor<?x1001xf32>
    %218 = stablehlo.reduce(%217 init: %cst_5) applies stablehlo.add across dimensions = [1] : (tensor<?x1001xf32>, tensor<f32>) -> tensor<?xf32>
    %dim_8 = tensor.dim %218, %c0 : tensor<?xf32>
    %from_elements_9 = tensor.from_elements %dim_8, %c1 : tensor<2xindex>
    %219 = shape.shape_of %217 : tensor<?x1001xf32> -> tensor<2xindex>
    %220 = shape.cstr_broadcastable %219, %from_elements_9 : tensor<2xindex>, tensor<2xindex>
    return %210 : tensor<?x1001xf32>
  }
}
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
ValueError: failed to serialize module

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
[<ipython-input-13-2f38caeacad8>](https://localhost:8080/#) in <cell line: 0>()
      4   print(data[-1000:])
      5 
----> 6   serialized = stablehlo.serialize_portable_artifact_str(
      7       data, stablehlo.get_current_version()
      8   )

SystemError: <built-in method serialize_portable_artifact_str of PyCapsule object at 0x7ba3a1c1ea30> returned a result with an exception set

ScottTodd · 2025-02-04T17:50:48Z

Bisected IREE releases:

!pip install iree-compiler==20240226.813
!iree-compile --version
!iree-compile --iree-hal-target-backends=llvm-cpu iree_input.mlir -o mobilenet_cpu.vmfb

IREE (https://iree.dev/):
  IREE compiler version 20240226.813 @ 14895845b13cb776b33116c49998e2629e2fa1b8
  LLVM version 19.0.0git
  Optimized build
-:545:10: error: 'stablehlo.convolution' op attribute 'window_strides' failed to satisfy constraint: 64-bit signless integer elements attribute
-:545:10: note: see current operation: %277 = "stablehlo.convolution"(%16, %26) {batch_group_count = 1 : i64, dimension_numbers = #stablehlo.conv<[b, 0, 1, f]x[0, 1, i, o]->[b, 0, 1, f]>, feature_group_count = 1 : i64, padding = dense<[[0, 1], [0, 1]]> : tensor<2x2xi64>, precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>], rhs_dilation = array<i64: 1, 1>, window_strides = array<i64: 2, 2>} : (tensor<?x128x128x3xf32>, tensor<3x3x3x16xf32>) -> tensor<?x64x64x16xf32>
-:545:10: note: in bytecode version 1 produced by: MLIR20.0.0git

!pip install iree-compiler==20240410.859
!iree-compile --version
!iree-compile --iree-hal-target-backends=llvm-cpu iree_input.mlir -o mobilenet_cpu.vmfb

IREE (https://iree.dev/):
  IREE compiler version 20240410.859 @ b4273a4bfc66ba6dd8f62f6483d74d42a7b936f1
  LLVM version 19.0.0git
  Optimized build
-:549:11: error: failed to legalize operation 'stablehlo.convolution' that was explicitly marked illegal
-:549:11: note: see current operation: %420 = "stablehlo.convolution"(%415, %419) {batch_group_count = 1 : i64, dimension_numbers = #stablehlo.conv<[b, 0, 1, f]x[0, 1, i, o]->[b, 0, 1, f]>, feature_group_count = 16 : i64, padding = dense<1> : tensor<2x2xi64>, precision_config = [#stablehlo<precision DEFAULT>, #stablehlo<precision DEFAULT>], rhs_dilation = array<i64: 1, 1>, window_strides = array<i64: 1, 1>} : (tensor<?x64x64x16xf32>, tensor<3x3x1x16xf32>) -> tensor<?x64x64x16xf32>

So ... that's not good that this is broken in multiple ways, but it does give a date range for the "failed to legalize operation" error: between 20240226.813 and 20240410.859. The earliest change that looks relevant is #16561.

ScottTodd · 2025-02-04T17:59:56Z

Tried https://www.kaggle.com/models/google/mobilenet-v3/tensorFlow2/small-075-224-classification instead of mobilenet-v2 using IREE compiler version 3.1.0rc20250107 @ d2242207764230ad398585a5771f9d54ce91b4c8, got a different error:

-:269:14: error: failed to legalize operation 'stablehlo.dynamic_broadcast_in_dim' that was explicitly marked illegal
-:269:14: note: see current operation: %2591 = "stablehlo.dynamic_broadcast_in_dim"(%176, %2590) <{broadcast_dimensions = array<i64: 0, 1, 2, 3>}> : (tensor<?x112x112x16xf32>, tensor<4xindex>) -> tensor<?x112x112x16xf32>

ScottTodd · 2025-02-05T23:33:02Z

I think we may drop TensorFlow support, or at least heavily de-emphasize it. I've filed a few issues to help with planning there and sent a PR to switch those docs to a working ONNX example instead.

Hope that helps!

) Progress on #18174, updating some stale documentation. > [!NOTE] > Demo here: https://scotttodd.github.io/iree/guides/deployment-configurations/cpu/ Changes included: * Switch examples to use ONNX instead of TensorFlow given that users are trying to use TensorFlow and failing: #19852 * Add more documentation for CPU targets and features for #18561 * Standardize some formatting across CPU/CUDA/ROCm/Vulkan pages * Adjust some parts of the ONNX guide now that support is more mature

metal3d added the bug 🐞 Something isn't working label Jan 30, 2025

ScottTodd added integrations/tensorflow TensorFlow model import and conversion integrations/stablehlo StableHLO (JAX/TensorFlow/etc) import and conversion labels Feb 4, 2025

This was referenced Feb 5, 2025

[RFC] Deprecate the iree-import-tf tool and iree-tools-tf package #19917

Open

Is there an export path from TensorFlow models to VHLO? openxla/stablehlo#2708

Open

Refresh compile/run examples in deployment configuration guides. #19920

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling mobilenetV2 from ducumentation: "failed to legalize operation 'stablehlo.convolution' that was explicitly marked illegal" #19852

Compiling mobilenetV2 from ducumentation: "failed to legalize operation 'stablehlo.convolution' that was explicitly marked illegal" #19852

metal3d commented Jan 30, 2025 •

edited

Loading

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 4, 2025 •

edited

Loading

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 5, 2025

Compiling mobilenetV2 from ducumentation: "failed to legalize operation 'stablehlo.convolution' that was explicitly marked illegal" #19852

Compiling mobilenetV2 from ducumentation: "failed to legalize operation 'stablehlo.convolution' that was explicitly marked illegal" #19852

Comments

metal3d commented Jan 30, 2025 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 4, 2025 • edited Loading

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 4, 2025

ScottTodd commented Feb 5, 2025

metal3d commented Jan 30, 2025 •

edited

Loading

ScottTodd commented Feb 4, 2025 •

edited

Loading