Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run Keras example using PlaidML backend on MacOs on GPU #290

Open
robinhad opened this issue Oct 18, 2019 · 5 comments
Open

Can't run Keras example using PlaidML backend on MacOs on GPU #290

robinhad opened this issue Oct 18, 2019 · 5 comments

Comments

@robinhad
Copy link

OS: macOS 10.15
Python: 3.6.9
Tensorflow: 1.14
Ngraph installed from PyPI.
The following code:

import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
import numpy as np
import ngraph_bridge
ngraph_bridge.set_backend('PLAIDML')

model = ResNet50(weights='imagenet')

batch_size = 128
img = np.random.rand(batch_size, 224, 224, 3)
preds = model.predict(preprocess_input(img))
print('Predicted:', decode_predictions(preds, top=3)[0])
model.compile(tf.keras.optimizers.SGD(), loss='categorical_crossentropy')
preds = model.fit(
    preprocess_input(img), np.zeros((batch_size, 1000), dtype='float32'))
print('Ran a train round')

Throws the following error:

Compilation failed: 
program_source:44:8: error: cannot initialize a variable of type 'int4' (vector of 4 'int' values) with an rvalue of type 'metal::float4' (aka 'float4')
  int4 LX_T3 = floor((float4)LX_T2);
       ^       ~~~~~~~~~~~~~~~~~~~~

CPU backend works fine.
Seems like it cannot cast "metal:float4" to "float4".

@denise-k
Copy link

denise-k commented Oct 21, 2019

Thanks for reporting @robinhad. I'm from the PlaidML team.

Does this code work if you use PlaidML with OpenCL instead of Metal? You can configure PlaidML for the nGraph-TF bridge using the following instructions.

As a note: we've had several open issues with Metal on MacOS Mojave (10.14) which we expect to be fixed in Catalina (10.15), but we have not yet been able to upgrade our build/test machines to Catalina to test things out. Either way, this information is helpful for our team to resolve any issues with Catalina.

@robinhad
Copy link
Author

Thanks for response, @dgkutnic.

I run with OpenCL and it crashed. Error looks pretty similar to the Metal one.
Device: opencl_intel_iris(tm)_plus_graphics_655.0
Error:
[WARN] 2019-10-21T18:22:49z src/ngraph/runtime/plaidml/plaidml_logger.cpp 44 Failed to build program kernel_c45_sdk_0
[WARN] 2019-10-21T18:22:49z src/ngraph/runtime/plaidml/plaidml_logger.cpp 44 Failed build log: :34:8: error: initializing '__int4' with an expression of incompatible type '__float4'
int4 LX_T3 = floor(convert_float4(LX_T2));
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[WARN] 2019-10-21T18:22:49z src/ngraph/runtime/plaidml/plaidml_logger.cpp 44 Code was:
1: // gid: 1 1 1
2: // lid: 1 1 1
3: // Names: { i1 }
4: // Ranges: { 1 }
5: // Out stride: { 1 }
6: // Elementwise input X_I_1 shape: i32(4):(1):16 bytes
7: // Elementwise input X_I_0 shape: i32(4):(1):16 bytes
8: // Elementwise op: X_T4 = cmp_lt(X_I_1, X_T0)
9: // Elementwise op: X_T1 = cond(X_T4, X_T0, X_I_1)
10: // Elementwise op: X_T2 = div(X_I_0, X_T1)
11: // Elementwise op: X_T3 = floor(X_T2)
12: // Tile size: { 1 }
13: // Contraction output var shape: bool(4):(1):4 bytes
14: // Computed true ops: 16
15: // Computed work groups: 1
16: // Computed inner loops: 1
17: // Computed shared mem: 0
18: // Computed out regs: 512
19: // Computed mem read: 8
20: // Computed mem write: 256
21: // Computed operations: 1
22: // Computed rollups: 0
23: // Computed threads used: 1
24: // lwork = 1, 1, 1
25: // gwork = 1, 1, 1
26: __kernel void kernel_c45_sdk_0(__global int4* restrict X_T3, __global const int4* restrict X_I_1, __global const int4* restrict X_I_0)
27: {
28: int tid = get_local_id(0);
29: int4 LX_I_1 = X_I_1[0];
30: int4 LX_I_0 = X_I_0[0];
31: int4 LX_T4 = (LX_I_1 < (int4)1);
32: int4 LX_T1 = select(convert_int4(LX_I_1), (int4)1, convert_int4(LX_T4));
33: int4 LX_T2 = (LX_I_0 / LX_T1);
34: int4 LX_T3 = floor(convert_float4(LX_T2));
35: X_T3[0] = LX_T3;
36: }

[WARN] 2019-10-21T18:22:49z src/ngraph/runtime/plaidml/plaidml_logger.cpp 44 Failed to build program kernel_c45_sdk_0: CL_BUILD_PROGRAM_FAILURE
[WARN] 2019-10-21T18:22:49z src/ngraph/runtime/plaidml/plaidml_logger.cpp 44 Failed to build program kernel_c45_sdk_0
[WARN] 2019-10-21T18:22:49z src/ngraph/runtime/plaidml/plaidml_logger.cpp 44 Failed build log: :34:8: error: initializing '__int4' with an expression of incompatible type '__float4'
int4 LX_T3 = floor(convert_float4(LX_T2));
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[WARN] 2019-10-21T18:22:49z src/ngraph/runtime/plaidml/plaidml_logger.cpp 44 Code was:
1: // gid: 1 1 1
2: // lid: 1 1 1
3: // Names: { i1 }
4: // Ranges: { 1 }
5: // Out stride: { 1 }
6: // Elementwise input X_I_1 shape: i32(4):(1):16 bytes
7: // Elementwise input X_I_0 shape: i32(4):(1):16 bytes
8: // Elementwise op: X_T4 = cmp_lt(X_I_1, X_T0)
9: // Elementwise op: X_T1 = cond(X_T4, X_T0, X_I_1)
10: // Elementwise op: X_T2 = div(X_I_0, X_T1)
11: // Elementwise op: X_T3 = floor(X_T2)
12: // Tile size: { 1 }
13: // Contraction output var shape: bool(4):(1):4 bytes
14: // Computed true ops: 16
15: // Computed work groups: 1
16: // Computed inner loops: 1
17: // Computed shared mem: 0
18: // Computed out regs: 512
19: // Computed mem read: 8
20: // Computed mem write: 256
21: // Computed operations: 1
22: // Computed rollups: 0
23: // Computed threads used: 1
24: // lwork = 1, 1, 1
25: // gwork = 1, 1, 1
26: __kernel void kernel_c45_sdk_0(__global int4* restrict X_T3, __global const int4* restrict X_I_1, __global const int4* restrict X_I_0)
27: {
28: int tid = get_local_id(0);
29: int4 LX_I_1 = X_I_1[0];
30: int4 LX_I_0 = X_I_0[0];
31: int4 LX_T4 = (LX_I_1 < (int4)1);
32: int4 LX_T1 = select(convert_int4(LX_I_1), (int4)1, convert_int4(LX_T4));
33: int4 LX_T2 = (LX_I_0 / LX_T1);
34: int4 LX_T3 = floor(convert_float4(LX_T2));
35: X_T3[0] = LX_T3;
36: }

libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injectorboost::promise_already_satisfied >: The state of the promise has already been set.

@denise-k
Copy link

Thanks for this info. We'll look into this and let you know once we've triaged the errors you are seeing.

@robinhad
Copy link
Author

is there any news regarding this?

@robinhad
Copy link
Author

robinhad commented Jul 7, 2020

@dgkutnic, tried it again on 5700 XT - works fine without crash
maybe iGPU has not enough memory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants