Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

operator: Support Clip op for cpu, wasm, and webgl backends #107

Merged
merged 12 commits into from
Mar 20, 2019

Conversation

hariharans29
Copy link
Member

Resolve #92

@hariharans29 hariharans29 changed the title (WIP) Support Clip operator for cpu, wasm, and webgl backends Support Clip operator for cpu, wasm, and webgl backends Mar 13, 2019
const size = result.floatData.length;
if (inputs[0].type === 'float32') {
WasmBinding.getInstance().ccall(
'_clip_f32', [inputs[0].floatData, 'float32ptr'], [result.floatData, 'float32ptr', 'out'], [size, 'int32'],
Copy link
Member Author

@hariharans29 hariharans29 Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No use of templatizing as unary-op in WASM backend as there is not much to templatize (no core implementation here). In fact, the overhead in dealing with types and attributes parsing (which is much more prevalent than in binary-op case) makes it an unattractive option. So, keep the kernels separate even for unary-op like implementations


// Core implementation of the op
template <typename T>
void clip_imp(const T *input, T *output, const int32_t length, const float min,
Copy link
Member Author

@hariharans29 hariharans29 Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No use of templatizing like binary_imp as in binary ops there is a lot to templatize on (for example - the broadcasting logic, etc.). Here - there is only a single for loop to templatize on and it becomes cumbersome to deal with special-casing of attributes in unary-op case. So, only templatizing on different types of the op and instantiating as necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK for now. If we are going to have 10+ unary-op implementations it will be the time to consider this problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can potentially support all unary-ops in WASM. Even then I would find it hard to templatize the core implementation as each has separate attributes. Since there is no native "attribute" object in our WASM layer, it becomes really hard to templatize this.

@hariharans29 hariharans29 changed the title Support Clip operator for cpu, wasm, and webgl backends operator: Support Clip op for cpu, wasm, and webgl backends Mar 13, 2019
@hariharans29 hariharans29 requested a review from fs-eire March 13, 2019 01:36
Copy link
Contributor

@fs-eire fs-eire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. there is one small bug in WASM implementation.

About WASM:
Compare WASM vs. javascript on element-wise unary op for performance:

  • possible perf gain: avoid float32<->float64 convert for each element (because javascript number is float64, and the underlying data is usually Float32Array)
  • possible drawback: extra cost in interop (ccallSerialize() and ccallDeserialize())

And this is also a JIT(V8)-vs-emcc battle, so the perf result may be very different in different browser version.

Due to our previous perf data (as we tried Relu before) in resnet50 there is no performance gain from implementation of simple element-wise unary op in WASM. But it is possible that WASM implementation can be faster than javascript. So in my opinion it's a good idea to have WASM unary-op implementation.

const float max) {
for (size_t i = 0; i < length; ++i) {
const auto &val = input[i];
output[i] = (val < min) ? min : (val > max) ? val : val;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output[i] = (val < min) ? min : (val > max) ? max : val;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Will fix.

Copy link
Member Author

@hariharans29 hariharans29 Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, I am not sure how this is not being caught by node tests. I tried adding unit tests, but I realized that our test data sits in some other repo making it hard to add tests :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

op test is in the repo and node tests are not. do you want to add test case into node tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not resolved this op in the wasm backend and so it was passing (despite the bug) all tests by falling back to the cpu backend. I added wasm native clip in the operators resolve function and this caught the bug (as expected)

@hariharans29
Copy link
Member Author

Generally looks good to me. there is one small bug in WASM implementation.

About WASM:
Compare WASM vs. javascript on element-wise unary op for performance:

  • possible perf gain: avoid float32<->float64 convert for each element (because javascript number is float64, and the underlying data is usually Float32Array)
  • possible drawback: extra cost in interop (ccallSerialize() and ccallDeserialize())

And this is also a JIT(V8)-vs-emcc battle, so the perf result may be very different in different browser version.

Due to our previous perf data (as we tried Relu before) in resnet50 there is no performance gain from implementation of simple element-wise unary op in WASM. But it is possible that WASM implementation can be faster than javascript. So in my opinion it's a good idea to have WASM unary-op implementation.

Another thing to consider for JS vs WASM unary-ops is the actual operation being performed on each element. For example, let's consider exp(x) - I do not know how slow exp(x) is for JS when compared to C++. My hypothesis is that (I do not have proof yet) for certain large number of elements, the interop cost + float32 <-> float64 + c++ element-wise op performance <= pure JS element-wise op.

@fs-eire
Copy link
Contributor

fs-eire commented Mar 18, 2019

Generally looks good to me. there is one small bug in WASM implementation.
About WASM:
Compare WASM vs. javascript on element-wise unary op for performance:

possible perf gain: avoid float32<->float64 convert for each element (because javascript number is float64, and the underlying data is usually Float32Array)
possible drawback: extra cost in interop (ccallSerialize() and ccallDeserialize())

And this is also a JIT(V8)-vs-emcc battle, so the perf result may be very different in different browser version.
Due to our previous perf data (as we tried Relu before) in resnet50 there is no performance gain from implementation of simple element-wise unary op in WASM. But it is possible that WASM implementation can be faster than javascript. So in my opinion it's a good idea to have WASM unary-op implementation.

Another thing to consider for JS vs WASM unary-ops is the actual operation being performed on each element. For example, let's consider exp(x) - I do not know how slow exp(x) is for JS when compared to C++. My hypothesis is that (I do not have proof yet) for certain large number of elements, the interop cost + float32 <-> float64 + c++ element-wise op performance <= pure JS element-wise op.

It is actually interop cost + WASM c++ element-wise op performance <= float32 <-> float64 + pure JS element-wise op.

For float32 tensor, the data is stored in Float32Array. For javascript, accessing an element in that typed array will cause a float32 -> float64, and write an element back to the typed array will cause a float64 -> float32.

The real underlying math function is always native so that if they are same function it should not make difference. however exp(double)(happened in js) and exp(float32)(happened in wasm) are expected to have difference.

@hariharans29 hariharans29 merged commit 0432a35 into microsoft:master Mar 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants