-
Notifications
You must be signed in to change notification settings - Fork 6.8k
segfault in native code while trying to use CustomOp #11926
Comments
@nswamy Could you please add labels: Bug, Scala, Operator |
I'll take a look since I happen to be working on custom operator stuff already. |
Alright, did some investigating. First a workaround, changing the returns in your inferShape and inferType to this seems to get around the problem: Likely the middle could also be null but I didn't test that explicitly. Now for a more in-depth explanation of what's happening. The segfault is happening in this inferShape JNI code and would also happen in this inferType JNI code. The issue here is that what the JNI code calls numInputs is actually defined to be the sum of the inputs, outputs, and aux. So in this particular case numInputs is 1, we go into the for loop and try to do a env->GetObjectArrayElement(ret, 0) which is failing. Hopefully this unblocks you. I'll give some thought to how this should be fixed in the JNI code. |
@andrewfayres This workaround is working, thank you. |
Happy to help. I've added a task to fix the JNI code to our backlog (https://issues.apache.org/jira/browse/MXNET-773). @nswamy You can close this issue. User is unblocked and we're tracking the bug in jira. |
@mdespriee thanks for raising this issue and glad to hear the workaround worked, i am curious to know about your use-case ? |
@nswamy Sure. MXNet looks like a good candidate for the task, doesn't it ? |
@mdespriee Thanks for explaining your use-case, looks very interesting and yes MXNet is perfect for such a use-case. We are aware of a few issues with this and working to resolve them #11753. We will further enhance this to bring parity with the MXNet Python RNN APIs. MXNet Python provides a very good set of APIs and RNN building blocks Please reach out to the community on dev@mxnet.apache.org or on https://the-asf.slack.com (#mxnet/#mxnet-scala channels) |
@andrewfayres Unfortnuately, I stumbled on another problem, when trying to instantiate 2 constants in a flow. Either there's something I don't get on how CustomOp should be used, or there are some subtle bugs in the code. Or both. Here we go. I create a symbol adding a variable and 2 instances of Constant.
I executed a forward(), with
So 2 problems: the values in kwargs got mixed at instantiation, and only one ConstantOp instance (and value) used in forward() Here is the full code:
Thanks for your help EDIT : added a bit of logs for kwargs as well, as it may help |
@nswamy @andrewfayres should I open another ticket for this second problem ? a JIRA ? |
I'll take a look and depending on complexity I might open another ticket to track this. |
@andrewfayres Could you please link your ticket to this issue? What is the current status? Thanks! |
Current status: I looked into this last week some but I couldn't find any obvious reason why it was happening. I need to make a jira ticket for the investigation & resolution. I'll try and get one made this afternoon and link it here. |
@nswamy @sandeep-krishnamurthy This issue is not related to Operator implementation. Please remove the [Operator] label. We have created a JIRA ticket MXNET-933 to implement the native constant operator in MXNet backend. |
@mdespriee Here to help, are you still facing the problems? |
After some tests, your problem can be solved through this:
It will actually bring you the right result you need. The weird thing is the Custom Operator that being registered is an object shared along in the same state. The state is used in the Symbolic graph as you used in here. When you have two ops are defined in the same state, they will share the same kwargs as the Engine would thought they are the same since they all registered to the same op. The weird thing with Symbolic graph, it will only be analysed at the end in order to produce the best way to execute. That cause the overwrite problem you have. The same problem is not reproducible in sequential execution with NDArray. |
I accidentally added the wrong issue id to the pr above. It doesn't have anything to do with this. |
@mdespriee Close this for now and link a new Issue to this since the problem seemed to be solved. Please feel free to reopen if problem persist or you would like to follow up. Thanks |
I'm trying to use CustomOp to create a Constant, as it has been suggested in #8428.
As soon as I define that my CustomOp has no inputs, it fails with a segfault, and I can't find a workaround.
Environment
Code
Error:
in the log
side-note
As you see in the code, I'm obliged to hack NDArrays into strings to transmit the data. That's because CustomOp implementation defines Map[String, String] for kwargs, whereas Symbol.Custom allows Map[String, Any]. It leads to very strange things where we actually have, at runtime, non-string objects behind java String references. But they aren't castable anyway because of the type system. Weird
A change of the def in CustomOp would be welcome.
The text was updated successfully, but these errors were encountered: