Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestReplyHelper call-chain exceptions #163

Closed
roded opened this issue Feb 18, 2016 · 30 comments
Closed

RequestReplyHelper call-chain exceptions #163

roded opened this issue Feb 18, 2016 · 30 comments
Assignees

Comments

@roded
Copy link

roded commented Feb 18, 2016

(Copied from: https://groups.google.com/forum/#!topic/quasar-pulsar-user/wkENHH0xAIQ)

If you would consider the following pattern:

public class Start extends BasicActor<Object, Void> {
    @Override
    protected Void doRun() throws InterruptedException, SuspendExecution {
        Object call = RequestReplyHelper.call(new Mid().spawn(), new RequestMessage<Object>() {
        });
        System.out.println(call);
        return null;
    }
}

public class Mid extends BasicActor<RequestMessage<Object>, Void> {
    @Override
    protected Void doRun() throws InterruptedException, SuspendExecution {
        RequestMessage<Object> receive = receive();
        ActorRef<RequestMessage<Object>> spawn = new End().spawn();
        Object call = RequestReplyHelper.call(spawn, receive);
        RequestReplyHelper.reply(receive, call);
        return null;
    }
}

public class End extends BasicActor<RequestMessage<Object>, Void> {
    @Override
    protected Void doRun() throws InterruptedException, SuspendExecution {
        RequestMessage<Object> receive = receive();
        RequestReplyHelper.reply(receive, new Object());
        return null;
    }
}

new Start().spawn();

The behavior works fine.
However, the following exception (see below) chain is thrown in relation to ExitMessage signaling (?).
This seems like a RequestReplyHelper chain issue.
Is there anything wrong with the code above?
Thanks

java.lang.Object@7f681f5
Exception in Fiber "fiber-10000004" java.lang.RuntimeException
at co.paralleluniverse.common.util.Exceptions.rethrow(Exceptions.java:26)
at co.paralleluniverse.actors.behaviors.RequestReplyHelper$1.handleLifecycleMessage(RequestReplyHelper.java:167)
at co.paralleluniverse.actors.SelectiveReceiveHelper.receive(SelectiveReceiveHelper.java:121)
at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:174)
at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)
at com.xxx.services.rest.temp.Mid.doRun(Mid.java:17)
at com.xxx.services.rest.temp.Mid.doRun(Mid.java:12)
at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1026)
WARNING: Uninstrumented methods (marked '') or call-sites (marked '!!') detected on the call stack:
at co.paralleluniverse.common.util.Exceptions.rethrow(java.lang.Throwable) (Exceptions.java:26 bci: 29) **
at co.paralleluniverse.actors.behaviors.RequestReplyHelper$1.handleLifecycleMessage (RequestReplyHelper.java:167 bci: 41) **
at co.paralleluniverse.actors.SelectiveReceiveHelper.receive (SelectiveReceiveHelper.java:121 bci: 341) !! (instrumented suspendable calls at: [])
at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call (RequestReplyHelper.java:174 bci: 663)
at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call (RequestReplyHelper.java:112 bci: 335)
at com.xxx.services.rest.temp.Mid.doRun (Mid.java:17 bci: 192)
at com.xxx.services.rest.temp.Mid.doRun (Mid.java:12 bci: 1) (optimized)
at co.paralleluniverse.actors.Actor.run0 (Actor.java:691 bci: 222)
at co.paralleluniverse.actors.ActorRunner.run (ActorRunner.java:51 bci: 148)
at co.paralleluniverse.fibers.Fiber.run (Fiber.java:1026 bci: 11)
at co.paralleluniverse.fibers.Fiber.run1 (Fiber.java:1021 bci: 1)
Exception in Fiber "fiber-10000004" java.lang.RuntimeException
at co.paralleluniverse.common.util.Exceptions.rethrow(Exceptions.java:26)
at co.paralleluniverse.actors.behaviors.RequestReplyHelper$1.handleLifecycleMessage(RequestReplyHelper.java:167)
at co.paralleluniverse.actors.SelectiveReceiveHelper.receive(SelectiveReceiveHelper.java:121)
at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:174)
at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call(RequestReplyHelper.java:112)
at com.xxx.services.rest.temp.Mid.doRun(Mid.java:17)
at com.xxx.services.rest.temp.Mid.doRun(Mid.java:12)
at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1026)
WARNING: Uninstrumented methods (marked '') or call-sites (marked '!!') detected on the call stack:
at co.paralleluniverse.common.util.Exceptions.rethrow(java.lang.Throwable) (Exceptions.java:26 bci: 29) **
at co.paralleluniverse.actors.behaviors.RequestReplyHelper$1.handleLifecycleMessage (RequestReplyHelper.java:167 bci: 41) **
at co.paralleluniverse.actors.SelectiveReceiveHelper.receive (SelectiveReceiveHelper.java:121 bci: 341) !! (instrumented suspendable calls at: [])
at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call (RequestReplyHelper.java:174 bci: 663)
at co.paralleluniverse.actors.behaviors.RequestReplyHelper.call (RequestReplyHelper.java:112 bci: 335)
at com.xxx.services.rest.temp.Mid.doRun (Mid.java:17 bci: 192)
at com.xxx.services.rest.temp.Mid.doRun (Mid.java:12 bci: 1) (optimized)
at co.paralleluniverse.actors.Actor.run0 (Actor.java:691 bci: 222)
at co.paralleluniverse.actors.ActorRunner.run (ActorRunner.java:51 bci: 148)
at co.paralleluniverse.fibers.Fiber.run (Fiber.java:1026 bci: 11)
at co.paralleluniverse.fibers.Fiber.run1 (Fiber.java:1021 bci: 1)

@circlespainter circlespainter self-assigned this Feb 18, 2016
@roded
Copy link
Author

roded commented Feb 24, 2016

Oddly enough; this issue stopped manifesting even though the general actor calling pattern remained the same (actually, the number of actors in the chain has increased).
I'm honestly not sure what has changed. I'll update if it reappears.

@circlespainter
Copy link
Member

Ok, let's leave it open for now.

@circlespainter
Copy link
Member

@roded Has it happened again so far?

@roded
Copy link
Author

roded commented Apr 12, 2016

Nope. Maybe close this issue and I'll update once/if it happens again?

@circlespainter
Copy link
Member

Alright.

@roded
Copy link
Author

roded commented Jun 20, 2016

Same issue?

Exception in Fiber "fiber-10000011" co.paralleluniverse.actors.LifecycleException: ExitMessage{actor: ActorRef@7753e5bc{AuthActor$$EnhancerByGuice$$75b9cab7@60a91bbe[owner: fiber-10000013]}, cause: null}
at co.paralleluniverse.actors.Actor.handleLifecycleMessage(Actor.java:755)
at co.paralleluniverse.actors.Actor.filterMessage(Actor.java:571)
at co.paralleluniverse.actors.Actor.receive(Actor.java:467)
at com.xxx.protocols.rest.webactors.RestActor.doAct(RestActor.java:66)
at com.xxx.flow.common.GeneralActor.doRun(GeneralActor.java:67)
at com.xxx.flow.common.GeneralActor.doRun(GeneralActor.java:18)
at co.paralleluniverse.actors.Actor.run0(Actor.java:691)
at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51)
at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1027)
WARNING: Uninstrumented methods (marked '*') or call-sites (marked '!!') detected on the call stack:
at co.paralleluniverse.actors.Actor.handleLifecycleMessage(co.paralleluniverse.actors.LifecycleMessage) (Actor.java:755) *

at co.paralleluniverse.actors.Actor.filterMessage (Actor.java:571) **
at co.paralleluniverse.actors.Actor.receive (Actor.java:467) !! (instrumented suspendable calls at: [])
at com.xxx.protocols.rest.webactors.RestActor.doAct (RestActor.java:66)
at com.xxx.flow.common.GeneralActor.doRun (GeneralActor.java:67) (optimized)
at com.xxx.flow.common.GeneralActor.doRun (GeneralActor.java:18) (optimized)
at co.paralleluniverse.actors.Actor.run0 (Actor.java:691)
at co.paralleluniverse.actors.ActorRunner.run (ActorRunner.java:51)
at co.paralleluniverse.fibers.Fiber.run (Fiber.java:1027)
at co.paralleluniverse.fibers.Fiber.run1 (Fiber.java:1022)

@pron
Copy link
Contributor

pron commented Jun 22, 2016

@roded

Hi. Are you blocking inside handleLifecycleMessage? The method's Javadoc says:

This method is not allowed to block. If you want to block as a result of a lifecycle message, return the message from this method (rather than returningnull), and have it processed by the caller to receive.

@roded
Copy link
Author

roded commented Jul 1, 2016

Sorry for the delay.

I'm not overriding handleLifecycleMessage.
I wasn't clear on why RestActor should receive an ExitMessage. It's the only non-single-serving actor in the actor chain.
Checking the ExitMessage being processed, it seems like there are cases (this isn't consistent) in which ExitMessage is created for AuthActor and is handled in turn by RestActor's handleLifecycleMessage. Not sure if this is to be expected though.
The messaging between the actors is still done w/ a RequestReplyHandler call.

@roded
Copy link
Author

roded commented Apr 6, 2017

Hi,
I think I've found the issue.
I'm passing the same RequestMessage object between the actors. This apparently confuses the RequestReplyHandler mechanism (I should've noticed earlier that RequestMessage has a from field).
Passing a new RequestMessage object for every call mitigates the issue.
An example is available here: https://github.com/roded/quasar-request-reply-handler-test

Edit:
The LifecycleException: ExitMessage issue mentioned above is a different one.
It seems like there are cases in which call throws the LifecycleException for the called actor instead of returning the reply value.
Adding a Strand.sleep(1000) at the end of the exiting actor resolves this. Could this be a race condition between the ExitMessage and reply somehow?
Thanks

@roded
Copy link
Author

roded commented Apr 20, 2017

The ExitMessages are due to more incorrect assumptions on my part it seems.
I'm using RequestReplyHandler.call multiple times in the same actor to call different actor objects (of the same actor class).
The 2nd call starts handling the LifecycleException of the first actor called and thus fails receiving the 2nd actor's reply.
Example here: https://github.com/roded/quasar-request-reply-handler-test

As sleeping in the replying actor for a bit mitigates this, I'm guessing that the called actor's ExitMessage is being sent into the calling actor's mailbox before the currentActor.unlink(actor) is called (?).
@circlespainter @pron Is this expected behavior?
Thanks

@roded
Copy link
Author

roded commented May 3, 2017

Hi,
I'd really appreciate some feedback or a point in the right direction on this issue.
Thanks

@pron
Copy link
Contributor

pron commented May 3, 2017

Hi. You will receive it. Please be patient.

@circlespainter
Copy link
Member

Hi, I think there's a race condition here that happens if called actors exit quickly after replying, i.e. when they exit after call has received a reply but before it unlinks them.

@roded
Copy link
Author

roded commented May 6, 2017

Thanks for the confirmation.
I'd be happy to submit a PR for this.
Is the ExitMessage guaranteed to arrive after the response? If so, the ExitMessage could be ignored in case a response had already been received.

@circlespainter
Copy link
Member

That's precisely what I'm investigating right now, according to what I see it looks like this is not guaranteed but I need to dig deeper.

@circlespainter
Copy link
Member

circlespainter commented May 6, 2017

I was wrong, it seems to be guaranteed instead but still there is (at least) one problematic trace:

  • ActorA calls ActorZ-0, which links it and then sends the message
  • ActorZ-0 enqueues a reply to ActorA
  • ActorZ-0 ends and enqueues an Exit to ActorA
  • ActorA gets a reply from ActorZ-0
  • ActorA unlinks ActorZ-0 and proceeds
    ...
  • ActorA tries receiving from ActorZ-1
  • The previous Exit from ActorZ-0, which is still in the mailbox, is processed. It is not directly thrown from helper's handleLFCM but passed to super, which forwards to the actor's, which sees it's from a (long gone) link and dies anyway.

Now, I think the problem is that the call pattern would require the replying actor to be unlinked immediately after it has sent the reply (and before exiting) but this is not so trivial to do because it can probably only be ensured by the replying actor. However it should do so only in a call case and (currently) it has has no way to know that it's being called (call just sends the user message and receiving user code just uses a regular receive to fetch it).

@roded
Copy link
Author

roded commented May 8, 2017

The called actor should be using reply though, so might that be a adequate place to unlink? (Although, reply only has an ActorRef to work with).
Unlinking only in the replying actor means that if no reply is sent and the called actor exits, then an ExitMessage will propagate from call which makes sense to me.

@pron
Copy link
Contributor

pron commented May 11, 2017

@circlespainter Isn't this a symptom of a larger issue, where an actor can be unlinked after it has sent an ExitMessage but before it's been processed? This applies to watches in addition to links. Wouldn't it be easier to check when processing an ExitMessage whether the sender is still watched/linked? I think this can be easily done by checking the observed collection, and using it to track links in addition to watches.

@circlespainter
Copy link
Member

circlespainter commented May 11, 2017

@pron I think it is certainly a problem that the call pattern uses link internally and user code could be affected by fatal (by default) ExitMessages because of that.

As for the larger picture I think it's a matter of behavior definition and I think we can't rely on Erlang as a reference this time because Erlang's default link handling is signal-based while Quasar's is message-based in all cases.

The way I see it, in a message-based setting it's correct for an actor to possibly have to deal with a (delayed) ExitMessage if a linked actor dies before it is unlinked.

But you could also define that an actor has to deal with an ExitMessage about a linked actor only if the dead actor is unlinked after the living one sees (i.e. processes) the ExitMessage . What I personally don't like much about this definition is that it depends on when the ExitMessage is seen rather than when the linked actor dies and it feels less natural and intuitive in a message-based setting.

It is true that in this specific situation the second definition would make things easier (actually I thought about adopting it) because it's easier for the caller (which has really a predominant role here, not a peer one) to both link and unlink but since link is symmetric I think this is not always the case.

@pron
Copy link
Contributor

pron commented May 11, 2017

@circlespainter But every actor processes messages based on its own current state, and link/watch unlink/unwatch are directives about the actor's state that influence its default handling of lifecycle messages. Seems perfectly reasonable, no?

@circlespainter
Copy link
Member

@pron Sure. But why do you think this default would be better? That is, why by default I shouldn't take care of ExitMessages about linked/watched actors that died before I unlinked/unwatched them (which in turn I did, true, before I had a chance to see their death)?

@pron
Copy link
Contributor

pron commented May 11, 2017

@circlespainter Because the entire model is pull rather than push. The whole idea of selective receive, for example, is that the receiver is responsible for defining the order of messages, not the sender(s). That the receiver defines the order is what makes message handling far easier (in terms of managing state) than frameworks where the order is set by the sender(s).

Once you unlink/unwatch you signal that you're in a state where you no longer care about lifecycle messages. Whether a message is already in transit or not is entirely accidental.

@circlespainter
Copy link
Member

@pron I agree about the model but I thought of unlink/unwatch as signalling that you no longer care about care about the actual linked/watched actor's death from that moment on (but not before that), so as a signal about the actual lifecycle event rather than about its message. Does this interpretation make sense or why doesn't it?

@pron
Copy link
Contributor

pron commented May 11, 2017

@circlespainter I don't think it makes sense, because whether or not there are messages in transit is accidental. What does it mean "from that moment on"? Wall-clock time? In a concurrent/distributed setting, time is local, and different processes have a different view on time, and wallclock time rarely makes sense.

@pron
Copy link
Contributor

pron commented May 11, 2017

@circlespainter Also, to demonstrate the problem with "from that moment on", consider the case where actor A calls unlink at time t0, and while the unlinking is in transit (because, say, actor B is remote), actor B dies at time t1, t1 > t0. So that just doesn't work. The only time that makes sense is for every actor to work in its own local time.

@circlespainter
Copy link
Member

circlespainter commented May 11, 2017

@pron "From that moment on" would mean "from the moment unlink/unwatch is agreed" so in this case of unsuccessful unlink (before B's death, that is) actor A should get an ExitMessage about B. What I'm not sure about is the value of such an agreement (and, as a result, of such an ExitMessage information) w.r.t. a simple local decision of ignoring ExitMessages about B (as well as not sending them to B about A of course).

Or, put another way: with the current default the user actor code receives some information that we'd instead discard with your proposed new default; I see a justification for that only if this information has no meaning, or at least little use, in the "common" case. It's not so easy to define that but I'm trying to see anyway what's the meaning and use (if any) of that information.

Maybe you're right, the only important thing is A's interest in B and that's because A is the initiator of the unlink/unwatch, so receiving info about B's after that A has expressed a lack of interest, even though B might have died earlier, would not be very useful.

@pron
Copy link
Contributor

pron commented May 11, 2017

@circlespainter

"From that moment on" would mean "from the moment unlink/unwatch is agreed"

But that moment is completely arbitrary from A's perspective. It means that it may or may not accept exit messages after calling unlink/unwatch, and that it will stop getting them some time in the future.

I see a justification for that only if this information has no meaning, or at least little use, in the "common" case.

It's not that it has no meaning or no use, but that actor A has put itself in a state that means it is no longer interested in B.

@circlespainter
Copy link
Member

@pron

But that moment is completely arbitrary from A's perspective. It means that it may or may not accept exit messages after calling unlink/unwatch, and that it will stop getting them some time in the future.

Yes, but getting it means that B died before the unlink/unwatch agreement's completion and I was under the impression that an agreement was needed at least in unlink's case because of the symmetry of the link relationship. But even though the relationship is symmetrical and cancellation impacts both sides, unlink is still a local and asymmetric operation initiated by one party and, precisely because of that, it expresses a ceased interest that nullifies the value in the reception of any further ExitMessage. So I agree with you finally.

@circlespainter
Copy link
Member

circlespainter commented May 12, 2017

@pron In the end it all boils down to what A is I saying it's not interested anymore in when it unlinks / unwatches B: any ExitMessages from B (so, based on local link/watch state) or only the ones after an unlink/unwatch agreement has completed (so, based on distributed link/watch state)?

Actually the reason why the former is preferrable is even more articulate than "because A is the one initiating the unlink/unwatch": even in the link/watch case one party initiates but in that case an agreement is absolutely necessary before any ExitMessage message can be received because without one the initiating party would not get any information, i.e. the initiating party needs the other one to know. Not so in the unlink/unwatch case, which can also be simpler and more efficient because of this locality (at least in the distributed case). And that is also why the actor paradigm prefers local concurrency-safe state by definition.

So there are really several reasons why your proposed default is better, I think:

  • Because unlink / unwatch are asymmetric operations that express a local interest by the initiator (even though the relation link builds is symmetrical and in both cases it is cooperative) and
  • Because we can, i.e. unlink / unwatch on the initiating side can work based on local state only and have effect before (and completely without, at least locally at the initiator's, in the worst case) the other party agreeing and
  • Because locality is simpler and more efficient in the distributed case and
  • Because for this very reason actors as a paradigm, by definition, prefer behaving based on local state whenever possible.

I'll close the PR then and try to propose another one implementing a discarding default handler in general for unlinked / unwatched actors (through an expanded observed). This will be a more extensive change though and it will include docs as well as, possibly, testsuite adjustments.

@pron pron closed this as completed in eb8a29a May 22, 2017
@pron pron assigned pron and unassigned circlespainter May 22, 2017
@roded
Copy link
Author

roded commented May 24, 2017

Looks good.
Thanks a lot for the fix and discussion.

pron added a commit that referenced this issue May 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants