Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Remote: implement "log frame size exceeding" #879

Closed
pcwiek opened this issue Apr 20, 2015 · 18 comments
Closed

Akka.Remote: implement "log frame size exceeding" #879

pcwiek opened this issue Apr 20, 2015 · 18 comments

Comments

@pcwiek
Copy link

pcwiek commented Apr 20, 2015

Is there a way to log everything that's happening in an actor and, if possible, the actor system in general?

I'm using TopShelf for service hosting and Serilog (min level: debug) for logging.

Including:

akka {
    (...)
     actor {
          loglevel = DEBUG
          stdout-loglevel = DEBUG
          provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
          debug {
             receive = on
             autoreceive = on
             fsm = on
             lifecycle = on
             unhandled = on
             event-stream = on
             router-misconfiguration = on
          }
    loggers = ["Akka.Logger.Serilog.SerilogLogger,Akka.Logger.Serilog"]
    remote {
          log-remote-lifecycle-events = DEBUG
          (...)
     }
}

in the configuration doesn't change anything when you compare the log output without the debug section specified at all.

I'm asking because I'm trying to track a really bizarre behavior (one message, which has 4 collections in it, all of collection items have Date (DateTime) member, is going nowhere unless I set members' dates to default(DateTime) in at least one of the collections...?!) but it might be hard to reproduce. So I'd rather exhaust all my monitoring and debugging options first before spending half my day setting up a separate solution...

@rogeralsing
Copy link
Contributor

That could be some odd json.net datetime issue you are seeing.
That would explain why it works with default values as those are not emitted in json by default

Skickat från min iPhone

20 apr 2015 kl. 17:37 skrev Patryk Ćwiek notifications@github.com:

Is there a way to log everything that's happening in an actor and, if possible, the actor system in general?

I'm using TopShelf for service hosting and Serilog (min level: debug) for logging.

Including:

akka {
(...)
actor {
loglevel = DEBUG
stdout-loglevel = DEBUG
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
debug {
receive = on
autoreceive = on
fsm = on
lifecycle = on
unhandled = on
event-stream = on
router-misconfiguration = on
}
loggers = ["Akka.Logger.Serilog.SerilogLogger,Akka.Logger.Serilog"]
remote {
log-remote-lifecycle-events = DEBUG
(...)
}
}
in the configuration doesn't change anything when you compare the log output without the debug section specified at all.

I'm asking because I'm trying to track a really bizarre behavior (one message, which has 4 collections in it, all of collection items have Date (DateTime) member, is going nowhere unless I set members' dates to default(DateTime) in at least one of the collections...?!) but it might be hard to reproduce. So I'd rather exhaust all my monitoring and debugging options first before spending half my day setting up a separate solution...


Reply to this email directly or view it on GitHub.

@pcwiek
Copy link
Author

pcwiek commented Apr 20, 2015

Could be, although if I 'manually' convert to JSON, everything seems to work fine.

Also, dates in each and every collection are converted just fine separately, but not when I'm trying to send all of them, hence my enormous WTF.

@rogeralsing
Copy link
Contributor

Can you try this?:

            var serializer = actorSystem.Serialization.FindSerializerFor(myMessage);
            var binary = serializer.ToBinary(myMessage);
            var result= serializer.FromBinary(binary, typeof(MyMessageType));

And see what comes out of result
if its a good copy of your message, we can rule out serialization.

If it's not a serializer issue, my best bet would be that the dates doesnt have anything to do with it except for message size and it could be a remoting issue with large messages (?)

But lets verify the serializer first

@pcwiek
Copy link
Author

pcwiek commented Apr 20, 2015

Sure, I'll try that tomorrow in the morning.

My first thought was some kind of message size limit too - especially considering that all parts of the message seem to work on their own, but when I put them together the message goes into void - but I don't know nearly enough about internals of Akka to tell whether my hunch was even close to being right.

@pcwiek
Copy link
Author

pcwiek commented Apr 21, 2015

@rogeralsing I have just tried it, serialization works fine - round trip serialization gives the exact copy.

[EDIT] Although it's a bit of XY problem, my original question still stands - how to enable full debugging and tracing in a system? :) It would make it that much easier to troubleshoot...

[EDIT2] OK, I found a way to log all messages, unfortunately the heartbeat and gossip messages clutter the log a little bit...

akka {
    loglevel = DEBUG
    remote {
         log-sent-messages = on
         log-received-messages = on
    }
}

@pcwiek
Copy link
Author

pcwiek commented Apr 22, 2015

@rogeralsing @Aaronontheweb I encountered the same thing in a different place in the application. Is there some kind of a size limit for remote messages? It seems that big messages get lost somewhere on their way... Either that or there's something weird going on with Ask.

For example, this time I have a details screen with entries in it. From the business process perspective, the number of entries can vary between ~500 and 20000. One entry is a string (~10 - 12 characters) and a boolean flag. The details screen has an extra description (string, up to 200 characters), one date (DateTimeOffset), user name (string, in practice up to ~20 characters), boolean flag, two ids (int).

When I take <= ~950 elements, I get the reply back to the Web application that Asks for the data. When I take >= ~1000 elements, originalSender.Tell in the remote actor sends the reply, but Ask in the web app is just hanging there.

I want to add that the first case when I had that happen (~4 - 5 collections, 240 elements each) is also when I Ask the local proxy actor from the Web app. It can be a coincidence, but I think it's worth mentioning. That 'proxy' actor forwards the message to all remote actors in the cluster with original sender which should get the reply, which is, for example, tcp.akka//my-system@127.0.0.1:2300/temp/f or similar.

Serialization seems to work just fine...

Is it possible that that Ask right at the very beginning is messing something up? Could that be somehow related to the other issue with Tell going missing?

@pcwiek
Copy link
Author

pcwiek commented Apr 22, 2015

@rogeralsing @Aaronontheweb Another update! Sorry guys for flooding you with all that...

Anyway, when the message is 'small' (as in, it passes through), the chatter between the actors looks like this:

'Small' message

  1. Web (Asking local proxy actor)

    2015-04-22 12:08:45.914 -04:00 [Debug] "RemoteMessage: ActorSelectionMessage - Message: Shared.General.DetailsRequest - WildCartFanOut: False - Elements: user/details to [[akka.tcp://my-system@127.0.0.1:21300/]]<+[akka.tcp://my-system@127.0.0.1:21300/] from [[akka://my-system/temp/g]]"
    
  2. Service (processing and Tell to sender, which would be temporary Ask actor)

    • Receive is acked

      2015-04-22 12:08:45.922 -04:00 [Debug] received local message ["RemoteMessage: ActorSelectionMessage - Message: Shared.General.DetailsRequest - WildCartFanOut: False - Elements: user/details to [akka://my-system/]<+akka://my-system/ from [akka.tcp://my-system@127.0.0.1:23000/temp/g]"]
      
    • Reply is sent

      2015-04-22 12:08:46.019 -04:00 [Debug] "RemoteMessage: Shared.General.Data.Details to [[akka.tcp://my-system@127.0.0.1:23000/temp/g]]<+[akka.tcp://my-system@127.0.0.1:23000/temp/g] from [[akka://my-system/user/details/get/$d]]"
      
  3. Web (ack when Ask returns)

    2015-04-22 12:08:46.071 -04:00 [Debug] received local message ["RemoteMessage: Shared.General.Data.Details to [akka://my-system/temp/g]<+akka://my-system/temp/g from [akka.tcp://my-system@127.0.0.1:21300/user/details/get/$d]"]
    

Everything just works.

On the other hand...

'Big' message

  1. Web (Asking local proxy actor) [Pretty much the same as above]

    2015-04-22 12:08:06.641 -04:00 [Debug] "RemoteMessage: ActorSelectionMessage - Message: Shared.General.DetailsRequest - WildCartFanOut: False - Elements: user/details to [[akka.tcp://my-system@127.0.0.1:21300/]]<+[akka.tcp://my-system@127.0.0.1:21300/] from [[akka://my-system/temp/e]]"
    
  2. Service (processing and Tell to sender) [Pretty much the same as above]

    • Receive is acked

      2015-04-22 12:08:06.867 -04:00 [Debug] received local message ["RemoteMessage: ActorSelectionMessage - Message: Shared.General.DetailsRequest - WildCartFanOut: False - Elements: user/details to [akka://my-system/]<+akka://my-system/ from [akka.tcp://my-system@127.0.0.1:23000/temp/e]"]
      
    • Reply is sent

      2015-04-22 12:08:08.790 -04:00 [Debug] "RemoteMessage: Shared.General.Data.Details to [[akka.tcp://my-system@127.0.0.1:23000/temp/e]]<+[akka.tcp://my-system@127.0.0.1:23000/temp/e] from [[akka://my-system/user/details/get/$b]]"
      

    So, far so good, right?

  3. Web (ack when Ask returns)

    Missing completely...

No idea what happens here. All preceding steps and operations are exactly the same.

@Aaronontheweb
Copy link
Member

@pcwiek so the big message makes it over the wire, but the Ack never returns?

@pcwiek
Copy link
Author

pcwiek commented Apr 22, 2015

@Aaronontheweb Hard to say if it actually makes it over the wire, because there's no way of telling what happens in the transport layer. What I can say is that the processing actor actually sends the message, but it's never received on the Ask side and Ask is just left there hanging...

@Aaronontheweb
Copy link
Member

@pcwiek so one thing worth bearing in mind is that the underlying Helios transport does have a maximum size threshold for messages it can send:

https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote/Transport/Helios/HeliosTransport.cs#L76

MaxFrameSize must be at least 32k. By default it's 128k https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote/Configuration/Remote.conf#L359

You can increase the size of this value to support whatever you need, but be aware that some network stacks (UDP) may not support arbitrarily large messages. We have some tests inside Helios that ramp up to 4mb and pass, so you have a lot of room to work with when using TCP: https://github.com/helios-io/helios/blob/dev/tests/Helios.MultiNodeTests/LargeMessageTests.cs#L105

@pcwiek
Copy link
Author

pcwiek commented Apr 22, 2015

@Aaronontheweb Thank you!

I upped the maximum-frame-size to 32 MB and now even the large analysis results are making it back. Perfect!

I just wish there was some kind of an error when that kind of situation happens, so that it could be logged - something simple like 'Remote message exceeds maximum frame size' or something :) Right now it's a completely silent failure.

@pcwiek pcwiek closed this as completed Apr 22, 2015
@Aaronontheweb Aaronontheweb reopened this Apr 22, 2015
@Aaronontheweb
Copy link
Member

@pcwiek so there is supposed to be a setting for this: https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote/Configuration/Remote.conf#L142

But we don't respect or use it inside Akka.Remote right now - so I'm going to rename this issue and leave it open as a to-do.

@Aaronontheweb Aaronontheweb changed the title Logging everything that's happening in an actor and actor system as a whole Akka.Remote: implement "log frame size exceeding" Apr 22, 2015
@Aaronontheweb
Copy link
Member

This might need to wait until Helios 2.0 ships, where we'll have some better error capture. Alternatively we can do this inside the HeliosTcpAssociationHandle.

@pcwiek
Copy link
Author

pcwiek commented Apr 22, 2015

@Aaronontheweb For now I would just add a paragraph about message sizes and potential pitfalls to 'How transparency is broken' section in 'Remoting' docs. I can actually do that and send a pull request if you'd like.

@Aaronontheweb
Copy link
Member

@pcwiek that would be excellent - please send along a PR

@Aaronontheweb
Copy link
Member

@cpx86
Copy link
Contributor

cpx86 commented Aug 6, 2015

This looks fun :) I'll have a look at it.

cpx86 added a commit to cpx86/akka.net that referenced this issue Aug 10, 2015
- Added property MaximumPayloadBytes to Transport. For HeliosTransport
  this is the same as MaxFrameSize.
- If payload size > MaximumPayloadBytes, log OversizedPayloadException
  and do not send message.

Corresponding Akka Scala code:
https://github.com/akka/akka/blob/master/akka-remote/src/main/scala/akka/remote/Endpoint.scala#L745
https://github.com/akka/akka/blob/master/akka-remote/src/test/scala/akka/remote/RemotingSpec.scala#L527
https://github.com/akka/akka/search?utf8=%E2%9C%93&q=maximumpayloadbytes
cpx86 added a commit to cpx86/akka.net that referenced this issue Aug 16, 2015
- Added property MaximumPayloadBytes to Transport. For HeliosTransport
  this is the same as MaxFrameSize.
- If payload size > MaximumPayloadBytes, log OversizedPayloadException
  and do not send message.

Corresponding Akka Scala code:
https://github.com/akka/akka/blob/master/akka-remote/src/main/scala/akka/remote/Endpoint.scala#L745
https://github.com/akka/akka/blob/master/akka-remote/src/test/scala/akka/remote/RemotingSpec.scala#L527
https://github.com/akka/akka/search?utf8=%E2%9C%93&q=maximumpayloadbytes
Aaronontheweb added a commit that referenced this issue Aug 17, 2015
Do not send if payload size > max allowed. Fixes #879
@Aaronontheweb Aaronontheweb removed the WIP label Aug 17, 2015
@Aaronontheweb Aaronontheweb modified the milestone: Akka.NET v1.1 Jun 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants