Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider eliding span.sync=false because it is the typical case #1881

Closed
trentm opened this issue Nov 18, 2020 · 13 comments
Closed

consider eliding span.sync=false because it is the typical case #1881

trentm opened this issue Nov 18, 2020 · 13 comments
Assignees
Labels
agent-nodejs Make available for APM Agents project planning.

Comments

@trentm
Copy link
Member

trentm commented Nov 18, 2020

tl;dr: Setting span.sync to true or false on every APM span results in unhelpful UI noise in Kibana. Let's only set it when it is "interesting". For Node.js the typical case is that spans are async (most traced work span event loop async operations). That leaves "span.sync==true" as the "interesting" case.

The span.sync field can be true, false, null or not set per the api-server spec.

Here is the Kibana code that adds a SyncBadge to spans in the Waterfall view and the code for the SyncBadge element. I.e. sync: true will showing "blocking" in the ui, sync: false will show "async", sync: null (or not specified) will show nothing. E.g.:

Screen Shot 2020-11-17 at 2 28 08 PM

From chat discussion (this came up when discussing #1878):

"Felix
I remember that we’ve talked about the usefulness of having all spans marked as sync=false in Node.js. Setting sync=false makes the UI add a async flag next to the span name. But as async is the default in Node.js, it would probably make more sense to not set the sync flag in the default async case but only set sync=true when there’s a blocking operation, which makes the UI show a blocking label. This is what the RUM agent is doing."

@trentm trentm added the agent-nodejs Make available for APM Agents project planning. label Nov 18, 2020
@trentm
Copy link
Member Author

trentm commented Nov 18, 2020

TODOs to consider:

  • fix this issue in the node.js APM agent
  • doc "span.sync" in apm.git specs

@smith
Copy link

smith commented Nov 18, 2020

I think the agent should be sending the most true information, so if a span is async the agent should be sending us that information. Whether or not it should be displayed in the UI is a UI issue. We can decide in the UI whether or not to show that data. IMO.

@trentm
Copy link
Member Author

trentm commented Nov 18, 2020

@smith Hi! Quoting @sqren from some earlier discussion:

"As you can see from that slack thread this decision was not communicated super well originally so agents have implemented this differently. But the intention is still for everyone to align on only setting span.sync to true or false when the async mode is different from the default mode of the language.

Søren Louv-Jansen 7 hours ago
So for node.js you’d never set it to false (because that’s the default mode of javascript). Instead you’d occasionally set it to true when that is the case"

What would be a good place to capture this intent? In the APM specs? In the APM UI docs?

@trentm trentm self-assigned this Nov 18, 2020
@smith
Copy link

smith commented Nov 18, 2020

Yeah so I'd disagree there but ok. "The default mode of the language" is ambiguous. If I'm using Twisted in Python or EventMachine in Ruby the "default mode" is async but the "default mode of the language" is sync.

@trentm
Copy link
Member Author

trentm commented Nov 18, 2020

"The default mode of the language" is ambiguous.

I agree with that. I'd change "default mode" to something like "what is typical". Speaking only for node.js, it is pretty easy to say "async is almost always the typical case". Less so for Python, as you point out. If every span is marked sync=true or false I don't know how the UI can reasonably determine if it notable enough to have a "blocking" or "async" badge -- which is why an intention for the APM Agent to decide whether to set span.sync=<bool> makes some sense to me.

@sorenlouv
Copy link
Member

Yeah so I'd disagree there but ok. "The default mode of the language" is ambiguous. If I'm using Twisted in Python or EventMachine in Ruby the "default mode" is async but the "default mode of the language" is sync.

Isn't that a good example of why we should leave it up to the agent to specify what is "typical" and what is not? Always specifying span.sync will require the UI to determine what is typical for every framework across every language. I trust the agent developers to have better domain knowledge than we do.

@astorm
Copy link
Contributor

astorm commented Nov 18, 2020

Great discussion so far all -- just coming in with my two cents:

Short Version: The Node agent's current intended behavior -- which is to start a span as sync=true and to mark it with sync=false as soon as we detect that the span has ended in an asynchronous context other than the one it started in (that's a mouthful! :)) -- feels like the correct behavior. This whole conversation kicked off because of a bug in this intended behavior -- once we close that bug I think we'll be good. @felixbarny -- this is probably another area where a cross agent spec on how agents report on threads vs. single process async vs. sync code would be a useful long term thing to have.

Long Version: I'm aligned with @smith in that it's an agent's job to represent The Truth of what's happened to the best of its ability. My experience elsewhere tells me that if we start seeing agent data as flags for particular pieces of the UI/UX we're eventually going to end up in a situation where our implementation is tied too heavily to one UI/UX and not to another. It's better to have the agent represent what actually happened to the best of it's ability.

Also -- I'd also lightly push back on the idea that Node's default behavior is async. It's true that the community of programmers around Node.js encourages asynchronous programming patterns. However by default every line of code in a Node.js program executes synchronously. It's not until a programmer taker proactive action (or the library they're using-blindly takes proactive active) to schedule something asynchronously that a span becomes async.

(waves hands like a dessert hermit yes, I know, worker threads complicate this a bit -- let's save that discussion for another day -- these are not the bleeding edge features you are looking for)

Also, regarding UI noise -- seeing async or sync next to every span may be noise to experienced javascript engineers working at a tech first company and programming in javascript/typescript for 95% of the day -- but I'm not sure that's our typical/ideal user. Folks are coming to these charts when there's problems. They're often not the engineers who wrote the code -- or when they are they're often developers who need to work in multiple different languages to keep their company's systems up and running. It seems like there's a value to being explicit about whether a span is considered async or not -- if only to help orient folks working with multiple-language environments.

@sorenlouv
Copy link
Member

It seems like there's a value to being explicit about whether a span is considered async or not

This is how the conversation started and the thinking was that it was valuable to display when it was "atypical" but noise to always show it. Particularly it becomes hard to spot the atypical. I still think that's the case but if we get feedback from users that they lack this information I'm good with always showing it.

@sorenlouv
Copy link
Member

sorenlouv commented Nov 18, 2020

if we start seeing agent data as flags for particular pieces of the UI/UX we're eventually going to end up in a situation where our implementation is tied too heavily to one UI/UX and not to another. It's better to have the agent represent what actually happened to the best of it's ability.

Makes sense. I'm interested to hear the other agent devs take on this so I'd appreciate if you can bring it up with them.

@sorenlouv
Copy link
Member

sorenlouv commented Aug 23, 2021

Reviving this old discussion. I have created elastic/kibana#109661 to solve this issue by hiding async badge from spans from the Node agent, however, the following makes me doubt if we are in agreement

I'd also lightly push back on the idea that Node's default behavior is async. It's true that the community of programmers around Node.js encourages asynchronous programming patterns. However by default every line of code in a Node.js program executes synchronously. It's not until a programmer taker proactive action (or the library they're using-blindly takes proactive active) to schedule something asynchronously that a span becomes async.

So the question is: are we happy with the current state (in which this issue and elastic/kibana#109661 can be closed immediately) or should we go ahead and implement elastic/kibana#109661 ?

@trentm
Copy link
Member Author

trentm commented Aug 23, 2021

[Alan]

Also -- I'd also lightly push back on the idea that Node's default behavior is async. ...

Alan, would you object to language that a "typical" Node.js traced span is async? I.e. we get away from using the term "default".


tl;dr: My votes are to (a) close this issue (i.e. have the Node.js APM agent report what it knows and not attempt to only report what it thinks is "interesting") and (b) continue with elastic/kibana#109661 (though I would say "when interesting", rather than "when relevant").

  • AFAICT the "sync" field isn't discussed in the APM agent specs at all. The Python async examples above show that it might be useful to at some point to have an cross-agent discussion clarifying span.sync. I don't think we need to get into it now, however. (FWIW, my naive read of the Python APM agent code is that span.sync is set False when asyncio is used, and otherwise is almost always left blank, i.e. almost never set True.)
  • I think the "async" badge in the UI on almost all spans in Node.js traces is visual clutter, so I think [APM] Only show span.sync badge when relevant kibana#109661 is a good idea.
  • Note that the "blocking" badge on all the ES spans in @sqren's screenshot in [APM] Only show span.sync badge when relevant kibana#109661 is a bug. Those ES spans are async. I hope to fix that in upcoming working.

@sorenlouv
Copy link
Member

sorenlouv commented Sep 2, 2021

One thing to mention:

Also, regarding UI noise -- seeing async or sync next to every span may be noise to experienced javascript engineers working at a tech first company and programming in javascript/typescript for 95% of the day -- but I'm not sure that's our typical/ideal user.

The intention is to still show async/blocking badge unconditionally in the flyout (when clicking a span or transaction). So for people in doubt they can always get the information there.
However in the waterfall/timeline we have different things that compete for space and in that context the async/blocking badge is currently taking up too much space.
Another option is to hide the badge entirely on the waterfall and only show it in the flyout. That would certainly be easier but then we definitely won't be calling out "atypical" items.

@trentm
Copy link
Member Author

trentm commented Sep 2, 2021

My votes are to (a) close this issue ...

Closing. Alan had no objections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
Development

No branches or pull requests

4 participants