Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APM summit #58

Closed
watson opened this issue Jun 12, 2016 · 20 comments
Closed

APM summit #58

watson opened this issue Jun 12, 2016 · 20 comments

Comments

@watson
Copy link
Member

watson commented Jun 12, 2016

APM stands for Application Performance Management.

We had a really good tracing/APM session at NodeConf Adventure two days ago with many of the APM vendors represented (NodeSource, Dynatrace, AppNeta and Opbeat).

There seemed to be a general agreement that we would all benefit from working closer together. A first step in this process would be to arrange an APM summit and meet up in person. Kind of like the error summit held this January.

It would be most beneficial if we could narrow the scope of the summit as much as possible. I'd like if the fist item on the agenda could be to lay out a roadmap of what we would like to achieve, but please pitch in below.

I suggest that we have the summit at NodeSummit in San Francisco on July 25th (the day before the conference starts). I've heard they have an extra meeting room that we might be able to borrow (I'll follow up with more details).

Here a some of the notes from the NodeConf Adventure session:

  • APM is hard to get right in Node.js (lot's of monkey patching, lot's of edge cases, lot's unsolved issues)
  • Callback queues in user-land modules are especially hard (think generic-pool)
  • Maybe we should formalise a generic tracing protocol for user-land modules to use if they want to be easily traceable
  • Everyone keeps reinventing the wheel
  • Part of the problem space we deal with every day, might be better solved in Node core
  • A good first step would be to create a roadmap of what we as a group want to achieve by working together
  • Having regular in-person summits would be of great value and helps speed things up (this is how TC39 gets all their stuff done)
  • The foundation might be able to help pay to get key people to attend who can't get their employer to sponsor

I most likely forgot some of what we discussed, so please add your comments below. In fact, all comments are highly appreciated 😃

Action needed:

Please fill in this Doodle if you want to attend the APM Summit and mark the dates / locations you are able to attend: http://doodle.com/poll/utqxycqki8chyddd

/cc @othiym23 @brycebaril @danielkhan @groundwater @Qard @dshaw

@AndreasMadsen
Copy link
Member

What does the acronym APM mean? "Asynchronous Programming Model" doesn't make sense to me in this context.

@ofrobots
Copy link
Contributor

@watson Thanks for posting this. APM summit would be great! Please count me and @matthewloring in as well. It would be great if we could formalize on some generic APM tracing protocol.

Insofar as context loss due to user-land queuing is concerned, I started writing this simple module that could be a good starting point: https://github.com/ofrobots/context-is-everything. The basic idea this could be a central protocol that context observers (APM modules, continuation-local-storage, etc.) and user-land queuing modules (mongodb, mysql, redis, grpc, etc.) could both sign up to in order to propagate async context. Here's an example patch on how continuation-local-storage could work with this module: ofrobots/node-continuation-local-storage@8fca413.

@AndreasMadsen APM stands for Application Performance Management.

@hmdhk
Copy link

hmdhk commented Jun 13, 2016

Zone.js provides similar functionality but at Node.js api level.
It is far from being comprehensive but their new api is interesting.

@mhdawson
Copy link
Member

Sounds good to me as well. @tobespc and @mchamberlain FYI

@danielkhan
Copy link
Contributor

Thank's for putting that all together @watson - I'm obviously in as well.

@rvagg
Copy link
Member

rvagg commented Jun 14, 2016

iirc ES Modules presents some important challenges for APM, mainly due to their static nature and the standard APM monkey-patching pattern, that should probably be on the agenda unless I'm not remembering the details correctly. It's really important that we have APM in the Modules discussion so we can move forward without leaving a massive chunk of our tooling ecosystem behind.

@Qard
Copy link
Member

Qard commented Jun 14, 2016

Indeed. @bmeck has already contacted some of us about ES6 module concerns. It'd be good for us to find a solution together face-to-face.

@joshgav
Copy link
Contributor

joshgav commented Jun 15, 2016

Thanks @watson, I'd certainly like to meet you all F2F 👍 /cc @avanderhoorn @nikmd23

A first step in this process would be to arrange an APM summit and meet up in person.

Could also help to have some open discussions in this repo now. Some topics from this thread which could be issues/topics:

  • monkey-patching and ES6 module semantics;
  • protocol for traces (e.g. name, level, object with expected props);
  • F2F meetings

Maybe we should formalise a generic tracing protocol for user-land modules to use if they want to be easily traceable.

I didn't specify what the payload objects would look like, but was prototyping architecture and API for this in #50 and joshgav/node-trace.

A few more places we might start from:

Also see #53 for the work @matthewloring and @ofrobots are doing.

Part of the problem space we deal with every day, might be better solved in Node core.

My module referenced above integrated into core: joshgav/node/trace-event-integration.

Seems like putting a trace system in core will be necessary to enable data collection without requiring developers to explicitly opt-in (e.g. by importing a module).

@danielkhan
Copy link
Contributor

In addition to the tracing facilities needed, I think we should also define metrics like event loop timings that should be provided via a potential API.

@mcollina
Copy link
Member

👍 for providing internal APIs for event loop timing. I'm currently using http://npm.im/loopbench, which is far from ideal.

@megastef
Copy link

megastef commented Jun 20, 2016

+1 for the API to get pre-aggregated metrics from node core and solve various problems for each metric type:

  • GC time, GC runs, Avg Released Memory per GC type - please see Add GC Insights node#4496 - often a problem for windows users to compile native packages ...
  • EventLoop latency - most solutions inject frequently event to the event loop, and measure time when the event was handeld. But actually this puts much more events to the event loop as usual ...
  • http stats for client (e.g. accessing API's) and server - typically a monkey patching adventure ...

Instead of handling all kinds of events (like GC) and run own aggregations in user land, it might be much more efficent when stats could be collected in node core and emitted in a defined interval e.g. every 10 seconds or once a minute. For example listening to each GC event (as we do today), would trigger many times the function that collects metrics, while an internal function could just update internal counters/arrays and emit the event once a while.
like
process.on('stats', statsListener)
resulting in an objectct providing most relevant key metrics like this:

{
    gc: {
       full_cycles: {
          duration: 200, 
          count: 4
          releasedMemory: 1024
       },
       sc_cycles: {
          duration: 200, 
          count: 4,
          releasedMemory: 1024
       }
   } 
    eventloop_latency { 
         min: 0.001,
         max: 10,
         avg: 2
    }, 
   http_server: { 
    requests: 10, 
    rx: 1200, 
    tx: 500
    response_time: {
         min: ..., 
         max: ..., 
         avg: ...,
    } 
   status: {
       2xx: 196
       3xx: 1,
       4xx: 2,
       5xx: 1 
   }
  }, 
  http_client {
          ...
  },
  upd_stats: {},
  tcp_stats: {},
  fs_stats: {}
}

All values should be reset in after emitting the 'stats' event - I've seen often API's that just count up and agents collecting this data have to keep last value and calculate the differnence to current value, another waste of CPU cycles ...

BTW, I wrote a while ago an article about Node.js metrics and hope it is helpful for the discussion: https://sematext.com/blog/2015/12/02/top-nodejs-metrics-to-watch/

@danielkhan
Copy link
Contributor

With one month to go until Node Summit and flight bookings coming up, I think we should announce the APM Summit and set a time and date (25th or 26th of July).

After the date has been set:
How do we get the message out to all vendors in space. Could some neutral entity like maybe @othiym23 or @brycebaril take care of letting the right people know?

@Qard
Copy link
Member

Qard commented Jun 23, 2016

AppNeta seems to be unwilling to send me down for this. 😞

@yunong
Copy link
Member

yunong commented Jul 5, 2016

Please sign me up for this summit. We've been working on USDT support for perf and ebpf on Linux, and would also like to discuss how we could more tightly integrate this into restify.

@jkrems
Copy link
Contributor

jkrems commented Jul 5, 2016

(For people like me who had to google what USDT stands for: http://www.brendangregg.com/blog/2015-07-03/hacking-linux-usdt-ftrace.html)

@yunong
Copy link
Member

yunong commented Jul 5, 2016

For additional details on USDT: see this issue @brendangregg has filed #61

@brendangregg
Copy link

Thanks @yunong; that's the Linux perf_events work, which is all mainline.

There's also the Linux bcc/BPF work, where the BPF is mainline and bcc is a python add-on. @goldshtn wrote a post showing initial Node.js USDT support here:

http://blogs.microsoft.co.il/sasha/2016/03/30/usdt-probe-support-in-bpfbcc/

@watson
Copy link
Member Author

watson commented Jul 14, 2016

Important update: The APM Summit was on the agenda at yesterdays Node.js Diagnostics Working Group meeting and it was decided to not have it at the NodeSummit this month.

If you're interested, you can watch the APM segment from the meeting on YouTube or read the minutes.

Action needed:

Please fill in this Doodle if you want to attend the APM Summit and mark the dates / locations you are able to attend: http://doodle.com/poll/utqxycqki8chyddd

@joshgav
Copy link
Contributor

joshgav commented Sep 13, 2016

Opened a continuation of this as a proposal for the Austin collaboration summit: openjs-foundation/summit#30

@joshgav
Copy link
Contributor

joshgav commented Sep 21, 2016

Closing in deference to Austin Summit thread, @watson - please re-open if you'd like. Thanks!

@joshgav joshgav closed this as completed Sep 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests