Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Emit telemetry events for caching tracing #106

Conversation

SophisticaSean
Copy link

@SophisticaSean SophisticaSean commented Mar 15, 2021

I'm very large fan of tracing and also of nebulex. Here I've gone through a rough draft that works in production for me to produce traces like this (none of the light blue/ice blue spans are possible without these changes):
image

Would like feedback on the current implementation. I also need to add example telemetry consumers for people to use if they want to utilize these events.

Most inspiration for the telemetry emit and consume tracing came from spandex_ecto

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 99.726% when pulling ba44322 on SophisticaSean:sophisticasean/support-before-after-blocks-in-decorator into 323bd76 on cabol:master.

@SophisticaSean
Copy link
Author

Since telemetry is optional we'll need to wrap our actual calls to telemetry in a dummy implementation. Looking for advice on how you'd like that done and then I'm happy to make that change.

@cabol
Copy link
Owner

cabol commented Mar 21, 2021

Hi @SophisticaSean , first of all, thanks a lot for your work, I can see it would be really helpful to have also the latency for the cache operations somehow. Let me check the PR, and also come up with some ideas, see what is the best way to approach this, will try to give you my feedback soon.

@cabol
Copy link
Owner

cabol commented Mar 21, 2021

Hey @SophisticaSean! I was checking out the PR, and as you mentioned, the problem is the adapter would depend on :telemetry. There is another option, and I personally think it could be even easier, and it is by using hooks, check this out: tracking example by using hooks – the example actually does exactly what you want, but instead of logging, you can just add the telemetry events you want. Please take a look at it and let me know your thoughts.

@SophisticaSean
Copy link
Author

Hey @SophisticaSean! I was checking out the PR, and as you mentioned, the problem is the adapter would depend on :telemetry. There is another option, and I personally think it could be even easier, and it is by using hooks, check this out: tracking example by using hooks – the example actually does exactly what you want, but instead of logging, you can just add the telemetry events you want. Please take a look at it and let me know your thoughts.

awesome. I'll take a look and adjust if it works!

@SophisticaSean
Copy link
Author

SophisticaSean commented Mar 27, 2021

@cabol So yes, hooks would work for me just fine. The reason for the PR is not to get Nebulex telemetry events just for me but to allow for other people and consumers of Nebulex to leverage these same telemetry events without having to spin up a large amount of custom hooks.

I'm following in the example of Ecto and Phoenix (which accomplish this by both emitting telemetry events internally like this implementation, and also providing configurable logging backends).

I think its important for us to build opt-in tooling directly into very useful libraries like this so we all can benefit from my work on this. After all this is why Telemetry was created, for exactly these scenarios and some of the largest and most used elixir Repos are following this pattern. The telemetry events and pluggable loggers in both Ecto and Phoenix are why libraries like https://github.com/spandex-project/spandex_ecto and https://github.com/spandex-project/spandex_phoenix are possible.

Please let me know what you think! I'm happy to refactor this anyway you see fit to ensure backwards compatibility and internal consistency but my reason for the PR is not to just solve my problem but to help push this great library forward!

@cabol
Copy link
Owner

cabol commented Mar 27, 2021

Hey @SophisticaSean, I agree with you, I've been thinking about his, and it would be good to have these metrics built-in as you propose, I'm thinking the best way to do it, maybe at cache level to be able to use them in all adapters, not only the local one. Let me think about it and get back to you soon, thanks 👍 !!

@SophisticaSean
Copy link
Author

Hey @SophisticaSean, I agree with you, I've been thinking about his, and it would be good to have these metrics built-in as you propose, I'm thinking the best way to do it, maybe at cache level to be able to use them in all adapters, not only the local one. Let me think about it and get back to you soon, thanks 👍 !!

Heck yeah man! Just let me know, I'm happy to do the work. :)

@cabol
Copy link
Owner

cabol commented Apr 2, 2021

@SophisticaSean I've been thinking about this and also the best and fastest way to do it, at least in a way it is optional and no that intrusive to avoid too many changes. What I'd suggest doing right now is to provide a build-in a hook module encapsulating the tracing logic (e.g.: Nebulex.Hook.Tracing.emit_telemetry_event/1), so at least from a user standpoint would be only a matter to declare it, for example:

defmodule MyApp.Cache do
  use Nebulex.Hook
  @decorate_all before(&Nebulex.Hook.Tracing.emit_telemetry_event/1)
  
  use Nebulex.Cache,
    otp_app: :may_app,
    adapter: Nebulex.Adapters.Local
end

Open to other names for the hook module and function, but overall that would be the idea, then we can add docs and examples of usage, but in this way, it would be very easy to use I think. Let me know your thoughts about it and also if you would like to do it, stay tuned!

@SophisticaSean
Copy link
Author

SophisticaSean commented Apr 5, 2021

@cabol I think that's a solid first pass but the issue is then granularity.

I want to know what is taking time in nebulex, not what is taking time in my application. I already have metrics and spans around my calls to nebulex, but I don't have those from within nebulex.

The comparison is as follows:
I can see how long the overall ecto interaction takes here:
image
but without the Ecto telemetry I wouldn't know how long it spent in the ecto queue, how long the query itself took (run portion) or how long decoding took (little sliver at the end).

The same can be said for nebulex here:
image
Using hooks, I believe I would lose the more granular information about what is happening inside the adapter. This is very important to know because I can begin to understand or try to diagnose performance problems at specific parts of the adapter. Otherwise I'd just have that top light blue span telling me how much time overall all of nebulex took which is not very useful.

@SophisticaSean
Copy link
Author

And again Hooks are not the way the the major libraries are doing this. Ecto and Phoenix do not make telemetry an optional dependency. Telemetry is the future and maybe we should just commit to it being a dependency and adding the events where they need to be in all of the adapters. Maybe its time for a major version upgrade and this is the major change in that version update?

I'm happy to do this work.

merge upstream nebulex
@SophisticaSean
Copy link
Author

@cabol any ideas? I'm super willing and happy to do literally all the work, I just need your go-ahead on how you want it to look. <3

Copy link
Owner

@cabol cabol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SophisticaSean I like the way you addressed the Telemetry in the adapter, which will be loaded/used if :telemetry is available. I added some comments, but overall for this local adapter looks good. My suggestions and/or next steps:

  • Please send a first smaller PR only instrumenting the local adapter.
  • Then one PR per adapter adding the Telemetry instrumentation (3 more PRs: partitioned, replicated, and multi-lever adapters).
  • I'm not sure if I'd like to add Telemetry all over the code, like to the RPC, caching decorators, etc., so this is something we can discuss, but I think at least the main need/goal of getting instrumented the adapters will be covered.

I see you have done a lot of work here 😄 , thanks a lot for your contribution! Looking forward for the next PRs!

|> update_stats(:get, ref)
log_telemetry_start(:get)

result =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the result = since it is not being used, and avoid breaking the pipe flow, don't you think?


if Code.ensure_loaded?(:telemetry) do
defp log_telemetry_start(action) do
start_time = System.os_time(:nanosecond)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to use System.system_time()? I guess by default is nanoseconds, anyway, you can also pass the unit if it is required.

start_time = System.os_time(:nanosecond)

:telemetry.execute(
[:nebulex, :start],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, by default, the telemetry prefix should be [:nebulex, :cache], so the events would be [:nebulex, :cache, :start], and in the metadata I would like to have:

  • The cache name.
  • The action as you have already.
  • Maybe the adapter?

All those values you can get them from the adapter metadata.

:telemetry.execute(
[:nebulex, :end],
%{completion_time: completion_time},
%{action: action, adapter: :local}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same previous comment.

@cabol
Copy link
Owner

cabol commented Apr 25, 2021

@SophisticaSean I'm settle down some foundations to support the Telemetry events in the adapters, similar to Ecto. And I was checking out telemetry and I found there is a function span/3 to actually emit telemetry events when a function starts and when it ends with the duration time. I'm wondering if that covers your needs, because I see in the :start event it sends the syatem_time, but for the :stop it sets the duration as measurement, not the timestamp, not sure if that works for you. Because I'd like to reuse as much as possible from Telemetry. Please take a look at it and let me know, stay tuned!

@cabol
Copy link
Owner

cabol commented Apr 25, 2021

Hey @SophisticaSean, quick update, I've pushed to master the changes to instrument the local adapter with Telemetry events. This is the ISSUE I created for this, also you will find the related changes. Please take a look at it and let me know if that covers your needs. Also, I will create more issues for instrumenting the rest of the adapters. Let me know your thoughts, stay tuned!

@cabol cabol force-pushed the master branch 5 times, most recently from a045984 to 6239874 Compare May 15, 2021 14:30
@cabol
Copy link
Owner

cabol commented May 17, 2021

@SophisticaSean I published release 2.1.0, which has the Telemetry events needed for tracing, please take a look at the docs, also try it out and let me know if that works for you. Thanks, stay tuned!

@cabol cabol force-pushed the master branch 2 times, most recently from f70c5c3 to 1c4067f Compare June 19, 2021 15:29
@cabol
Copy link
Owner

cabol commented Jul 3, 2021

Hey 👋 !! I will close this PR because tracing is now possible with Nebulex (since v2.1.0). Feel free to open an issue or a new PR to improve current features regarding Telemetry events. Thanks 👍 !!

@SophisticaSean
Copy link
Author

Ended up swapping us over to use your new events, thanks for making them and it works well. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants