-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Emit telemetry events for caching tracing #106
WIP: Emit telemetry events for caching tracing #106
Conversation
Since telemetry is optional we'll need to wrap our actual calls to telemetry in a dummy implementation. Looking for advice on how you'd like that done and then I'm happy to make that change. |
Hi @SophisticaSean , first of all, thanks a lot for your work, I can see it would be really helpful to have also the latency for the cache operations somehow. Let me check the PR, and also come up with some ideas, see what is the best way to approach this, will try to give you my feedback soon. |
Hey @SophisticaSean! I was checking out the PR, and as you mentioned, the problem is the adapter would depend on |
awesome. I'll take a look and adjust if it works! |
@cabol So yes, hooks would work for me just fine. The reason for the PR is not to get Nebulex telemetry events just for me but to allow for other people and consumers of Nebulex to leverage these same telemetry events without having to spin up a large amount of custom hooks. I'm following in the example of Ecto and Phoenix (which accomplish this by both emitting telemetry events internally like this implementation, and also providing configurable logging backends). I think its important for us to build opt-in tooling directly into very useful libraries like this so we all can benefit from my work on this. After all this is why Telemetry was created, for exactly these scenarios and some of the largest and most used elixir Repos are following this pattern. The telemetry events and pluggable loggers in both Ecto and Phoenix are why libraries like https://github.com/spandex-project/spandex_ecto and https://github.com/spandex-project/spandex_phoenix are possible. Please let me know what you think! I'm happy to refactor this anyway you see fit to ensure backwards compatibility and internal consistency but my reason for the PR is not to just solve my problem but to help push this great library forward! |
Hey @SophisticaSean, I agree with you, I've been thinking about his, and it would be good to have these metrics built-in as you propose, I'm thinking the best way to do it, maybe at cache level to be able to use them in all adapters, not only the local one. Let me think about it and get back to you soon, thanks 👍 !! |
Heck yeah man! Just let me know, I'm happy to do the work. :) |
@SophisticaSean I've been thinking about this and also the best and fastest way to do it, at least in a way it is optional and no that intrusive to avoid too many changes. What I'd suggest doing right now is to provide a build-in a hook module encapsulating the tracing logic (e.g.: defmodule MyApp.Cache do
use Nebulex.Hook
@decorate_all before(&Nebulex.Hook.Tracing.emit_telemetry_event/1)
use Nebulex.Cache,
otp_app: :may_app,
adapter: Nebulex.Adapters.Local
end Open to other names for the hook module and function, but overall that would be the idea, then we can add docs and examples of usage, but in this way, it would be very easy to use I think. Let me know your thoughts about it and also if you would like to do it, stay tuned! |
@cabol I think that's a solid first pass but the issue is then granularity. I want to know what is taking time in nebulex, not what is taking time in my application. I already have metrics and spans around my calls to nebulex, but I don't have those from within nebulex. The comparison is as follows: The same can be said for nebulex here: |
And again Hooks are not the way the the major libraries are doing this. Ecto and Phoenix do not make telemetry an optional dependency. Telemetry is the future and maybe we should just commit to it being a dependency and adding the events where they need to be in all of the adapters. Maybe its time for a major version upgrade and this is the major change in that version update? I'm happy to do this work. |
merge upstream nebulex
@cabol any ideas? I'm super willing and happy to do literally all the work, I just need your go-ahead on how you want it to look. <3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SophisticaSean I like the way you addressed the Telemetry in the adapter, which will be loaded/used if :telemetry
is available. I added some comments, but overall for this local adapter looks good. My suggestions and/or next steps:
- Please send a first smaller PR only instrumenting the local adapter.
- Then one PR per adapter adding the Telemetry instrumentation (3 more PRs: partitioned, replicated, and multi-lever adapters).
- I'm not sure if I'd like to add Telemetry all over the code, like to the RPC, caching decorators, etc., so this is something we can discuss, but I think at least the main need/goal of getting instrumented the adapters will be covered.
I see you have done a lot of work here 😄 , thanks a lot for your contribution! Looking forward for the next PRs!
|> update_stats(:get, ref) | ||
log_telemetry_start(:get) | ||
|
||
result = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the result =
since it is not being used, and avoid breaking the pipe flow, don't you think?
|
||
if Code.ensure_loaded?(:telemetry) do | ||
defp log_telemetry_start(action) do | ||
start_time = System.os_time(:nanosecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not possible to use System.system_time()
? I guess by default is nanoseconds, anyway, you can also pass the unit if it is required.
start_time = System.os_time(:nanosecond) | ||
|
||
:telemetry.execute( | ||
[:nebulex, :start], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, by default, the telemetry prefix should be [:nebulex, :cache]
, so the events would be [:nebulex, :cache, :start]
, and in the metadata I would like to have:
- The cache name.
- The action as you have already.
- Maybe the adapter?
All those values you can get them from the adapter metadata.
:telemetry.execute( | ||
[:nebulex, :end], | ||
%{completion_time: completion_time}, | ||
%{action: action, adapter: :local} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same previous comment.
@SophisticaSean I'm settle down some foundations to support the Telemetry events in the adapters, similar to Ecto. And I was checking out telemetry and I found there is a function span/3 to actually emit telemetry events when a function starts and when it ends with the duration time. I'm wondering if that covers your needs, because I see in the |
Hey @SophisticaSean, quick update, I've pushed to master the changes to instrument the local adapter with Telemetry events. This is the ISSUE I created for this, also you will find the related changes. Please take a look at it and let me know if that covers your needs. Also, I will create more issues for instrumenting the rest of the adapters. Let me know your thoughts, stay tuned! |
a045984
to
6239874
Compare
@SophisticaSean I published release 2.1.0, which has the Telemetry events needed for tracing, please take a look at the docs, also try it out and let me know if that works for you. Thanks, stay tuned! |
f70c5c3
to
1c4067f
Compare
Hey 👋 !! I will close this PR because tracing is now possible with Nebulex (since v2.1.0). Feel free to open an issue or a new PR to improve current features regarding Telemetry events. Thanks 👍 !! |
Ended up swapping us over to use your new events, thanks for making them and it works well. Thank you! |
I'm very large fan of tracing and also of nebulex. Here I've gone through a rough draft that works in production for me to produce traces like this (none of the light blue/ice blue spans are possible without these changes):
Would like feedback on the current implementation. I also need to add example telemetry consumers for people to use if they want to utilize these events.
Most inspiration for the telemetry emit and consume tracing came from spandex_ecto